WorldWideScience

Sample records for model regression analyses

  1. USE OF THE SIMPLE LINEAR REGRESSION MODEL IN MACRO-ECONOMICAL ANALYSES

    Directory of Open Access Journals (Sweden)

    Constantin ANGHELACHE

    2011-10-01

    Full Text Available The article presents the fundamental aspects of the linear regression, as a toolbox which can be used in macroeconomic analyses. The article describes the estimation of the parameters, the statistical tests used, the homoscesasticity and heteroskedasticity. The use of econometrics instrument in macroeconomics is an important factor that guarantees the quality of the models, analyses, results and possible interpretation that can be drawn at this level.

  2. Testing Mediation Using Multiple Regression and Structural Equation Modeling Analyses in Secondary Data

    Science.gov (United States)

    Li, Spencer D.

    2011-01-01

    Mediation analysis in child and adolescent development research is possible using large secondary data sets. This article provides an overview of two statistical methods commonly used to test mediated effects in secondary analysis: multiple regression and structural equation modeling (SEM). Two empirical studies are presented to illustrate the…

  3. Alpins and thibos vectorial astigmatism analyses: proposal of a linear regression model between methods

    Directory of Open Access Journals (Sweden)

    Giuliano de Oliveira Freitas

    2013-10-01

    Full Text Available PURPOSE: To determine linear regression models between Alpins descriptive indices and Thibos astigmatic power vectors (APV, assessing the validity and strength of such correlations. METHODS: This case series prospectively assessed 62 eyes of 31 consecutive cataract patients with preoperative corneal astigmatism between 0.75 and 2.50 diopters in both eyes. Patients were randomly assorted among two phacoemulsification groups: one assigned to receive AcrySof®Toric intraocular lens (IOL in both eyes and another assigned to have AcrySof Natural IOL associated with limbal relaxing incisions, also in both eyes. All patients were reevaluated postoperatively at 6 months, when refractive astigmatism analysis was performed using both Alpins and Thibos methods. The ratio between Thibos postoperative APV and preoperative APV (APVratio and its linear regression to Alpins percentage of success of astigmatic surgery, percentage of astigmatism corrected and percentage of astigmatism reduction at the intended axis were assessed. RESULTS: Significant negative correlation between the ratio of post- and preoperative Thibos APVratio and Alpins percentage of success (%Success was found (Spearman's ρ=-0.93; linear regression is given by the following equation: %Success = (-APVratio + 1.00x100. CONCLUSION: The linear regression we found between APVratio and %Success permits a validated mathematical inference concerning the overall success of astigmatic surgery.

  4. Analysing the forward premium anomaly using a Logistic Smooth Transition Regression model.

    OpenAIRE

    Sofiane Amri

    2008-01-01

    Several researchers have suggested that exchange rates may be characterized by nonlinear behaviour. This paper examines these nonlinearities and asymetries and estimates a Logistic Transition Regression (LSTR) of Fama Regression with the Risk Adjusted Forward Premia as transition variable. Results confirm the existence of nonlinear dynamics in the relationship between spot exchange rate differential and the forward premium for all the currencies of the sample and for all maturities (three and...

  5. SPECIFICS OF THE APPLICATIONS OF MULTIPLE REGRESSION MODEL IN THE ANALYSES OF THE EFFECTS OF GLOBAL FINANCIAL CRISES

    Directory of Open Access Journals (Sweden)

    Željko V. Račić

    2010-12-01

    Full Text Available This paper aims to present the specifics of the application of multiple linear regression model. The economic (financial crisis is analyzed in terms of gross domestic product which is in a function of the foreign trade balance (on one hand and the credit cards, i.e. indebtedness of the population on this basis (on the other hand, in the USA (from 1999. to 2008. We used the extended application model which shows how the analyst should run the whole development process of regression model. This process began with simple statistical features and the application of regression procedures, and ended with residual analysis, intended for the study of compatibility of data and model settings. This paper also analyzes the values of some standard statistics used in the selection of appropriate regression model. Testing of the model is carried out with the use of the Statistics PASW 17 program.

  6. Flexible survival regression modelling

    DEFF Research Database (Denmark)

    Cortese, Giuliana; Scheike, Thomas H; Martinussen, Torben

    2009-01-01

    Regression analysis of survival data, and more generally event history data, is typically based on Cox's regression model. We here review some recent methodology, focusing on the limitations of Cox's regression model. The key limitation is that the model is not well suited to represent time-varyi...

  7. Unitary Response Regression Models

    Science.gov (United States)

    Lipovetsky, S.

    2007-01-01

    The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…

  8. Systematic Selection of Key Logistic Regression Variables for Risk Prediction Analyses: A Five-Factor Maximum Model.

    Science.gov (United States)

    Hewett, Timothy E; Webster, Kate E; Hurd, Wendy J

    2017-08-16

    The evolution of clinical practice and medical technology has yielded an increasing number of clinical measures and tests to assess a patient's progression and return to sport readiness after injury. The plethora of available tests may be burdensome to clinicians in the absence of evidence that demonstrates the utility of a given measurement. Thus, there is a critical need to identify a discrete number of metrics to capture during clinical assessment to effectively and concisely guide patient care. The data sources included Pubmed and PMC Pubmed Central articles on the topic. Therefore, we present a systematic approach to injury risk analyses and how this concept may be used in algorithms for risk analyses for primary anterior cruciate ligament (ACL) injury in healthy athletes and patients after ACL reconstruction. In this article, we present the five-factor maximum model, which states that in any predictive model, a maximum of 5 variables will contribute in a meaningful manner to any risk factor analysis. We demonstrate how this model already exists for prevention of primary ACL injury, how this model may guide development of the second ACL injury risk analysis, and how the five-factor maximum model may be applied across the injury spectrum for development of the injury risk analysis.

  9. Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses.

    Science.gov (United States)

    Faul, Franz; Erdfelder, Edgar; Buchner, Axel; Lang, Albert-Georg

    2009-11-01

    G*Power is a free power analysis program for a variety of statistical tests. We present extensions and improvements of the version introduced by Faul, Erdfelder, Lang, and Buchner (2007) in the domain of correlation and regression analyses. In the new version, we have added procedures to analyze the power of tests based on (1) single-sample tetrachoric correlations, (2) comparisons of dependent correlations, (3) bivariate linear regression, (4) multiple linear regression based on the random predictor model, (5) logistic regression, and (6) Poisson regression. We describe these new features and provide a brief introduction to their scope and handling.

  10. TWO REGRESSION CREDIBILITY MODELS

    Directory of Open Access Journals (Sweden)

    Constanţa-Nicoleta BODEA

    2010-03-01

    Full Text Available In this communication we will discuss two regression credibility models from Non – Life Insurance Mathematics that can be solved by means of matrix theory. In the first regression credibility model, starting from a well-known representation formula of the inverse for a special class of matrices a risk premium will be calculated for a contract with risk parameter θ. In the next regression credibility model, we will obtain a credibility solution in the form of a linear combination of the individual estimate (based on the data of a particular state and the collective estimate (based on aggregate USA data. To illustrate the solution with the properties mentioned above, we shall need the well-known representation theorem for a special class of matrices, the properties of the trace for a square matrix, the scalar product of two vectors, the norm with respect to a positive definite matrix given in advance and the complicated mathematical properties of conditional expectations and of conditional covariances.

  11. Elaborate ligand-based modeling coupled with multiple linear regression and k nearest neighbor QSAR analyses unveiled new nanomolar mTOR inhibitors.

    Science.gov (United States)

    Khanfar, Mohammad A; Taha, Mutasem O

    2013-10-28

    The mammalian target of rapamycin (mTOR) has an important role in cell growth, proliferation, and survival. mTOR is frequently hyperactivated in cancer, and therefore, it is a clinically validated target for cancer therapy. In this study, we combined exhaustive pharmacophore modeling and quantitative structure-activity relationship (QSAR) analysis to explore the structural requirements for potent mTOR inhibitors employing 210 known mTOR ligands. Genetic function algorithm (GFA) coupled with k nearest neighbor (kNN) and multiple linear regression (MLR) analyses were employed to build self-consistent and predictive QSAR models based on optimal combinations of pharmacophores and physicochemical descriptors. Successful pharmacophores were complemented with exclusion spheres to optimize their receiver operating characteristic curve (ROC) profiles. Optimal QSAR models and their associated pharmacophore hypotheses were validated by identification and experimental evaluation of several new promising mTOR inhibitory leads retrieved from the National Cancer Institute (NCI) structural database. The most potent hit illustrated an IC50 value of 48 nM.

  12. Forecasting with Dynamic Regression Models

    CERN Document Server

    Pankratz, Alan

    2012-01-01

    One of the most widely used tools in statistical forecasting, single equation regression models is examined here. A companion to the author's earlier work, Forecasting with Univariate Box-Jenkins Models: Concepts and Cases, the present text pulls together recent time series ideas and gives special attention to possible intertemporal patterns, distributed lag responses of output to input series and the auto correlation patterns of regression disturbance. It also includes six case studies.

  13. Modified Regression Correlation Coefficient for Poisson Regression Model

    Science.gov (United States)

    Kaengthong, Nattacha; Domthong, Uthumporn

    2017-09-01

    This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).

  14. Ridge Regression for Interactive Models.

    Science.gov (United States)

    Tate, Richard L.

    1988-01-01

    An exploratory study of the value of ridge regression for interactive models is reported. Assuming that the linear terms in a simple interactive model are centered to eliminate non-essential multicollinearity, a variety of common models, representing both ordinal and disordinal interactions, are shown to have "orientations" that are favorable to…

  15. APPROACH OF FIVE-YEAR-AVERAGE HAZARD RATES FOR THE BREAST CANCER PATIENTS AND ANALYSES OF PROGNOSTIC FACTORS-AN APPLICATION OF COX REGRESSION MODEL

    Institute of Scientific and Technical Information of China (English)

    Gai Xueliang; Fan Zhimin; Liu Guojin; Jacques Brisson

    1998-01-01

    Objective: To compare with five-year survival after surgery for the 116 breast cancer patients treated at the First Teaching Hospital (FTH) and the 866 breast cancer patients at Hopital du Saint-Sacrement (HSS). Methods:Using Cox regression model, after eliminating the confounders, to develop the comparison of the five-year average hazard rates between two hospitals and among the levels of prognostic factors. Results: It has significant difference for the old patients (50 years old or more)between the two hospitals. Conclusion: Tumor size at pathology and involvement of lymph nodes were important prognostic factors.

  16. The use of GLS regression in regional hydrologic analyses

    Science.gov (United States)

    Griffis, V. W.; Stedinger, J. R.

    2007-09-01

    SummaryTo estimate flood quantiles and other statistics at ungauged sites, many organizations employ an iterative generalized least squares (GLS) regression procedure to estimate the parameters of a model of the statistic of interest as a function of basin characteristics. The GLS regression procedure accounts for differences in available record lengths and spatial correlation in concurrent events by using an estimator of the sampling covariance matrix of available flood quantiles. Previous studies by the US Geological Survey using the LP3 distribution have neglected the impact of uncertainty in the weighted skew on quantile precision. The needed relationship is developed here and its use is illustrated in a regional flood study with 162 sites from South Carolina. The performance of a pooled regression model is compared to separate models for each hydrologic region: statistical tests recommend an interesting hybrid of the two which is both surprising and hydrologically reasonable. The statistical analysis is augmented with new diagnostic metrics including a condition number to check for multicollinearity, a new pseudo- R appropriate for use with GLS regression, and two error variance ratios. GLS regression for the standard deviation demonstrates that again a hybrid model is attractive, and that GLS rather than an OLS or WLS analysis is appropriate for the development of regional standard deviation models.

  17. Basic Diagnosis and Prediction of Persistent Contrail Occurrence using High-resolution Numerical Weather Analyses/Forecasts and Logistic Regression. Part II: Evaluation of Sample Models

    Science.gov (United States)

    Duda, David P.; Minnis, Patrick

    2009-01-01

    Previous studies have shown that probabilistic forecasting may be a useful method for predicting persistent contrail formation. A probabilistic forecast to accurately predict contrail formation over the contiguous United States (CONUS) is created by using meteorological data based on hourly meteorological analyses from the Advanced Regional Prediction System (ARPS) and from the Rapid Update Cycle (RUC) as well as GOES water vapor channel measurements, combined with surface and satellite observations of contrails. Two groups of logistic models were created. The first group of models (SURFACE models) is based on surface-based contrail observations supplemented with satellite observations of contrail occurrence. The second group of models (OUTBREAK models) is derived from a selected subgroup of satellite-based observations of widespread persistent contrails. The mean accuracies for both the SURFACE and OUTBREAK models typically exceeded 75 percent when based on the RUC or ARPS analysis data, but decreased when the logistic models were derived from ARPS forecast data.

  18. Inferential Models for Linear Regression

    Directory of Open Access Journals (Sweden)

    Zuoyi Zhang

    2011-09-01

    Full Text Available Linear regression is arguably one of the most widely used statistical methods in applications.  However, important problems, especially variable selection, remain a challenge for classical modes of inference.  This paper develops a recently proposed framework of inferential models (IMs in the linear regression context.  In general, an IM is able to produce meaningful probabilistic summaries of the statistical evidence for and against assertions about the unknown parameter of interest and, moreover, these summaries are shown to be properly calibrated in a frequentist sense.  Here we demonstrate, using simple examples, that the IM framework is promising for linear regression analysis --- including model checking, variable selection, and prediction --- and for uncertain inference in general.

  19. Heteroscedasticity checks for regression models

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    For checking on heteroscedasticity in regression models, a unified approach is proposed to constructing test statistics in parametric and nonparametric regression models. For nonparametric regression, the test is not affected sensitively by the choice of smoothing parameters which are involved in estimation of the nonparametric regression function. The limiting null distribution of the test statistic remains the same in a wide range of the smoothing parameters. When the covariate is one-dimensional, the tests are, under some conditions, asymptotically distribution-free. In the high-dimensional cases, the validity of bootstrap approximations is investigated. It is shown that a variant of the wild bootstrap is consistent while the classical bootstrap is not in the general case, but is applicable if some extra assumption on conditional variance of the squared error is imposed. A simulation study is performed to provide evidence of how the tests work and compare with tests that have appeared in the literature. The approach may readily be extended to handle partial linear, and linear autoregressive models.

  20. Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models

    Science.gov (United States)

    Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung

    2015-01-01

    Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…

  1. Heteroscedasticity checks for regression models

    Institute of Scientific and Technical Information of China (English)

    ZHU; Lixing

    2001-01-01

    [1]Carroll, R. J., Ruppert, D., Transformation and Weighting in Regression, New York: Chapman and Hall, 1988.[2]Cook, R. D., Weisberg, S., Diagnostics for heteroscedasticity in regression, Biometrika, 1988, 70: 1—10.[3]Davidian, M., Carroll, R. J., Variance function estimation, J. Amer. Statist. Assoc., 1987, 82: 1079—1091.[4]Bickel, P., Using residuals robustly I: Tests for heteroscedasticity, Ann. Statist., 1978, 6: 266—291.[5]Carroll, R. J., Ruppert, D., On robust tests for heteroscedasticity, Ann. Statist., 1981, 9: 205—209.[6]Eubank, R. L., Thomas, W., Detecting heteroscedasticity in nonparametric regression, J. Roy. Statist. Soc., Ser. B, 1993, 55: 145—155.[7]Diblasi, A., Bowman, A., Testing for constant variance in a linear model, Statist. and Probab. Letters, 1997, 33: 95—103.[8]Dette, H., Munk, A., Testing heteoscedasticity in nonparametric regression, J. R. Statist. Soc. B, 1998, 60: 693—708.[9]Müller, H. G., Zhao, P. L., On a semi-parametric variance function model and a test for heteroscedasticity, Ann. Statist., 1995, 23: 946—967.[10]Stute, W., Manteiga, G., Quindimil, M. P., Bootstrap approximations in model checks for regression, J. Amer. Statist. Asso., 1998, 93: 141—149.[11]Stute, W., Thies, G., Zhu, L. X., Model checks for regression: An innovation approach, Ann. Statist., 1998, 26: 1916—1939.[12]Shorack, G. R., Wellner, J. A., Empirical Processes with Applications to Statistics, New York: Wiley, 1986.[13]Efron, B., Bootstrap methods: Another look at the jackknife, Ann. Statist., 1979, 7: 1—26.[14]Wu, C. F. J., Jackknife, bootstrap and other re-sampling methods in regression analysis, Ann. Statist., 1986, 14: 1261—1295.[15]H rdle, W., Mammen, E., Comparing non-parametric versus parametric regression fits, Ann. Statist., 1993, 21: 1926—1947.[16]Liu, R. Y., Bootstrap procedures under some non-i.i.d. models, Ann. Statist., 1988, 16: 1696—1708.[17

  2. Area under the curve predictions of dalbavancin, a new lipoglycopeptide agent, using the end of intravenous infusion concentration data point by regression analyses such as linear, log-linear and power models.

    Science.gov (United States)

    Bhamidipati, Ravi Kanth; Syed, Muzeeb; Mullangi, Ramesh; Srinivas, Nuggehally

    2017-03-14

    1. Dalbavancin, a lipoglycopeptide, is approved for treating gram-positive bacterial infections. Area under plasma concentration versus time curve (AUCinf) of dalbavancin is a key parameter and AUCinf/MIC ratio is a critical pharmacodynamic marker. 2. Using end of intravenous infusion concentration (i.e. Cmax) Cmax versus AUCinf relationship for dalbavancin was established by regression analyses (i.e. linear, log-log, log-linear and power models) using 21 pairs of subject data. 3. The predictions of the AUCinf were performed using published Cmax data by application of regression equations. The quotient of observed/predicted values rendered fold difference. The mean absolute error (MAE)/root mean square error (RMSE) and correlation coefficient (r) were used in the assessment. 4. MAE and RMSE values for the various models were comparable. The Cmax versus AUCinf exhibited excellent correlation (r > 0.9488). The internal data evaluation showed narrow confinement (0.84-1.14-fold difference) with a RMSE regression models, a single time point strategy of using Cmax (i.e. end of 30-min infusion) is amenable as a prospective tool for predicting AUCinf of dalbavancin in patients.

  3. Analysing inequalities in Germany a structured additive distributional regression approach

    CERN Document Server

    Silbersdorff, Alexander

    2017-01-01

    This book seeks new perspectives on the growing inequalities that our societies face, putting forward Structured Additive Distributional Regression as a means of statistical analysis that circumvents the common problem of analytical reduction to simple point estimators. This new approach allows the observed discrepancy between the individuals’ realities and the abstract representation of those realities to be explicitly taken into consideration using the arithmetic mean alone. In turn, the method is applied to the question of economic inequality in Germany.

  4. Semiparametric Regression and Model Refining

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    This paper presents a semiparametric adjustment method suitable for general cases.Assuming that the regularizer matrix is positive definite,the calculation method is discussed and the corresponding formulae are presented.Finally,a simulated adjustment problem is constructed to explain the method given in this paper.The results from the semiparametric model and G-M model are compared.The results demonstrate that the model errors or the systematic errors of the observations can be detected correctly with the semiparametric estimate method.

  5. Regression modeling of ground-water flow

    Science.gov (United States)

    Cooley, R.L.; Naff, R.L.

    1985-01-01

    Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)

  6. [From clinical judgment to linear regression model.

    Science.gov (United States)

    Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O

    2013-01-01

    When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R(2)) indicates the importance of independent variables in the outcome.

  7. Regression Model With Elliptically Contoured Errors

    CERN Document Server

    Arashi, M; Tabatabaey, S M M

    2012-01-01

    For the regression model where the errors follow the elliptically contoured distribution (ECD), we consider the least squares (LS), restricted LS (RLS), preliminary test (PT), Stein-type shrinkage (S) and positive-rule shrinkage (PRS) estimators for the regression parameters. We compare the quadratic risks of the estimators to determine the relative dominance properties of the five estimators.

  8. The Infinite Hierarchical Factor Regression Model

    CERN Document Server

    Rai, Piyush

    2009-01-01

    We propose a nonparametric Bayesian factor regression model that accounts for uncertainty in the number of factors, and the relationship between factors. To accomplish this, we propose a sparse variant of the Indian Buffet Process and couple this with a hierarchical model over factors, based on Kingman's coalescent. We apply this model to two problems (factor analysis and factor regression) in gene-expression data analysis.

  9. Applied Regression Modeling A Business Approach

    CERN Document Server

    Pardoe, Iain

    2012-01-01

    An applied and concise treatment of statistical regression techniques for business students and professionals who have little or no background in calculusRegression analysis is an invaluable statistical methodology in business settings and is vital to model the relationship between a response variable and one or more predictor variables, as well as the prediction of a response value given values of the predictors. In view of the inherent uncertainty of business processes, such as the volatility of consumer spending and the presence of market uncertainty, business professionals use regression a

  10. A new bivariate negative binomial regression model

    Science.gov (United States)

    Faroughi, Pouya; Ismail, Noriszura

    2014-12-01

    This paper introduces a new form of bivariate negative binomial (BNB-1) regression which can be fitted to bivariate and correlated count data with covariates. The BNB regression discussed in this study can be fitted to bivariate and overdispersed count data with positive, zero or negative correlations. The joint p.m.f. of the BNB1 distribution is derived from the product of two negative binomial marginals with a multiplicative factor parameter. Several testing methods were used to check overdispersion and goodness-of-fit of the model. Application of BNB-1 regression is illustrated on Malaysian motor insurance dataset. The results indicated that BNB-1 regression has better fit than bivariate Poisson and BNB-2 models with regards to Akaike information criterion.

  11. A Spline Regression Model for Latent Variables

    Science.gov (United States)

    Harring, Jeffrey R.

    2014-01-01

    Spline (or piecewise) regression models have been used in the past to account for patterns in observed data that exhibit distinct phases. The changepoint or knot marking the shift from one phase to the other, in many applications, is an unknown parameter to be estimated. As an extension of this framework, this research considers modeling the…

  12. Regression modeling methods, theory, and computation with SAS

    CERN Document Server

    Panik, Michael

    2009-01-01

    Regression Modeling: Methods, Theory, and Computation with SAS provides an introduction to a diverse assortment of regression techniques using SAS to solve a wide variety of regression problems. The author fully documents the SAS programs and thoroughly explains the output produced by the programs.The text presents the popular ordinary least squares (OLS) approach before introducing many alternative regression methods. It covers nonparametric regression, logistic regression (including Poisson regression), Bayesian regression, robust regression, fuzzy regression, random coefficients regression,

  13. Constrained regression models for optimization and forecasting

    Directory of Open Access Journals (Sweden)

    P.J.S. Bruwer

    2003-12-01

    Full Text Available Linear regression models and the interpretation of such models are investigated. In practice problems often arise with the interpretation and use of a given regression model in spite of the fact that researchers may be quite "satisfied" with the model. In this article methods are proposed which overcome these problems. This is achieved by constructing a model where the "area of experience" of the researcher is taken into account. This area of experience is represented as a convex hull of available data points. With the aid of a linear programming model it is shown how conclusions can be formed in a practical way regarding aspects such as optimal levels of decision variables and forecasting.

  14. A Skew-Normal Mixture Regression Model

    Science.gov (United States)

    Liu, Min; Lin, Tsung-I

    2014-01-01

    A challenge associated with traditional mixture regression models (MRMs), which rest on the assumption of normally distributed errors, is determining the number of unobserved groups. Specifically, even slight deviations from normality can lead to the detection of spurious classes. The current work aims to (a) examine how sensitive the commonly…

  15. Modeling confounding by half-sibling regression

    DEFF Research Database (Denmark)

    Schölkopf, Bernhard; Hogg, David W; Wang, Dun

    2016-01-01

    We describe a method for removing the effect of confounders to reconstruct a latent quantity of interest. The method, referred to as "half-sibling regression," is inspired by recent work in causal inference using additive noise models. We provide a theoretical justification, discussing both...

  16. Problems of correlations between explanatory variables in multiple regression analyses in the dental literature.

    Science.gov (United States)

    Tu, Y-K; Kellett, M; Clerehugh, V; Gilthorpe, M S

    2005-10-01

    Multivariable analysis is a widely used statistical methodology for investigating associations amongst clinical variables. However, the problems of collinearity and multicollinearity, which can give rise to spurious results, have in the past frequently been disregarded in dental research. This article illustrates and explains the problems which may be encountered, in the hope of increasing awareness and understanding of these issues, thereby improving the quality of the statistical analyses undertaken in dental research. Three examples from different clinical dental specialties are used to demonstrate how to diagnose the problem of collinearity/multicollinearity in multiple regression analyses and to illustrate how collinearity/multicollinearity can seriously distort the model development process. Lack of awareness of these problems can give rise to misleading results and erroneous interpretations. Multivariable analysis is a useful tool for dental research, though only if its users thoroughly understand the assumptions and limitations of these methods. It would benefit evidence-based dentistry enormously if researchers were more aware of both the complexities involved in multiple regression when using these methods and of the need for expert statistical consultation in developing study design and selecting appropriate statistical methodologies.

  17. Bayesian multimodel inference for geostatistical regression models.

    Directory of Open Access Journals (Sweden)

    Devin S Johnson

    Full Text Available The problem of simultaneous covariate selection and parameter inference for spatial regression models is considered. Previous research has shown that failure to take spatial correlation into account can influence the outcome of standard model selection methods. A Markov chain Monte Carlo (MCMC method is investigated for the calculation of parameter estimates and posterior model probabilities for spatial regression models. The method can accommodate normal and non-normal response data and a large number of covariates. Thus the method is very flexible and can be used to fit spatial linear models, spatial linear mixed models, and spatial generalized linear mixed models (GLMMs. The Bayesian MCMC method also allows a priori unequal weighting of covariates, which is not possible with many model selection methods such as Akaike's information criterion (AIC. The proposed method is demonstrated on two data sets. The first is the whiptail lizard data set which has been previously analyzed by other researchers investigating model selection methods. Our results confirmed the previous analysis suggesting that sandy soil and ant abundance were strongly associated with lizard abundance. The second data set concerned pollution tolerant fish abundance in relation to several environmental factors. Results indicate that abundance is positively related to Strahler stream order and a habitat quality index. Abundance is negatively related to percent watershed disturbance.

  18. An Application on Multinomial Logistic Regression Model

    Directory of Open Access Journals (Sweden)

    Abdalla M El-Habil

    2012-03-01

    Full Text Available Normal 0 false false false EN-US X-NONE X-NONE This study aims to identify an application of Multinomial Logistic Regression model which is one of the important methods for categorical data analysis. This model deals with one nominal/ordinal response variable that has more than two categories, whether nominal or ordinal variable. This model has been applied in data analysis in many areas, for example health, social, behavioral, and educational.To identify the model by practical way, we used real data on physical violence against children, from a survey of Youth 2003 which was conducted by Palestinian Central Bureau of Statistics (PCBS. Segment of the population of children in the age group (10-14 years for residents in Gaza governorate, size of 66,935 had been selected, and the response variable consisted of four categories. Eighteen of explanatory variables were used for building the primary multinomial logistic regression model. Model had been tested through a set of statistical tests to ensure its appropriateness for the data. Also the model had been tested by selecting randomly of two observations of the data used to predict the position of each observation in any classified group it can be, by knowing the values of the explanatory variables used. We concluded by using the multinomial logistic regression model that we can able to define accurately the relationship between the group of explanatory variables and the response variable, identify the effect of each of the variables, and we can predict the classification of any individual case.

  19. Regression Models for Count Data in R

    Directory of Open Access Journals (Sweden)

    Christian Kleiber

    2008-06-01

    Full Text Available The classical Poisson, geometric and negative binomial regression models for count data belong to the family of generalized linear models and are available at the core of the statistics toolbox in the R system for statistical computing. After reviewing the conceptual and computational features of these methods, a new implementation of hurdle and zero-inflated regression models in the functions hurdle( and zeroinfl( from the package pscl is introduced. It re-uses design and functionality of the basic R functions just as the underlying conceptual tools extend the classical models. Both hurdle and zero-inflated model, are able to incorporate over-dispersion and excess zeros-two problems that typically occur in count data sets in economics and the social sciences—better than their classical counterparts. Using cross-section data on the demand for medical care, it is illustrated how the classical as well as the zero-augmented models can be fitted, inspected and tested in practice.

  20. Parametric Regression Models Using Reversed Hazard Rates

    Directory of Open Access Journals (Sweden)

    Asokan Mulayath Variyath

    2014-01-01

    Full Text Available Proportional hazard regression models are widely used in survival analysis to understand and exploit the relationship between survival time and covariates. For left censored survival times, reversed hazard rate functions are more appropriate. In this paper, we develop a parametric proportional hazard rates model using an inverted Weibull distribution. The estimation and construction of confidence intervals for the parameters are discussed. We assess the performance of the proposed procedure based on a large number of Monte Carlo simulations. We illustrate the proposed method using a real case example.

  1. Bayesian model selection in Gaussian regression

    CERN Document Server

    Abramovich, Felix

    2009-01-01

    We consider a Bayesian approach to model selection in Gaussian linear regression, where the number of predictors might be much larger than the number of observations. From a frequentist view, the proposed procedure results in the penalized least squares estimation with a complexity penalty associated with a prior on the model size. We investigate the optimality properties of the resulting estimator. We establish the oracle inequality and specify conditions on the prior that imply its asymptotic minimaxity within a wide range of sparse and dense settings for "nearly-orthogonal" and "multicollinear" designs.

  2. Bayesian Inference of a Multivariate Regression Model

    Directory of Open Access Journals (Sweden)

    Marick S. Sinay

    2014-01-01

    Full Text Available We explore Bayesian inference of a multivariate linear regression model with use of a flexible prior for the covariance structure. The commonly adopted Bayesian setup involves the conjugate prior, multivariate normal distribution for the regression coefficients and inverse Wishart specification for the covariance matrix. Here we depart from this approach and propose a novel Bayesian estimator for the covariance. A multivariate normal prior for the unique elements of the matrix logarithm of the covariance matrix is considered. Such structure allows for a richer class of prior distributions for the covariance, with respect to strength of beliefs in prior location hyperparameters, as well as the added ability, to model potential correlation amongst the covariance structure. The posterior moments of all relevant parameters of interest are calculated based upon numerical results via a Markov chain Monte Carlo procedure. The Metropolis-Hastings-within-Gibbs algorithm is invoked to account for the construction of a proposal density that closely matches the shape of the target posterior distribution. As an application of the proposed technique, we investigate a multiple regression based upon the 1980 High School and Beyond Survey.

  3. General regression and representation model for classification.

    Directory of Open Access Journals (Sweden)

    Jianjun Qian

    Full Text Available Recently, the regularized coding-based classification methods (e.g. SRC and CRC show a great potential for pattern classification. However, most existing coding methods assume that the representation residuals are uncorrelated. In real-world applications, this assumption does not hold. In this paper, we take account of the correlations of the representation residuals and develop a general regression and representation model (GRR for classification. GRR not only has advantages of CRC, but also takes full use of the prior information (e.g. the correlations between representation residuals and representation coefficients and the specific information (weight matrix of image pixels to enhance the classification performance. GRR uses the generalized Tikhonov regularization and K Nearest Neighbors to learn the prior information from the training data. Meanwhile, the specific information is obtained by using an iterative algorithm to update the feature (or image pixel weights of the test sample. With the proposed model as a platform, we design two classifiers: basic general regression and representation classifier (B-GRR and robust general regression and representation classifier (R-GRR. The experimental results demonstrate the performance advantages of proposed methods over state-of-the-art algorithms.

  4. Adaptive regression for modeling nonlinear relationships

    CERN Document Server

    Knafl, George J

    2016-01-01

    This book presents methods for investigating whether relationships are linear or nonlinear and for adaptively fitting appropriate models when they are nonlinear. Data analysts will learn how to incorporate nonlinearity in one or more predictor variables into regression models for different types of outcome variables. Such nonlinear dependence is often not considered in applied research, yet nonlinear relationships are common and so need to be addressed. A standard linear analysis can produce misleading conclusions, while a nonlinear analysis can provide novel insights into data, not otherwise possible. A variety of examples of the benefits of modeling nonlinear relationships are presented throughout the book. Methods are covered using what are called fractional polynomials based on real-valued power transformations of primary predictor variables combined with model selection based on likelihood cross-validation. The book covers how to formulate and conduct such adaptive fractional polynomial modeling in the s...

  5. Hierarchical linear regression models for conditional quantiles

    Institute of Scientific and Technical Information of China (English)

    TIAN Maozai; CHEN Gemai

    2006-01-01

    The quantile regression has several useful features and therefore is gradually developing into a comprehensive approach to the statistical analysis of linear and nonlinear response models,but it cannot deal effectively with the data with a hierarchical structure.In practice,the existence of such data hierarchies is neither accidental nor ignorable,it is a common phenomenon.To ignore this hierarchical data structure risks overlooking the importance of group effects,and may also render many of the traditional statistical analysis techniques used for studying data relationships invalid.On the other hand,the hierarchical models take a hierarchical data structure into account and have also many applications in statistics,ranging from overdispersion to constructing min-max estimators.However,the hierarchical models are virtually the mean regression,therefore,they cannot be used to characterize the entire conditional distribution of a dependent variable given high-dimensional covariates.Furthermore,the estimated coefficient vector (marginal effects)is sensitive to an outlier observation on the dependent variable.In this article,a new approach,which is based on the Gauss-Seidel iteration and taking a full advantage of the quantile regression and hierarchical models,is developed.On the theoretical front,we also consider the asymptotic properties of the new method,obtaining the simple conditions for an n1/2-convergence and an asymptotic normality.We also illustrate the use of the technique with the real educational data which is hierarchical and how the results can be explained.

  6. Performance Evaluation of Button Bits in Coal Measure Rocks by Using Multiple Regression Analyses

    Science.gov (United States)

    Su, Okan

    2016-02-01

    Electro-hydraulic and jumbo drills are commonly used for underground coal mines and tunnel drives for the purpose of blasthole drilling and rock bolt installations. Not only machine parameters but also environmental conditions have significant effects on drilling. This study characterizes the performance of button bits during blasthole drilling in coal measure rocks by using multiple regression analyses. The penetration rate of jumbo and electro-hydraulic drills was measured in the field by employing bits in different diameters and the specific energy of the drilling was calculated at various locations, including highway tunnels and underground roadways of coal mines. Large block samples were collected from each location at which in situ drilling measurements were performed. Then, the effects of rock properties and machine parameters on the drilling performance were examined. Multiple regression models were developed for the prediction of the specific energy of the drilling and the penetration rate. The results revealed that hole area, impact (blow) energy, blows per minute of the piston within the drill, and some rock properties, such as the uniaxial compressive strength (UCS) and the drilling rate index (DRI), influence the drill performance.

  7. Regression Models For Saffron Yields in Iran

    Science.gov (United States)

    S. H, Sanaeinejad; S. N, Hosseini

    Saffron is an important crop in social and economical aspects in Khorassan Province (Northeast of Iran). In this research wetried to evaluate trends of saffron yield in recent years and to study the relationship between saffron yield and the climate change. A regression analysis was used to predict saffron yield based on 20 years of yield data in Birjand, Ghaen and Ferdows cities.Climatologically data for the same periods was provided by database of Khorassan Climatology Center. Climatologically data includedtemperature, rainfall, relative humidity and sunshine hours for ModelI, and temperature and rainfall for Model II. The results showed the coefficients of determination for Birjand, Ferdows and Ghaen for Model I were 0.69, 0.50 and 0.81 respectively. Also coefficients of determination for the same cities for model II were 0.53, 0.50 and 0.72 respectively. Multiple regression analysisindicated that among weather variables, temperature was the key parameter for variation ofsaffron yield. It was concluded that increasing temperature at spring was the main cause of declined saffron yield during recent years across the province. Finally, yield trend was predicted for the last 5 years using time series analysis.

  8. Symbolic regression of generative network models

    CERN Document Server

    Menezes, Telmo

    2014-01-01

    Networks are a powerful abstraction with applicability to a variety of scientific fields. Models explaining their morphology and growth processes permit a wide range of phenomena to be more systematically analysed and understood. At the same time, creating such models is often challenging and requires insights that may be counter-intuitive. Yet there currently exists no general method to arrive at better models. We have developed an approach to automatically detect realistic decentralised network growth models from empirical data, employing a machine learning technique inspired by natural selection and defining a unified formalism to describe such models as computer programs. As the proposed method is completely general and does not assume any pre-existing models, it can be applied "out of the box" to any given network. To validate our approach empirically, we systematically rediscover pre-defined growth laws underlying several canonical network generation models and credible laws for diverse real-world netwo...

  9. The number of subjects per variable required in linear regression analyses

    NARCIS (Netherlands)

    P.C. Austin (Peter); E.W. Steyerberg (Ewout)

    2015-01-01

    textabstractObjectives To determine the number of independent variables that can be included in a linear regression model. Study Design and Setting We used a series of Monte Carlo simulations to examine the impact of the number of subjects per variable (SPV) on the accuracy of estimated regression c

  10. Inferring gene regression networks with model trees

    Directory of Open Access Journals (Sweden)

    Aguilar-Ruiz Jesus S

    2010-10-01

    Full Text Available Abstract Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear

  11. The number of subjects per variable required in linear regression analyses.

    Science.gov (United States)

    Austin, Peter C; Steyerberg, Ewout W

    2015-06-01

    To determine the number of independent variables that can be included in a linear regression model. We used a series of Monte Carlo simulations to examine the impact of the number of subjects per variable (SPV) on the accuracy of estimated regression coefficients and standard errors, on the empirical coverage of estimated confidence intervals, and on the accuracy of the estimated R(2) of the fitted model. A minimum of approximately two SPV tended to result in estimation of regression coefficients with relative bias of less than 10%. Furthermore, with this minimum number of SPV, the standard errors of the regression coefficients were accurately estimated and estimated confidence intervals had approximately the advertised coverage rates. A much higher number of SPV were necessary to minimize bias in estimating the model R(2), although adjusted R(2) estimates behaved well. The bias in estimating the model R(2) statistic was inversely proportional to the magnitude of the proportion of variation explained by the population regression model. Linear regression models require only two SPV for adequate estimation of regression coefficients, standard errors, and confidence intervals. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  12. Fitting Additive Binomial Regression Models with the R Package blm

    Directory of Open Access Journals (Sweden)

    Stephanie Kovalchik

    2013-09-01

    Full Text Available The R package blm provides functions for fitting a family of additive regression models to binary data. The included models are the binomial linear model, in which all covariates have additive effects, and the linear-expit (lexpit model, which allows some covariates to have additive effects and other covariates to have logisitc effects. Additive binomial regression is a model of event probability, and the coefficients of linear terms estimate covariate-adjusted risk differences. Thus, in contrast to logistic regression, additive binomial regression puts focus on absolute risk and risk differences. In this paper, we give an overview of the methodology we have developed to fit the binomial linear and lexpit models to binary outcomes from cohort and population-based case-control studies. We illustrate the blm packages methods for additive model estimation, diagnostics, and inference with risk association analyses of a bladder cancer nested case-control study in the NIH-AARP Diet and Health Study.

  13. Quantile regression modeling for Malaysian automobile insurance premium data

    Science.gov (United States)

    Fuzi, Mohd Fadzli Mohd; Ismail, Noriszura; Jemain, Abd Aziz

    2015-09-01

    Quantile regression is a robust regression to outliers compared to mean regression models. Traditional mean regression models like Generalized Linear Model (GLM) are not able to capture the entire distribution of premium data. In this paper we demonstrate how a quantile regression approach can be used to model net premium data to study the effects of change in the estimates of regression parameters (rating classes) on the magnitude of response variable (pure premium). We then compare the results of quantile regression model with Gamma regression model. The results from quantile regression show that some rating classes increase as quantile increases and some decrease with decreasing quantile. Further, we found that the confidence interval of median regression (τ = O.5) is always smaller than Gamma regression in all risk factors.

  14. Entrepreneurial intention modeling using hierarchical multiple regression

    Directory of Open Access Journals (Sweden)

    Marina Jeger

    2014-12-01

    Full Text Available The goal of this study is to identify the contribution of effectuation dimensions to the predictive power of the entrepreneurial intention model over and above that which can be accounted for by other predictors selected and confirmed in previous studies. As is often the case in social and behavioral studies, some variables are likely to be highly correlated with each other. Therefore, the relative amount of variance in the criterion variable explained by each of the predictors depends on several factors such as the order of variable entry and sample specifics. The results show the modest predictive power of two dimensions of effectuation prior to the introduction of the theory of planned behavior elements. The article highlights the main advantages of applying hierarchical regression in social sciences as well as in the specific context of entrepreneurial intention formation, and addresses some of the potential pitfalls that this type of analysis entails.

  15. Boosted Regression Tree Models to Explain Watershed ...

    Science.gov (United States)

    Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on the Index of Biotic Integrity (IBI), were also analyzed. Seasonal BRT models at two spatial scales (watershed and riparian buffered area [RBA]) for nitrite-nitrate (NO2-NO3), total Kjeldahl nitrogen, and total phosphorus (TP) and annual models for the IBI score were developed. Two primary factors — location within the watershed (i.e., geographic position, stream order, and distance to a downstream confluence) and percentage of urban land cover (both scales) — emerged as important predictor variables. Latitude and longitude interacted with other factors to explain the variability in summer NO2-NO3 concentrations and IBI scores. BRT results also suggested that location might be associated with indicators of sources (e.g., land cover), runoff potential (e.g., soil and topographic factors), and processes not easily represented by spatial data indicators. Runoff indicators (e.g., Hydrological Soil Group D and Topographic Wetness Indices) explained a substantial portion of the variability in nutrient concentrations as did point sources for TP in the summer months. The results from our BRT approach can help prioritize areas for nutrient management in mixed-use and heavily impacted watershed

  16. Illustrating Bayesian evaluation of informative hypotheses for regression models

    Directory of Open Access Journals (Sweden)

    Anouck eKluytmans

    2012-01-01

    Full Text Available In the present paper we illustrate the Bayesian evaluation of informative hypotheses for regression models. This approach allows psychologists to more directly test their theories than they would using conventional statis- tical analyses. Throughout this paper, both real-world data and simulated datasets will be introduced and evaluated to investigate the pragmatical as well as the theoretical qualities of the approach. We will pave the way from forming informative hypotheses in the context of regression models to interpreting the Bayes factors that express the support for the hypotheses being evaluated. In doing so, the present approach goes beyond p-values and uninformative null hypothesis testing, moving on to informative testing and quantification of model support in a way that is accessible to everyday psychologists.

  17. Time series regression model for infectious disease and weather.

    Science.gov (United States)

    Imai, Chisato; Armstrong, Ben; Chalabi, Zaid; Mangtani, Punam; Hashizume, Masahiro

    2015-10-01

    Time series regression has been developed and long used to evaluate the short-term associations of air pollution and weather with mortality or morbidity of non-infectious diseases. The application of the regression approaches from this tradition to infectious diseases, however, is less well explored and raises some new issues. We discuss and present potential solutions for five issues often arising in such analyses: changes in immune population, strong autocorrelations, a wide range of plausible lag structures and association patterns, seasonality adjustments, and large overdispersion. The potential approaches are illustrated with datasets of cholera cases and rainfall from Bangladesh and influenza and temperature in Tokyo. Though this article focuses on the application of the traditional time series regression to infectious diseases and weather factors, we also briefly introduce alternative approaches, including mathematical modeling, wavelet analysis, and autoregressive integrated moving average (ARIMA) models. Modifications proposed to standard time series regression practice include using sums of past cases as proxies for the immune population, and using the logarithm of lagged disease counts to control autocorrelation due to true contagion, both of which are motivated from "susceptible-infectious-recovered" (SIR) models. The complexity of lag structures and association patterns can often be informed by biological mechanisms and explored by using distributed lag non-linear models. For overdispersed models, alternative distribution models such as quasi-Poisson and negative binomial should be considered. Time series regression can be used to investigate dependence of infectious diseases on weather, but may need modifying to allow for features specific to this context. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  18. Introduction to the use of regression models in epidemiology.

    Science.gov (United States)

    Bender, Ralf

    2009-01-01

    Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.

  19. Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate

    Science.gov (United States)

    Minh, Vu Trieu; Katushin, Dmitri; Antonov, Maksim; Veinthal, Renno

    2017-03-01

    This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM) based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS), Brazilian tensile strength (BTS), rock brittleness index (BI), the distance between planes of weakness (DPW), and the alpha angle (Alpha) between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP). Four (4) statistical regression models (two linear and two nonlinear) are built to predict the ROP of TBM. Finally a fuzzy logic model is developed as an alternative method and compared to the four statistical regression models. Results show that the fuzzy logic model provides better estimations and can be applied to predict the TBM performance. The R-squared value (R2) of the fuzzy logic model scores the highest value of 0.714 over the second runner-up of 0.667 from the multiple variables nonlinear regression model.

  20. Analysing count data of Butterflies communities in Jasin, Melaka: A Poisson regression analysis

    Science.gov (United States)

    Afiqah Muhamad Jamil, Siti; Asrul Affendi Abdullah, M.; Kek, Sie Long; Nor, Maria Elena; Mohamed, Maryati; Ismail, Norradihah

    2017-09-01

    Counting outcomes normally have remaining values highly skewed toward the right as they are often characterized by large values of zeros. The data of butterfly communities, had been taken from Jasin, Melaka and consists of 131 number of subject visits in Jasin, Melaka. In this paper, considering the count data of butterfly communities, an analysis is considered Poisson regression analysis as it is assumed to be an alternative way on better suited to the counting process. This research paper is about analysing count data from zero observation ecological inference of butterfly communities in Jasin, Melaka by using Poisson regression analysis. The software for Poisson regression is readily available and it is becoming more widely used in many field of research and the data was analysed by using SAS software. The purpose of analysis comprised the framework of identifying the concerns. Besides, by using Poisson regression analysis, the study determines the fitness of data for accessing the reliability on using the count data. The finding indicates that the highest and lowest number of subject comes from the third family (Nymphalidae) family and fifth (Hesperidae) family and the Poisson distribution seems to fit the zero values.

  1. Analyzing industrial energy use through ordinary least squares regression models

    Science.gov (United States)

    Golden, Allyson Katherine

    Extensive research has been performed using regression analysis and calibrated simulations to create baseline energy consumption models for residential buildings and commercial institutions. However, few attempts have been made to discuss the applicability of these methodologies to establish baseline energy consumption models for industrial manufacturing facilities. In the few studies of industrial facilities, the presented linear change-point and degree-day regression analyses illustrate ideal cases. It follows that there is a need in the established literature to discuss the methodologies and to determine their applicability for establishing baseline energy consumption models of industrial manufacturing facilities. The thesis determines the effectiveness of simple inverse linear statistical regression models when establishing baseline energy consumption models for industrial manufacturing facilities. Ordinary least squares change-point and degree-day regression methods are used to create baseline energy consumption models for nine different case studies of industrial manufacturing facilities located in the southeastern United States. The influence of ambient dry-bulb temperature and production on total facility energy consumption is observed. The energy consumption behavior of industrial manufacturing facilities is only sometimes sufficiently explained by temperature, production, or a combination of the two variables. This thesis also provides methods for generating baseline energy models that are straightforward and accessible to anyone in the industrial manufacturing community. The methods outlined in this thesis may be easily replicated by anyone that possesses basic spreadsheet software and general knowledge of the relationship between energy consumption and weather, production, or other influential variables. With the help of simple inverse linear regression models, industrial manufacturing facilities may better understand their energy consumption and

  2. Model performance analysis and model validation in logistic regression

    Directory of Open Access Journals (Sweden)

    Rosa Arboretti Giancristofaro

    2007-10-01

    Full Text Available In this paper a new model validation procedure for a logistic regression model is presented. At first, we illustrate a brief review of different techniques of model validation. Next, we define a number of properties required for a model to be considered "good", and a number of quantitative performance measures. Lastly, we describe a methodology for the assessment of the performance of a given model by using an example taken from a management study.

  3. SMOOTH TRANSITION LOGISTIC REGRESSION MODEL TREE

    OpenAIRE

    RODRIGO PINTO MOREIRA

    2008-01-01

    Este trabalho tem como objetivo principal adaptar o modelo STR-Tree, o qual é a combinação de um modelo Smooth Transition Regression com Classification and Regression Tree (CART), a fim de utilizá-lo em Classificação. Para isto algumas alterações foram realizadas em sua forma estrutural e na estimação. Devido ao fato de estarmos fazendo classificação de variáveis dependentes binárias, se faz necessária a utilização das técnicas empregadas em Regressão Logística, dessa forma a estimação dos pa...

  4. An extensible analysable system model

    DEFF Research Database (Denmark)

    Probst, Christian W.; Hansen, Rene Rydhof

    2008-01-01

    , this does not hold for real physical systems. Approaches such as threat modelling try to target the formalisation of the real-world domain, but still are far from the rigid techniques available in security research. Many currently available approaches to assurance of critical infrastructure security...... allows for easy development of analyses for the abstracted systems. We briefly present one application of our approach, namely the analysis of systems for potential insider threats....

  5. Model selection in kernel ridge regression

    DEFF Research Database (Denmark)

    Exterkate, Peter

    2013-01-01

    Kernel ridge regression is a technique to perform ridge regression with a potentially infinite number of nonlinear transformations of the independent variables as regressors. This method is gaining popularity as a data-rich nonlinear forecasting tool, which is applicable in many different contexts....... The influence of the choice of kernel and the setting of tuning parameters on forecast accuracy is investigated. Several popular kernels are reviewed, including polynomial kernels, the Gaussian kernel, and the Sinc kernel. The latter two kernels are interpreted in terms of their smoothing properties......, and the tuning parameters associated to all these kernels are related to smoothness measures of the prediction function and to the signal-to-noise ratio. Based on these interpretations, guidelines are provided for selecting the tuning parameters from small grids using cross-validation. A Monte Carlo study...

  6. A Dirty Model for Multiple Sparse Regression

    CERN Document Server

    Jalali, Ali; Sanghavi, Sujay

    2011-01-01

    Sparse linear regression -- finding an unknown vector from linear measurements -- is now known to be possible with fewer samples than variables, via methods like the LASSO. We consider the multiple sparse linear regression problem, where several related vectors -- with partially shared support sets -- have to be recovered. A natural question in this setting is whether one can use the sharing to further decrease the overall number of samples required. A line of recent research has studied the use of \\ell_1/\\ell_q norm block-regularizations with q>1 for such problems; however these could actually perform worse in sample complexity -- vis a vis solving each problem separately ignoring sharing -- depending on the level of sharing. We present a new method for multiple sparse linear regression that can leverage support and parameter overlap when it exists, but not pay a penalty when it does not. A very simple idea: we decompose the parameters into two components and regularize these differently. We show both theore...

  7. Logistic Regression Model on Antenna Control Unit Autotracking Mode

    Science.gov (United States)

    2015-10-20

    412TW-PA-15240 Logistic Regression Model on Antenna Control Unit Autotracking Mode DANIEL T. LAIRD AIR FORCE TEST CENTER EDWARDS AFB, CA...OCT 15 4. TITLE AND SUBTITLE Logistic Regression Model on Antenna Control Unit Autotracking Mode 5a. CONTRACT NUMBER 5b. GRANT...alternative-hypothesis. This paper will present an Antenna Auto- tracking model using Logistic Regression modeling. This paper presents an example of

  8. Extending the linear model with R generalized linear, mixed effects and nonparametric regression models

    CERN Document Server

    Faraway, Julian J

    2005-01-01

    Linear models are central to the practice of statistics and form the foundation of a vast range of statistical methodologies. Julian J. Faraway''s critically acclaimed Linear Models with R examined regression and analysis of variance, demonstrated the different methods available, and showed in which situations each one applies. Following in those footsteps, Extending the Linear Model with R surveys the techniques that grow from the regression model, presenting three extensions to that framework: generalized linear models (GLMs), mixed effect models, and nonparametric regression models. The author''s treatment is thoroughly modern and covers topics that include GLM diagnostics, generalized linear mixed models, trees, and even the use of neural networks in statistics. To demonstrate the interplay of theory and practice, throughout the book the author weaves the use of the R software environment to analyze the data of real examples, providing all of the R commands necessary to reproduce the analyses. All of the ...

  9. Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model

    Science.gov (United States)

    Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami

    2017-06-01

    A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.

  10. Multiple Retrieval Models and Regression Models for Prior Art Search

    CERN Document Server

    Lopez, Patrice

    2009-01-01

    This paper presents the system called PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach presents three main characteristics: 1. The usage of multiple retrieval models (KL, Okapi) and term index definitions (lemma, phrase, concept) for the three languages considered in the present track (English, French, German) producing ten different sets of ranked results. 2. The merging of the different results based on multiple regression models using an additional validation set created from the patent collection. 3. The exploitation of patent metadata and of the citation structures for creating restricted initial working sets of patents and for producing a final re-ranking regression model. As we exploit specific metadata of the patent documents and the citation relations only at the creation of initial working sets and during the final post ranking step, our architecture remains generic and easy to extend.

  11. Relative risk regression models with inverse polynomials.

    Science.gov (United States)

    Ning, Yang; Woodward, Mark

    2013-08-30

    The proportional hazards model assumes that the log hazard ratio is a linear function of parameters. In the current paper, we model the log relative risk as an inverse polynomial, which is particularly suitable for modeling bounded and asymmetric functions. The parameters estimated by maximizing the partial likelihood are consistent and asymptotically normal. The advantages of the inverse polynomial model over the ordinary polynomial model and the fractional polynomial model for fitting various asymmetric log relative risk functions are shown by simulation. The utility of the method is further supported by analyzing two real data sets, addressing the specific question of the location of the minimum risk threshold.

  12. Model Selection in Kernel Ridge Regression

    DEFF Research Database (Denmark)

    Exterkate, Peter

    Kernel ridge regression is gaining popularity as a data-rich nonlinear forecasting tool, which is applicable in many different contexts. This paper investigates the influence of the choice of kernel and the setting of tuning parameters on forecast accuracy. We review several popular kernels......, including polynomial kernels, the Gaussian kernel, and the Sinc kernel. We interpret the latter two kernels in terms of their smoothing properties, and we relate the tuning parameters associated to all these kernels to smoothness measures of the prediction function and to the signal-to-noise ratio. Based...... on these interpretations, we provide guidelines for selecting the tuning parameters from small grids using cross-validation. A Monte Carlo study confirms the practical usefulness of these rules of thumb. Finally, the flexible and smooth functional forms provided by the Gaussian and Sinc kernels makes them widely...

  13. Combining logistic regression and neural networks to create predictive models.

    OpenAIRE

    Spackman, K. A.

    1992-01-01

    Neural networks are being used widely in medicine and other areas to create predictive models from data. The statistical method that most closely parallels neural networks is logistic regression. This paper outlines some ways in which neural networks and logistic regression are similar, shows how a small modification of logistic regression can be used in the training of neural network models, and illustrates the use of this modification for variable selection and predictive model building wit...

  14. Spatializing Area-Based Measures of Neighborhood Characteristics for Multilevel Regression Analyses: An Areal Median Filtering Approach.

    Science.gov (United States)

    Oka, Masayoshi; Wong, David W S

    2016-06-01

    Area-based measures of neighborhood characteristics simply derived from enumeration units (e.g., census tracts or block groups) ignore the potential of spatial spillover effects, and thus incorporating such measures into multilevel regression models may underestimate the neighborhood effects on health. To overcome this limitation, we describe the concept and method of areal median filtering to spatialize area-based measures of neighborhood characteristics for multilevel regression analyses. The areal median filtering approach provides a means to specify or formulate "neighborhoods" as meaningful geographic entities by removing enumeration unit boundaries as the absolute barriers and by pooling information from the neighboring enumeration units. This spatializing process takes into account for the potential of spatial spillover effects and also converts aspatial measures of neighborhood characteristics into spatial measures. From a conceptual and methodological standpoint, incorporating the derived spatial measures into multilevel regression analyses allows us to more accurately examine the relationships between neighborhood characteristics and health. To promote and set the stage for informative research in the future, we provide a few important conceptual and methodological remarks, and discuss possible applications, inherent limitations, and practical solutions for using the areal median filtering approach in the study of neighborhood effects on health.

  15. Support Vector Regression Model Based on Empirical Mode Decomposition and Auto Regression for Electric Load Forecasting

    Directory of Open Access Journals (Sweden)

    Hong-Juan Li

    2013-04-01

    Full Text Available Electric load forecasting is an important issue for a power utility, associated with the management of daily operations such as energy transfer scheduling, unit commitment, and load dispatch. Inspired by strong non-linear learning capability of support vector regression (SVR, this paper presents a SVR model hybridized with the empirical mode decomposition (EMD method and auto regression (AR for electric load forecasting. The electric load data of the New South Wales (Australia market are employed for comparing the forecasting performances of different forecasting models. The results confirm the validity of the idea that the proposed model can simultaneously provide forecasting with good accuracy and interpretability.

  16. Stochastic Approximation Methods for Latent Regression Item Response Models

    Science.gov (United States)

    von Davier, Matthias; Sinharay, Sandip

    2010-01-01

    This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…

  17. Using AMMI, factorial regression and partial least squares regression models for interpreting genotype x environment interaction.

    NARCIS (Netherlands)

    Vargas, M.; Crossa, J.; Eeuwijk, van F.A.; Ramirez, M.E.; Sayre, K.

    1999-01-01

    Partial least squares (PLS) and factorial regression (FR) are statistical models that incorporate external environmental and/or cultivar variables for studying and interpreting genotype × environment interaction (GEl). The Additive Main effect and Multiplicative Interaction (AMMI) model uses only th

  18. Corporate prediction models, ratios or regression analysis?

    NARCIS (Netherlands)

    Bijnen, E.J.; Wijn, M.F.C.M.

    1994-01-01

    The models developed in the literature with respect to the prediction of a company s failure are based on ratios. It has been shown before that these models should be rejected on theoretical grounds. Our study of industrial companies in the Netherlands shows that the ratios which are used in

  19. Sparse Volterra and Polynomial Regression Models: Recoverability and Estimation

    CERN Document Server

    Kekatos, Vassilis

    2011-01-01

    Volterra and polynomial regression models play a major role in nonlinear system identification and inference tasks. Exciting applications ranging from neuroscience to genome-wide association analysis build on these models with the additional requirement of parsimony. This requirement has high interpretative value, but unfortunately cannot be met by least-squares based or kernel regression methods. To this end, compressed sampling (CS) approaches, already successful in linear regression settings, can offer a viable alternative. The viability of CS for sparse Volterra and polynomial models is the core theme of this work. A common sparse regression task is initially posed for the two models. Building on (weighted) Lasso-based schemes, an adaptive RLS-type algorithm is developed for sparse polynomial regressions. The identifiability of polynomial models is critically challenged by dimensionality. However, following the CS principle, when these models are sparse, they could be recovered by far fewer measurements. ...

  20. Mixed Frequency Data Sampling Regression Models: The R Package midasr

    Directory of Open Access Journals (Sweden)

    Eric Ghysels

    2016-08-01

    Full Text Available When modeling economic relationships it is increasingly common to encounter data sampled at different frequencies. We introduce the R package midasr which enables estimating regression models with variables sampled at different frequencies within a MIDAS regression framework put forward in work by Ghysels, Santa-Clara, and Valkanov (2002. In this article we define a general autoregressive MIDAS regression model with multiple variables of different frequencies and show how it can be specified using the familiar R formula interface and estimated using various optimization methods chosen by the researcher. We discuss how to check the validity of the estimated model both in terms of numerical convergence and statistical adequacy of a chosen regression specification, how to perform model selection based on a information criterion, how to assess forecasting accuracy of the MIDAS regression model and how to obtain a forecast aggregation of different MIDAS regression models. We illustrate the capabilities of the package with a simulated MIDAS regression model and give two empirical examples of application of MIDAS regression.

  1. Impact of multicollinearity on small sample hydrologic regression models

    Science.gov (United States)

    Kroll, Charles N.; Song, Peter

    2013-06-01

    Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.

  2. ASYMPTOTIC EFFICIENT ESTIMATION IN SEMIPARAMETRIC NONLINEAR REGRESSION MODELS

    Institute of Scientific and Technical Information of China (English)

    ZhuZhongyi; WeiBocheng

    1999-01-01

    In this paper, the estimation method based on the “generalized profile likelihood” for the conditionally parametric models in the paper given by Severini and Wong (1992) is extendedto fixed design semiparametrie nonlinear regression models. For these semiparametrie nonlinear regression models,the resulting estimator of parametric component of the model is shown to beasymptotically efficient and the strong convergence rate of nonparametric component is investigated. Many results (for example Chen (1988) ,Gao & Zhao (1993), Rice (1986) et al. ) are extended to fixed design semiparametric nonlinear regression models.

  3. Support vector regression model for complex target RCS predicting

    Institute of Scientific and Technical Information of China (English)

    Wang Gu; Chen Weishi; Miao Jungang

    2009-01-01

    The electromagnetic scattering computation has developed rapidly for many years; some computing problems for complex and coated targets cannot be solved by using the existing theory and computing models. A computing model based on data is established for making up the insufficiency of theoretic models. Based on the "support vector regression method", which is formulated on the principle of minimizing a structural risk, a data model to predicate the unknown radar cross section of some appointed targets is given. Comparison between the actual data and the results of this predicting model based on support vector regression method proved that the support vector regression method is workable and with a comparative precision.

  4. Rank-preserving regression: a more robust rank regression model against outliers.

    Science.gov (United States)

    Chen, Tian; Kowalski, Jeanne; Chen, Rui; Wu, Pan; Zhang, Hui; Feng, Changyong; Tu, Xin M

    2016-08-30

    Mean-based semi-parametric regression models such as the popular generalized estimating equations are widely used to improve robustness of inference over parametric models. Unfortunately, such models are quite sensitive to outlying observations. The Wilcoxon-score-based rank regression (RR) provides more robust estimates over generalized estimating equations against outliers. However, the RR and its extensions do not sufficiently address missing data arising in longitudinal studies. In this paper, we propose a new approach to address outliers under a different framework based on the functional response models. This functional-response-model-based alternative not only addresses limitations of the RR and its extensions for longitudinal data, but, with its rank-preserving property, even provides more robust estimates than these alternatives. The proposed approach is illustrated with both real and simulated data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  5. Nonlinear and Non Normal Regression Models in Physiological Research

    OpenAIRE

    1984-01-01

    Applications of nonlinear and non normal regression models are in increasing order for appropriate interpretation of complex phenomenon of biomedical sciences. This paper reviews critically some applications of these models physiological research.

  6. Multivariate Chemometrics with Regression and Classification Analyses in Heroin Profiling Based on the Chromatographic Data.

    Science.gov (United States)

    B Gadžurić, Slobodan; O Podunavac Kuzmanović, Sanja; B Vraneš, Milan; Petrin, Marija; Bugarski, Tatjana; Kovačević, Strahinja Z

    2016-01-01

    The purpose of this work is to promote and facilitate forensic profiling and chemical analysis of illicit drug samples in order to determine their origin, methods of production and transfer through the country. The article is based on the gas chromatography analysis of heroin samples seized from three different locations in Serbia. Chemometric approach with appropriate statistical tools (multiple-linear regression (MLR), hierarchical cluster analysis (HCA) and Wald-Wolfowitz run (WWR) test) were applied on chromatographic data of heroin samples in order to correlate and examine the geographic origin of seized heroin samples. The best MLR models were further validated by leave-one-out technique as well as by the calculation of basic statistical parameters for the established models. To confirm the predictive power of the models, external set of heroin samples was used. High agreement between experimental and predicted values of acetyl thebaol and diacetyl morphine peak ratio, obtained in the validation procedure, indicated the good quality of derived MLR models. WWR test showed which examined heroin samples come from the same population, and HCA was applied in order to overview the similarities among the studied heroine samples.

  7. Identification of Influential Points in a Linear Regression Model

    Directory of Open Access Journals (Sweden)

    Jan Grosz

    2011-03-01

    Full Text Available The article deals with the detection and identification of influential points in the linear regression model. Three methods of detection of outliers and leverage points are described. These procedures can also be used for one-sample (independentdatasets. This paper briefly describes theoretical aspects of several robust methods as well. Robust statistics is a powerful tool to increase the reliability and accuracy of statistical modelling and data analysis. A simulation model of the simple linear regression is presented.

  8. Adaptive Regression and Classification Models with Applications in Insurance

    Directory of Open Access Journals (Sweden)

    Jekabsons Gints

    2014-07-01

    Full Text Available Nowadays, in the insurance industry the use of predictive modeling by means of regression and classification techniques is becoming increasingly important and popular. The success of an insurance company largely depends on the ability to perform such tasks as credibility estimation, determination of insurance premiums, estimation of probability of claim, detecting insurance fraud, managing insurance risk. This paper discusses regression and classification modeling for such types of prediction problems using the method of Adaptive Basis Function Construction

  9. Geometric Properties of AR(q) Nonlinear Regression Models

    Institute of Scientific and Technical Information of China (English)

    LIUYing-ar; WEIBo-cheng

    2004-01-01

    This paper is devoted to a study of geometric properties of AR(q) nonlinear regression models. We present geometric frameworks for regression parameter space and autoregression parameter space respectively based on the weighted inner product by fisher information matrix. Several geometric properties related to statistical curvatures are given for the models. The results of this paper extended the work of Bates & Watts(1980,1988)[1.2] and Seber & Wild (1989)[3].

  10. Robust Depth-Weighted Wavelet for Nonparametric Regression Models

    Institute of Scientific and Technical Information of China (English)

    Lu LIN

    2005-01-01

    In the nonpaxametric regression models, the original regression estimators including kernel estimator, Fourier series estimator and wavelet estimator are always constructed by the weighted sum of data, and the weights depend only on the distance between the design points and estimation points. As a result these estimators are not robust to the perturbations in data. In order to avoid this problem, a new nonparametric regression model, called the depth-weighted regression model, is introduced and then the depth-weighted wavelet estimation is defined. The new estimation is robust to the perturbations in data, which attains very high breakdown value close to 1/2. On the other hand, some asymptotic behaviours such as asymptotic normality are obtained. Some simulations illustrate that the proposed wavelet estimator is more robust than the original wavelet estimator and, as a price to pay for the robustness, the new method is slightly less efficient than the original method.

  11. Graphical models for genetic analyses

    DEFF Research Database (Denmark)

    Lauritzen, Steffen Lilholt; Sheehan, Nuala A.

    2003-01-01

    This paper introduces graphical models as a natural environment in which to formulate and solve problems in genetics and related areas. Particular emphasis is given to the relationships among various local computation algorithms which have been developed within the hitherto mostly separate areas...... of graphical models and genetics. The potential of graphical models is explored and illustrated through a number of example applications where the genetic element is substantial or dominating....

  12. Wavelet regression model in forecasting crude oil price

    Science.gov (United States)

    Hamid, Mohd Helmie; Shabri, Ani

    2017-05-01

    This study presents the performance of wavelet multiple linear regression (WMLR) technique in daily crude oil forecasting. WMLR model was developed by integrating the discrete wavelet transform (DWT) and multiple linear regression (MLR) model. The original time series was decomposed to sub-time series with different scales by wavelet theory. Correlation analysis was conducted to assist in the selection of optimal decomposed components as inputs for the WMLR model. The daily WTI crude oil price series has been used in this study to test the prediction capability of the proposed model. The forecasting performance of WMLR model were also compared with regular multiple linear regression (MLR), Autoregressive Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) using root mean square errors (RMSE) and mean absolute errors (MAE). Based on the experimental results, it appears that the WMLR model performs better than the other forecasting technique tested in this study.

  13. Regression Model Optimization for the Analysis of Experimental Data

    Science.gov (United States)

    Ulbrich, N.

    2009-01-01

    A candidate math model search algorithm was developed at Ames Research Center that determines a recommended math model for the multivariate regression analysis of experimental data. The search algorithm is applicable to classical regression analysis problems as well as wind tunnel strain gage balance calibration analysis applications. The algorithm compares the predictive capability of different regression models using the standard deviation of the PRESS residuals of the responses as a search metric. This search metric is minimized during the search. Singular value decomposition is used during the search to reject math models that lead to a singular solution of the regression analysis problem. Two threshold dependent constraints are also applied. The first constraint rejects math models with insignificant terms. The second constraint rejects math models with near-linear dependencies between terms. The math term hierarchy rule may also be applied as an optional constraint during or after the candidate math model search. The final term selection of the recommended math model depends on the regressor and response values of the data set, the user s function class combination choice, the user s constraint selections, and the result of the search metric minimization. A frequently used regression analysis example from the literature is used to illustrate the application of the search algorithm to experimental data.

  14. A general framework for the use of logistic regression models in meta-analysis.

    Science.gov (United States)

    Simmonds, Mark C; Higgins, Julian Pt

    2016-12-01

    Where individual participant data are available for every randomised trial in a meta-analysis of dichotomous event outcomes, "one-stage" random-effects logistic regression models have been proposed as a way to analyse these data. Such models can also be used even when individual participant data are not available and we have only summary contingency table data. One benefit of this one-stage regression model over conventional meta-analysis methods is that it maximises the correct binomial likelihood for the data and so does not require the common assumption that effect estimates are normally distributed. A second benefit of using this model is that it may be applied, with only minor modification, in a range of meta-analytic scenarios, including meta-regression, network meta-analyses and meta-analyses of diagnostic test accuracy. This single model can potentially replace the variety of often complex methods used in these areas. This paper considers, with a range of meta-analysis examples, how random-effects logistic regression models may be used in a number of different types of meta-analyses. This one-stage approach is compared with widely used meta-analysis methods including Bayesian network meta-analysis and the bivariate and hierarchical summary receiver operating characteristic (ROC) models for meta-analyses of diagnostic test accuracy.

  15. Alternative regression models to assess increase in childhood BMI

    Directory of Open Access Journals (Sweden)

    Mansmann Ulrich

    2008-09-01

    Full Text Available Abstract Background Body mass index (BMI data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Methods Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs, quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS. We analyzed data of 4967 children participating in the school entry health examination in Bavaria, Germany, from 2001 to 2002. TV watching, meal frequency, breastfeeding, smoking in pregnancy, maternal obesity, parental social class and weight gain in the first 2 years of life were considered as risk factors for obesity. Results GAMLSS showed a much better fit regarding the estimation of risk factors effects on transformed and untransformed BMI data than common GLMs with respect to the generalized Akaike information criterion. In comparison with GAMLSS, quantile regression allowed for additional interpretation of prespecified distribution quantiles, such as quantiles referring to overweight or obesity. The variables TV watching, maternal BMI and weight gain in the first 2 years were directly, and meal frequency was inversely significantly associated with body composition in any model type examined. In contrast, smoking in pregnancy was not directly, and breastfeeding and parental social class were not inversely significantly associated with body composition in GLM models, but in GAMLSS and partly in quantile regression models. Risk factor specific BMI percentile curves could be estimated from GAMLSS and quantile regression models. Conclusion GAMLSS and quantile regression seem to be more appropriate than common GLMs for risk factor modeling of BMI data.

  16. Credit Scoring Model Hybridizing Artificial Intelligence with Logistic Regression

    Directory of Open Access Journals (Sweden)

    Han Lu

    2013-01-01

    Full Text Available Today the most commonly used techniques for credit scoring are artificial intelligence and statistics. In this paper, we started a new way to use these two kinds of models. Through logistic regression filters the variables with a high degree of correlation, artificial intelligence models reduce complexity and accelerate convergence, while these models hybridizing logistic regression have better explanations in statistically significance, thus improve the effect of artificial intelligence models. With experiments on German data set, we find an interesting phenomenon defined as ‘Dimensional interference’ with support vector machine and from cross validation it can be seen that the new method gives a lot of help with credit scoring.

  17. Analysis of Sting Balance Calibration Data Using Optimized Regression Models

    Science.gov (United States)

    Ulbrich, N.; Bader, Jon B.

    2010-01-01

    Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.

  18. Group Lasso for high dimensional sparse quantile regression models

    CERN Document Server

    Kato, Kengo

    2011-01-01

    This paper studies the statistical properties of the group Lasso estimator for high dimensional sparse quantile regression models where the number of explanatory variables (or the number of groups of explanatory variables) is possibly much larger than the sample size while the number of variables in "active" groups is sufficiently small. We establish a non-asymptotic bound on the $\\ell_{2}$-estimation error of the estimator. This bound explains situations under which the group Lasso estimator is potentially superior/inferior to the $\\ell_{1}$-penalized quantile regression estimator in terms of the estimation error. We also propose a data-dependent choice of the tuning parameter to make the method more practical, by extending the original proposal of Belloni and Chernozhukov (2011) for the $\\ell_{1}$-penalized quantile regression estimator. As an application, we analyze high dimensional additive quantile regression models. We show that under a set of primitive regularity conditions, the group Lasso estimator c...

  19. Graphical models for genetic analyses

    DEFF Research Database (Denmark)

    Lauritzen, Steffen Lilholt; Sheehan, Nuala A.

    2003-01-01

    This paper introduces graphical models as a natural environment in which to formulate and solve problems in genetics and related areas. Particular emphasis is given to the relationships among various local computation algorithms which have been developed within the hitherto mostly separate areas...

  20. Joint regression analysis and AMMI model applied to oat improvement

    Science.gov (United States)

    Oliveira, A.; Oliveira, T. A.; Mejza, S.

    2012-09-01

    In our work we present an application of some biometrical methods useful in genotype stability evaluation, namely AMMI model, Joint Regression Analysis (JRA) and multiple comparison tests. A genotype stability analysis of oat (Avena Sativa L.) grain yield was carried out using data of the Portuguese Plant Breeding Board, sample of the 22 different genotypes during the years 2002, 2003 and 2004 in six locations. In Ferreira et al. (2006) the authors state the relevance of the regression models and of the Additive Main Effects and Multiplicative Interactions (AMMI) model, to study and to estimate phenotypic stability effects. As computational techniques we use the Zigzag algorithm to estimate the regression coefficients and the agricolae-package available in R software for AMMI model analysis.

  1. Buffalos milk yield analysis using random regression models

    Directory of Open Access Journals (Sweden)

    A.S. Schierholt

    2010-02-01

    Full Text Available Data comprising 1,719 milk yield records from 357 females (predominantly Murrah breed, daughters of 110 sires, with births from 1974 to 2004, obtained from the Programa de Melhoramento Genético de Bubalinos (PROMEBUL and from records of EMBRAPA Amazônia Oriental - EAO herd, located in Belém, Pará, Brazil, were used to compare random regression models for estimating variance components and predicting breeding values of the sires. The data were analyzed by different models using the Legendre’s polynomial functions from second to fourth orders. The random regression models included the effects of herd-year, month of parity date of the control; regression coefficients for age of females (in order to describe the fixed part of the lactation curve and random regression coefficients related to the direct genetic and permanent environment effects. The comparisons among the models were based on the Akaike Infromation Criterion. The random effects regression model using third order Legendre’s polynomials with four classes of the environmental effect were the one that best described the additive genetic variation in milk yield. The heritability estimates varied from 0.08 to 0.40. The genetic correlation between milk yields in younger ages was close to the unit, but in older ages it was low.

  2. Optimization of Regression Models of Experimental Data Using Confirmation Points

    Science.gov (United States)

    Ulbrich, N.

    2010-01-01

    A new search metric is discussed that may be used to better assess the predictive capability of different math term combinations during the optimization of a regression model of experimental data. The new search metric can be determined for each tested math term combination if the given experimental data set is split into two subsets. The first subset consists of data points that are only used to determine the coefficients of the regression model. The second subset consists of confirmation points that are exclusively used to test the regression model. The new search metric value is assigned after comparing two values that describe the quality of the fit of each subset. The first value is the standard deviation of the PRESS residuals of the data points. The second value is the standard deviation of the response residuals of the confirmation points. The greater of the two values is used as the new search metric value. This choice guarantees that both standard deviations are always less or equal to the value that is used during the optimization. Experimental data from the calibration of a wind tunnel strain-gage balance is used to illustrate the application of the new search metric. The new search metric ultimately generates an optimized regression model that was already tested at regression model independent confirmation points before it is ever used to predict an unknown response from a set of regressors.

  3. Geographically Weighted Logistic Regression Applied to Credit Scoring Models

    Directory of Open Access Journals (Sweden)

    Pedro Henrique Melo Albuquerque

    Full Text Available Abstract This study used real data from a Brazilian financial institution on transactions involving Consumer Direct Credit (CDC, granted to clients residing in the Distrito Federal (DF, to construct credit scoring models via Logistic Regression and Geographically Weighted Logistic Regression (GWLR techniques. The aims were: to verify whether the factors that influence credit risk differ according to the borrower’s geographic location; to compare the set of models estimated via GWLR with the global model estimated via Logistic Regression, in terms of predictive power and financial losses for the institution; and to verify the viability of using the GWLR technique to develop credit scoring models. The metrics used to compare the models developed via the two techniques were the AICc informational criterion, the accuracy of the models, the percentage of false positives, the sum of the value of false positive debt, and the expected monetary value of portfolio default compared with the monetary value of defaults observed. The models estimated for each region in the DF were distinct in their variables and coefficients (parameters, with it being concluded that credit risk was influenced differently in each region in the study. The Logistic Regression and GWLR methodologies presented very close results, in terms of predictive power and financial losses for the institution, and the study demonstrated viability in using the GWLR technique to develop credit scoring models for the target population in the study.

  4. CICAAR - Convolutive ICA with an Auto-Regressive Inverse Model

    DEFF Research Database (Denmark)

    Dyrholm, Mads; Hansen, Lars Kai

    2004-01-01

    We invoke an auto-regressive IIR inverse model for convolutive ICA and derive expressions for the likelihood and its gradient. We argue that optimization will give a stable inverse. When there are more sensors than sources the mixing model parameters are estimated in a second step by least squares...

  5. Systematic evaluation of land use regression models for NO₂

    NARCIS (Netherlands)

    Wang, M.|info:eu-repo/dai/nl/345480279; Beelen, R.M.J.|info:eu-repo/dai/nl/30483100X; Eeftens, M.R.|info:eu-repo/dai/nl/315028300; Meliefste, C.; Hoek, G.|info:eu-repo/dai/nl/069553475; Brunekreef, B.|info:eu-repo/dai/nl/067548180

    2012-01-01

    Land use regression (LUR) models have become popular to explain the spatial variation of air pollution concentrations. Independent evaluation is important. We developed LUR models for nitrogen dioxide (NO(2)) using measurements conducted at 144 sampling sites in The Netherlands. Sites were randomly

  6. FUNCTIONAL-COEFFICIENT REGRESSION MODEL AND ITS ESTIMATION

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    In this paper,a class of functional-coefficient regression models is proposed and an estimation procedure based on the locally weighted least equares is suggested. This class of models,with the proposed estimation method,is a powerful means for exploratory data analysis.

  7. Correcting for multivariate measurement error by regression calibration in meta-analyses of epidemiological studies

    DEFF Research Database (Denmark)

    Tybjærg-Hansen, Anne

    2009-01-01

    Within-person variability in measured values of multiple risk factors can bias their associations with disease. The multivariate regression calibration (RC) approach can correct for such measurement error and has been applied to studies in which true values or independent repeat measurements of t...

  8. Improved Dietary Guidelines for Vitamin D: Application of Individual Participant Data (IPD)-Level Meta-Regression Analyses.

    Science.gov (United States)

    Cashman, Kevin D; Ritz, Christian; Kiely, Mairead; Odin Collaborators

    2017-05-08

    Dietary Reference Values (DRVs) for vitamin D have a key role in the prevention of vitamin D deficiency. However, despite adopting similar risk assessment protocols, estimates from authoritative agencies over the last 6 years have been diverse. This may have arisen from diverse approaches to data analysis. Modelling strategies for pooling of individual subject data from cognate vitamin D randomized controlled trials (RCTs) are likely to provide the most appropriate DRV estimates. Thus, the objective of the present work was to undertake the first-ever individual participant data (IPD)-level meta-regression, which is increasingly recognized as best practice, from seven winter-based RCTs (with 882 participants ranging in age from 4 to 90 years) of the vitamin D intake-serum 25-hydroxyvitamin D (25(OH)D) dose-response. Our IPD-derived estimates of vitamin D intakes required to maintain 97.5% of 25(OH)D concentrations >25, 30, and 50 nmol/L across the population are 10, 13, and 26 µg/day, respectively. In contrast, standard meta-regression analyses with aggregate data (as used by several agencies in recent years) from the same RCTs estimated that a vitamin D intake requirement of 14 µg/day would maintain 97.5% of 25(OH)D >50 nmol/L. These first IPD-derived estimates offer improved dietary recommendations for vitamin D because the underpinning modeling captures the between-person variability in response of serum 25(OH)D to vitamin D intake.

  9. Maximum Entropy Discrimination Poisson Regression for Software Reliability Modeling.

    Science.gov (United States)

    Chatzis, Sotirios P; Andreou, Andreas S

    2015-11-01

    Reliably predicting software defects is one of the most significant tasks in software engineering. Two of the major components of modern software reliability modeling approaches are: 1) extraction of salient features for software system representation, based on appropriately designed software metrics and 2) development of intricate regression models for count data, to allow effective software reliability data modeling and prediction. Surprisingly, research in the latter frontier of count data regression modeling has been rather limited. More specifically, a lack of simple and efficient algorithms for posterior computation has made the Bayesian approaches appear unattractive, and thus underdeveloped in the context of software reliability modeling. In this paper, we try to address these issues by introducing a novel Bayesian regression model for count data, based on the concept of max-margin data modeling, effected in the context of a fully Bayesian model treatment with simple and efficient posterior distribution updates. Our novel approach yields a more discriminative learning technique, making more effective use of our training data during model inference. In addition, it allows of better handling uncertainty in the modeled data, which can be a significant problem when the training data are limited. We derive elegant inference algorithms for our model under the mean-field paradigm and exhibit its effectiveness using the publicly available benchmark data sets.

  10. Check-all-that-apply data analysed by Partial Least Squares regression

    DEFF Research Database (Denmark)

    Rinnan, Åsmund; Giacalone, Davide; Frøst, Michael Bom

    2015-01-01

    are analysed by multivariate techniques. CATA data can be analysed both by setting the CATA as the X and the Y. The former is the PLS-Discriminant Analysis (PLS-DA) version, while the latter is the ANOVA-PLS (A-PLS) version. We investigated the difference between these two approaches, concluding...

  11. Sugarcane Land Classification with Satellite Imagery using Logistic Regression Model

    Science.gov (United States)

    Henry, F.; Herwindiati, D. E.; Mulyono, S.; Hendryli, J.

    2017-03-01

    This paper discusses the classification of sugarcane plantation area from Landsat-8 satellite imagery. The classification process uses binary logistic regression method with time series data of normalized difference vegetation index as input. The process is divided into two steps: training and classification. The purpose of training step is to identify the best parameter of the regression model using gradient descent algorithm. The best fit of the model can be utilized to classify sugarcane and non-sugarcane area. The experiment shows high accuracy and successfully maps the sugarcane plantation area which obtained best result of Cohen’s Kappa value 0.7833 (strong) with 89.167% accuracy.

  12. The art of regression modeling in road safety

    CERN Document Server

    Hauer, Ezra

    2015-01-01

    This unique book explains how to fashion useful regression models from commonly available data to erect models essential for evidence-based road safety management and research. Composed from techniques and best practices presented over many years of lectures and workshops, The Art of Regression Modeling in Road Safety illustrates that fruitful modeling cannot be done without substantive knowledge about the modeled phenomenon. Class-tested in courses and workshops across North America, the book is ideal for professionals, researchers, university professors, and graduate students with an interest in, or responsibilities related to, road safety. This book also: · Presents for the first time a powerful analytical tool for road safety researchers and practitioners · Includes problems and solutions in each chapter as well as data and spreadsheets for running models and PowerPoint presentation slides · Features pedagogy well-suited for graduate courses and workshops including problems, solutions, and PowerPoint p...

  13. Logistic regression for risk factor modelling in stuttering research.

    Science.gov (United States)

    Reed, Phil; Wu, Yaqionq

    2013-06-01

    To outline the uses of logistic regression and other statistical methods for risk factor analysis in the context of research on stuttering. The principles underlying the application of a logistic regression are illustrated, and the types of questions to which such a technique has been applied in the stuttering field are outlined. The assumptions and limitations of the technique are discussed with respect to existing stuttering research, and with respect to formulating appropriate research strategies to accommodate these considerations. Finally, some alternatives to the approach are briefly discussed. The way the statistical procedures are employed are demonstrated with some hypothetical data. Research into several practical issues concerning stuttering could benefit if risk factor modelling were used. Important examples are early diagnosis, prognosis (whether a child will recover or persist) and assessment of treatment outcome. After reading this article you will: (a) Summarize the situations in which logistic regression can be applied to a range of issues about stuttering; (b) Follow the steps in performing a logistic regression analysis; (c) Describe the assumptions of the logistic regression technique and the precautions that need to be checked when it is employed; (d) Be able to summarize its advantages over other techniques like estimation of group differences and simple regression. Copyright © 2012 Elsevier Inc. All rights reserved.

  14. Direction of Effects in Multiple Linear Regression Models.

    Science.gov (United States)

    Wiedermann, Wolfgang; von Eye, Alexander

    2015-01-01

    Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed.

  15. Differential item functioning (DIF) analyses of health-related quality of life instruments using logistic regression

    DEFF Research Database (Denmark)

    Scott, Neil W; Fayers, Peter M; Aaronson, Neil K

    2010-01-01

    Differential item functioning (DIF) methods can be used to determine whether different subgroups respond differently to particular items within a health-related quality of life (HRQoL) subscale, after allowing for overall subgroup differences in that scale. This article reviews issues that arise ...... when testing for DIF in HRQoL instruments. We focus on logistic regression methods, which are often used because of their efficiency, simplicity and ease of application....

  16. Differential item functioning (DIF) analyses of health-related quality of life instruments using logistic regression

    DEFF Research Database (Denmark)

    Scott, Neil W.; Fayers, Peter M.; Aaronson, Neil K.

    2010-01-01

    Differential item functioning (DIF) methods can be used to determine whether different subgroups respond differently to particular items within a health-related quality of life (HRQoL) subscale, after allowing for overall subgroup differences in that scale. This article reviews issues that arise...... when testing for DIF in HRQoL instruments. We focus on logistic regression methods, which are often used because of their efficiency, simplicity and ease of application....

  17. Modelling multimodal photometric redshift regression with noisy observations

    CERN Document Server

    Kügler, S D

    2016-01-01

    In this work, we are trying to extent the existing photometric redshift regression models from modeling pure photometric data back to the spectra themselves. To that end, we developed a PCA that is capable of describing the input uncertainty (including missing values) in a dimensionality reduction framework. With this "spectrum generator" at hand, we are capable of treating the redshift regression problem in a fully Bayesian framework, returning a posterior distribution over the redshift. This approach allows therefore to approach the multimodal regression problem in an adequate fashion. In addition, input uncertainty on the magnitudes can be included quite naturally and lastly, the proposed algorithm allows in principle to make predictions outside the training values which makes it a fascinating opportunity for the detection of high-redshifted quasars.

  18. Robust Bayesian Regularized Estimation Based on t Regression Model

    Directory of Open Access Journals (Sweden)

    Zean Li

    2015-01-01

    Full Text Available The t distribution is a useful extension of the normal distribution, which can be used for statistical modeling of data sets with heavy tails, and provides robust estimation. In this paper, in view of the advantages of Bayesian analysis, we propose a new robust coefficient estimation and variable selection method based on Bayesian adaptive Lasso t regression. A Gibbs sampler is developed based on the Bayesian hierarchical model framework, where we treat the t distribution as a mixture of normal and gamma distributions and put different penalization parameters for different regression coefficients. We also consider the Bayesian t regression with adaptive group Lasso and obtain the Gibbs sampler from the posterior distributions. Both simulation studies and real data example show that our method performs well compared with other existing methods when the error distribution has heavy tails and/or outliers.

  19. Analyses of Developmental Rate Isomorphy in Ectotherms: Introducing the Dirichlet Regression.

    Directory of Open Access Journals (Sweden)

    David S Boukal

    Full Text Available Temperature drives development in insects and other ectotherms because their metabolic rate and growth depends directly on thermal conditions. However, relative durations of successive ontogenetic stages often remain nearly constant across a substantial range of temperatures. This pattern, termed 'developmental rate isomorphy' (DRI in insects, appears to be widespread and reported departures from DRI are generally very small. We show that these conclusions may be due to the caveats hidden in the statistical methods currently used to study DRI. Because the DRI concept is inherently based on proportional data, we propose that Dirichlet regression applied to individual-level data is an appropriate statistical method to critically assess DRI. As a case study we analyze data on five aquatic and four terrestrial insect species. We find that results obtained by Dirichlet regression are consistent with DRI violation in at least eight of the studied species, although standard analysis detects significant departure from DRI in only four of them. Moreover, the departures from DRI detected by Dirichlet regression are consistently much larger than previously reported. The proposed framework can also be used to infer whether observed departures from DRI reflect life history adaptations to size- or stage-dependent effects of varying temperature. Our results indicate that the concept of DRI in insects and other ectotherms should be critically re-evaluated and put in a wider context, including the concept of 'equiproportional development' developed for copepods.

  20. A Multi-objective Procedure for Efficient Regression Modeling

    CERN Document Server

    Sinha, Ankur; Kuosmanen, Timo

    2012-01-01

    Variable selection is recognized as one of the most critical steps in statistical modeling. The problems encountered in engineering and social sciences are commonly characterized by over-abundance of explanatory variables, non-linearities and unknown interdependencies between the regressors. An added difficulty is that the analysts may have little or no prior knowledge on the relative importance of the variables. To provide a robust method for model selection, this paper introduces a technique called the Multi-objective Genetic Algorithm for Variable Selection (MOGA-VS) which provides the user with an efficient set of regression models for a given data-set. The algorithm considers the regression problem as a two objective task, where the purpose is to choose those models over the other which have less number of regression coefficients and better goodness of fit. In MOGA-VS, the model selection procedure is implemented in two steps. First, we generate the frontier of all efficient or non-dominated regression m...

  1. Applications of some discrete regression models for count data

    Directory of Open Access Journals (Sweden)

    B. M. Golam Kibria

    2006-01-01

    Full Text Available In this paper we have considered several regression models to fit the count data that encounter in the field of Biometrical, Environmental, Social Sciences and Transportation Engineering. We have fitted Poisson (PO, Negative Binomial (NB, Zero-Inflated Poisson (ZIP and Zero-Inflated Negative Binomial (ZINB regression models to run-off-road (ROR crash data which collected on arterial roads in south region (rural of Florida State. To compare the performance of these models, we analyzed data with moderate to high percentage of zero counts. Because the variances were almost three times greater than the means, it appeared that both NB and ZINB models performed better than PO and ZIP models for the zero inflated and over dispersed count data.

  2. A regression model to estimate regional ground water recharge.

    Science.gov (United States)

    Lorenz, David L; Delin, Geoffrey N

    2007-01-01

    A regional regression model was developed to estimate the spatial distribution of ground water recharge in subhumid regions. The regional regression recharge (RRR) model was based on a regression of basin-wide estimates of recharge from surface water drainage basins, precipitation, growing degree days (GDD), and average basin specific yield (SY). Decadal average recharge, precipitation, and GDD were used in the RRR model. The RRR estimates were derived from analysis of stream base flow using a computer program that was based on the Rorabaugh method. As expected, there was a strong correlation between recharge and precipitation. The model was applied to statewide data in Minnesota. Where precipitation was least in the western and northwestern parts of the state (50 to 65 cm/year), recharge computed by the RRR model also was lowest (0 to 5 cm/year). A strong correlation also exists between recharge and SY. SY was least in areas where glacial lake clay occurs, primarily in the northwest part of the state; recharge estimates in these areas were in the 0- to 5-cm/year range. In sand-plain areas where SY is greatest, recharge estimates were in the 15- to 29-cm/year range on the basis of the RRR model. Recharge estimates that were based on the RRR model compared favorably with estimates made on the basis of other methods. The RRR model can be applied in other subhumid regions where region wide data sets of precipitation, streamflow, GDD, and soils data are available.

  3. Regression modeling strategies with applications to linear models, logistic and ordinal regression, and survival analysis

    CERN Document Server

    Harrell , Jr , Frank E

    2015-01-01

    This highly anticipated second edition features new chapters and sections, 225 new references, and comprehensive R software. In keeping with the previous edition, this book is about the art and science of data analysis and predictive modeling, which entails choosing and using multiple tools. Instead of presenting isolated techniques, this text emphasizes problem solving strategies that address the many issues arising when developing multivariable models using real data and not standard textbook examples. It includes imputation methods for dealing with missing data effectively, methods for fitting nonlinear relationships and for making the estimation of transformations a formal part of the modeling process, methods for dealing with "too many variables to analyze and not enough observations," and powerful model validation techniques based on the bootstrap.  The reader will gain a keen understanding of predictive accuracy, and the harm of categorizing continuous predictors or outcomes.  This text realistically...

  4. Procedures for adjusting regional regression models of urban-runoff quality using local data

    Science.gov (United States)

    Hoos, A.B.; Sisolak, J.K.

    1993-01-01

    Statistical operations termed model-adjustment procedures (MAP?s) can be used to incorporate local data into existing regression models to improve the prediction of urban-runoff quality. Each MAP is a form of regression analysis in which the local data base is used as a calibration data set. Regression coefficients are determined from the local data base, and the resulting `adjusted? regression models can then be used to predict storm-runoff quality at unmonitored sites. The response variable in the regression analyses is the observed load or mean concentration of a constituent in storm runoff for a single storm. The set of explanatory variables used in the regression analyses is different for each MAP, but always includes the predicted value of load or mean concentration from a regional regression model. The four MAP?s examined in this study were: single-factor regression against the regional model prediction, P, (termed MAP-lF-P), regression against P,, (termed MAP-R-P), regression against P, and additional local variables (termed MAP-R-P+nV), and a weighted combination of P, and a local-regression prediction (termed MAP-W). The procedures were tested by means of split-sample analysis, using data from three cities included in the Nationwide Urban Runoff Program: Denver, Colorado; Bellevue, Washington; and Knoxville, Tennessee. The MAP that provided the greatest predictive accuracy for the verification data set differed among the three test data bases and among model types (MAP-W for Denver and Knoxville, MAP-lF-P and MAP-R-P for Bellevue load models, and MAP-R-P+nV for Bellevue concentration models) and, in many cases, was not clearly indicated by the values of standard error of estimate for the calibration data set. A scheme to guide MAP selection, based on exploratory data analysis of the calibration data set, is presented and tested. The MAP?s were tested for sensitivity to the size of a calibration data set. As expected, predictive accuracy of all MAP?s for

  5. Modeling energy expenditure in children and adolescents using quantile regression

    Science.gov (United States)

    Advanced mathematical models have the potential to capture the complex metabolic and physiological processes that result in energy expenditure (EE). Study objective is to apply quantile regression (QR) to predict EE and determine quantile-dependent variation in covariate effects in nonobese and obes...

  6. Linearity and Misspecification Tests for Vector Smooth Transition Regression Models

    DEFF Research Database (Denmark)

    Teräsvirta, Timo; Yang, Yukai

    The purpose of the paper is to derive Lagrange multiplier and Lagrange multiplier type specification and misspecification tests for vector smooth transition regression models. We report results from simulation studies in which the size and power properties of the proposed asymptotic tests in small...

  7. Trimmed Likelihood-based Estimation in Binary Regression Models

    NARCIS (Netherlands)

    Cizek, P.

    2005-01-01

    The binary-choice regression models such as probit and logit are typically estimated by the maximum likelihood method.To improve its robustness, various M-estimation based procedures were proposed, which however require bias corrections to achieve consistency and their resistance to outliers is rela

  8. PARAMETER ESTIMATION IN LINEAR REGRESSION MODELS FOR LONGITUDINAL CONTAMINATED DATA

    Institute of Scientific and Technical Information of China (English)

    QianWeimin; LiYumei

    2005-01-01

    The parameter estimation and the coefficient of contamination for the regression models with repeated measures are studied when its response variables are contaminated by another random variable sequence. Under the suitable conditions it is proved that the estimators which are established in the paper are strongly consistent estimators.

  9. Change-point estimation for censored regression model

    Institute of Scientific and Technical Information of China (English)

    Zhan-feng WANG; Yao-hua WU; Lin-cheng ZHAO

    2007-01-01

    In this paper, we consider the change-point estimation in the censored regression model assuming that there exists one change point. A nonparametric estimate of the change-point is proposed and is shown to be strongly consistent. Furthermore, its convergence rate is also obtained.

  10. Improved Methodology for Parameter Inference in Nonlinear, Hydrologic Regression Models

    Science.gov (United States)

    Bates, Bryson C.

    1992-01-01

    A new method is developed for the construction of reliable marginal confidence intervals and joint confidence regions for the parameters of nonlinear, hydrologic regression models. A parameter power transformation is combined with measures of the asymptotic bias and asymptotic skewness of maximum likelihood estimators to determine the transformation constants which cause the bias or skewness to vanish. These optimized constants are used to construct confidence intervals and regions for the transformed model parameters using linear regression theory. The resulting confidence intervals and regions can be easily mapped into the original parameter space to give close approximations to likelihood method confidence intervals and regions for the model parameters. Unlike many other approaches to parameter transformation, the procedure does not use a grid search to find the optimal transformation constants. An example involving the fitting of the Michaelis-Menten model to velocity-discharge data from an Australian gauging station is used to illustrate the usefulness of the methodology.

  11. On modified skew logistic regression model and its applications

    Directory of Open Access Journals (Sweden)

    C. Satheesh Kumar

    2015-12-01

    Full Text Available Here we consider a modified form of the logistic regression model useful for situations where the dependent variable is dichotomous in nature and the explanatory variables exhibit asymmetric and multimodal behaviour. The proposed model has been fitted to some real life data set by using method of maximum likelihood estimation and illustrated its usefulness in certain medical applications.

  12. Improved Testing and Specifivations of Smooth Transition Regression Models

    OpenAIRE

    Escribano, Álvaro; Jordá, Óscar

    1997-01-01

    This paper extends previous work in Escribano and Jordá (1997)and introduces new LM specification procedures to choose between Logistic and Exponential Smooth Transition Regression (STR)Models. These procedures are simpler, consistent and more powerful than those previously available in the literature. An analysis of the properties of Taylor approximations around the transition function of STR models permits one to understand why these procedures work better and it suggests ways to improve te...

  13. Support vector regression-based internal model control

    Institute of Scientific and Technical Information of China (English)

    HUANG Yan-wei; PENG Tie-gen

    2007-01-01

    This paper proposes a design of internal model control systems for process with delay by using support vector regression (SVR). The proposed system fully uses the excellent nonlinear estimation performance of SVR with the structural risk minimization principle. Closed-system stability and steady error are analyzed for the existence of modeling errors. The simulations show that the proposed control systems have the better control performance than that by neural networks in the cases of the training samples with small size and noises.

  14. CONSERVATIVE ESTIMATING FUNCTIONIN THE NONLINEAR REGRESSION MODEL WITHAGGREGATED DATA

    Institute of Scientific and Technical Information of China (English)

    2000-01-01

    The purpose of this paper is to study the theory of conservative estimating functions in nonlinear regression model with aggregated data. In this model, a quasi-score function with aggregated data is defined. When this function happens to be conservative, it is projection of the true score function onto a class of estimation functions. By constructing, the potential function for the projected score with aggregated data is obtained, which have some properties of log-likelihood function.

  15. Using regression models to determine the poroelastic properties of cartilage.

    Science.gov (United States)

    Chung, Chen-Yuan; Mansour, Joseph M

    2013-07-26

    The feasibility of determining biphasic material properties using regression models was investigated. A transversely isotropic poroelastic finite element model of stress relaxation was developed and validated against known results. This model was then used to simulate load intensity for a wide range of material properties. Linear regression equations for load intensity as a function of the five independent material properties were then developed for nine time points (131, 205, 304, 390, 500, 619, 700, 800, and 1000s) during relaxation. These equations illustrate the effect of individual material property on the stress in the time history. The equations at the first four time points, as well as one at a later time (five equations) could be solved for the five unknown material properties given computed values of the load intensity. Results showed that four of the five material properties could be estimated from the regression equations to within 9% of the values used in simulation if time points up to 1000s are included in the set of equations. However, reasonable estimates of the out of plane Poisson's ratio could not be found. Although all regression equations depended on permeability, suggesting that true equilibrium was not realized at 1000s of simulation, it was possible to estimate material properties to within 10% of the expected values using equations that included data up to 800s. This suggests that credible estimates of most material properties can be obtained from tests that are not run to equilibrium, which is typically several thousand seconds.

  16. On concurvity in nonlinear and nonparametric regression models

    Directory of Open Access Journals (Sweden)

    Sonia Amodio

    2014-12-01

    Full Text Available When data are affected by multicollinearity in the linear regression framework, then concurvity will be present in fitting a generalized additive model (GAM. The term concurvity describes nonlinear dependencies among the predictor variables. As collinearity results in inflated variance of the estimated regression coefficients in the linear regression model, the result of the presence of concurvity leads to instability of the estimated coefficients in GAMs. Even if the backfitting algorithm will always converge to a solution, in case of concurvity the final solution of the backfitting procedure in fitting a GAM is influenced by the starting functions. While exact concurvity is highly unlikely, approximate concurvity, the analogue of multicollinearity, is of practical concern as it can lead to upwardly biased estimates of the parameters and to underestimation of their standard errors, increasing the risk of committing type I error. We compare the existing approaches to detect concurvity, pointing out their advantages and drawbacks, using simulated and real data sets. As a result, this paper will provide a general criterion to detect concurvity in nonlinear and non parametric regression models.

  17. Efficient robust nonparametric estimation in a semimartingale regression model

    CERN Document Server

    Konev, Victor

    2010-01-01

    The paper considers the problem of robust estimating a periodic function in a continuous time regression model with dependent disturbances given by a general square integrable semimartingale with unknown distribution. An example of such a noise is non-gaussian Ornstein-Uhlenbeck process with the L\\'evy process subordinator, which is used to model the financial Black-Scholes type markets with jumps. An adaptive model selection procedure, based on the weighted least square estimates, is proposed. Under general moment conditions on the noise distribution, sharp non-asymptotic oracle inequalities for the robust risks have been derived and the robust efficiency of the model selection procedure has been shown.

  18. REGRESSION ANALYSIS OF PRODUCTIVITY USING MIXED EFFECT MODEL

    Directory of Open Access Journals (Sweden)

    Siana Halim

    2007-01-01

    Full Text Available Production plants of a company are located in several areas that spread across Middle and East Java. As the production process employs mostly manpower, we suspected that each location has different characteristics affecting the productivity. Thus, the production data may have a spatial and hierarchical structure. For fitting a linear regression using the ordinary techniques, we are required to make some assumptions about the nature of the residuals i.e. independent, identically and normally distributed. However, these assumptions were rarely fulfilled especially for data that have a spatial and hierarchical structure. We worked out the problem using mixed effect model. This paper discusses the model construction of productivity and several characteristics in the production line by taking location as a random effect. The simple model with high utility that satisfies the necessary regression assumptions was built using a free statistic software R version 2.6.1.

  19. Batch Mode Active Learning for Regression With Expected Model Change.

    Science.gov (United States)

    Cai, Wenbin; Zhang, Muhan; Zhang, Ya

    2016-04-20

    While active learning (AL) has been widely studied for classification problems, limited efforts have been done on AL for regression. In this paper, we introduce a new AL framework for regression, expected model change maximization (EMCM), which aims at choosing the unlabeled data instances that result in the maximum change of the current model once labeled. The model change is quantified as the difference between the current model parameters and the updated parameters after the inclusion of the newly selected examples. In light of the stochastic gradient descent learning rule, we approximate the change as the gradient of the loss function with respect to each single candidate instance. Under the EMCM framework, we propose novel AL algorithms for the linear and nonlinear regression models. In addition, by simulating the behavior of the sequential AL policy when applied for k iterations, we further extend the algorithms to batch mode AL to simultaneously choose a set of k most informative instances at each query time. Extensive experimental results on both UCI and StatLib benchmark data sets have demonstrated that the proposed algorithms are highly effective and efficient.

  20. Hierarchical Neural Regression Models for Customer Churn Prediction

    Directory of Open Access Journals (Sweden)

    Golshan Mohammadi

    2013-01-01

    Full Text Available As customers are the main assets of each industry, customer churn prediction is becoming a major task for companies to remain in competition with competitors. In the literature, the better applicability and efficiency of hierarchical data mining techniques has been reported. This paper considers three hierarchical models by combining four different data mining techniques for churn prediction, which are backpropagation artificial neural networks (ANN, self-organizing maps (SOM, alpha-cut fuzzy c-means (α-FCM, and Cox proportional hazards regression model. The hierarchical models are ANN + ANN + Cox, SOM + ANN + Cox, and α-FCM + ANN + Cox. In particular, the first component of the models aims to cluster data in two churner and nonchurner groups and also filter out unrepresentative data or outliers. Then, the clustered data as the outputs are used to assign customers to churner and nonchurner groups by the second technique. Finally, the correctly classified data are used to create Cox proportional hazards model. To evaluate the performance of the hierarchical models, an Iranian mobile dataset is considered. The experimental results show that the hierarchical models outperform the single Cox regression baseline model in terms of prediction accuracy, Types I and II errors, RMSE, and MAD metrics. In addition, the α-FCM + ANN + Cox model significantly performs better than the two other hierarchical models.

  1. Regression Model to Predict Global Solar Irradiance in Malaysia

    Directory of Open Access Journals (Sweden)

    Hairuniza Ahmed Kutty

    2015-01-01

    Full Text Available A novel regression model is developed to estimate the monthly global solar irradiance in Malaysia. The model is developed based on different available meteorological parameters, including temperature, cloud cover, rain precipitate, relative humidity, wind speed, pressure, and gust speed, by implementing regression analysis. This paper reports on the details of the analysis of the effect of each prediction parameter to identify the parameters that are relevant to estimating global solar irradiance. In addition, the proposed model is compared in terms of the root mean square error (RMSE, mean bias error (MBE, and the coefficient of determination (R2 with other models available from literature studies. Seven models based on single parameters (PM1 to PM7 and five multiple-parameter models (PM7 to PM12 are proposed. The new models perform well, with RMSE ranging from 0.429% to 1.774%, R2 ranging from 0.942 to 0.992, and MBE ranging from −0.1571% to 0.6025%. In general, cloud cover significantly affects the estimation of global solar irradiance. However, cloud cover in Malaysia lacks sufficient influence when included into multiple-parameter models although it performs fairly well in single-parameter prediction models.

  2. Phone Duration Modeling of Affective Speech Using Support Vector Regression

    Directory of Open Access Journals (Sweden)

    Alexandros Lazaridis

    2012-07-01

    Full Text Available In speech synthesis accurate modeling of prosody is important for producing high quality synthetic speech. One of the main aspects of prosody is phone duration. Robust phone duration modeling is a prerequisite for synthesizing emotional speech with natural sounding. In this work ten phone duration models are evaluated. These models belong to well known and widely used categories of algorithms, such as the decision trees, linear regression, lazy-learning algorithms and meta-learning algorithms. Furthermore, we investigate the effectiveness of Support Vector Regression (SVR in phone duration modeling in the context of emotional speech. The evaluation of the eleven models is performed on a Modern Greek emotional speech database which consists of four categories of emotional speech (anger, fear, joy, sadness plus neutral speech. The experimental results demonstrated that the SVR-based modeling outperforms the other ten models across all the four emotion categories. Specifically, the SVR model achieved an average relative reduction of 8% in terms of root mean square error (RMSE throughout all emotional categories.

  3. Data correction for seven activity trackers based on regression models.

    Science.gov (United States)

    Andalibi, Vafa; Honko, Harri; Christophe, Francois; Viik, Jari

    2015-08-01

    Using an activity tracker for measuring activity-related parameters, e.g. steps and energy expenditure (EE), can be very helpful in assisting a person's fitness improvement. Unlike the measuring of number of steps, an accurate EE estimation requires additional personal information as well as accurate velocity of movement, which is hard to achieve due to inaccuracy of sensors. In this paper, we have evaluated regression-based models to improve the precision for both steps and EE estimation. For this purpose, data of seven activity trackers and two reference devices was collected from 20 young adult volunteers wearing all devices at once in three different tests, namely 60-minute office work, 6-hour overall activity and 60-minute walking. Reference data is used to create regression models for each device and relative percentage errors of adjusted values are then statistically compared to that of original values. The effectiveness of regression models are determined based on the result of a statistical test. During a walking period, EE measurement was improved in all devices. The step measurement was also improved in five of them. The results show that improvement of EE estimation is possible only with low-cost implementation of fitting model over the collected data e.g. in the app or in corresponding service back-end.

  4. Drop-Weight Impact Test on U-Shape Concrete Specimens with Statistical and Regression Analyses

    Directory of Open Access Journals (Sweden)

    Xue-Chao Zhu

    2015-09-01

    Full Text Available According to the principle and method of drop-weight impact test, the impact resistance of concrete was measured using self-designed U-shape specimens and a newly designed drop-weight impact test apparatus. A series of drop-weight impact tests were carried out with four different masses of drop hammers (0.875, 0.8, 0.675 and 0.5 kg. The test results show that the impact resistance results fail to follow a normal distribution. As expected, U-shaped specimens can predetermine the location of the cracks very well. It is also easy to record the cracks propagation during the test. The maximum of coefficient of variation in this study is 31.2%; it is lower than the values obtained from the American Concrete Institute (ACI impact tests in the literature. By regression analysis, the linear relationship between the first-crack and ultimate failure impact resistance is good. It can suggested that a minimum number of specimens is required to reliably measure the properties of the material based on the observed levels of variation.

  5. Challenges and Opportunities in Analysing Students Modelling

    Science.gov (United States)

    Blanco-Anaya, Paloma; Justi, Rosária; Díaz de Bustamante, Joaquín

    2017-01-01

    Modelling-based teaching activities have been designed and analysed from distinct theoretical perspectives. In this paper, we use one of them--the model of modelling diagram (MMD)--as an analytical tool in a regular classroom context. This paper examines the challenges that arise when the MMD is used as an analytical tool to characterise the…

  6. Forecasting relativistic electron flux using dynamic multiple regression models

    Directory of Open Access Journals (Sweden)

    H.-L. Wei

    2011-02-01

    Full Text Available The forecast of high energy electron fluxes in the radiation belts is important because the exposure of modern spacecraft to high energy particles can result in significant damage to onboard systems. A comprehensive physical model of processes related to electron energisation that can be used for such a forecast has not yet been developed. In the present paper a systems identification approach is exploited to deduce a dynamic multiple regression model that can be used to predict the daily maximum of high energy electron fluxes at geosynchronous orbit from data. It is shown that the model developed provides reliable predictions.

  7. Resampling procedures to validate dendro-auxometric regression models

    Directory of Open Access Journals (Sweden)

    2009-03-01

    Full Text Available Regression analysis has a large use in several sectors of forest research. The validation of a dendro-auxometric model is a basic step in the building of the model itself. The more a model resists to attempts of demonstrating its groundlessness, the more its reliability increases. In the last decades many new theories, that quite utilizes the calculation speed of the calculators, have been formulated. Here we show the results obtained by the application of a bootsprap resampling procedure as a validation tool.

  8. Two-step variable selection in quantile regression models

    Directory of Open Access Journals (Sweden)

    FAN Yali

    2015-06-01

    Full Text Available We propose a two-step variable selection procedure for high dimensional quantile regressions,in which the dimension of the covariates, pn is much larger than the sample size n. In the first step, we perform l1 penalty, and we demonstrate that the first step penalized estimator with the LASSO penalty can reduce the model from an ultra-high dimensional to a model whose size has the same order as that of the true model, and the selected model can cover the true model. The second step excludes the remained irrelevant covariates by applying the adaptive LASSO penalty to the reduced model obtained from the first step. Under some regularity conditions, we show that our procedure enjoys the model selection consistency. We conduct a simulation study and a real data analysis to evaluate the finite sample performance of the proposed approach.

  9. Fuzzy and Regression Modelling of Hard Milling Process

    Directory of Open Access Journals (Sweden)

    A. Tamilarasan

    2014-04-01

    Full Text Available The present study highlights the application of box-behnken design coupled with fuzzy and regression modeling approach for making expert system in hard milling process to improve the process performance with systematic reduction of production cost. The important input fields of work piece hardness, nose radius, feed per tooth, radial depth of cut and axial depth cut were considered. The cutting forces, work surface temperature and sound pressure level were identified as key index of machining outputs. The results indicate that the fuzzy logic and regression modeling technique can be effectively used for the prediction of desired responses with less average error variation. Predicted results were verified by experiments and shown the good potential characteristics of the developed system for automated machining environment.

  10. Regression Cloud Models and Their Applications in Energy Consumption of Data Center

    Directory of Open Access Journals (Sweden)

    Yanshuang Zhou

    2015-01-01

    Full Text Available As cloud data center consumes more and more energy, both researchers and engineers aim to minimize energy consumption while keeping its services available. A good energy model can reflect the relationships between running tasks and the energy consumed by hardware and can be further used to schedule tasks for saving energy. In this paper, we analyzed linear and nonlinear regression energy model based on performance counters and system utilization and proposed a support vector regression energy model. For performance counters, we gave a general linear regression framework and compared three linear regression models. For system utilization, we compared our support vector regression model with linear regression and three nonlinear regression models. The experiments show that linear regression model is good enough to model performance counters, nonlinear regression is better than linear regression model for modeling system utilization, and support vector regression model is better than polynomial and exponential regression models.

  11. Central limit theorem of linear regression model under right censorship

    Institute of Scientific and Technical Information of China (English)

    HE; Shuyuan(何书元); HUANG; Xiang(Heung; Wong)(黄香)

    2003-01-01

    In this paper, the estimation of joint distribution F(y,z) of (Y, Z) and the estimation in thelinear regression model Y = b′Z + ε for complete data are extended to that of the right censored data. Theregression parameter estimates of b and the variance of ε are weighted least square estimates with randomweights. The central limit theorems of the estimators are obtained under very weak conditions and the derivedasymptotic variance has a very simple form.

  12. APPLYING LOGISTIC REGRESSION MODEL TO THE EXAMINATION RESULTS DATA

    Directory of Open Access Journals (Sweden)

    Goutam Saha

    2011-01-01

    Full Text Available The binary logistic regression model is used to analyze the school examination results(scores of 1002 students. The analysis is performed on the basis of the independent variables viz.gender, medium of instruction, type of schools, category of schools, board of examinations andlocation of schools, where scores or marks are assumed to be dependent variables. The odds ratioanalysis compares the scores obtained in two examinations viz. matriculation and highersecondary.

  13. Predicting and Modelling of Survival Data when Cox's Regression Model does not hold

    DEFF Research Database (Denmark)

    Scheike, Thomas H.; Zhang, Mei-Jie

    2002-01-01

    Aalen model; additive risk model; counting processes; competing risk; Cox regression; flexible modeling; goodness of fit; prediction of survival; survival analysis; time-varying effects......Aalen model; additive risk model; counting processes; competing risk; Cox regression; flexible modeling; goodness of fit; prediction of survival; survival analysis; time-varying effects...

  14. GAUSSIAN COPULA MARGINAL REGRESSION FOR MODELING EXTREME DATA WITH APPLICATION

    Directory of Open Access Journals (Sweden)

    Sutikno

    2014-01-01

    Full Text Available Regression is commonly used to determine the relationship between the response variable and the predictor variable, where the parameters are estimated by Ordinary Least Square (OLS. This method can be used with an assumption that residuals are normally distributed (0, σ2. However, the assumption of normality of the data is often violated due to extreme observations, which are often found in the climate data. Modeling of rice harvested area with rainfall predictor variables allows extreme observations. Therefore, another approximation is necessary to be applied in order to overcome the presence of extreme observations. The method used to solve this problem is a Gaussian Copula Marginal Regression (GCMR, the regression-based Copula. As a case study, the method is applied to model rice harvested area of rice production centers in East Java, Indonesia, covering District: Banyuwangi, Lamongan, Bojonegoro, Ngawi and Jember. Copula is chosen because this method is not strict against the assumption distribution, especially the normal distribution. Moreover, this method can describe dependency on extreme point clearly. The GCMR performance will be compared with OLS and Generalized Linear Models (GLM. The identification result of the dependencies structure between the Rice Harvest per period (RH and monthly rainfall showed a dependency in all areas of research. It is shown that the real test copula type mostly follows the Gumbel distribution. While the comparison of the model goodness for rice harvested area in the modeling showed that the method used to model the exact GCMR in five districts RH1 and RH2 in Jember district since its lowest AICc. Looking at the data distribution pattern of response variables, it can be concluded that the GCMR good for modeling the response variable that is not normally distributed and tend to have a large skew.

  15. Online Statistical Modeling (Regression Analysis) for Independent Responses

    Science.gov (United States)

    Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus

    2017-06-01

    Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.

  16. Regression Modeling of Competing Risks Data Based on Pseudovalues of the Cumulative Incidence Function

    DEFF Research Database (Denmark)

    Klein, John P.; Andersen, Per Kragh

    2005-01-01

    Bone marrow transplantation; Generalized estimating equations; Jackknife statistics; Regression models......Bone marrow transplantation; Generalized estimating equations; Jackknife statistics; Regression models...

  17. K factor estimation in distribution transformers using linear regression models

    Directory of Open Access Journals (Sweden)

    Juan Miguel Astorga Gómez

    2016-06-01

    Full Text Available Background: Due to massive incorporation of electronic equipment to distribution systems, distribution transformers are subject to operation conditions other than the design ones, because of the circulation of harmonic currents. It is necessary to quantify the effect produced by these harmonic currents to determine the capacity of the transformer to withstand these new operating conditions. The K-factor is an indicator that estimates the ability of a transformer to withstand the thermal effects caused by harmonic currents. This article presents a linear regression model to estimate the value of the K-factor, from total current harmonic content obtained with low-cost equipment.Method: Two distribution transformers that feed different loads are studied variables, current total harmonic distortion factor K are recorded, and the regression model that best fits the data field is determined. To select the regression model the coefficient of determination R2 and the Akaike Information Criterion (AIC are used. With the selected model, the K-factor is estimated to actual operating conditions.Results: Once determined the model it was found that for both agricultural cargo and industrial mining, present harmonic content (THDi exceeds the values that these transformers can drive (average of 12.54% and minimum 8,90% in the case of agriculture and average value of 18.53% and a minimum of 6.80%, for industrial mining case.Conclusions: When estimating the K factor using polynomial models it was determined that studied transformers can not withstand the current total harmonic distortion of their current loads. The appropriate K factor for studied transformer should be 4; this allows transformers support the current total harmonic distortion of their respective loads.

  18. Extended cox regression model: The choice of timefunction

    Science.gov (United States)

    Isik, Hatice; Tutkun, Nihal Ata; Karasoy, Durdu

    2017-07-01

    Cox regression model (CRM), which takes into account the effect of censored observations, is one the most applicative and usedmodels in survival analysis to evaluate the effects of covariates. Proportional hazard (PH), requires a constant hazard ratio over time, is the assumptionofCRM. Using extended CRM provides the test of including a time dependent covariate to assess the PH assumption or an alternative model in case of nonproportional hazards. In this study, the different types of real data sets are used to choose the time function and the differences between time functions are analyzed and discussed.

  19. Testing and Modeling Fuel Regression Rate in a Miniature Hybrid Burner

    Directory of Open Access Journals (Sweden)

    Luciano Fanton

    2012-01-01

    Full Text Available Ballistic characterization of an extended group of innovative HTPB-based solid fuel formulations for hybrid rocket propulsion was performed in a lab-scale burner. An optical time-resolved technique was used to assess the quasisteady regression history of single perforation, cylindrical samples. The effects of metalized additives and radiant heat transfer on the regression rate of such formulations were assessed. Under the investigated operating conditions and based on phenomenological models from the literature, analyses of the collected experimental data show an appreciable influence of the radiant heat flux from burnt gases and soot for both unloaded and loaded fuel formulations. Pure HTPB regression rate data are satisfactorily reproduced, while the impressive initial regression rates of metalized formulations require further assessment.

  20. A New Approach in Regression Analysis for Modeling Adsorption Isotherms

    Directory of Open Access Journals (Sweden)

    Dana D. Marković

    2014-01-01

    Full Text Available Numerous regression approaches to isotherm parameters estimation appear in the literature. The real insight into the proper modeling pattern can be achieved only by testing methods on a very big number of cases. Experimentally, it cannot be done in a reasonable time, so the Monte Carlo simulation method was applied. The objective of this paper is to introduce and compare numerical approaches that involve different levels of knowledge about the noise structure of the analytical method used for initial and equilibrium concentration determination. Six levels of homoscedastic noise and five types of heteroscedastic noise precision models were considered. Performance of the methods was statistically evaluated based on median percentage error and mean absolute relative error in parameter estimates. The present study showed a clear distinction between two cases. When equilibrium experiments are performed only once, for the homoscedastic case, the winning error function is ordinary least squares, while for the case of heteroscedastic noise the use of orthogonal distance regression or Margart’s percent standard deviation is suggested. It was found that in case when experiments are repeated three times the simple method of weighted least squares performed as well as more complicated orthogonal distance regression method.

  1. Model and Variable Selection Procedures for Semiparametric Time Series Regression

    Directory of Open Access Journals (Sweden)

    Risa Kato

    2009-01-01

    Full Text Available Semiparametric regression models are very useful for time series analysis. They facilitate the detection of features resulting from external interventions. The complexity of semiparametric models poses new challenges for issues of nonparametric and parametric inference and model selection that frequently arise from time series data analysis. In this paper, we propose penalized least squares estimators which can simultaneously select significant variables and estimate unknown parameters. An innovative class of variable selection procedure is proposed to select significant variables and basis functions in a semiparametric model. The asymptotic normality of the resulting estimators is established. Information criteria for model selection are also proposed. We illustrate the effectiveness of the proposed procedures with numerical simulations.

  2. Regularized multivariate regression models with skew-t error distributions

    KAUST Repository

    Chen, Lianfu

    2014-06-01

    We consider regularization of the parameters in multivariate linear regression models with the errors having a multivariate skew-t distribution. An iterative penalized likelihood procedure is proposed for constructing sparse estimators of both the regression coefficient and inverse scale matrices simultaneously. The sparsity is introduced through penalizing the negative log-likelihood by adding L1-penalties on the entries of the two matrices. Taking advantage of the hierarchical representation of skew-t distributions, and using the expectation conditional maximization (ECM) algorithm, we reduce the problem to penalized normal likelihood and develop a procedure to minimize the ensuing objective function. Using a simulation study the performance of the method is assessed, and the methodology is illustrated using a real data set with a 24-dimensional response vector. © 2014 Elsevier B.V.

  3. Modeling the number of car theft using Poisson regression

    Science.gov (United States)

    Zulkifli, Malina; Ling, Agnes Beh Yen; Kasim, Maznah Mat; Ismail, Noriszura

    2016-10-01

    Regression analysis is the most popular statistical methods used to express the relationship between the variables of response with the covariates. The aim of this paper is to evaluate the factors that influence the number of car theft using Poisson regression model. This paper will focus on the number of car thefts that occurred in districts in Peninsular Malaysia. There are two groups of factor that have been considered, namely district descriptive factors and socio and demographic factors. The result of the study showed that Bumiputera composition, Chinese composition, Other ethnic composition, foreign migration, number of residence with the age between 25 to 64, number of employed person and number of unemployed person are the most influence factors that affect the car theft cases. These information are very useful for the law enforcement department, insurance company and car owners in order to reduce and limiting the car theft cases in Peninsular Malaysia.

  4. Interpreting parameters in the logistic regression model with random effects

    DEFF Research Database (Denmark)

    Larsen, Klaus; Petersen, Jørgen Holm; Budtz-Jørgensen, Esben

    2000-01-01

    interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects......interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects...

  5. Dynamic Regression Intervention Modeling for the Malaysian Daily Load

    Directory of Open Access Journals (Sweden)

    Fadhilah Abdrazak

    2014-05-01

    Full Text Available Malaysia is a unique country due to having both fixed and moving holidays.  These moving holidays may overlap with other fixed holidays and therefore, increase the complexity of the load forecasting activities. The errors due to holidays’ effects in the load forecasting are known to be higher than other factors.  If these effects can be estimated and removed, the behavior of the series could be better viewed.  Thus, the aim of this paper is to improve the forecasting errors by using a dynamic regression model with intervention analysis.   Based on the linear transfer function method, a daily load model consists of either peak or average is developed.  The developed model outperformed the seasonal ARIMA model in estimating the fixed and moving holidays’ effects and achieved a smaller Mean Absolute Percentage Error (MAPE in load forecast.

  6. Modeling of the Monthly Rainfall-Runoff Process Through Regressions

    Directory of Open Access Journals (Sweden)

    Campos-Aranda Daniel Francisco

    2014-10-01

    Full Text Available To solve the problems associated with the assessment of water resources of a river, the modeling of the rainfall-runoff process (RRP allows the deduction of runoff missing data and to extend its record, since generally the information available on precipitation is larger. It also enables the estimation of inputs to reservoirs, when their building led to the suppression of the gauging station. The simplest mathematical model that can be set for the RRP is the linear regression or curve on a monthly basis. Such a model is described in detail and is calibrated with the simultaneous record of monthly rainfall and runoff in Ballesmi hydrometric station, which covers 35 years. Since the runoff of this station has an important contribution from the spring discharge, the record is corrected first by removing that contribution. In order to do this a procedure was developed based either on the monthly average regional runoff coefficients or on nearby and similar watershed; in this case the Tancuilín gauging station was used. Both stations belong to the Partial Hydrologic Region No. 26 (Lower Rio Panuco and are located within the state of San Luis Potosi, México. The study performed indicates that the monthly regression model, due to its conceptual approach, faithfully reproduces monthly average runoff volumes and achieves an excellent approximation in relation to the dispersion, proved by calculation of the means and standard deviations.

  7. Mixed-model Regression for Variable-star Photometry

    Science.gov (United States)

    Dose, Eric

    2016-05-01

    Mixed-model regression, a recent advance from social-science statistics, applies directly to reducing one night's photometric raw data, especially for variable stars in fields with multiple comparison stars. One regression model per filter/passband yields any or all of: transform values, extinction values, nightly zero-points, rapid zero-point fluctuations ("cirrus effect"), ensemble comparisons, vignette and gradient removal arising from incomplete flat-correction, check-star and target-star magnitudes, and specific indications of unusually large catalog magnitude errors. When images from several different fields of view are included, the models improve without complicating the calculations. The mixed-model approach is generally robust to outliers and missing data points, and it directly yields 14 diagnostic plots, used to monitor data set quality and/or residual systematic errors - these diagnostic plots may in fact turn out to be the prime advantage of this approach. Also presented is initial work on a split-annulus approach to sky background estimation, intended to address the sensitivity of photometric observations to noise within the sky-background annulus.

  8. Genetic evaluation of European quails by random regression models

    Directory of Open Access Journals (Sweden)

    Flaviana Miranda Gonçalves

    2012-09-01

    Full Text Available The objective of this study was to compare different random regression models, defined from different classes of heterogeneity of variance combined with different Legendre polynomial orders for the estimate of (covariance of quails. The data came from 28,076 observations of 4,507 female meat quails of the LF1 lineage. Quail body weights were determined at birth and 1, 14, 21, 28, 35 and 42 days of age. Six different classes of residual variance were fitted to Legendre polynomial functions (orders ranging from 2 to 6 to determine which model had the best fit to describe the (covariance structures as a function of time. According to the evaluated criteria (AIC, BIC and LRT, the model with six classes of residual variances and of sixth-order Legendre polynomial was the best fit. The estimated additive genetic variance increased from birth to 28 days of age, and dropped slightly from 35 to 42 days. The heritability estimates decreased along the growth curve and changed from 0.51 (1 day to 0.16 (42 days. Animal genetic and permanent environmental correlation estimates between weights and age classes were always high and positive, except for birth weight. The sixth order Legendre polynomial, along with the residual variance divided into six classes was the best fit for the growth rate curve of meat quails; therefore, they should be considered for breeding evaluation processes by random regression models.

  9. Fuzzy regression modeling for tool performance prediction and degradation detection.

    Science.gov (United States)

    Li, X; Er, M J; Lim, B S; Zhou, J H; Gan, O P; Rutkowski, L

    2010-10-01

    In this paper, the viability of using Fuzzy-Rule-Based Regression Modeling (FRM) algorithm for tool performance and degradation detection is investigated. The FRM is developed based on a multi-layered fuzzy-rule-based hybrid system with Multiple Regression Models (MRM) embedded into a fuzzy logic inference engine that employs Self Organizing Maps (SOM) for clustering. The FRM converts a complex nonlinear problem to a simplified linear format in order to further increase the accuracy in prediction and rate of convergence. The efficacy of the proposed FRM is tested through a case study - namely to predict the remaining useful life of a ball nose milling cutter during a dry machining process of hardened tool steel with a hardness of 52-54 HRc. A comparative study is further made between four predictive models using the same set of experimental data. It is shown that the FRM is superior as compared with conventional MRM, Back Propagation Neural Networks (BPNN) and Radial Basis Function Networks (RBFN) in terms of prediction accuracy and learning speed.

  10. A hybrid neural network model for noisy data regression.

    Science.gov (United States)

    Lee, Eric W M; Lim, Chee Peng; Yuen, Richard K K; Lo, S M

    2004-04-01

    A hybrid neural network model, based on the fusion of fuzzy adaptive resonance theory (FA ART) and the general regression neural network (GRNN), is proposed in this paper. Both FA and the GRNN are incremental learning systems and are very fast in network training. The proposed hybrid model, denoted as GRNNFA, is able to retain these advantages and, at the same time, to reduce the computational requirements in calculating and storing information of the kernels. A clustering version of the GRNN is designed with data compression by FA for noise removal. An adaptive gradient-based kernel width optimization algorithm has also been devised. Convergence of the gradient descent algorithm can be accelerated by the geometric incremental growth of the updating factor. A series of experiments with four benchmark datasets have been conducted to assess and compare effectiveness of GRNNFA with other approaches. The GRNNFA model is also employed in a novel application task for predicting the evacuation time of patrons at typical karaoke centers in Hong Kong in the event of fire. The results positively demonstrate the applicability of GRNNFA in noisy data regression problems.

  11. Multivariate parametric random effect regression models for fecundability studies.

    Science.gov (United States)

    Ecochard, R; Clayton, D G

    2000-12-01

    Delay until conception is generally described by a mixture of geometric distributions. Weinberg and Gladen (1986, Biometrics 42, 547-560) proposed a regression generalization of the beta-geometric mixture model where covariates effects were expressed in terms of contrasts of marginal hazards. Scheike and Jensen (1997, Biometrics 53, 318-329) developed a frailty model for discrete event times data based on discrete-time analogues of Hougaard's results (1984, Biometrika 71, 75-83). This paper is on a generalization to a three-parameter family distribution and an extension to multivariate cases. The model allows the introduction of explanatory variables, including time-dependent variables at the subject-specific level, together with a choice from a flexible family of random effect distributions. This makes it possible, in the context of medically assisted conception, to include data sources with multiple pregnancies (or attempts at pregnancy) per couple.

  12. Random regression models using different functions to model milk flow in dairy cows.

    Science.gov (United States)

    Laureano, M M M; Bignardi, A B; El Faro, L; Cardoso, V L; Tonhati, H; Albuquerque, L G

    2014-09-12

    We analyzed 75,555 test-day milk flow records from 2175 primiparous Holstein cows that calved between 1997 and 2005. Milk flow was obtained by dividing the mean milk yield (kg) of the 3 daily milking by the total milking time (min) and was expressed as kg/min. Milk flow was grouped into 43 weekly classes. The analyses were performed using a single-trait Random Regression Models that included direct additive genetic, permanent environmental, and residual random effects. In addition, the contemporary group and linear and quadratic effects of cow age at calving were included as fixed effects. Fourth-order orthogonal Legendre polynomial of days in milk was used to model the mean trend in milk flow. The additive genetic and permanent environmental covariance functions were estimated using random regression Legendre polynomials and B-spline functions of days in milk. The model using a third-order Legendre polynomial for additive genetic effects and a sixth-order polynomial for permanent environmental effects, which contained 7 residual classes, proved to be the most adequate to describe variations in milk flow, and was also the most parsimonious. The heritability in milk flow estimated by the most parsimonious model was of moderate to high magnitude.

  13. Logistic Regression Models to Forecast Travelling Behaviour in Tripoli City

    Directory of Open Access Journals (Sweden)

    Amiruddin Ismail

    2011-01-01

    Full Text Available Transport modes are very important to Libyan’s Tripoli residents for their daily trips. However, the total number of own car and private transport namely taxi and micro buses on the road increases and causes many problems such as traffic congestion, accidents, air and noise pollution. These problems then causes other related phenomena to the travel activities such as delay in trips, stress and frustration to motorists which may affect their productivity and efficiency to both workers and students. Delay may also increase travel cost as well inefficiency in trips making if compare to other public transport users in some Arabs cities. Switching to public transport (PT modes alternatives such as buses, light rail transit and underground train could improve travel time and travel costs. A transport study has been carried out at Tripoli City Authority areas among own car users who live in areas with inadequate of private transport and poor public transportation services. Analyses about relation between factors such as travel time, travel cost, trip purpose and parking cost have been made to answer research questions. Logistic regression technique has been used to analyse these factors that influence users to switch their trips mode to public transport alternatives.

  14. The application of Dynamic Linear Bayesian Models in hydrological forecasting: Varying Coefficient Regression and Discount Weighted Regression

    Science.gov (United States)

    Ciupak, Maurycy; Ozga-Zielinski, Bogdan; Adamowski, Jan; Quilty, John; Khalil, Bahaa

    2015-11-01

    A novel implementation of Dynamic Linear Bayesian Models (DLBM), using either a Varying Coefficient Regression (VCR) or a Discount Weighted Regression (DWR) algorithm was used in the hydrological modeling of annual hydrographs as well as 1-, 2-, and 3-day lead time stream flow forecasting. Using hydrological data (daily discharge, rainfall, and mean, maximum and minimum air temperatures) from the Upper Narew River watershed in Poland, the forecasting performance of DLBM was compared to that of traditional multiple linear regression (MLR) and more recent artificial neural network (ANN) based models. Model performance was ranked DLBM-DWR > DLBM-VCR > MLR > ANN for both annual hydrograph modeling and 1-, 2-, and 3-day lead forecasting, indicating that the DWR and VCR algorithms, operating in a DLBM framework, represent promising new methods for both annual hydrograph modeling and short-term stream flow forecasting.

  15. Estimation of reference evapotranspiration using multivariate fractional polynomial, Bayesian regression, and robust regression models in three arid environments

    Science.gov (United States)

    Khoshravesh, Mojtaba; Sefidkouhi, Mohammad Ali Gholami; Valipour, Mohammad

    2017-07-01

    The proper evaluation of evapotranspiration is essential in food security investigation, farm management, pollution detection, irrigation scheduling, nutrient flows, carbon balance as well as hydrologic modeling, especially in arid environments. To achieve sustainable development and to ensure water supply, especially in arid environments, irrigation experts need tools to estimate reference evapotranspiration on a large scale. In this study, the monthly reference evapotranspiration was estimated by three different regression models including the multivariate fractional polynomial (MFP), robust regression, and Bayesian regression in Ardestan, Esfahan, and Kashan. The results were compared with Food and Agriculture Organization (FAO)-Penman-Monteith (FAO-PM) to select the best model. The results show that at a monthly scale, all models provided a closer agreement with the calculated values for FAO-PM ( R 2 > 0.95 and RMSE < 12.07 mm month-1). However, the MFP model gives better estimates than the other two models for estimating reference evapotranspiration at all stations.

  16. Risk stratification for prognosis in intracerebral hemorrhage: A decision tree model and logistic regression

    Directory of Open Access Journals (Sweden)

    Gang WU

    2016-01-01

    Full Text Available Objective  To analyze the risk factors for prognosis in intracerebral hemorrhage using decision tree (classification and regression tree, CART model and logistic regression model. Methods  CART model and logistic regression model were established according to the risk factors for prognosis of patients with cerebral hemorrhage. The differences in the results were compared between the two methods. Results  Logistic regression analyses showed that hematoma volume (OR-value 0.953, initial Glasgow Coma Scale (GCS score (OR-value 1.210, pulmonary infection (OR-value 0.295, and basal ganglia hemorrhage (OR-value 0.336 were the risk factors for the prognosis of cerebral hemorrhage. The results of CART analysis showed that volume of hematoma and initial GCS score were the main factors affecting the prognosis of cerebral hemorrhage. The effects of two models on the prognosis of cerebral hemorrhage were similar (Z-value 0.402, P=0.688. Conclusions  CART model has a similar value to that of logistic model in judging the prognosis of cerebral hemorrhage, and it is characterized by using transactional analysis between the risk factors, and it is more intuitive. DOI: 10.11855/j.issn.0577-7402.2015.12.13

  17. Regression Models for Predicting Force Coefficients of Aerofoils

    Directory of Open Access Journals (Sweden)

    Mohammed ABDUL AKBAR

    2015-09-01

    Full Text Available Renewable sources of energy are attractive and advantageous in a lot of different ways. Among the renewable energy sources, wind energy is the fastest growing type. Among wind energy converters, Vertical axis wind turbines (VAWTs have received renewed interest in the past decade due to some of the advantages they possess over their horizontal axis counterparts. VAWTs have evolved into complex 3-D shapes. A key component in predicting the output of VAWTs through analytical studies is obtaining the values of lift and drag coefficients which is a function of shape of the aerofoil, ‘angle of attack’ of wind and Reynolds’s number of flow. Sandia National Laboratories have carried out extensive experiments on aerofoils for the Reynolds number in the range of those experienced by VAWTs. The volume of experimental data thus obtained is huge. The current paper discusses three Regression analysis models developed wherein lift and drag coefficients can be found out using simple formula without having to deal with the bulk of the data. Drag coefficients and Lift coefficients were being successfully estimated by regression models with R2 values as high as 0.98.

  18. Empirical likelihood ratio tests for multivariate regression models

    Institute of Scientific and Technical Information of China (English)

    WU Jianhong; ZHU Lixing

    2007-01-01

    This paper proposes some diagnostic tools for checking the adequacy of multivariate regression models including classical regression and time series autoregression. In statistical inference, the empirical likelihood ratio method has been well known to be a powerful tool for constructing test and confidence region. For model checking, however, the naive empirical likelihood (EL) based tests are not of Wilks' phenomenon. Hence, we make use of bias correction to construct the EL-based score tests and derive a nonparametric version of Wilks' theorem. Moreover, by the advantages of both the EL and score test method, the EL-based score tests share many desirable features as follows: They are self-scale invariant and can detect the alternatives that converge to the null at rate n-1/2, the possibly fastest rate for lack-of-fit testing; they involve weight functions, which provides us with the flexibility to choose scores for improving power performance, especially under directional alternatives. Furthermore, when the alternatives are not directional, we construct asymptotically distribution-free maximin tests for a large class of possible alternatives. A simulation study is carried out and an application for a real dataset is analyzed.

  19. Approximation by randomly weighting method in censored regression model

    Institute of Scientific and Technical Information of China (English)

    2009-01-01

    Censored regression ("Tobit") models have been in common use, and their linear hypothesis testings have been widely studied. However, the critical values of these tests are usually related to quantities of an unknown error distribution and estimators of nuisance parameters. In this paper, we propose a randomly weighting test statistic and take its conditional distribution as an approximation to null distribution of the test statistic. It is shown that, under both the null and local alternative hypotheses, conditionally asymptotic distribution of the randomly weighting test statistic is the same as the null distribution of the test statistic. Therefore, the critical values of the test statistic can be obtained by randomly weighting method without estimating the nuisance parameters. At the same time, we also achieve the weak consistency and asymptotic normality of the randomly weighting least absolute deviation estimate in censored regression model. Simulation studies illustrate that the per-formance of our proposed resampling test method is better than that of central chi-square distribution under the null hypothesis.

  20. Approximation by randomly weighting method in censored regression model

    Institute of Scientific and Technical Information of China (English)

    WANG ZhanFeng; WU YaoHua; ZHAO LinCheng

    2009-01-01

    Censored regression ("Tobit") models have been in common use,and their linear hypothesis testings have been widely studied.However,the critical values of these tests are usually related to quantities of an unknown error distribution and estimators of nuisance parameters.In this paper,we propose a randomly weighting test statistic and take its conditional distribution as an approximation to null distribution of the test statistic.It is shown that,under both the null and local alternative hypotheses,conditionally asymptotic distribution of the randomly weighting test statistic is the same as the null distribution of the test statistic.Therefore,the critical values of the test statistic can be obtained by randomly weighting method without estimating the nuisance parameters.At the same time,we also achieve the weak consistency and asymptotic normality of the randomly weighting least absolute deviation estimate in censored regression model.Simulation studies illustrate that the performance of our proposed resampling test method is better than that of central chi-square distribution under the null hypothesis.

  1. Antibiotic Resistances in Livestock: A Comparative Approach to Identify an Appropriate Regression Model for Count Data

    Directory of Open Access Journals (Sweden)

    Anke Hüls

    2017-05-01

    Full Text Available Antimicrobial resistance in livestock is a matter of general concern. To develop hygiene measures and methods for resistance prevention and control, epidemiological studies on a population level are needed to detect factors associated with antimicrobial resistance in livestock holdings. In general, regression models are used to describe these relationships between environmental factors and resistance outcome. Besides the study design, the correlation structures of the different outcomes of antibiotic resistance and structural zero measurements on the resistance outcome as well as on the exposure side are challenges for the epidemiological model building process. The use of appropriate regression models that acknowledge these complexities is essential to assure valid epidemiological interpretations. The aims of this paper are (i to explain the model building process comparing several competing models for count data (negative binomial model, quasi-Poisson model, zero-inflated model, and hurdle model and (ii to compare these models using data from a cross-sectional study on antibiotic resistance in animal husbandry. These goals are essential to evaluate which model is most suitable to identify potential prevention measures. The dataset used as an example in our analyses was generated initially to study the prevalence and associated factors for the appearance of cefotaxime-resistant Escherichia coli in 48 German fattening pig farms. For each farm, the outcome was the count of samples with resistant bacteria. There was almost no overdispersion and only moderate evidence of excess zeros in the data. Our analyses show that it is essential to evaluate regression models in studies analyzing the relationship between environmental factors and antibiotic resistances in livestock. After model comparison based on evaluation of model predictions, Akaike information criterion, and Pearson residuals, here the hurdle model was judged to be the most appropriate

  2. Remodeling and Estimation for Sparse Partially Linear Regression Models

    Directory of Open Access Journals (Sweden)

    Yunhui Zeng

    2013-01-01

    Full Text Available When the dimension of covariates in the regression model is high, one usually uses a submodel as a working model that contains significant variables. But it may be highly biased and the resulting estimator of the parameter of interest may be very poor when the coefficients of removed variables are not exactly zero. In this paper, based on the selected submodel, we introduce a two-stage remodeling method to get the consistent estimator for the parameter of interest. More precisely, in the first stage, by a multistep adjustment, we reconstruct an unbiased model based on the correlation information between the covariates; in the second stage, we further reduce the adjusted model by a semiparametric variable selection method and get a new estimator of the parameter of interest simultaneously. Its convergence rate and asymptotic normality are also obtained. The simulation results further illustrate that the new estimator outperforms those obtained by the submodel and the full model in the sense of mean square errors of point estimation and mean square prediction errors of model prediction.

  3. Modelling QTL effect on BTA06 using random regression test day models.

    Science.gov (United States)

    Suchocki, T; Szyda, J; Zhang, Q

    2013-02-01

    In statistical models, a quantitative trait locus (QTL) effect has been incorporated either as a fixed or as a random term, but, up to now, it has been mainly considered as a time-independent variable. However, for traits recorded repeatedly, it is very interesting to investigate the variation of QTL over time. The major goal of this study was to estimate the position and effect of QTL for milk, fat, protein yields and for somatic cell score based on test day records, while testing whether the effects are constant or variable throughout lactation. The analysed data consisted of 23 paternal half-sib families (716 daughters of 23 sires) of Chinese Holstein-Friesian cattle genotyped at 14 microsatellites located in the area of the casein loci on BTA6. A sequence of three models was used: (i) a lactation model, (ii) a random regression model with a QTL constant in time and (iii) a random regression model with a QTL variable in time. The results showed that, for each production trait, at least one significant QTL exists. For milk and protein yields, the QTL effect was variable in time, while for fat yield, each of the three models resulted in a significant QTL effect. When a QTL is incorporated into a model as a constant over time, its effect is averaged over lactation stages and may, thereby, be difficult or even impossible to be detected. Our results showed that, in such a situation, only a longitudinal model is able to identify loci significantly influencing trait variation.

  4. Information Criteria for Deciding between Normal Regression Models

    CERN Document Server

    Maier, Robert S

    2013-01-01

    Regression models fitted to data can be assessed on their goodness of fit, though models with many parameters should be disfavored to prevent over-fitting. Statisticians' tools for this are little known to physical scientists. These include the Akaike Information Criterion (AIC), a penalized goodness-of-fit statistic, and the AICc, a variant including a small-sample correction. They entered the physical sciences through being used by astrophysicists to compare cosmological models; e.g., predictions of the distance-redshift relation. The AICc is shown to have been misapplied, being applicable only if error variances are unknown. If error bars accompany the data, the AIC should be used instead. Erroneous applications of the AICc are listed in an appendix. It is also shown how the variability of the AIC difference between models with a known error variance can be estimated. This yields a significance test that can potentially replace the use of `Akaike weights' for deciding between such models. Additionally, the...

  5. Genomic breeding value estimation using nonparametric additive regression models

    Directory of Open Access Journals (Sweden)

    Solberg Trygve

    2009-01-01

    Full Text Available Abstract Genomic selection refers to the use of genomewide dense markers for breeding value estimation and subsequently for selection. The main challenge of genomic breeding value estimation is the estimation of many effects from a limited number of observations. Bayesian methods have been proposed to successfully cope with these challenges. As an alternative class of models, non- and semiparametric models were recently introduced. The present study investigated the ability of nonparametric additive regression models to predict genomic breeding values. The genotypes were modelled for each marker or pair of flanking markers (i.e. the predictors separately. The nonparametric functions for the predictors were estimated simultaneously using additive model theory, applying a binomial kernel. The optimal degree of smoothing was determined by bootstrapping. A mutation-drift-balance simulation was carried out. The breeding values of the last generation (genotyped was predicted using data from the next last generation (genotyped and phenotyped. The results show moderate to high accuracies of the predicted breeding values. A determination of predictor specific degree of smoothing increased the accuracy.

  6. THE REGRESSION MODEL OF IRAN LIBRARIES ORGANIZATIONAL CLIMATE.

    Science.gov (United States)

    Jahani, Mohammad Ali; Yaminfirooz, Mousa; Siamian, Hasan

    2015-10-01

    The purpose of this study was to drawing a regression model of organizational climate of central libraries of Iran's universities. This study is an applied research. The statistical population of this study consisted of 96 employees of the central libraries of Iran's public universities selected among the 117 universities affiliated to the Ministry of Health by Stratified Sampling method (510 people). Climate Qual localized questionnaire was used as research tools. For predicting the organizational climate pattern of the libraries is used from the multivariate linear regression and track diagram. of the 9 variables affecting organizational climate, 5 variables of innovation, teamwork, customer service, psychological safety and deep diversity play a major role in prediction of the organizational climate of Iran's libraries. The results also indicate that each of these variables with different coefficient have the power to predict organizational climate but the climate score of psychological safety (0.94) plays a very crucial role in predicting the organizational climate. Track diagram showed that five variables of teamwork, customer service, psychological safety, deep diversity and innovation directly effects on the organizational climate variable that contribution of the team work from this influence is more than any other variables. Of the indicator of the organizational climate of climateQual, the contribution of the team work from this influence is more than any other variables that reinforcement of teamwork in academic libraries can be more effective in improving the organizational climate of this type libraries.

  7. THE REGRESSION MODEL OF IRAN LIBRARIES ORGANIZATIONAL CLIMATE

    Science.gov (United States)

    Jahani, Mohammad Ali; Yaminfirooz, Mousa; Siamian, Hasan

    2015-01-01

    Background: The purpose of this study was to drawing a regression model of organizational climate of central libraries of Iran’s universities. Methods: This study is an applied research. The statistical population of this study consisted of 96 employees of the central libraries of Iran’s public universities selected among the 117 universities affiliated to the Ministry of Health by Stratified Sampling method (510 people). Climate Qual localized questionnaire was used as research tools. For predicting the organizational climate pattern of the libraries is used from the multivariate linear regression and track diagram. Results: of the 9 variables affecting organizational climate, 5 variables of innovation, teamwork, customer service, psychological safety and deep diversity play a major role in prediction of the organizational climate of Iran’s libraries. The results also indicate that each of these variables with different coefficient have the power to predict organizational climate but the climate score of psychological safety (0.94) plays a very crucial role in predicting the organizational climate. Track diagram showed that five variables of teamwork, customer service, psychological safety, deep diversity and innovation directly effects on the organizational climate variable that contribution of the team work from this influence is more than any other variables. Conclusions: Of the indicator of the organizational climate of climateQual, the contribution of the team work from this influence is more than any other variables that reinforcement of teamwork in academic libraries can be more effective in improving the organizational climate of this type libraries. PMID:26622203

  8. Externalizing Behaviour for Analysing System Models

    DEFF Research Database (Denmark)

    Ivanova, Marieta Georgieva; Probst, Christian W.; Hansen, René Rydhof

    2013-01-01

    attackers. Therefore, many attacks are considerably easier to be performed for insiders than for outsiders. However, current models do not support explicit specification of different behaviours. Instead, behaviour is deeply embedded in the analyses supported by the models, meaning that it is a complex......, if not impossible task to change behaviours. Especially when considering social engineering or the human factor in general, the ability to use different kinds of behaviours is essential. In this work we present an approach to make the behaviour a separate component in system models, and explore how to integrate......System models have recently been introduced to model organisations and evaluate their vulnerability to threats and especially insider threats. Especially for the latter these models are very suitable, since insiders can be assumed to have more knowledge about the attacked organisation than outside...

  9. A Gompertz regression model for fern spores germination

    Directory of Open Access Journals (Sweden)

    Gabriel y Galán, Jose María

    2015-06-01

    Full Text Available Germination is one of the most important biological processes for both seed and spore plants, also for fungi. At present, mathematical models of germination have been developed in fungi, bryophytes and several plant species. However, ferns are the only group whose germination has never been modelled. In this work we develop a regression model of the germination of fern spores. We have found that for Blechnum serrulatum, Blechnum yungense, Cheilanthes pilosa, Niphidium macbridei and Polypodium feuillei species the Gompertz growth model describe satisfactorily cumulative germination. An important result is that regression parameters are independent of fern species and the model is not affected by intraspecific variation. Our results show that the Gompertz curve represents a general germination model for all the non-green spore leptosporangiate ferns, including in the paper a discussion about the physiological and ecological meaning of the model.La germinación es uno de los procesos biológicos más relevantes tanto para las plantas con esporas, como para las plantas con semillas y los hongos. Hasta el momento, se han desarrollado modelos de germinación para hongos, briofitos y diversas especies de espermatófitos. Los helechos son el único grupo de plantas cuya germinación nunca ha sido modelizada. En este trabajo se desarrolla un modelo de regresión para explicar la germinación de las esporas de helechos. Observamos que para las especies Blechnum serrulatum, Blechnum yungense, Cheilanthes pilosa, Niphidium macbridei y Polypodium feuillei el modelo de crecimiento de Gompertz describe satisfactoriamente la germinación acumulativa. Un importante resultado es que los parámetros de la regresión son independientes de la especie y que el modelo no está afectado por variación intraespecífica. Por lo tanto, los resultados del trabajo muestran que la curva de Gompertz puede representar un modelo general para todos los helechos leptosporangiados

  10. Meta-Modeling by Symbolic Regression and Pareto Simulated Annealing

    NARCIS (Netherlands)

    Stinstra, E.; Rennen, G.; Teeuwen, G.J.A.

    2006-01-01

    The subject of this paper is a new approach to Symbolic Regression.Other publications on Symbolic Regression use Genetic Programming.This paper describes an alternative method based on Pareto Simulated Annealing.Our method is based on linear regression for the estimation of constants.Interval arithm

  11. Genetic parameters for growth characteristics of free-range chickens under univariate random regression models.

    Science.gov (United States)

    Rovadoscki, Gregori A; Petrini, Juliana; Ramirez-Diaz, Johanna; Pertile, Simone F N; Pertille, Fábio; Salvian, Mayara; Iung, Laiza H S; Rodriguez, Mary Ana P; Zampar, Aline; Gaya, Leila G; Carvalho, Rachel S B; Coelho, Antonio A D; Savino, Vicente J M; Coutinho, Luiz L; Mourão, Gerson B

    2016-09-01

    Repeated measures from the same individual have been analyzed by using repeatability and finite dimension models under univariate or multivariate analyses. However, in the last decade, the use of random regression models for genetic studies with longitudinal data have become more common. Thus, the aim of this research was to estimate genetic parameters for body weight of four experimental chicken lines by using univariate random regression models. Body weight data from hatching to 84 days of age (n = 34,730) from four experimental free-range chicken lines (7P, Caipirão da ESALQ, Caipirinha da ESALQ and Carijó Barbado) were used. The analysis model included the fixed effects of contemporary group (gender and rearing system), fixed regression coefficients for age at measurement, and random regression coefficients for permanent environmental effects and additive genetic effects. Heterogeneous variances for residual effects were considered, and one residual variance was assigned for each of six subclasses of age at measurement. Random regression curves were modeled by using Legendre polynomials of the second and third orders, with the best model chosen based on the Akaike Information Criterion, Bayesian Information Criterion, and restricted maximum likelihood. Multivariate analyses under the same animal mixed model were also performed for the validation of the random regression models. The Legendre polynomials of second order were better for describing the growth curves of the lines studied. Moderate to high heritabilities (h(2) = 0.15 to 0.98) were estimated for body weight between one and 84 days of age, suggesting that selection for body weight at all ages can be used as a selection criteria. Genetic correlations among body weight records obtained through multivariate analyses ranged from 0.18 to 0.96, 0.12 to 0.89, 0.06 to 0.96, and 0.28 to 0.96 in 7P, Caipirão da ESALQ, Caipirinha da ESALQ, and Carijó Barbado chicken lines, respectively. Results indicate that

  12. Modeling Information Content Via Dirichlet-Multinomial Regression Analysis.

    Science.gov (United States)

    Ferrari, Alberto

    2017-02-16

    Shannon entropy is being increasingly used in biomedical research as an index of complexity and information content in sequences of symbols, e.g. languages, amino acid sequences, DNA methylation patterns and animal vocalizations. Yet, distributional properties of information entropy as a random variable have seldom been the object of study, leading to researchers mainly using linear models or simulation-based analytical approach to assess differences in information content, when entropy is measured repeatedly in different experimental conditions. Here a method to perform inference on entropy in such conditions is proposed. Building on results coming from studies in the field of Bayesian entropy estimation, a symmetric Dirichlet-multinomial regression model, able to deal efficiently with the issue of mean entropy estimation, is formulated. Through a simulation study the model is shown to outperform linear modeling in a vast range of scenarios and to have promising statistical properties. As a practical example, the method is applied to a data set coming from a real experiment on animal communication.

  13. A nonlinear regression model-based predictive control algorithm.

    Science.gov (United States)

    Dubay, R; Abu-Ayyad, M; Hernandez, J M

    2009-04-01

    This paper presents a unique approach for designing a nonlinear regression model-based predictive controller (NRPC) for single-input-single-output (SISO) and multi-input-multi-output (MIMO) processes that are common in industrial applications. The innovation of this strategy is that the controller structure allows nonlinear open-loop modeling to be conducted while closed-loop control is executed every sampling instant. Consequently, the system matrix is regenerated every sampling instant using a continuous function providing a more accurate prediction of the plant. Computer simulations are carried out on nonlinear plants, demonstrating that the new approach is easily implemented and provides tight control. Also, the proposed algorithm is implemented on two real time SISO applications; a DC motor, a plastic injection molding machine and a nonlinear MIMO thermal system comprising three temperature zones to be controlled with interacting effects. The experimental closed-loop responses of the proposed algorithm were compared to a multi-model dynamic matrix controller (MPC) with improved results for various set point trajectories. Good disturbance rejection was attained, resulting in improved tracking of multi-set point profiles in comparison to multi-model MPC.

  14. Statistical Inference for Partially Linear Regression Models with Measurement Errors

    Institute of Scientific and Technical Information of China (English)

    Jinhong YOU; Qinfeng XU; Bin ZHOU

    2008-01-01

    In this paper, the authors investigate three aspects of statistical inference for the partially linear regression models where some covariates are measured with errors. Firstly,a bandwidth selection procedure is proposed, which is a combination of the difference-based technique and GCV method. Secondly, a goodness-of-fit test procedure is proposed,which is an extension of the generalized likelihood technique. Thirdly, a variable selection procedure for the parametric part is provided based on the nonconcave penalization and corrected profile least squares. Same as "Variable selection via nonconcave penalized like-lihood and its oracle properties" (J. Amer. Statist. Assoc., 96, 2001, 1348-1360), it is shown that the resulting estimator has an oracle property with a proper choice of regu-larization parameters and penalty function. Simulation studies are conducted to illustrate the finite sample performances of the proposed procedures.

  15. Projection-type estimation for varying coefficient regression models

    CERN Document Server

    Lee, Young K; Park, Byeong U; 10.3150/10-BEJ331

    2012-01-01

    In this paper we introduce new estimators of the coefficient functions in the varying coefficient regression model. The proposed estimators are obtained by projecting the vector of the full-dimensional kernel-weighted local polynomial estimators of the coefficient functions onto a Hilbert space with a suitable norm. We provide a backfitting algorithm to compute the estimators. We show that the algorithm converges at a geometric rate under weak conditions. We derive the asymptotic distributions of the estimators and show that the estimators have the oracle properties. This is done for the general order of local polynomial fitting and for the estimation of the derivatives of the coefficient functions, as well as the coefficient functions themselves. The estimators turn out to have several theoretical and numerical advantages over the marginal integration estimators studied by Yang, Park, Xue and H\\"{a}rdle [J. Amer. Statist. Assoc. 101 (2006) 1212--1227].

  16. The R Package threg to Implement Threshold Regression Models

    Directory of Open Access Journals (Sweden)

    Tao Xiao

    2015-08-01

    This new package includes four functions: threg, and the methods hr, predict and plot for threg objects returned by threg. The threg function is the model-fitting function which is used to calculate regression coefficient estimates, asymptotic standard errors and p values. The hr method for threg objects is the hazard-ratio calculation function which provides the estimates of hazard ratios at selected time points for specified scenarios (based on given categories or value settings of covariates. The predict method for threg objects is used for prediction. And the plot method for threg objects provides plots for curves of estimated hazard functions, survival functions and probability density functions of the first-hitting-time; function curves corresponding to different scenarios can be overlaid in the same plot for comparison to give additional research insights.

  17. Classification and regression tree (CART analyses of genomic signatures reveal sets of tetramers that discriminate temperature optima of archaea and bacteria

    Directory of Open Access Journals (Sweden)

    Betsey Dexter Dyer

    2008-01-01

    Full Text Available Classification and regression tree (CART analysis was applied to genome-wide tetranucleotide frequencies (genomic signatures of 195 archaea and bacteria. Although genomic signatures have typically been used to classify evolutionary divergence, in this study, convergent evolution was the focus. Temperature optima for most of the organisms examined could be distinguished by CART analyses of tetranucleotide frequencies. This suggests that pervasive (nonlinear qualities of genomes may reflect certain environmental conditions (such as temperature in which those genomes evolved. The predominant use of GAGA and AGGA as the discriminating tetramers in CART models suggests that purine-loading and codon biases of thermophiles may explain some of the results.

  18. Epistasis analysis for quantitative traits by functional regression model.

    Science.gov (United States)

    Zhang, Futao; Boerwinkle, Eric; Xiong, Momiao

    2014-06-01

    The critical barrier in interaction analysis for rare variants is that most traditional statistical methods for testing interactions were originally designed for testing the interaction between common variants and are difficult to apply to rare variants because of their prohibitive computational time and poor ability. The great challenges for successful detection of interactions with next-generation sequencing (NGS) data are (1) lack of methods for interaction analysis with rare variants, (2) severe multiple testing, and (3) time-consuming computations. To meet these challenges, we shift the paradigm of interaction analysis between two loci to interaction analysis between two sets of loci or genomic regions and collectively test interactions between all possible pairs of SNPs within two genomic regions. In other words, we take a genome region as a basic unit of interaction analysis and use high-dimensional data reduction and functional data analysis techniques to develop a novel functional regression model to collectively test interactions between all possible pairs of single nucleotide polymorphisms (SNPs) within two genome regions. By intensive simulations, we demonstrate that the functional regression models for interaction analysis of the quantitative trait have the correct type 1 error rates and a much better ability to detect interactions than the current pairwise interaction analysis. The proposed method was applied to exome sequence data from the NHLBI's Exome Sequencing Project (ESP) and CHARGE-S study. We discovered 27 pairs of genes showing significant interactions after applying the Bonferroni correction (P-values < 4.58 × 10(-10)) in the ESP, and 11 were replicated in the CHARGE-S study.

  19. Robust Medical Test Evaluation Using Flexible Bayesian Semiparametric Regression Models

    Directory of Open Access Journals (Sweden)

    Adam J. Branscum

    2013-01-01

    Full Text Available The application of Bayesian methods is increasing in modern epidemiology. Although parametric Bayesian analysis has penetrated the population health sciences, flexible nonparametric Bayesian methods have received less attention. A goal in nonparametric Bayesian analysis is to estimate unknown functions (e.g., density or distribution functions rather than scalar parameters (e.g., means or proportions. For instance, ROC curves are obtained from the distribution functions corresponding to continuous biomarker data taken from healthy and diseased populations. Standard parametric approaches to Bayesian analysis involve distributions with a small number of parameters, where the prior specification is relatively straight forward. In the nonparametric Bayesian case, the prior is placed on an infinite dimensional space of all distributions, which requires special methods. A popular approach to nonparametric Bayesian analysis that involves Polya tree prior distributions is described. We provide example code to illustrate how models that contain Polya tree priors can be fit using SAS software. The methods are used to evaluate the covariate-specific accuracy of the biomarker, soluble epidermal growth factor receptor, for discerning lung cancer cases from controls using a flexible ROC regression modeling framework. The application highlights the usefulness of flexible models over a standard parametric method for estimating ROC curves.

  20. Modeling Pan Evaporation for Kuwait by Multiple Linear Regression

    Directory of Open Access Journals (Sweden)

    Jaber Almedeij

    2012-01-01

    Full Text Available Evaporation is an important parameter for many projects related to hydrology and water resources systems. This paper constitutes the first study conducted in Kuwait to obtain empirical relations for the estimation of daily and monthly pan evaporation as functions of available meteorological data of temperature, relative humidity, and wind speed. The data used here for the modeling are daily measurements of substantial continuity coverage, within a period of 17 years between January 1993 and December 2009, which can be considered representative of the desert climate of the urban zone of the country. Multiple linear regression technique is used with a procedure of variable selection for fitting the best model forms. The correlations of evaporation with temperature and relative humidity are also transformed in order to linearize the existing curvilinear patterns of the data by using power and exponential functions, respectively. The evaporation models suggested with the best variable combinations were shown to produce results that are in a reasonable agreement with observation values.

  1. Principal Component and Multiple Regression Analyses for the Estimation of Suspended Sediment Yield in Ungauged Basins of Northern Thailand

    Directory of Open Access Journals (Sweden)

    Piyawat Wuttichaikitcharoen

    2014-08-01

    Full Text Available Predicting sediment yield is necessary for good land and water management in any river basin. However, sometimes, the sediment data is either not available or is sparse, which renders estimating sediment yield a daunting task. The present study investigates the factors influencing suspended sediment yield using the principal component analysis (PCA. Additionally, the regression relationships for estimating suspended sediment yield, based on the selected key factors from the PCA, are developed. The PCA shows six components of key factors that can explain at least up to 86.7% of the variation of all variables. The regression models show that basin size, channel network characteristics, land use, basin steepness and rainfall distribution are the key factors affecting sediment yield. The validation of regression relationships for estimating suspended sediment yield shows the error of estimation ranging from −55% to +315% and −59% to +259% for suspended sediment yield and for area-specific suspended sediment yield, respectively. The proposed relationships may be considered useful for predicting suspended sediment yield in ungauged basins of Northern Thailand that have geologic, climatic and hydrologic conditions similar to the study area.

  2. A poisson regression approach for modelling spatial autocorrelation between geographically referenced observations

    Directory of Open Access Journals (Sweden)

    Jolley Damien

    2011-10-01

    Full Text Available Abstract Background Analytic methods commonly used in epidemiology do not account for spatial correlation between observations. In regression analyses, omission of that autocorrelation can bias parameter estimates and yield incorrect standard error estimates. Methods We used age standardised incidence ratios (SIRs of esophageal cancer (EC from the Babol cancer registry from 2001 to 2005, and extracted socioeconomic indices from the Statistical Centre of Iran. The following models for SIR were used: (1 Poisson regression with agglomeration-specific nonspatial random effects; (2 Poisson regression with agglomeration-specific spatial random effects. Distance-based and neighbourhood-based autocorrelation structures were used for defining the spatial random effects and a pseudolikelihood approach was applied to estimate model parameters. The Bayesian information criterion (BIC, Akaike's information criterion (AIC and adjusted pseudo R2, were used for model comparison. Results A Gaussian semivariogram with an effective range of 225 km best fit spatial autocorrelation in agglomeration-level EC incidence. The Moran's I index was greater than its expected value indicating systematic geographical clustering of EC. The distance-based and neighbourhood-based Poisson regression estimates were generally similar. When residual spatial dependence was modelled, point and interval estimates of covariate effects were different to those obtained from the nonspatial Poisson model. Conclusions The spatial pattern evident in the EC SIR and the observation that point estimates and standard errors differed depending on the modelling approach indicate the importance of accounting for residual spatial correlation in analyses of EC incidence in the Caspian region of Iran. Our results also illustrate that spatial smoothing must be applied with care.

  3. The microcomputer scientific software series 2: general linear model--regression.

    Science.gov (United States)

    Harold M. Rauscher

    1983-01-01

    The general linear model regression (GLMR) program provides the microcomputer user with a sophisticated regression analysis capability. The output provides a regression ANOVA table, estimators of the regression model coefficients, their confidence intervals, confidence intervals around the predicted Y-values, residuals for plotting, a check for multicollinearity, a...

  4. Genomic Prediction of Genotype × Environment Interaction Kernel Regression Models.

    Science.gov (United States)

    Cuevas, Jaime; Crossa, José; Soberanis, Víctor; Pérez-Elizalde, Sergio; Pérez-Rodríguez, Paulino; Campos, Gustavo de Los; Montesinos-López, O A; Burgueño, Juan

    2016-11-01

    In genomic selection (GS), genotype × environment interaction (G × E) can be modeled by a marker × environment interaction (M × E). The G × E may be modeled through a linear kernel or a nonlinear (Gaussian) kernel. In this study, we propose using two nonlinear Gaussian kernels: the reproducing kernel Hilbert space with kernel averaging (RKHS KA) and the Gaussian kernel with the bandwidth estimated through an empirical Bayesian method (RKHS EB). We performed single-environment analyses and extended to account for G × E interaction (GBLUP-G × E, RKHS KA-G × E and RKHS EB-G × E) in wheat ( L.) and maize ( L.) data sets. For single-environment analyses of wheat and maize data sets, RKHS EB and RKHS KA had higher prediction accuracy than GBLUP for all environments. For the wheat data, the RKHS KA-G × E and RKHS EB-G × E models did show up to 60 to 68% superiority over the corresponding single environment for pairs of environments with positive correlations. For the wheat data set, the models with Gaussian kernels had accuracies up to 17% higher than that of GBLUP-G × E. For the maize data set, the prediction accuracy of RKHS EB-G × E and RKHS KA-G × E was, on average, 5 to 6% higher than that of GBLUP-G × E. The superiority of the Gaussian kernel models over the linear kernel is due to more flexible kernels that accounts for small, more complex marker main effects and marker-specific interaction effects.

  5. Air Pollution Analysis using Ontologies and Regression Models

    Directory of Open Access Journals (Sweden)

    Parul Choudhary

    2016-07-01

    Full Text Available Rapidly throughout the world economy, "the expansive Web" in the "world" explosive growth, rapidly growing market characterized by short product cycles exists and the demand for increased flexibility as well as the extensive use of a new data vision managed data society. A new socio-economic system that relies more and more on movement and allocation results in data whose daily existence, refinement, economy and adjust the exchange industry. Cooperative Engineering Co -operation and multi -disciplinary installed on people's cooperation is a good example. Semantic Web is a new form of Web content that is meaningful to computers and additional approved another example. Communication, vision sharing and exchanging data Society's are new commercial bet. Urban air pollution modeling and data processing techniques need elevated Association. Artificial intelligence in countless ways and breakthrough technologies can solve environmental problems from uneven offers. A method for data to formal ontology means a true meaning and lack of ambiguity to allow us to portray memo. In this work we survey regression model for ontologies and air pollution.

  6. Modelling and Analysing Socio-Technical Systems

    DEFF Research Database (Denmark)

    Aslanyan, Zaruhi; Ivanova, Marieta Georgieva; Nielson, Flemming

    2015-01-01

    with social engineering. Due to this combination of attack steps on technical and social levels, risk assessment in socio-technical systems is complex. Therefore, established risk assessment methods often abstract away the internal structure of an organisation and ignore human factors when modelling...... and assessing attacks. In our work we model all relevant levels of socio-technical systems, and propose evaluation techniques for analysing the security properties of the model. Our approach simplifies the identification of possible attacks and provides qualified assessment and ranking of attacks based...... on the expected impact. We demonstrate our approach on a home-payment system. The system is specifically designed to help elderly or disabled people, who may have difficulties leaving their home, to pay for some services, e.g., care-taking or rent. The payment is performed using the remote control of a television...

  7. Modeling Fire Occurrence at the City Scale: A Comparison between Geographically Weighted Regression and Global Linear Regression.

    Science.gov (United States)

    Song, Chao; Kwan, Mei-Po; Zhu, Jiping

    2017-04-08

    An increasing number of fires are occurring with the rapid development of cities, resulting in increased risk for human beings and the environment. This study compares geographically weighted regression-based models, including geographically weighted regression (GWR) and geographically and temporally weighted regression (GTWR), which integrates spatial and temporal effects and global linear regression models (LM) for modeling fire risk at the city scale. The results show that the road density and the spatial distribution of enterprises have the strongest influences on fire risk, which implies that we should focus on areas where roads and enterprises are densely clustered. In addition, locations with a large number of enterprises have fewer fire ignition records, probably because of strict management and prevention measures. A changing number of significant variables across space indicate that heterogeneity mainly exists in the northern and eastern rural and suburban areas of Hefei city, where human-related facilities or road construction are only clustered in the city sub-centers. GTWR can capture small changes in the spatiotemporal heterogeneity of the variables while GWR and LM cannot. An approach that integrates space and time enables us to better understand the dynamic changes in fire risk. Thus governments can use the results to manage fire safety at the city scale.

  8. Genetic analysis of somatic cell score in Norwegian cattle using random regression test-day models.

    Science.gov (United States)

    Odegård, J; Jensen, J; Klemetsdal, G; Madsen, P; Heringstad, B

    2003-12-01

    The dataset used in this analysis contained a total of 341,736 test-day observations of somatic cell scores from 77,110 primiparous daughters of 1965 Norwegian Cattle sires. Initial analyses, using simple random regression models without genetic effects, indicated that use of homogeneous residual variance was appropriate. Further analyses were carried out by use of a repeatability model and 12 random regression sire models. Legendre polynomials of varying order were used to model both permanent environmental and sire effects, as did the Wilmink function, the Lidauer-Mäntysaari function, and the Ali-Schaeffer function. For all these models, heritability estimates were lowest at the beginning (0.05 to 0.07) and higher at the end (0.09 to 0.12) of lactation. Genetic correlations between somatic cell scores early and late in lactation were moderate to high (0.38 to 0.71), whereas genetic correlations for adjacent DIM were near unity. Models were compared based on likelihood ratio tests, Bayesian information criterion, Akaike information criterion, residual variance, and predictive ability. Based on prediction of randomly excluded observations, models with 4 coefficients for permanent environmental effect were preferred over simpler models. More highly parameterized models did not substantially increase predictive ability. Evaluation of the different model selection criteria indicated that a reduced order of fit for sire effects was desireable. Models with zeroth- or first-order of fit for sire effects and higher order of fit for permanent environmental effects probably underestimated sire variance. The chosen model had Legendre polynomials with 3 coefficients for sire, and 4 coefficients for permanent environmental effects. For this model, trajectories of sire variance and heritability were similar assuming either homogeneous or heterogeneous residual variance structure.

  9. MODELING SNAKE MICROHABITAT FROM RADIOTELEMETRY STUDIES USING POLYTOMOUS LOGISTIC REGRESSION

    Science.gov (United States)

    Multivariate analysis of snake microhabitat has historically used techniques that were derived under assumptions of normality and common covariance structure (e.g., discriminant function analysis, MANOVA). In this study, polytomous logistic regression (PLR which does not require ...

  10. Correlation-regression model for physico-chemical quality of ...

    African Journals Online (AJOL)

    abusaad

    Key words: Groundwater, water quality, bore well, water supply, correlation, regression. INTRODUCTION ..... interpreting groundwater quality data and relating them to specific hydro ..... Regional trends in nitrate content of Texas groundwater.

  11. Evidence from regression-discontinuity analyses for beneficial effects of a criterion-based increase in alcohol treatment.

    Science.gov (United States)

    Flam-Zalcman, Rosely; Mann, Robert E; Stoduto, Gina; Nochajski, Thomas H; Rush, Brian R; Koski-Jännes, Anja; Wickens, Christine M; Thomas, Rita K; Rehm, Jürgen

    2013-03-01

    Brief interventions effectively reduce alcohol problems; however, it is controversial whether longer interventions result in greater improvement. This study aims to determine whether an increase in treatment for people with more severe problems resulted in better outcome. We employed regression-discontinuity analyses to determine if drinking driver clients (n = 22,277) in Ontario benefited when they were assigned to a longer treatment program (8-hour versus 16-hour) based on assessed addiction severity criteria. Assignment to the longer16-hour program was based on two addiction severity measures derived from the Research Institute on Addictions Self-inventory (RIASI) (meeting criteria for assignment based on either the total RIASI score or the score on the recidivism subscale). The main outcome measure was self-reported number of days of alcohol use during the 90 days preceding the six month follow-up interview. We found significant reductions of one or two self-reported drinking days at the point of assignment, depending on the severity criterion used. These data suggest that more intensive treatment for alcohol problems may improve results for individuals with more severe problems.

  12. Regression of retinopathy by squalamine in a mouse model.

    Science.gov (United States)

    Higgins, Rosemary D; Yan, Yun; Geng, Yixun; Zasloff, Michael; Williams, Jon I

    2004-07-01

    The goal of this study was to determine whether an antiangiogenic agent, squalamine, given late during the evolution of oxygen-induced retinopathy (OIR) in the mouse, could improve retinal neovascularization. OIR was induced in neonatal C57BL6 mice and the neonates were treated s.c. with squalamine doses begun at various times after OIR induction. A system of retinal whole mounts and assessment of neovascular nuclei extending beyond the inner limiting membrane from animals reared under room air or OIR conditions and killed periodically from d 12 to 21 were used to assess retinopathy in squalamine-treated and untreated animals. OIR evolved after 75% oxygen exposure in neonatal mice with florid retinal neovascularization developing by d 14. Squalamine (single dose, 25 mg/kg s.c.) given on d 15 or 16, but not d 17, substantially improved retinal neovascularization in the mouse model of OIR. There was improvement seen in the degree of blood vessel tuft formation, blood vessel tortuosity, and central vasoconstriction with squalamine treatment at d 15 or 16. Single-dose squalamine at d 12 was effective at reducing subsequent development of retinal neovascularization at doses as low as 1 mg/kg. Squalamine is a very active inhibitor of OIR in mouse neonates at doses as low as 1 mg/kg given once. Further, squalamine given late in the course of OIR improves retinopathy by inducing regression of retinal neovessels and abrogating invasion of new vessels beyond the inner-limiting membrane of the retina.

  13. Linking Simple Economic Theory Models and the Cointegrated Vector AutoRegressive Model

    DEFF Research Database (Denmark)

    Møller, Niels Framroze

    This paper attempts to clarify the connection between simple economic theory models and the approach of the Cointegrated Vector-Auto-Regressive model (CVAR). By considering (stylized) examples of simple static equilibrium models, it is illustrated in detail, how the theoretical model and its...

  14. Random regression models using Legendre orthogonal polynomials to evaluate the milk production of Alpine goats.

    Science.gov (United States)

    Silva, F G; Torres, R A; Brito, L F; Euclydes, R F; Melo, A L P; Souza, N O; Ribeiro, J I; Rodrigues, M T

    2013-12-11

    The objective of this study was to identify the best random regression model using Legendre orthogonal polynomials to evaluate Alpine goats genetically and to estimate the parameters for test day milk yield. On the test day, we analyzed 20,710 records of milk yield of 667 goats from the Goat Sector of the Universidade Federal de Viçosa. The evaluated models had combinations of distinct fitting orders for polynomials (2-5), random genetic (1-7), and permanent environmental (1-7) fixed curves and a number of classes for residual variance (2, 4, 5, and 6). WOMBAT software was used for all genetic analyses. A random regression model using the best Legendre orthogonal polynomial for genetic evaluation of milk yield on the test day of Alpine goats considered a fixed curve of order 4, curve of genetic additive effects of order 2, curve of permanent environmental effects of order 7, and a minimum of 5 classes of residual variance because it was the most economical model among those that were equivalent to the complete model by the likelihood ratio test. Phenotypic variance and heritability were higher at the end of the lactation period, indicating that the length of lactation has more genetic components in relation to the production peak and persistence. It is very important that the evaluation utilizes the best combination of fixed, genetic additive and permanent environmental regressions, and number of classes of heterogeneous residual variance for genetic evaluation using random regression models, thereby enhancing the precision and accuracy of the estimates of parameters and prediction of genetic values.

  15. Household Food Waste: Multivariate Regression and Principal Components Analyses of Awareness and Attitudes among U.S. Consumers.

    Science.gov (United States)

    Qi, Danyi; Roe, Brian E

    2016-01-01

    We estimate models of consumer food waste awareness and attitudes using responses from a national survey of U.S. residents. Our models are interpreted through the lens of several theories that describe how pro-social behaviors relate to awareness, attitudes and opinions. Our analysis of patterns among respondents' food waste attitudes yields a model with three principal components: one that represents perceived practical benefits households may lose if food waste were reduced, one that represents the guilt associated with food waste, and one that represents whether households feel they could be doing more to reduce food waste. We find our respondents express significant agreement that some perceived practical benefits are ascribed to throwing away uneaten food, e.g., nearly 70% of respondents agree that throwing away food after the package date has passed reduces the odds of foodborne illness, while nearly 60% agree that some food waste is necessary to ensure meals taste fresh. We identify that these attitudinal responses significantly load onto a single principal component that may represent a key attitudinal construct useful for policy guidance. Further, multivariate regression analysis reveals a significant positive association between the strength of this component and household income, suggesting that higher income households most strongly agree with statements that link throwing away uneaten food to perceived private benefits.

  16. Household Food Waste: Multivariate Regression and Principal Components Analyses of Awareness and Attitudes among U.S. Consumers

    Science.gov (United States)

    2016-01-01

    We estimate models of consumer food waste awareness and attitudes using responses from a national survey of U.S. residents. Our models are interpreted through the lens of several theories that describe how pro-social behaviors relate to awareness, attitudes and opinions. Our analysis of patterns among respondents’ food waste attitudes yields a model with three principal components: one that represents perceived practical benefits households may lose if food waste were reduced, one that represents the guilt associated with food waste, and one that represents whether households feel they could be doing more to reduce food waste. We find our respondents express significant agreement that some perceived practical benefits are ascribed to throwing away uneaten food, e.g., nearly 70% of respondents agree that throwing away food after the package date has passed reduces the odds of foodborne illness, while nearly 60% agree that some food waste is necessary to ensure meals taste fresh. We identify that these attitudinal responses significantly load onto a single principal component that may represent a key attitudinal construct useful for policy guidance. Further, multivariate regression analysis reveals a significant positive association between the strength of this component and household income, suggesting that higher income households most strongly agree with statements that link throwing away uneaten food to perceived private benefits. PMID:27441687

  17. Household Food Waste: Multivariate Regression and Principal Components Analyses of Awareness and Attitudes among U.S. Consumers.

    Directory of Open Access Journals (Sweden)

    Danyi Qi

    Full Text Available We estimate models of consumer food waste awareness and attitudes using responses from a national survey of U.S. residents. Our models are interpreted through the lens of several theories that describe how pro-social behaviors relate to awareness, attitudes and opinions. Our analysis of patterns among respondents' food waste attitudes yields a model with three principal components: one that represents perceived practical benefits households may lose if food waste were reduced, one that represents the guilt associated with food waste, and one that represents whether households feel they could be doing more to reduce food waste. We find our respondents express significant agreement that some perceived practical benefits are ascribed to throwing away uneaten food, e.g., nearly 70% of respondents agree that throwing away food after the package date has passed reduces the odds of foodborne illness, while nearly 60% agree that some food waste is necessary to ensure meals taste fresh. We identify that these attitudinal responses significantly load onto a single principal component that may represent a key attitudinal construct useful for policy guidance. Further, multivariate regression analysis reveals a significant positive association between the strength of this component and household income, suggesting that higher income households most strongly agree with statements that link throwing away uneaten food to perceived private benefits.

  18. Regression model for tuning the PID controller with fractional order time delay system

    OpenAIRE

    S.P. Agnihotri; Laxman Madhavrao Waghmare

    2014-01-01

    In this paper a regression model based for tuning proportional integral derivative (PID) controller with fractional order time delay system is proposed. The novelty of this paper is that tuning parameters of the fractional order time delay system are optimally predicted using the regression model. In the proposed method, the output parameters of the fractional order system are used to derive the regression function. Here, the regression model depends on the weights of the exponential function...

  19. A generalized additive regression model for survival times

    DEFF Research Database (Denmark)

    Scheike, Thomas H.

    2001-01-01

    Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models......Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models...

  20. A generalized additive regression model for survival times

    DEFF Research Database (Denmark)

    Scheike, Thomas H.

    2001-01-01

    Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models......Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models...

  1. A Computationally Efficient State Space Approach to Estimating Multilevel Regression Models and Multilevel Confirmatory Factor Models.

    Science.gov (United States)

    Gu, Fei; Preacher, Kristopher J; Wu, Wei; Yung, Yiu-Fai

    2014-01-01

    Although the state space approach for estimating multilevel regression models has been well established for decades in the time series literature, it does not receive much attention from educational and psychological researchers. In this article, we (a) introduce the state space approach for estimating multilevel regression models and (b) extend the state space approach for estimating multilevel factor models. A brief outline of the state space formulation is provided and then state space forms for univariate and multivariate multilevel regression models, and a multilevel confirmatory factor model, are illustrated. The utility of the state space approach is demonstrated with either a simulated or real example for each multilevel model. It is concluded that the results from the state space approach are essentially identical to those from specialized multilevel regression modeling and structural equation modeling software. More importantly, the state space approach offers researchers a computationally more efficient alternative to fit multilevel regression models with a large number of Level 1 units within each Level 2 unit or a large number of observations on each subject in a longitudinal study.

  2. Growth regression models at two generations of selected populations Alabio ducks

    Directory of Open Access Journals (Sweden)

    L Hardi Prasetyo

    2007-12-01

    Full Text Available A selection process to increase egg production of Alabio ducks was conducted in Balai Penelitian Ternak, Ciawi-Bogor. The selection aimed at increasing production, however observation on growth of the selected ducks was necessary since early growth stage (0-8 wks determines the performance during laying period. This paper presents the growth models and the coefficient of determination of two generations of selected Alabio ducks. Body weight were observed weekly on 363 ducks from F1 and 356 ducks from F2, between 0-8 weeks and then fortinghly until 16 weeks. Growth curves were analysed using regression models between age and bodyweight of each population. The selection of model with the best fit was based on the large value of determination coefficient (R2, small value of MSE, and sinificant level of regression coefficient. Result showed that cubic polynomial regression was the best fit for the two populations, Y = 56.31-1.44X+0.64X2-0.005X3 for F1 and Y = 43.05 + 0.96X + 0.69X2 - 0.0056X3 for F2. The values of R2 were 0.9466 for F1 and 0.9243 for F2, and the values of MSE were 11.586 for F1 and 19.978 for F2. The growth of F1 is better during starter period, but F2 is better during grower period.

  3. High dimensional linear regression models under long memory dependence and measurement error

    Science.gov (United States)

    Kaul, Abhishek

    This dissertation consists of three chapters. The first chapter introduces the models under consideration and motivates problems of interest. A brief literature review is also provided in this chapter. The second chapter investigates the properties of Lasso under long range dependent model errors. Lasso is a computationally efficient approach to model selection and estimation, and its properties are well studied when the regression errors are independent and identically distributed. We study the case, where the regression errors form a long memory moving average process. We establish a finite sample oracle inequality for the Lasso solution. We then show the asymptotic sign consistency in this setup. These results are established in the high dimensional setup (p> n) where p can be increasing exponentially with n. Finally, we show the consistency, n½ --d-consistency of Lasso, along with the oracle property of adaptive Lasso, in the case where p is fixed. Here d is the memory parameter of the stationary error sequence. The performance of Lasso is also analysed in the present setup with a simulation study. The third chapter proposes and investigates the properties of a penalized quantile based estimator for measurement error models. Standard formulations of prediction problems in high dimension regression models assume the availability of fully observed covariates and sub-Gaussian and homogeneous model errors. This makes these methods inapplicable to measurement errors models where covariates are unobservable and observations are possibly non sub-Gaussian and heterogeneous. We propose weighted penalized corrected quantile estimators for the regression parameter vector in linear regression models with additive measurement errors, where unobservable covariates are nonrandom. The proposed estimators forgo the need for the above mentioned model assumptions. We study these estimators in both the fixed dimension and high dimensional sparse setups, in the latter setup, the

  4. A Bayesian Nonparametric Causal Model for Regression Discontinuity Designs

    Science.gov (United States)

    Karabatsos, George; Walker, Stephen G.

    2013-01-01

    The regression discontinuity (RD) design (Thistlewaite & Campbell, 1960; Cook, 2008) provides a framework to identify and estimate causal effects from a non-randomized design. Each subject of a RD design is assigned to the treatment (versus assignment to a non-treatment) whenever her/his observed value of the assignment variable equals or…

  5. On the Variability and Correlation of Surface Ozone and Carbon Monoxide Observed in Hong Kong Using Trajectory and Regression Analyses

    Institute of Scientific and Technical Information of China (English)

    WANG Tijian(王体健); K. S. LAM; C. W. TSANG; S. C. KOT

    2004-01-01

    This paper investigates,the variability and correlation of surface ozone (03) and carbon monoxide (CO) observed at Cape D'Aguilar in Hong Kong from I January 1994 to 31 December 1995.Statistical analysis shows that the average 03 and CO mixing ratios during the two years are 32:k17 ppbv and 305:k191ppbv,respectively.The O3/CO ratio ranges from 0.05 to 0.6 ppbv/ppbv with its frequency peaking at 0.15.The raw dataset is divided into six groups using backward trajectory and cluster analyses.For data assigned to the same trajectory type,three groups are further sorted out based on CO and NOx mixing ratios.The correlation coefficients and slopes of O3/CO for the 18 groups are calculated using linear regression analysis.Final]y,five kinds of air masses with different chemical features are identified:continental background (CB),marine background (MB),regional polluted continental (RPC),perturbed marine (P'M),and local polluted (LP) air masses.Further studies indicate that 03 and CO in the continental and marine background air masses (CB and MB) are positively correlated for the reason that they are well mixed over the long range transport before arriving at the site.The negative correlation between 03 and CO in air mass LP is believed to be associated with heavy anthropogenic influence,which results from the enhancement by local sources as indicated by high CO and NOx and depletion of 03 when mixed with fresh emissions.The positive correlation in the perturbed marine air mass P*M favors the low photochemical production of 03.The negative,correlation found in the regional polluted continental air mass RPC is different from the observations at Oki Island in Japan due to the more complex 03 chemistry at Cape D'Aguilar.

  6. Linear regression model selection using p-values when the model dimension grows

    CERN Document Server

    Pokarowski, Piotr; Teisseyre, Paweł

    2012-01-01

    We consider a new criterion-based approach to model selection in linear regression. Properties of selection criteria based on p-values of a likelihood ratio statistic are studied for families of linear regression models. We prove that such procedures are consistent i.e. the minimal true model is chosen with probability tending to 1 even when the number of models under consideration slowly increases with a sample size. The simulation study indicates that introduced methods perform promisingly when compared with Akaike and Bayesian Information Criteria.

  7. Modelling subject-specific childhood growth using linear mixed-effect models with cubic regression splines.

    Science.gov (United States)

    Grajeda, Laura M; Ivanescu, Andrada; Saito, Mayuko; Crainiceanu, Ciprian; Jaganath, Devan; Gilman, Robert H; Crabtree, Jean E; Kelleher, Dermott; Cabrera, Lilia; Cama, Vitaliano; Checkley, William

    2016-01-01

    Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p linear regression equation for both estimation and prediction of population- and individual-level growth in height. We show that cubic regression splines are superior to linear regression splines for the case of a small number of knots in both estimation and prediction with the full linear mixed effect model (AIC 19,352 vs. 19,598, respectively). While the regression parameters are more complex to interpret in the former, we argue that inference for any problem depends more on the estimated curve or differences in curves rather

  8. A nonparametric dynamic additive regression model for longitudinal data

    DEFF Research Database (Denmark)

    Martinussen, Torben; Scheike, Thomas H.

    2000-01-01

    dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models......dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models...

  9. Linear Regression Model of the Ash Mass Fraction and Electrical Conductivity for Slovenian Honey

    Directory of Open Access Journals (Sweden)

    Mojca Jamnik

    2008-01-01

    Full Text Available Mass fraction of ash is a quality criterion for determining the botanical origin of honey. At present, this parameter is generally being replaced by the measurement of electrical conductivity (κ. The value κ depends on the ash and acid content of honey; the higher their content, the higher the resulting conductivity. A linear regression model for the relationship between ash and electrical conductivity has been established for Slovenian honey by analysing 290 samples of Slovenian honey (including acacia, lime, chestnut, spruce, fir, multifloral and mixed forest honeydew honey. The obtained model differs from the one proposed by the International Honey Commission (IHC in the slope, but not in the section part of the relation formula. Therefore, the Slovenian model is recommended when calculating the ash mass fraction from the results of electrical conductivity in samples of Slovenian honey.

  10. VARIABLE SELECTION BY PSEUDO WAVELETS IN HETEROSCEDASTIC REGRESSION MODELS INVOLVING TIME SERIES

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    A simple but efficient method has been proposed to select variables in heteroscedastic regression models. It is shown that the pseudo empirical wavelet coefficients corresponding to the significant explanatory variables in the regression models are clearly larger than those nonsignificant ones, on the basis of which a procedure is developed to select variables in regression models. The coefficients of the models are also estimated. All estimators are proved to be consistent.

  11. Regression mixture models : Does modeling the covariance between independent variables and latent classes improve the results?

    NARCIS (Netherlands)

    Lamont, A.E.; Vermunt, J.K.; Van Horn, M.L.

    2016-01-01

    Regression mixture models are increasingly used as an exploratory approach to identify heterogeneity in the effects of a predictor on an outcome. In this simulation study, we tested the effects of violating an implicit assumption often made in these models; that is, independent variables in the

  12. GIS-Based Analytical Tools for Transport Planning: Spatial Regression Models for Transportation Demand Forecast

    Directory of Open Access Journals (Sweden)

    Simone Becker Lopes

    2014-04-01

    Full Text Available Considering the importance of spatial issues in transport planning, the main objective of this study was to analyze the results obtained from different approaches of spatial regression models. In the case of spatial autocorrelation, spatial dependence patterns should be incorporated in the models, since that dependence may affect the predictive power of these models. The results obtained with the spatial regression models were also compared with the results of a multiple linear regression model that is typically used in trips generation estimations. The findings support the hypothesis that the inclusion of spatial effects in regression models is important, since the best results were obtained with alternative models (spatial regression models or the ones with spatial variables included. This was observed in a case study carried out in the city of Porto Alegre, in the state of Rio Grande do Sul, Brazil, in the stages of specification and calibration of the models, with two distinct datasets.

  13. First Look at Photometric Reduction via Mixed-Model Regression (Poster abstract)

    Science.gov (United States)

    Dose, E.

    2016-12-01

    (Abstract only) Mixed-model regression is proposed as a new approach to photometric reduction, especially for variable-star photometry in several filters. Mixed-model regression adds to normal multivariate regression certain "random effects": categorical-variable terms that model and extract specific systematic errors such as image-to-image zero-point fluctuations (cirrus effect) or even errors in comp-star catalog magnitudes.

  14. School Attendance Problems and Youth Psychopathology: Structural Cross-Lagged Regression Models in Three Longitudinal Datasets

    Science.gov (United States)

    Wood, Jeffrey J.; Lynne, Sarah D.; Langer, David A.; Wood, Patricia A.; Clark, Shaunna L.; Eddy, J. Mark; Ialongo, Nicholas

    2011-01-01

    This study tests a model of reciprocal influences between absenteeism and youth psychopathology using three longitudinal datasets (Ns= 20745, 2311, and 671). Participants in 1st through 12th grades were interviewed annually or bi-annually. Measures of psychopathology include self-, parent-, and teacher-report questionnaires. Structural cross-lagged regression models were tested. In a nationally representative dataset (Add Health), middle school students with relatively greater absenteeism at study year 1 tended towards increased depression and conduct problems in study year 2, over and above the effects of autoregressive associations and demographic covariates. The opposite direction of effects was found for both middle and high school students. Analyses with two regionally representative datasets were also partially supportive. Longitudinal links were more evident in adolescence than in childhood. PMID:22188462

  15. Introduction to mixed modelling beyond regression and analysis of variance

    CERN Document Server

    Galwey, N W

    2007-01-01

    Mixed modelling is one of the most promising and exciting areas of statistical analysis, enabling more powerful interpretation of data through the recognition of random effects. However, many perceive mixed modelling as an intimidating and specialized technique.

  16. Investigating the Performance of Alternate Regression Weights by Studying All Possible Criteria in Regression Models with a Fixed Set of Predictors

    Science.gov (United States)

    Waller, Niels; Jones, Jeff

    2011-01-01

    We describe methods for assessing all possible criteria (i.e., dependent variables) and subsets of criteria for regression models with a fixed set of predictors, x (where x is an n x 1 vector of independent variables). Our methods build upon the geometry of regression coefficients (hereafter called regression weights) in n-dimensional space. For a…

  17. Externalizing Behaviour for Analysing System Models

    NARCIS (Netherlands)

    Ivanova, Marieta Georgieva; Probst, Christian W.; Hansen, René Rydhof; Kammüller, Florian

    Systems models have recently been introduced to model organisationsandevaluate their vulnerability to threats and especially insiderthreats. Especially for the latter these models are very suitable, since insiders can be assumed to have more knowledge about the attacked organisation than outside

  18. Data to support "Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations & Biological Condition"

    Data.gov (United States)

    U.S. Environmental Protection Agency — Spreadsheets are included here to support the manuscript "Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition". This...

  19. Preference learning with evolutionary Multivariate Adaptive Regression Spline model

    DEFF Research Database (Denmark)

    Abou-Zleikha, Mohamed; Shaker, Noor; Christensen, Mads Græsbøll

    2015-01-01

    for human decision making. Learning models from pairwise preference data is however an NP-hard problem. Therefore, constructing models that can effectively learn such data is a challenging task. Models are usually constructed with accuracy being the most important factor. Another vitally important aspect...... that is usually given less attention is expressiveness, i.e. how easy it is to explain the relationship between the model input and output. Most machine learning techniques are focused either on performance or on expressiveness. This paper employ MARS models which have the advantage of being a powerful method...

  20. Spatial Double Generalized Beta Regression Models: Extensions and Application to Study Quality of Education in Colombia

    Science.gov (United States)

    Cepeda-Cuervo, Edilberto; Núñez-Antón, Vicente

    2013-01-01

    In this article, a proposed Bayesian extension of the generalized beta spatial regression models is applied to the analysis of the quality of education in Colombia. We briefly revise the beta distribution and describe the joint modeling approach for the mean and dispersion parameters in the spatial regression models' setting. Finally, we motivate…

  1. Stochastic Approximation Methods for Latent Regression Item Response Models. Research Report. ETS RR-09-09

    Science.gov (United States)

    von Davier, Matthias; Sinharay, Sandip

    2009-01-01

    This paper presents an application of a stochastic approximation EM-algorithm using a Metropolis-Hastings sampler to estimate the parameters of an item response latent regression model. Latent regression models are extensions of item response theory (IRT) to a 2-level latent variable model in which covariates serve as predictors of the…

  2. Invariant Bayesian Inference in Regression Models that is robust against the Jeffreys-Lindley's paradox

    NARCIS (Netherlands)

    Kleibergen, F.

    2003-01-01

    We obtain the prior and posterior probability of a nested regression model as the Hausdorff-integral of the prior and posterior on the parameters of an encompassing linear regression model over a lower dimensional set that represents the nested model. The invariant expression of the

  3. Invariant Bayesian Inference in Regression Models that is robust against the Jeffreys-Lindleys Paradox

    NARCIS (Netherlands)

    Kleibergen, F.R.

    2004-01-01

    We obtain the prior and posterior probability of a nested regression model as the Hausdorff-integral of the prior and posterior on the parameters of an encompassing linear regression model over a lower-dimensional set that represents the nested model. The Hausdorff-integral is invariant and

  4. A note on the maximum likelihood estimator in the gamma regression model

    Directory of Open Access Journals (Sweden)

    Jerzy P. Rydlewski

    2009-01-01

    Full Text Available This paper considers a nonlinear regression model, in which the dependent variable has the gamma distribution. A model is considered in which the shape parameter of the random variable is the sum of continuous and algebraically independent functions. The paper proves that there is exactly one maximum likelihood estimator for the gamma regression model.

  5. Genetic parameters for various random regression models to describe the weight data of pigs

    NARCIS (Netherlands)

    Huisman, A.E.; Veerkamp, R.F.; Arendonk, van J.A.M.

    2002-01-01

    Various random regression models have been advocated for the fitting of covariance structures. It was suggested that a spline model would fit better to weight data than a random regression model that utilizes orthogonal polynomials. The objective of this study was to investigate which kind of random

  6. Genetic parameters for different random regression models to describe weight data of pigs

    NARCIS (Netherlands)

    Huisman, A.E.; Veerkamp, R.F.; Arendonk, van J.A.M.

    2001-01-01

    Various random regression models have been advocated for the fitting of covariance structures. It was suggested that a spline model would fit better to weight data than a random regression model that utilizes orthogonal polynomials. The objective of this study was to investigate which kind of random

  7. Spatial Double Generalized Beta Regression Models: Extensions and Application to Study Quality of Education in Colombia

    Science.gov (United States)

    Cepeda-Cuervo, Edilberto; Núñez-Antón, Vicente

    2013-01-01

    In this article, a proposed Bayesian extension of the generalized beta spatial regression models is applied to the analysis of the quality of education in Colombia. We briefly revise the beta distribution and describe the joint modeling approach for the mean and dispersion parameters in the spatial regression models' setting. Finally, we…

  8. Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.

    Science.gov (United States)

    Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko

    2016-03-01

    In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. Copyright © 2015 Elsevier Ltd. All rights reserved.

  9. A New Global Regression Analysis Method for the Prediction of Wind Tunnel Model Weight Corrections

    Science.gov (United States)

    Ulbrich, Norbert Manfred; Bridge, Thomas M.; Amaya, Max A.

    2014-01-01

    A new global regression analysis method is discussed that predicts wind tunnel model weight corrections for strain-gage balance loads during a wind tunnel test. The method determines corrections by combining "wind-on" model attitude measurements with least squares estimates of the model weight and center of gravity coordinates that are obtained from "wind-off" data points. The method treats the least squares fit of the model weight separate from the fit of the center of gravity coordinates. Therefore, it performs two fits of "wind- off" data points and uses the least squares estimator of the model weight as an input for the fit of the center of gravity coordinates. Explicit equations for the least squares estimators of the weight and center of gravity coordinates are derived that simplify the implementation of the method in the data system software of a wind tunnel. In addition, recommendations for sets of "wind-off" data points are made that take typical model support system constraints into account. Explicit equations of the confidence intervals on the model weight and center of gravity coordinates and two different error analyses of the model weight prediction are also discussed in the appendices of the paper.

  10. Modeling by regression for laser cutting of quartz crystal

    Institute of Scientific and Technical Information of China (English)

    2000-01-01

    Presents the theoretical models built by analysis of the mechanism of laser cutting of quartz crystal and re gression of test results for the laser cutting of quartz crystal, and comparative analysis of calculation errors for these models, and concludes with test results that these models comprehensively reflect the physical features of laser cutting of quartz crystal and satisfy the industrial production requirements, and they can be used to select right parameters for improvement of productivity and quality and saving of energy.

  11. Teacher training through the Regression Model in foreign language education

    Directory of Open Access Journals (Sweden)

    Jesús García Laborda

    2011-01-01

    Full Text Available In the last few years, Spain has seen dramatic changes in its educational system. Many of them have been rejected by most teachers after their implementation (LOGSE while others have found potential drawbacks even before starting operating (LOCE, LOE. To face these changes, schools need well qualified instructors. Given this need, and also considering that, although all the schools want the best teachers but, as teachers’ salaries are regulated by the state, few schools can actually offer incentives to their teachers and consequently schools never have the instructors they wish. Apart from this, state schools have a fixed salary for their teachers and private institutions offer no additional bonuses for things like additional training or diplomas (for example, masters or post-degree courses and, therefore, teachers are rarely interested in pursuing any further studies in methodology or any other related fields such as education or applied linguistics. Although many teachers acknowledge their love to teaching, the current situation in schools (school violence, bad salaries, depression, social desprestige, legal changes and so has made the teaching job one of the most complicated and undevoted in Spain. It is not unusual to have a couple of instructors ill due to depression and other psychological sicknesses. This paper deals with the development and implementation of a training program based on regressive visualizations of one’s experience both as a teacher as well as a learner.

  12. Misspecified poisson regression models for large-scale registry data

    DEFF Research Database (Denmark)

    Grøn, Randi; Gerds, Thomas A.; Andersen, Per K.

    2016-01-01

    working models that are then likely misspecified. To support and improve conclusions drawn from such models, we discuss methods for sensitivity analysis, for estimation of average exposure effects using aggregated data, and a semi-parametric bootstrap method to obtain robust standard errors. The methods...

  13. CONSISTENCY OF LS ESTIMATOR IN SIMPLE LINEAR EV REGRESSION MODELS

    Institute of Scientific and Technical Information of China (English)

    Liu Jixue; Chen Xiru

    2005-01-01

    Consistency of LS estimate of simple linear EV model is studied. It is shown that under some common assumptions of the model, both weak and strong consistency of the estimate are equivalent but it is not so for quadratic-mean consistency.

  14. A Noncentral "t" Regression Model for Meta-Analysis

    Science.gov (United States)

    Camilli, Gregory; de la Torre, Jimmy; Chiu, Chia-Yi

    2010-01-01

    In this article, three multilevel models for meta-analysis are examined. Hedges and Olkin suggested that effect sizes follow a noncentral "t" distribution and proposed several approximate methods. Raudenbush and Bryk further refined this model; however, this procedure is based on a normal approximation. In the current research literature, this…

  15. A Negative Binomial Regression Model for Accuracy Tests

    Science.gov (United States)

    Hung, Lai-Fa

    2012-01-01

    Rasch used a Poisson model to analyze errors and speed in reading tests. An important property of the Poisson distribution is that the mean and variance are equal. However, in social science research, it is very common for the variance to be greater than the mean (i.e., the data are overdispersed). This study embeds the Rasch model within an…

  16. Additive Intensity Regression Models in Corporate Default Analysis

    DEFF Research Database (Denmark)

    Lando, David; Medhat, Mamdouh; Nielsen, Mads Stenbo

    2013-01-01

    We consider additive intensity (Aalen) models as an alternative to the multiplicative intensity (Cox) models for analyzing the default risk of a sample of rated, nonfinancial U.S. firms. The setting allows for estimating and testing the significance of time-varying effects. We use a variety of mo...

  17. A generalized exponential time series regression model for electricity prices

    DEFF Research Database (Denmark)

    Haldrup, Niels; Knapik, Oskar; Proietti, Tomasso

    We consider the issue of modeling and forecasting daily electricity spot prices on the Nord Pool Elspot power market. We propose a method that can handle seasonal and non-seasonal persistence by modelling the price series as a generalized exponential process. As the presence of spikes can distort...... the estimation of the dynamic structure of the series we consider an iterative estimation strategy which, conditional on a set of parameter estimates, clears the spikes using a data cleaning algorithm, and reestimates the parameters using the cleaned data so as to robustify the estimates. Conditional...... on the estimated model, the best linear predictor is constructed. Our modeling approach provides good fit within sample and outperforms competing benchmark predictors in terms of forecasting accuracy. We also find that building separate models for each hour of the day and averaging the forecasts is a better...

  18. Thermodynamic Analysis of Simple Gas Turbine Cycle with Multiple Regression Modelling and Optimization

    Directory of Open Access Journals (Sweden)

    Abdul Ghafoor Memon

    2014-03-01

    Full Text Available In this study, thermodynamic and statistical analyses were performed on a gas turbine system, to assess the impact of some important operating parameters like CIT (Compressor Inlet Temperature, PR (Pressure Ratio and TIT (Turbine Inlet Temperature on its performance characteristics such as net power output, energy efficiency, exergy efficiency and fuel consumption. Each performance characteristic was enunciated as a function of operating parameters, followed by a parametric study and optimization. The results showed that the performance characteristics increase with an increase in the TIT and a decrease in the CIT, except fuel consumption which behaves oppositely. The net power output and efficiencies increase with the PR up to certain initial values and then start to decrease, whereas the fuel consumption always decreases with an increase in the PR. The results of exergy analysis showed the combustion chamber as a major contributor to the exergy destruction, followed by stack gas. Subsequently, multiple regression models were developed to correlate each of the response variables (performance characteristic with the predictor variables (operating parameters. The regression model equations showed a significant statistical relationship between the predictor and response variables.

  19. Using the classical linear regression model in analysis of the dependences of conveyor belt life

    Directory of Open Access Journals (Sweden)

    Miriam Andrejiová

    2013-12-01

    Full Text Available The paper deals with the classical linear regression model of the dependence of conveyor belt life on some selected parameters: thickness of paint layer, width and length of the belt, conveyor speed and quantity of transported material. The first part of the article is about regression model design, point and interval estimation of parameters, verification of statistical significance of the model, and about the parameters of the proposed regression model. The second part of the article deals with identification of influential and extreme values that can have an impact on estimation of regression model parameters. The third part focuses on assumptions of the classical regression model, i.e. on verification of independence assumptions, normality and homoscedasticity of residuals.

  20. Climate variations and salmonellosis transmission in Adelaide, South Australia: a comparison between regression models

    Science.gov (United States)

    Zhang, Ying; Bi, Peng; Hiller, Janet

    2008-01-01

    This is the first study to identify appropriate regression models for the association between climate variation and salmonellosis transmission. A comparison between different regression models was conducted using surveillance data in Adelaide, South Australia. By using notified salmonellosis cases and climatic variables from the Adelaide metropolitan area over the period 1990-2003, four regression methods were examined: standard Poisson regression, autoregressive adjusted Poisson regression, multiple linear regression, and a seasonal autoregressive integrated moving average (SARIMA) model. Notified salmonellosis cases in 2004 were used to test the forecasting ability of the four models. Parameter estimation, goodness-of-fit and forecasting ability of the four regression models were compared. Temperatures occurring 2 weeks prior to cases were positively associated with cases of salmonellosis. Rainfall was also inversely related to the number of cases. The comparison of the goodness-of-fit and forecasting ability suggest that the SARIMA model is better than the other three regression models. Temperature and rainfall may be used as climatic predictors of salmonellosis cases in regions with climatic characteristics similar to those of Adelaide. The SARIMA model could, thus, be adopted to quantify the relationship between climate variations and salmonellosis transmission.

  1. An assessment of coefficient accuracy in linear regression models with spatially varying coefficients

    Science.gov (United States)

    Wheeler, David C.; Calder, Catherine A.

    2007-06-01

    The realization in the statistical and geographical sciences that a relationship between an explanatory variable and a response variable in a linear regression model is not always constant across a study area has led to the development of regression models that allow for spatially varying coefficients. Two competing models of this type are geographically weighted regression (GWR) and Bayesian regression models with spatially varying coefficient processes (SVCP). In the application of these spatially varying coefficient models, marginal inference on the regression coefficient spatial processes is typically of primary interest. In light of this fact, there is a need to assess the validity of such marginal inferences, since these inferences may be misleading in the presence of explanatory variable collinearity. In this paper, we present the results of a simulation study designed to evaluate the sensitivity of the spatially varying coefficients in the competing models to various levels of collinearity. The simulation study results show that the Bayesian regression model produces more accurate inferences on the regression coefficients than does GWR. In addition, the Bayesian regression model is overall fairly robust in terms of marginal coefficient inference to moderate levels of collinearity, and degrades less substantially than GWR with strong collinearity.

  2. Moment-bases estimation of smooth transition regression models with endogenous variables

    NARCIS (Netherlands)

    W.D. Areosa (Waldyr Dutra); M.J. McAleer (Michael); M.C. Medeiros (Marcelo)

    2008-01-01

    textabstractNonlinear regression models have been widely used in practice for a variety of time series and cross-section datasets. For purposes of analyzing univariate and multivariate time series data, in particular, Smooth Transition Regression (STR) models have been shown to be very useful for re

  3. Photovoltaic System Modeling. Uncertainty and Sensitivity Analyses

    Energy Technology Data Exchange (ETDEWEB)

    Hansen, Clifford W. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Martin, Curtis E. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2015-08-01

    We report an uncertainty and sensitivity analysis for modeling AC energy from ph otovoltaic systems . Output from a PV system is predicted by a sequence of models. We quantify u ncertainty i n the output of each model using empirical distribution s of each model's residuals. We propagate uncertainty through the sequence of models by sampli ng these distributions to obtain a n empirical distribution of a PV system's output. We consider models that: (1) translate measured global horizontal, direct and global diffuse irradiance to plane - of - array irradiance; (2) estimate effective irradiance; (3) predict cell temperature; (4) estimate DC voltage, current and power ; (5) reduce DC power for losses due to inefficient maximum power point tracking or mismatch among modules; and (6) convert DC to AC power . O ur analysis consider s a notional PV system com prising an array of FirstSolar FS - 387 modules and a 250 kW AC inverter ; we use measured irradiance and weather at Albuquerque, NM. We found the uncertainty in PV syste m output to be relatively small, on the order of 1% for daily energy. We found that unce rtainty in the models for POA irradiance and effective irradiance to be the dominant contributors to uncertainty in predicted daily energy. Our analysis indicates that efforts to reduce the uncertainty in PV system output predictions may yield the greatest improvements by focusing on the POA and effective irradiance models.

  4. Covariance Functions and Random Regression Models in the ...

    African Journals Online (AJOL)

    ARC-IRENE

    modelled to account for heterogeneity of variance by AY. ... Results suggest that selection for CW could be effective and that RRM could be .... permanent environmental effects; and εij is the temporary environmental effect or measurement error. .... (1999), however, obtained correlations that were variable as low as 0.23 ...

  5. APPLICATION OF RANDOM REGRESSION MODELS FOR GROWTH TRAITS OF NELLORE CATTLE IN BRAZIL

    Directory of Open Access Journals (Sweden)

    Wéverton José Lima fONSECA

    2016-11-01

    Full Text Available Thepurpose of thisreview isto show the increase in number of researches on covariance components and genetic evaluation using random regression models (RRM for growth traits of Nellore cattle. Random regression models (RRM, also known as infinite-dimension models have been used to estimate variance components and genetic parameters for weight of beef cattle. In addition, those models are a standard alternative for genetic analyses of longitudinal data, however, the availibility of computational resources for performing genetic evaluations widely is an obstacle. Traits related to animal growth are adopted as selection criteria in beef cattle breeding programs, because the remuneration of cattle breeders is made based on the weight of carcasses. In recent years, RRM have been adopted as standard procedure in relation to the analysis of longitudinal data in animal breeding. Objetivou-se com esta revisão de literatura compilar informações sob a avaliação genética utilizando modelos de regressão aleatória (MRA para características de crescimento em bovinos da raça Nelore. Os modelos de regressão aleatória (MRA, denominados modelos de dimensão infinita, estão sendo utilizados para estimar os componentes de variância e parâmetros genéticos de pesos de bovinos de corte. Os MRA têm se tornado uma alternativa padrão para análises genéticas de dados longitudinais, onde um dos entraves destes modelos está relacionado à disponibilidade de memória e tempo computacional para a realização de avaliações genéticas em larga escala. Características relacionadas ao crescimento animal são adotadas em programas de melhoramento genético de bovinos de corte como critérios de seleção, pelo fato da remuneração, dos produtores, ser feita com base no peso das carcaças. Nos últimos anos, os MRA têm sido adotados como procedimento padrão para análise de dados longitudinais em melhoramento genético animal.

  6. Bayesian Uncertainty Analyses Via Deterministic Model

    Science.gov (United States)

    Krzysztofowicz, R.

    2001-05-01

    Rational decision-making requires that the total uncertainty about a variate of interest (a predictand) be quantified in terms of a probability distribution, conditional on all available information and knowledge. Suppose the state-of-knowledge is embodied in a deterministic model, which is imperfect and outputs only an estimate of the predictand. Fundamentals are presented of three Bayesian approaches to producing a probability distribution of the predictand via any deterministic model. The Bayesian Processor of Output (BPO) quantifies the total uncertainty in terms of a posterior distribution, conditional on model output. The Bayesian Processor of Ensemble (BPE) quantifies the total uncertainty in terms of a posterior distribution, conditional on an ensemble of model output. The Bayesian Forecasting System (BFS) decomposes the total uncertainty into input uncertainty and model uncertainty, which are characterized independently and then integrated into a predictive distribution.

  7. Quantile regression

    CERN Document Server

    Hao, Lingxin

    2007-01-01

    Quantile Regression, the first book of Hao and Naiman's two-book series, establishes the seldom recognized link between inequality studies and quantile regression models. Though separate methodological literature exists for each subject, the authors seek to explore the natural connections between this increasingly sought-after tool and research topics in the social sciences. Quantile regression as a method does not rely on assumptions as restrictive as those for the classical linear regression; though more traditional models such as least squares linear regression are more widely utilized, Hao

  8. Analysing Social Epidemics by Delayed Stochastic Models

    Directory of Open Access Journals (Sweden)

    Francisco-José Santonja

    2012-01-01

    Full Text Available We investigate the dynamics of a delayed stochastic mathematical model to understand the evolution of the alcohol consumption in Spain. Sufficient condition for stability in probability of the equilibrium point of the dynamic model with aftereffect and stochastic perturbations is obtained via Kolmanovskii and Shaikhet general method of Lyapunov functionals construction. We conclude that alcohol consumption in Spain will be constant (with stability in time with around 36.47% of nonconsumers, 62.94% of nonrisk consumers, and 0.59% of risk consumers. This approach allows us to emphasize the possibilities of the dynamical models in order to study human behaviour.

  9. Linking Simple Economic Theory Models and the Cointegrated Vector AutoRegressive Model

    DEFF Research Database (Denmark)

    Møller, Niels Framroze

    This paper attempts to clarify the connection between simple economic theory models and the approach of the Cointegrated Vector-Auto-Regressive model (CVAR). By considering (stylized) examples of simple static equilibrium models, it is illustrated in detail, how the theoretical model and its stru....... Further fundamental extensions and advances to more sophisticated theory models, such as those related to dynamics and expectations (in the structural relations) are left for future papers......This paper attempts to clarify the connection between simple economic theory models and the approach of the Cointegrated Vector-Auto-Regressive model (CVAR). By considering (stylized) examples of simple static equilibrium models, it is illustrated in detail, how the theoretical model and its......, it is demonstrated how other controversial hypotheses such as Rational Expectations can be formulated directly as restrictions on the CVAR-parameters. A simple example of a "Neoclassical synthetic" AS-AD model is also formulated. Finally, the partial- general equilibrium distinction is related to the CVAR as well...

  10. Asymptotic Normality of LS Estimate in Simple Linear EV Regression Model

    Institute of Scientific and Technical Information of China (English)

    Jixue LIU

    2006-01-01

    Though EV model is theoretically more appropriate for applications in which measurement errors exist, people are still more inclined to use the ordinary regression models and the traditional LS method owing to the difficulties of statistical inference and computation. So it is meaningful to study the performance of LS estimate in EV model.In this article we obtain general conditions guaranteeing the asymptotic normality of the estimates of regression coefficients in the linear EV model. It is noticeable that the result is in some way different from the corresponding result in the ordinary regression model.

  11. Local asymptotic behavior of regression splines for marginal semiparametric models with longitudinal data

    Institute of Scientific and Technical Information of China (English)

    2009-01-01

    In this paper, we study the local asymptotic behavior of the regression spline estimator in the framework of marginal semiparametric model. Similarly to Zhu, Fung and He (2008), we give explicit expression for the asymptotic bias of regression spline estimator for nonparametric function f. Our results also show that the asymptotic bias of the regression spline estimator does not depend on the working covariance matrix, which distinguishes the regression splines from the smoothing splines and the seemingly unrelated kernel. To understand the local bias result of the regression spline estimator, we show that the regression spline estimator can be obtained iteratively by applying the standard weighted least squares regression spline estimator to pseudo-observations. At each iteration, the bias of the estimator is unchanged and only the variance is updated.

  12. Predicting Antitumor Activity of Peptides by Consensus of Regression Models Trained on a Small Data Sample

    Directory of Open Access Journals (Sweden)

    Ivanka Jerić

    2011-11-01

    Full Text Available Predicting antitumor activity of compounds using regression models trained on a small number of compounds with measured biological activity is an ill-posed inverse problem. Yet, it occurs very often within the academic community. To counteract, up to some extent, overfitting problems caused by a small training data, we propose to use consensus of six regression models for prediction of biological activity of virtual library of compounds. The QSAR descriptors of 22 compounds related to the opioid growth factor (OGF, Tyr-Gly-Gly-Phe-Met with known antitumor activity were used to train regression models: the feed-forward artificial neural network, the k-nearest neighbor, sparseness constrained linear regression, the linear and nonlinear (with polynomial and Gaussian kernel support vector machine. Regression models were applied on a virtual library of 429 compounds that resulted in six lists with candidate compounds ranked by predicted antitumor activity. The highly ranked candidate compounds were synthesized, characterized and tested for an antiproliferative activity. Some of prepared peptides showed more pronounced activity compared with the native OGF; however, they were less active than highly ranked compounds selected previously by the radial basis function support vector machine (RBF SVM regression model. The ill-posedness of the related inverse problem causes unstable behavior of trained regression models on test data. These results point to high complexity of prediction based on the regression models trained on a small data sample.

  13. Modelling, analyses and design of switching converters

    Science.gov (United States)

    Cuk, S. M.; Middlebrook, R. D.

    1978-01-01

    A state-space averaging method for modelling switching dc-to-dc converters for both continuous and discontinuous conduction mode is developed. In each case the starting point is the unified state-space representation, and the end result is a complete linear circuit model, for each conduction mode, which correctly represents all essential features, namely, the input, output, and transfer properties (static dc as well as dynamic ac small-signal). While the method is generally applicable to any switching converter, it is extensively illustrated for the three common power stages (buck, boost, and buck-boost). The results for these converters are then easily tabulated owing to the fixed equivalent circuit topology of their canonical circuit model. The insights that emerge from the general state-space modelling approach lead to the design of new converter topologies through the study of generic properties of the cascade connection of basic buck and boost converters.

  14. A Vector Auto Regression Model Applied to Real Estate Development Investment: A Statistic Analysis

    National Research Council Canada - National Science Library

    Liu, Fengyun; Matsuno, Shuji; Malekian, Reza; Yu, Jin; Li, Zhixiong

    2016-01-01

    .... The above theoretical model is empirically evidenced with VAR (Vector Auto Regression) methodology. A panel VAR model shows that land leasing and real estate price appreciation positively affect local government general fiscal revenue...

  15. Reduction of the curvature of a class of nonlinear regression models

    Institute of Scientific and Technical Information of China (English)

    吴翊; 易东云

    2000-01-01

    It is proved that the curvature of nonlinear model can be reduced to zero by increasing measured data for a class of nonlinear regression models. The result is important to actual problem and has obtained satisfying effect on data fusing.

  16. Multivariable Linear Regression Model for Promotional Forecasting:The Coca Cola - Morrisons Case

    OpenAIRE

    Zheng, Yiwei/Y

    2009-01-01

    This paper describes a promotional forecasting model, built by linear regression module in Microsoft Excel. It intends to provide quick and reliable forecasts with a moderate credit and to assist the CPFR between the Coca Cola Enterprises (CCE) and the Morrisons. The model is derived from previous researches and literature review on CPFR, promotion, forecasting and modelling. It is designed as a multivariable linear regression model, which involves several promotional mix as variables includi...

  17. Comparative analysis of regression and artificial neural network models for wind speed prediction

    Science.gov (United States)

    Bilgili, Mehmet; Sahin, Besir

    2010-11-01

    In this study, wind speed was modeled by linear regression (LR), nonlinear regression (NLR) and artificial neural network (ANN) methods. A three-layer feedforward artificial neural network structure was constructed and a backpropagation algorithm was used for the training of ANNs. To get a successful simulation, firstly, the correlation coefficients between all of the meteorological variables (wind speed, ambient temperature, atmospheric pressure, relative humidity and rainfall) were calculated taking two variables in turn for each calculation. All independent variables were added to the simple regression model. Then, the method of stepwise multiple regression was applied for the selection of the “best” regression equation (model). Thus, the best independent variables were selected for the LR and NLR models and also used in the input layer of the ANN. The results obtained by all methods were compared to each other. Finally, the ANN method was found to provide better performance than the LR and NLR methods.

  18. Based on Partial Least-squares Regression to Build up and Analyze the Model of Rice Evapotranspiration

    Institute of Scientific and Technical Information of China (English)

    2003-01-01

    During the course of calculating the rice evapotranspiration using weather factors,we often find that some independent variables have multiple correlation.The phenomena can lead to the traditional multivariate regression model which based on least square method distortion.And the stability of the model will be lost.The model will be built based on partial least-square regression in the paper,through applying the idea of main component analyze and typical correlation analyze,the writer picks up some component from original material.Thus,the writer builds up the model of rice evapotranspiration to solve the multiple correlation among the independent variables (some weather factors).At last,the writer analyses the model in some parts,and gains the satisfied result.

  19. Prediction of the result in race walking using regularized regression models

    Directory of Open Access Journals (Sweden)

    Krzysztof Przednowek

    2013-04-01

    Full Text Available The following paper presents the use of regularized linear models as tools to optimize training process. The models were calculated by using data collected from race-walkers' training events. The models used predict the outcomes over a 3 km race and following a prescribed training plan. The material included a total of 122 training patterns made by 21 players. The methods of analysis include: classical model of OLS regression, ridge regression, LASSO regression and elastic net regression. In order to compare and choose the best method a cross-validation of the extit{leave-one-out} was used. All models were calculated using R language with additional packages. The best model was determined by the LASSO method which generates an error of about 26 seconds. The method has simplified the structure of the model by eliminating 5 out of 18 predictors.

  20. Logistic regression models for polymorphic and antagonistic pleiotropic gene action on human aging and longevity

    DEFF Research Database (Denmark)

    Tan, Qihua; Bathum, L; Christiansen, L

    2003-01-01

    In this paper, we apply logistic regression models to measure genetic association with human survival for highly polymorphic and pleiotropic genes. By modelling genotype frequency as a function of age, we introduce a logistic regression model with polytomous responses to handle the polymorphic...... situation. Genotype and allele-based parameterization can be used to investigate the modes of gene action and to reduce the number of parameters, so that the power is increased while the amount of multiple testing minimized. A binomial logistic regression model with fractional polynomials is used to capture...

  1. STATISTICAL INFERENCES FOR VARYING-COEFFICINT MODELS BASED ON LOCALLY WEIGHTED REGRESSION TECHNIQUE

    Institute of Scientific and Technical Information of China (English)

    梅长林; 张文修; 梁怡

    2001-01-01

    Some fundamental issues on statistical inferences relating to varying-coefficient regression models are addressed and studied. An exact testing procedure is proposed for checking the goodness of fit of a varying-coefficient model fired by the locally weighted regression technique versus an ordinary linear regression model. Also, an appropriate statistic for testing variation of model parameters over the locations where the observations are collected is constructed and a formal testing approach which is essential to exploring spatial non-stationarity in geography science is suggested.

  2. Comparing Methodologies for Developing an Early Warning System: Classification and Regression Tree Model versus Logistic Regression. REL 2015-077

    Science.gov (United States)

    Koon, Sharon; Petscher, Yaacov

    2015-01-01

    The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…

  3. Aboveground biomass and carbon stocks modelling using non-linear regression model

    Science.gov (United States)

    Ain Mohd Zaki, Nurul; Abd Latif, Zulkiflee; Nazip Suratman, Mohd; Zainee Zainal, Mohd

    2016-06-01

    Aboveground biomass (AGB) is an important source of uncertainty in the carbon estimation for the tropical forest due to the variation biodiversity of species and the complex structure of tropical rain forest. Nevertheless, the tropical rainforest holds the most extensive forest in the world with the vast diversity of tree with layered canopies. With the usage of optical sensor integrate with empirical models is a common way to assess the AGB. Using the regression, the linkage between remote sensing and a biophysical parameter of the forest may be made. Therefore, this paper exemplifies the accuracy of non-linear regression equation of quadratic function to estimate the AGB and carbon stocks for the tropical lowland Dipterocarp forest of Ayer Hitam forest reserve, Selangor. The main aim of this investigation is to obtain the relationship between biophysical parameter field plots with the remotely-sensed data using nonlinear regression model. The result showed that there is a good relationship between crown projection area (CPA) and carbon stocks (CS) with Pearson Correlation (p < 0.01), the coefficient of correlation (r) is 0.671. The study concluded that the integration of Worldview-3 imagery with the canopy height model (CHM) raster based LiDAR were useful in order to quantify the AGB and carbon stocks for a larger sample area of the lowland Dipterocarp forest.

  4. Mapping soil organic carbon stocks by robust geostatistical and boosted regression models

    Science.gov (United States)

    Nussbaum, Madlene; Papritz, Andreas; Baltensweiler, Andri; Walthert, Lorenz

    2013-04-01

    Carbon (C) sequestration in forests offsets greenhouse gas emissions. Therefore, quantifying C stocks and fluxes in forest ecosystems is of interest for greenhouse gas reporting according to the Kyoto protocol. In Switzerland, the National Forest Inventory offers comprehensive data to quantify the aboveground forest biomass and its change in time. Estimating stocks of soil organic C (SOC) in forests is more difficult because the variables needed to quantify stocks vary strongly in space and precise quantification of some of them is very costly. Based on data from 1'033 plots we modeled SOC stocks of the organic layer and the mineral soil to depths of 30 cm and 100 cm for the Swiss forested area. For the statistical modeling a broad range of covariates were available: Climate data (e. g. precipitation, temperature), two elevation models (resolutions 25 and 2 m) with respective terrain attributes and spectral reflectance data representing vegetation. Furthermore, the main mapping units of an overview soil map and a coarse scale geological map were used to coarsely represent the parent material of the soils. The selection of important covariates for SOC stocks modeling out of a large set was a major challenge for the statistical modeling. We used two approaches to deal with this problem: 1) A robust restricted maximum likelihood method to fit linear regression model with spatially correlated errors. The large number of covariates was first reduced by LASSO (Least Absolute Shrinkage and Selection Operator) and then further narrowed down to a parsimonious set of important covariates by cross-validation of the robustly fitted model. To account for nonlinear dependencies of the response on the covariates interaction terms of the latter were included in model if this improved the fit. 2) A boosted structured regression model with componentwise linear least squares or componentwise smoothing splines as base procedures. The selection of important covariates was done by the

  5. An analysis of first-time blood donors return behaviour using regression models.

    Science.gov (United States)

    Kheiri, S; Alibeigi, Z

    2015-08-01

    Blood products have a vital role in saving many patients' lives. The aim of this study was to analyse blood donor return behaviour. Using a cross-sectional follow-up design of 5-year duration, 864 first-time donors who had donated blood were selected using a systematic sampling. The behaviours of donors via three response variables, return to donation, frequency of return to donation and the time interval between donations, were analysed based on logistic regression, negative binomial regression and Cox's shared frailty model for recurrent events respectively. Successful return to donation rated at 49·1% and the deferral rate was 13·3%. There was a significant reverse relationship between the frequency of return to donation and the time interval between donations. Sex, body weight and job had an effect on return to donation; weight and frequency of donation during the first year had a direct effect on the total frequency of donations. Age, weight and job had a significant effect on the time intervals between donations. Aging decreases the chances of return to donation and increases the time interval between donations. Body weight affects the three response variables, i.e. the higher the weight, the more the chances of return to donation and the shorter the time interval between donations. There is a positive correlation between the frequency of donations in the first year and the total number of return to donations. Also, the shorter the time interval between donations is, the higher the frequency of donations. © 2015 British Blood Transfusion Society.

  6. Modelling and Analyses of Embedded Systems Design

    DEFF Research Database (Denmark)

    Brekling, Aske Wiid

    We present the MoVES languages: a language with which embedded systems can be specified at a stage in the development process where an application is identified and should be mapped to an execution platform (potentially multi- core). We give a formal model for MoVES that captures and gives......-based verification is a promising approach for assisting developers of embedded systems. We provide examples of system verifications that, in size and complexity, point in the direction of industrially-interesting systems....

  7. Linear Multivariable Regression Models for Prediction of Eddy Dissipation Rate from Available Meteorological Data

    Science.gov (United States)

    MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.

    2005-01-01

    Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.

  8. Quantifying spatial disparities in neonatal mortality using a structured additive regression model.

    Directory of Open Access Journals (Sweden)

    Lawrence N Kazembe

    Full Text Available BACKGROUND: Neonatal mortality contributes a large proportion towards early childhood mortality in developing countries, with considerable geographical variation at small areas within countries. METHODS: A geo-additive logistic regression model is proposed for quantifying small-scale geographical variation in neonatal mortality, and to estimate risk factors of neonatal mortality. Random effects are introduced to capture spatial correlation and heterogeneity. The spatial correlation can be modelled using the Markov random fields (MRF when data is aggregated, while the two dimensional P-splines apply when exact locations are available, whereas the unstructured spatial effects are assigned an independent Gaussian prior. Socio-economic and bio-demographic factors which may affect the risk of neonatal mortality are simultaneously estimated as fixed effects and as nonlinear effects for continuous covariates. The smooth effects of continuous covariates are modelled by second-order random walk priors. Modelling and inference use the empirical Bayesian approach via penalized likelihood technique. The methodology is applied to analyse the likelihood of neonatal deaths, using data from the 2000 Malawi demographic and health survey. The spatial effects are quantified through MRF and two dimensional P-splines priors. RESULTS: Findings indicate that both fixed and spatial effects are associated with neonatal mortality. CONCLUSIONS: Our study, therefore, suggests that the challenge to reduce neonatal mortality goes beyond addressing individual factors, but also require to understanding unmeasured covariates for potential effective interventions.

  9. Combining an additive and tree-based regression model simultaneously: STIMA

    NARCIS (Netherlands)

    Dusseldorp, E.; Conversano, C.; Os, B.J. van

    2010-01-01

    Additive models and tree-based regression models are two main classes of statistical models used to predict the scores on a continuous response variable. It is known that additive models become very complex in the presence of higher order interaction effects, whereas some tree-based models, such as

  10. Analyzing Multilevel Data: Comparing Findings from Hierarchical Linear Modeling and Ordinary Least Squares Regression

    Science.gov (United States)

    Rocconi, Louis M.

    2013-01-01

    This study examined the differing conclusions one may come to depending upon the type of analysis chosen, hierarchical linear modeling or ordinary least squares (OLS) regression. To illustrate this point, this study examined the influences of seniors' self-reported critical thinking abilities three ways: (1) an OLS regression with the student…

  11. Mechanisms of Developmental Regression in Autism and the Broader Phenotype: A Neural Network Modeling Approach

    Science.gov (United States)

    Thomas, Michael S. C.; Knowland, Victoria C. P.; Karmiloff-Smith, Annette

    2011-01-01

    Loss of previously established behaviors in early childhood constitutes a markedly atypical developmental trajectory. It is found almost uniquely in autism and its cause is currently unknown (Baird et al., 2008). We present an artificial neural network model of developmental regression, exploring the hypothesis that regression is caused by…

  12. Mechanisms of Developmental Regression in Autism and the Broader Phenotype: A Neural Network Modeling Approach

    Science.gov (United States)

    Thomas, Michael S. C.; Knowland, Victoria C. P.; Karmiloff-Smith, Annette

    2011-01-01

    Loss of previously established behaviors in early childhood constitutes a markedly atypical developmental trajectory. It is found almost uniquely in autism and its cause is currently unknown (Baird et al., 2008). We present an artificial neural network model of developmental regression, exploring the hypothesis that regression is caused by…

  13. CONFIDENCE REGIONS IN TERMS OF STATISTICAL CURVATURE FOR AR(q) NONLINEAR REGRESSION MODELS

    Institute of Scientific and Technical Information of China (English)

    刘应安; 韦博成

    2004-01-01

    This paper constructs a set of confidence regions of parameters in terms of statistical curvatures for AR(q) nonlinear regression models. The geometric frameworks are proposed for the model. Then several confidence regions for parameters and parameter subsets in terms of statistical curvatures are given based on the likelihood ratio statistics and score statistics. Several previous results, such as [1] and [2] are extended to AR(q)nonlinear regression models.

  14. Predictive market segmentation model: An application of logistic regression model and CHAID procedure

    Directory of Open Access Journals (Sweden)

    Soldić-Aleksić Jasna

    2009-01-01

    Full Text Available Market segmentation presents one of the key concepts of the modern marketing. The main goal of market segmentation is focused on creating groups (segments of customers that have similar characteristics, needs, wishes and/or similar behavior regarding the purchase of concrete product/service. Companies can create specific marketing plan for each of these segments and therefore gain short or long term competitive advantage on the market. Depending on the concrete marketing goal, different segmentation schemes and techniques may be applied. This paper presents a predictive market segmentation model based on the application of logistic regression model and CHAID analysis. The logistic regression model was used for the purpose of variables selection (from the initial pool of eleven variables which are statistically significant for explaining the dependent variable. Selected variables were afterwards included in the CHAID procedure that generated the predictive market segmentation model. The model results are presented on the concrete empirical example in the following form: summary model results, CHAID tree, Gain chart, Index chart, risk and classification tables.

  15. Comparing uncertainty resulting from two-step and global regression procedures applied to microbial growth models.

    Science.gov (United States)

    Martino, K G; Marks, B P

    2007-12-01

    Two different microbial modeling procedures were compared and validated against independent data for Listeria monocytogenes growth. The most generally used method is two consecutive regressions: growth parameters are estimated from a primary regression of microbial counts, and a secondary regression relates the growth parameters to experimental conditions. A global regression is an alternative method in which the primary and secondary models are combined, giving a direct relationship between experimental factors and microbial counts. The Gompertz equation was the primary model, and a response surface model was the secondary model. Independent data from meat and poultry products were used to validate the modeling procedures. The global regression yielded the lower standard errors of calibration, 0.95 log CFU/ml for aerobic and 1.21 log CFU/ml for anaerobic conditions. The two-step procedure yielded errors of 1.35 log CFU/ml for aerobic and 1.62 log CFU/ ml for anaerobic conditions. For food products, the global regression was more robust than the two-step procedure for 65% of the cases studied. The robustness index for the global regression ranged from 0.27 (performed better than expected) to 2.60. For the two-step method, the robustness index ranged from 0.42 to 3.88. The predictions were overestimated (fail safe) in more than 50% of the cases using the global regression and in more than 70% of the cases using the two-step regression. Overall, the global regression performed better than the two-step procedure for this specific application.

  16. Consequences of kriging and land use regression for PM2.5 predictions in epidemiologic analyses: insights into spatial variability using high-resolution satellite data.

    Science.gov (United States)

    Alexeeff, Stacey E; Schwartz, Joel; Kloog, Itai; Chudnovsky, Alexandra; Koutrakis, Petros; Coull, Brent A

    2015-01-01

    Many epidemiological studies use predicted air pollution exposures as surrogates for true air pollution levels. These predicted exposures contain exposure measurement error, yet simulation studies have typically found negligible bias in resulting health effect estimates. However, previous studies typically assumed a statistical spatial model for air pollution exposure, which may be oversimplified. We address this shortcoming by assuming a realistic, complex exposure surface derived from fine-scale (1 km × 1 km) remote-sensing satellite data. Using simulation, we evaluate the accuracy of epidemiological health effect estimates in linear and logistic regression when using spatial air pollution predictions from kriging and land use regression models. We examined chronic (long-term) and acute (short-term) exposure to air pollution. Results varied substantially across different scenarios. Exposure models with low out-of-sample R(2) yielded severe biases in the health effect estimates of some models, ranging from 60% upward bias to 70% downward bias. One land use regression exposure model with >0.9 out-of-sample R(2) yielded upward biases up to 13% for acute health effect estimates. Almost all models drastically underestimated the SEs. Land use regression models performed better in chronic effect simulations. These results can help researchers when interpreting health effect estimates in these types of studies.

  17. A modified Lee-Carter model for analysing short-base-period data.

    Science.gov (United States)

    Zhao, Bojuan Barbara

    2012-03-01

    This paper introduces a new modified Lee-Carter model for analysing short-base-period mortality data, for which the original Lee-Carter model produces severely fluctuating predicted age-specific mortality. Approximating the unknown parameters in the modified model by linearized cubic splines and other additive functions, the model can be simplified into a logistic regression when fitted to binomial data. The expected death rate estimated from the modified model is smooth, not only over ages but also over years. The analysis of mortality data in China (2000-08) demonstrates the advantages of the new model over existing models.

  18. Multilevel modeling was a convenient alternative to common regression designs in longitudinal suicide research.

    Science.gov (United States)

    Antretter, Elfi; Dunkel, Dirk; Osvath, Peter; Voros, Viktor; Fekete, Sandor; Haring, Christian

    2006-06-01

    The prospective investigation of repetitive nonfatal suicidal behavior is associated with two methodological problems. Due to the commonly used definitions of nonfatal suicidal behavior, clinical samples usually consist of patients with a considerable between-person variability. Second, repeated nonfatal suicidal episodes of the same subjects are likely to be correlated. We examined three regression techniques to comparatively evaluate their efficiency in addressing the given methodological problems. Repeated episodes of nonfatal suicidal behavior were assessed in two independent patient samples during a 2-year follow-up period. The first regression design modeled repetitive nonfatal suicidal behavior as a summary measure. The second regression model treated repeated episodes of the same subject as independent events. The third regression model represented a hierarchical linear model. The estimated mean effects of the first model were likely to be nonrepresentative for a considerable part of the study subjects. The second regression design overemphasized the impact of the predictor variables. The hierarchical linear model most appropriately accounted for the heterogeneity of the samples and the correlated data structure. The nonhierarchical regression designs did not provide appropriate statistical models for the prospective investigation of repetitive nonfatal suicidal behavior. Multilevel modeling provides a convenient alternative.

  19. QSAR modeling of antimalarial activity of urea derivatives using genetic algorithm–multiple linear regressions

    Directory of Open Access Journals (Sweden)

    Abolghasem Beheshti

    2016-05-01

    Full Text Available A quantitative structure–activity relationship (QSAR was performed to analyze antimalarial activities of 68 urea derivatives using multiple linear regressions (MLR. QSAR analyses were performed on the available 68 IC50 oral data based on theoretical molecular descriptors. A suitable set of molecular descriptors were calculated to represent the molecular structures of compounds, such as constitutional, topological, geometrical, electrostatic and quantum-chemical descriptors. The important descriptors were selected with the aid of the genetic algorithm (GA method. The obtained model was validated using leave-one-out (LOO cross-validation; external test set and Y-randomization test. The root mean square errors (RMSE of the training set, and the test set for GA–MLR model were calculated to be 0.314 and 0.486, the square of correlation coefficients (R2 were obtained 0.801 and 0.803, respectively. Results showed that the predictive ability of the model was satisfactory, and it can be used for designing similar group of antimalarial compounds.

  20. Estimating carbon and showing impacts of drought using satellite data in regression-tree models

    Science.gov (United States)

    Boyte, Stephen; Wylie, Bruce K.; Howard, Danny; Dahal, Devendra; Gilmanov, Tagir G.

    2018-01-01

    Integrating spatially explicit biogeophysical and remotely sensed data into regression-tree models enables the spatial extrapolation of training data over large geographic spaces, allowing a better understanding of broad-scale ecosystem processes. The current study presents annual gross primary production (GPP) and annual ecosystem respiration (RE) for 2000–2013 in several short-statured vegetation types using carbon flux data from towers that are located strategically across the conterminous United States (CONUS). We calculate carbon fluxes (annual net ecosystem production [NEP]) for each year in our study period, which includes 2012 when drought and higher-than-normal temperatures influence vegetation productivity in large parts of the study area. We present and analyse carbon flux dynamics in the CONUS to better understand how drought affects GPP, RE, and NEP. Model accuracy metrics show strong correlation coefficients (r) (r ≥ 94%) between training and estimated data for both GPP and RE. Overall, average annual GPP, RE, and NEP are relatively constant throughout the study period except during 2012 when almost 60% less carbon is sequestered than normal. These results allow us to conclude that this modelling method effectively estimates carbon dynamics through time and allows the exploration of impacts of meteorological anomalies and vegetation types on carbon dynamics.

  1. Effective factors contraceptive use by logistic regression model in Tehran, 1996

    Directory of Open Access Journals (Sweden)

    Ramezani F

    1999-07-01

    Full Text Available Despite unwillingness to fertility, about 30% of couples do not use any kind of contraception and this will lead to unwanted pregnancy. In this clinical trial study, 4177 subjects who had at least one alive child, and delivered in one of the 12 university hospitals in Tehran were recruited. This study was conducted in 1996. The questionnaire included some questions about contraceptive use, their attitudes about unwantedness or wantedness of their current pregnancies. Data were analysed using a Logistic Regrassion Model. Results showed that 20.3% of those who had no fertility intention, did not use any kind of contraception methods, 41.1% of the subjects who were using a contraception method before pregnancy, had got pregnant unwantedly. Based on Logistic Regression Model; age, education, previous familiarity of women with contraception methods and husband's education were the most significant factors in contraceptive use. Subjects who were 20 years old and less or 35 years old and more and illeterate subjects were at higher risk for unuse of contraception methods. This risk was not related to the gender of their children that suggests a positive change in their perspectives towards sex and the number of children. It is suggested that health politicians choose an appropriate model to enhance the literacy, education and counseling for the correct usage of contraceptives and prevention of unwanted pregnancy.

  2. Regression modeling of streamflow, baseflow, and runoff using geographic information systems.

    Science.gov (United States)

    Zhu, Yuanhong; Day, Rick L

    2009-02-01

    Regression models for predicting total streamflow (TSF), baseflow (TBF), and storm runoff (TRO) are needed for water resource planning and management. This study used 54 streams with >20 years of streamflow gaging station records during the period October 1971 to September 2001 in Pennsylvania and partitioned TSF into TBF and TRO. TBF was considered a surrogate of groundwater recharge for basins. Regression models for predicting basin-wide TSF, TBF, and TRO were developed under three scenarios that varied in regression variables used for model development. Regression variables representing basin geomorphological, geological, soil, and climatic characteristics were estimated using geographic information systems. All regression models for TSF, TBF, and TRO had R(2) values >0.94 and reasonable prediction errors. The two best TSF models developed under scenarios 1 and 2 had similar absolute prediction errors. The same was true for the two best TBF models. Therefore, any one of the two best TSF and TBF models could be used for respective flow prediction depending on variable availability. The TRO model developed under scenario 1 had smaller absolute prediction errors than that developed under scenario 2. Simplified Area-alone models developed under scenario 3 might be used when variables for using best models are not available, but had lower R(2) values and higher or more variable prediction errors than the best models.

  3. Comparison of land-use regression models between Great Britain and the Netherlands.

    NARCIS (Netherlands)

    Vienneau, D.; de Hoogh, K.; Beelen, R.M.J.; Fischer, P.; Hoek, G.; Briggs, D.

    2010-01-01

    Land-use regression models have increasingly been applied for air pollution mapping at typically the city level. Though models generally predict spatial variability well, the structure of models differs widely between studies. The observed differences in the models may be due to artefacts of data an

  4. Parameter-elevation Regressions on Independent Slopes Model Monthly Climate Data for the Continental United States.

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This dataset was created using the PRISM (Parameter-elevation Regressions on Independent Slopes Model) climate mapping system, developed by Dr. Christopher Daly,...

  5. Rank Set Sampling in Improving the Estimates of Simple Regression Model

    Directory of Open Access Journals (Sweden)

    M Iqbal Jeelani

    2015-04-01

    Full Text Available In this paper Rank set sampling (RSS is introduced with a view of increasing the efficiency of estimates of Simple regression model. Regression model is considered with respect to samples taken from sampling techniques like Simple random sampling (SRS, Systematic sampling (SYS and Rank set sampling (RSS. It is found that R2 and Adj R2 obtained from regression model based on Rank set sample is higher than rest of two sampling schemes. Similarly Root mean square error, p-values, coefficient of variation are much lower in Rank set based regression model, also under validation technique (Jackknifing there is consistency in the measure of R2, Adj R2 and RMSE in case of RSS as compared to SRS and SYS. Results are supported with an empirical study involving a real data set generated of Pinus Wallichiana taken from block Langate of district Kupwara. 

  6. Efficient Estimation for Semiparametric Varying Coefficient Partially Linear Regression Models with Current Status Data

    Institute of Scientific and Technical Information of China (English)

    Tao Hu; Heng-jian Cui; Xing-wei Tong

    2009-01-01

    This article considers a semiparametric varying-coefficient partially linear regression model with current status data. The semiparametric varying-coefficient partially linear regression model which is a gen-eralization of the partially linear regression model and varying-coefficient regression model that allows one to explore the possibly nonlinear effect of a certain covariate on the response variable. A Sieve maximum likelihood estimation method is proposed and the asymptotic properties of the proposed estimators are discussed. Under some mild conditions, the estimators are shown to be strongly consistent. The convergence rate of the estima-tor for the unknown smooth function is obtained and the estimator for the unknown parameter is shown to be asymptotically efficient and normally distributed. Simulation studies are conducted to examine the small-sample properties of the proposed estimates and a real dataset is used to illustrate our approach.

  7. Estimation of pyrethroid pesticide intake using regression modeling of food groups based on composite dietary samples

    Science.gov (United States)

    Population-based estimates of pesticide intake are needed to characterize exposure for particular demographic groups based on their dietary behaviors. Regression modeling performed on measurements of selected pesticides in composited duplicate diet samples allowed (1) estimation ...

  8. Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition

    Science.gov (United States)

    Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on ...

  9. Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition

    Science.gov (United States)

    Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on ...

  10. Improving sub-pixel imperviousness change prediction by ensembling heterogeneous non-linear regression models

    Science.gov (United States)

    Drzewiecki, Wojciech

    2016-12-01

    In this work nine non-linear regression models were compared for sub-pixel impervious surface area mapping from Landsat images. The comparison was done in three study areas both for accuracy of imperviousness coverage evaluation in individual points in time and accuracy of imperviousness change assessment. The performance of individual machine learning algorithms (Cubist, Random Forest, stochastic gradient boosting of regression trees, k-nearest neighbors regression, random k-nearest neighbors regression, Multivariate Adaptive Regression Splines, averaged neural networks, and support vector machines with polynomial and radial kernels) was also compared with the performance of heterogeneous model ensembles constructed from the best models trained using particular techniques. The results proved that in case of sub-pixel evaluation the most accurate prediction of change may not necessarily be based on the most accurate individual assessments. When single methods are considered, based on obtained results Cubist algorithm may be advised for Landsat based mapping of imperviousness for single dates. However, Random Forest may be endorsed when the most reliable evaluation of imperviousness change is the primary goal. It gave lower accuracies for individual assessments, but better prediction of change due to more correlated errors of individual predictions. Heterogeneous model ensembles performed for individual time points assessments at least as well as the best individual models. In case of imperviousness change assessment the ensembles always outperformed single model approaches. It means that it is possible to improve the accuracy of sub-pixel imperviousness change assessment using ensembles of heterogeneous non-linear regression models.

  11. Evaluation of accuracy of linear regression models in predicting urban stormwater discharge characteristics.

    Science.gov (United States)

    Madarang, Krish J; Kang, Joo-Hyon

    2014-06-01

    Stormwater runoff has been identified as a source of pollution for the environment, especially for receiving waters. In order to quantify and manage the impacts of stormwater runoff on the environment, predictive models and mathematical models have been developed. Predictive tools such as regression models have been widely used to predict stormwater discharge characteristics. Storm event characteristics, such as antecedent dry days (ADD), have been related to response variables, such as pollutant loads and concentrations. However it has been a controversial issue among many studies to consider ADD as an important variable in predicting stormwater discharge characteristics. In this study, we examined the accuracy of general linear regression models in predicting discharge characteristics of roadway runoff. A total of 17 storm events were monitored in two highway segments, located in Gwangju, Korea. Data from the monitoring were used to calibrate United States Environmental Protection Agency's Storm Water Management Model (SWMM). The calibrated SWMM was simulated for 55 storm events, and the results of total suspended solid (TSS) discharge loads and event mean concentrations (EMC) were extracted. From these data, linear regression models were developed. R(2) and p-values of the regression of ADD for both TSS loads and EMCs were investigated. Results showed that pollutant loads were better predicted than pollutant EMC in the multiple regression models. Regression may not provide the true effect of site-specific characteristics, due to uncertainty in the data.

  12. The empirical likelihood goodness-of-fit test for regression model

    Institute of Scientific and Technical Information of China (English)

    Li-xing ZHU; Yong-song QIN; Wang-li XU

    2007-01-01

    Goodness-of-fit test for regression modes has received much attention in literature. In this paper, empirical likelihood (EL) goodness-of-fit tests for regression models including classical parametric and autoregressive (AR) time series models are proposed. Unlike the existing locally smoothing and globally smoothing methodologies, the new method has the advantage that the tests are self-scale invariant and that the asymptotic null distribution is chi-squared. Simulations are carried out to illustrate the methodology.

  13. On asymptotics of t-type regression estimation in multiple linear model

    Institute of Scientific and Technical Information of China (English)

    2004-01-01

    We consider a robust estimator (t-type regression estimator) of multiple linear regression model by maximizing marginal likelihood of a scaled t-type error t-distribution.The marginal likelihood can also be applied to the de-correlated response when the withinsubject correlation can be consistently estimated from an initial estimate of the model based on the independent working assumption. This paper shows that such a t-type estimator is consistent.

  14. Developing and testing a global-scale regression model to quantify mean annual streamflow

    Science.gov (United States)

    Barbarossa, Valerio; Huijbregts, Mark A. J.; Hendriks, A. Jan; Beusen, Arthur H. W.; Clavreul, Julie; King, Henry; Schipper, Aafke M.

    2017-01-01

    Quantifying mean annual flow of rivers (MAF) at ungauged sites is essential for assessments of global water supply, ecosystem integrity and water footprints. MAF can be quantified with spatially explicit process-based models, which might be overly time-consuming and data-intensive for this purpose, or with empirical regression models that predict MAF based on climate and catchment characteristics. Yet, regression models have mostly been developed at a regional scale and the extent to which they can be extrapolated to other regions is not known. In this study, we developed a global-scale regression model for MAF based on a dataset unprecedented in size, using observations of discharge and catchment characteristics from 1885 catchments worldwide, measuring between 2 and 106 km2. In addition, we compared the performance of the regression model with the predictive ability of the spatially explicit global hydrological model PCR-GLOBWB by comparing results from both models to independent measurements. We obtained a regression model explaining 89% of the variance in MAF based on catchment area and catchment averaged mean annual precipitation and air temperature, slope and elevation. The regression model performed better than PCR-GLOBWB for the prediction of MAF, as root-mean-square error (RMSE) values were lower (0.29-0.38 compared to 0.49-0.57) and the modified index of agreement (d) was higher (0.80-0.83 compared to 0.72-0.75). Our regression model can be applied globally to estimate MAF at any point of the river network, thus providing a feasible alternative to spatially explicit process-based global hydrological models.

  15. Improving validation methods for molecular diagnostics: application of Bland-Altman, Deming and simple linear regression analyses in assay comparison and evaluation for next-generation sequencing.

    Science.gov (United States)

    Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L

    2017-07-26

    A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R(2)), using R(2) as the primary metric of assay agreement. However, the use of R(2) alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  16. Regression Model Term Selection for the Analysis of Strain-Gage Balance Calibration Data

    Science.gov (United States)

    Ulbrich, Norbert Manfred; Volden, Thomas R.

    2010-01-01

    The paper discusses the selection of regression model terms for the analysis of wind tunnel strain-gage balance calibration data. Different function class combinations are presented that may be used to analyze calibration data using either a non-iterative or an iterative method. The role of the intercept term in a regression model of calibration data is reviewed. In addition, useful algorithms and metrics originating from linear algebra and statistics are recommended that will help an analyst (i) to identify and avoid both linear and near-linear dependencies between regression model terms and (ii) to make sure that the selected regression model of the calibration data uses only statistically significant terms. Three different tests are suggested that may be used to objectively assess the predictive capability of the final regression model of the calibration data. These tests use both the original data points and regression model independent confirmation points. Finally, data from a simplified manual calibration of the Ames MK40 balance is used to illustrate the application of some of the metrics and tests to a realistic calibration data set.

  17. A hybrid model using logistic regression and wavelet transformation to detect traffic incidents

    Directory of Open Access Journals (Sweden)

    Shaurya Agarwal

    2016-07-01

    Full Text Available This research paper investigates a hybrid model using logistic regression with a wavelet-based feature extraction for detecting traffic incidents. A logistic regression model is suitable when the outcome can take only a limited number of values. For traffic incident detection, the outcome is limited to only two values, the presence or absence of an incident. The logistic regression model used in this study is a generalized linear model (GLM with a binomial response and a logit link function. This paper presents a framework to use logistic regression and wavelet-based feature extraction for traffic incident detection. It investigates the effect of preprocessing data on the performance of incident detection models. Results of this study indicate that logistic regression along with wavelet based feature extraction can be used effectively for incident detection by balancing the incident detection rate and the false alarm rate according to need. Logistic regression on raw data resulted in a maximum detection rate of 95.4% at the cost of 14.5% false alarm rate. Whereas the hybrid model achieved a maximum detection rate of 98.78% at the expense of 6.5% false alarm rate. Results indicate that the proposed approach is practical and efficient; with future improvements in the proposed technique, it will make an effective tool for traffic incident detection.

  18. OPLS statistical model versus linear regression to assess sonographic predictors of stroke prognosis.

    Science.gov (United States)

    Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi

    2012-01-01

    The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.

  19. Use of empirical likelihood to calibrate auxiliary information in partly linear monotone regression models.

    Science.gov (United States)

    Chen, Baojiang; Qin, Jing

    2014-05-10

    In statistical analysis, a regression model is needed if one is interested in finding the relationship between a response variable and covariates. When the response depends on the covariate, then it may also depend on the function of this covariate. If one has no knowledge of this functional form but expect for monotonic increasing or decreasing, then the isotonic regression model is preferable. Estimation of parameters for isotonic regression models is based on the pool-adjacent-violators algorithm (PAVA), where the monotonicity constraints are built in. With missing data, people often employ the augmented estimating method to improve estimation efficiency by incorporating auxiliary information through a working regression model. However, under the framework of the isotonic regression model, the PAVA does not work as the monotonicity constraints are violated. In this paper, we develop an empirical likelihood-based method for isotonic regression model to incorporate the auxiliary information. Because the monotonicity constraints still hold, the PAVA can be used for parameter estimation. Simulation studies demonstrate that the proposed method can yield more efficient estimates, and in some situations, the efficiency improvement is substantial. We apply this method to a dementia study.

  20. Using the Logistic Regression model in supporting decisions of establishing marketing strategies

    Directory of Open Access Journals (Sweden)

    Cristinel CONSTANTIN

    2015-12-01

    Full Text Available This paper is about an instrumental research regarding the using of Logistic Regression model for data analysis in marketing research. The decision makers inside different organisation need relevant information to support their decisions regarding the marketing strategies. The data provided by marketing research could be computed in various ways but the multivariate data analysis models can enhance the utility of the information. Among these models we can find the Logistic Regression model, which is used for dichotomous variables. Our research is based on explanation the utility of this model and interpretation of the resulted information in order to help practitioners and researchers to use it in their future investigations

  1. Regression-based air temperature spatial prediction models: an example from Poland

    Directory of Open Access Journals (Sweden)

    Mariusz Szymanowski

    2013-10-01

    Full Text Available A Geographically Weighted Regression ? Kriging (GWRK algorithm, based on the local Geographically Weighted Regression (GWR, is applied for spatial prediction of air temperature in Poland. Hengl's decision tree for selecting a suitable prediction model is extended for varying spatial relationships between the air temperature and environmental predictors with an assumption of existing environmental dependence of analyzed temperature variables. The procedure includes the potential choice of a local GWR instead of the global Multiple Linear Regression (MLR method for modeling the deterministic part of spatial variation, which is usual in the standard regression (residual kriging model (MLRK. The analysis encompassed: testing for environmental correlation, selecting an appropriate regression model, testing for spatial autocorrelation of the residual component, and validating the prediction accuracy. The proposed approach was performed for 69 air temperature cases, with time aggregation ranging from daily to annual average air temperatures. The results show that, irrespective of the level of data aggregation, the spatial distribution of temperature is better fitted by local models, and hence is the reason for choosing a GWR instead of the MLR for all variables analyzed. Additionally, in most cases (78% there is spatial autocorrelation in the residuals of the deterministic part, which suggests that the GWR model should be extended by ordinary kriging of residuals to the GWRK form. The decision tree used in this paper can be considered as universal as it encompasses either spatially varying relationships of modeled and explanatory variables or random process that can be modeled by a stochastic extension of the regression model (residual kriging. Moreover, for all cases analyzed, the selection of a method based on the local regression model (GWRK or GWR does not depend on the data aggregation level, showing the potential versatility of the technique.

  2. The spatial prediction of landslide susceptibility applying artificial neural network and logistic regression models: A case study of Inje, Korea

    Science.gov (United States)

    Saro, Lee; Woo, Jeon Seong; Kwan-Young, Oh; Moung-Jin, Lee

    2016-02-01

    The aim of this study is to predict landslide susceptibility caused using the spatial analysis by the application of a statistical methodology based on the GIS. Logistic regression models along with artificial neutral network were applied and validated to analyze landslide susceptibility in Inje, Korea. Landslide occurrence area in the study were identified based on interpretations of optical remote sensing data (Aerial photographs) followed by field surveys. A spatial database considering forest, geophysical, soil and topographic data, was built on the study area using the Geographical Information System (GIS). These factors were analysed using artificial neural network (ANN) and logistic regression models to generate a landslide susceptibility map. The study validates the landslide susceptibility map by comparing them with landslide occurrence areas. The locations of landslide occurrence were divided randomly into a training set (50%) and a test set (50%). A training set analyse the landslide susceptibility map using the artificial network along with logistic regression models, and a test set was retained to validate the prediction map. The validation results revealed that the artificial neural network model (with an accuracy of 80.10%) was better at predicting landslides than the logistic regression model (with an accuracy of 77.05%). Of the weights used in the artificial neural network model, `slope' yielded the highest weight value (1.330), and `aspect' yielded the lowest value (1.000). This research applied two statistical analysis methods in a GIS and compared their results. Based on the findings, we were able to derive a more effective method for analyzing landslide susceptibility.

  3. Intuitionistic Fuzzy Weighted Linear Regression Model with Fuzzy Entropy under Linear Restrictions.

    Science.gov (United States)

    Kumar, Gaurav; Bajaj, Rakesh Kumar

    2014-01-01

    In fuzzy set theory, it is well known that a triangular fuzzy number can be uniquely determined through its position and entropies. In the present communication, we extend this concept on triangular intuitionistic fuzzy number for its one-to-one correspondence with its position and entropies. Using the concept of fuzzy entropy the estimators of the intuitionistic fuzzy regression coefficients have been estimated in the unrestricted regression model. An intuitionistic fuzzy weighted linear regression (IFWLR) model with some restrictions in the form of prior information has been considered. Further, the estimators of regression coefficients have been obtained with the help of fuzzy entropy for the restricted/unrestricted IFWLR model by assigning some weights in the distance function.

  4. Regression Test-Selection Technique Using Component Model Based Modification: Code to Test Traceability

    Directory of Open Access Journals (Sweden)

    Ahmad A. Saifan

    2016-04-01

    Full Text Available Regression testing is a safeguarding procedure to validate and verify adapted software, and guarantee that no errors have emerged. However, regression testing is very costly when testers need to re-execute all the test cases against the modified software. This paper proposes a new approach in regression test selection domain. The approach is based on meta-models (test models and structured models to decrease the number of test cases to be used in the regression testing process. The approach has been evaluated using three Java applications. To measure the effectiveness of the proposed approach, we compare the results using the re-test to all approaches. The results have shown that our approach reduces the size of test suite without negative impact on the effectiveness of the fault detection.

  5. Exploring nonlinear relations: models of clinical decision making by regression with optimal scaling.

    Science.gov (United States)

    Hartmann, Armin; Van Der Kooij, Anita J; Zeeck, Almut

    2009-07-01

    In explorative regression studies, linear models are often applied without questioning the linearity of the relations between the predictor variables and the dependent variable, or linear relations are taken as an approximation. In this study, the method of regression with optimal scaling transformations is demonstrated. This method does not require predefined nonlinear functions and results in easy-to-interpret transformations that will show the form of the relations. The method is illustrated using data from a German multicenter project on the indication criteria for inpatient or day clinic psychotherapy treatment. The indication criteria to include in the regression model were selected with the Lasso, which is a tool for predictor selection that overcomes the disadvantages of stepwise regression methods. The resulting prediction model indicates that treatment status is (approximately) linearly related to some criteria and nonlinearly related to others.

  6. Modeling personalized head-related impulse response using support vector regression

    Institute of Scientific and Technical Information of China (English)

    HUANG Qing-hua; FANG Yong

    2009-01-01

    A new customization approach based on support vector regression (SVR) is proposed to obtain individual headrelated impulse response (HRIR) without complex measurement and special equipment. Principal component analysis (PCA) is first applied to obtain a few principal components and corresponding weight vectors correlated with individual anthropometric parameters. Then the weight vectors act as output of the nonlinear regression model. Some measured anthropometric parameters are selected as input of the model according to the correlation coefficients between the parameters and the weight vectors. After the regression model is learned from the training data, the individual HRIR can be predicted based on the measured anthropometric parameters. Compared with a back-propagation neural network (BPNN) for nonlinear regression,better generalization and prediction performance for small training samples can be obtained using the proposed PCA-SVR algorithm.

  7. RAINFALL-RUNOFF MODELING IN THE TURKEY RIVER USING NUMERICAL AND REGRESSION METHODS

    Directory of Open Access Journals (Sweden)

    J. Behmanesh

    2015-01-01

    Full Text Available Modeling rainfall-runoff relationships in a watershed have an important role in water resources engineering. Researchers have used numerical models for modeling rainfall-runoff process in the watershed because of non-linear nature of rainfall-runoff relationship, vast data requirement and physical models hardness. The main object of this research was to model the rainfall-runoff relationship at the Turkey River in Mississippi. In this research, two numerical models including ANN and ANFIS were used to model the rainfall-runoff process and the best model was chosen. Also, by using SPSS software, the regression equations were developed and then the best equation was selected from regression analysis. The obtained results from the numerical and regression modeling were compared each other. The comparison showed that the model obtained from ANFIS modeling was better than the model obtained from regression modeling. The results also stated that the Turkey river flow rate had a logical relationship with one and two days ago flow rate and one, two and three days ago rainfall values.

  8. RAINFALL-RUNOFF MODELING IN THE TURKEY RIVER USING NUMERICAL AND REGRESSION METHODS

    Directory of Open Access Journals (Sweden)

    J. Behmanesh

    2015-03-01

    Full Text Available Modeling rainfall-runoff relationships in a watershed have an important role in water resources engineering. Researchers have used numerical models for modeling rainfall-runoff process in the watershed because of non-linear nature of rainfall-runoff relationship, vast data requirement and physical models hardness. The main object of this research was to model the rainfall-runoff relationship at the Turkey River in Mississippi. In this research, two numerical models including ANN and ANFIS were used to model the rainfall-runoff process and the best model was chosen. Also, by using SPSS software, the regression equations were developed and then the best equation was selected from regression analysis. The obtained results from the numerical and regression modeling were compared each other. The comparison showed that the model obtained from ANFIS modeling was better than the model obtained from regression modeling. The results also stated that the Turkey river flow rate had a logical relationship with one and two days ago flow rate and one, two and three days ago rainfall values.

  9. Electricity demand loads modeling using AutoRegressive Moving Average (ARMA) models

    Energy Technology Data Exchange (ETDEWEB)

    Pappas, S.S. [Department of Information and Communication Systems Engineering, University of the Aegean, Karlovassi, 83 200 Samos (Greece); Ekonomou, L.; Chatzarakis, G.E. [Department of Electrical Engineering Educators, ASPETE - School of Pedagogical and Technological Education, N. Heraklion, 141 21 Athens (Greece); Karamousantas, D.C. [Technological Educational Institute of Kalamata, Antikalamos, 24100 Kalamata (Greece); Katsikas, S.K. [Department of Technology Education and Digital Systems, University of Piraeus, 150 Androutsou Srt., 18 532 Piraeus (Greece); Liatsis, P. [Division of Electrical Electronic and Information Engineering, School of Engineering and Mathematical Sciences, Information and Biomedical Engineering Centre, City University, Northampton Square, London EC1V 0HB (United Kingdom)

    2008-09-15

    This study addresses the problem of modeling the electricity demand loads in Greece. The provided actual load data is deseasonilized and an AutoRegressive Moving Average (ARMA) model is fitted on the data off-line, using the Akaike Corrected Information Criterion (AICC). The developed model fits the data in a successful manner. Difficulties occur when the provided data includes noise or errors and also when an on-line/adaptive modeling is required. In both cases and under the assumption that the provided data can be represented by an ARMA model, simultaneous order and parameter estimation of ARMA models under the presence of noise are performed. The produced results indicate that the proposed method, which is based on the multi-model partitioning theory, tackles successfully the studied problem. For validation purposes the produced results are compared with three other established order selection criteria, namely AICC, Akaike's Information Criterion (AIC) and Schwarz's Bayesian Information Criterion (BIC). The developed model could be useful in the studies that concern electricity consumption and electricity prices forecasts. (author)

  10. Predicting dissolved oxygen concentration using kernel regression modeling approaches with nonlinear hydro-chemical data.

    Science.gov (United States)

    Singh, Kunwar P; Gupta, Shikha; Rai, Premanjali

    2014-05-01

    Kernel function-based regression models were constructed and applied to a nonlinear hydro-chemical dataset pertaining to surface water for predicting the dissolved oxygen levels. Initial features were selected using nonlinear approach. Nonlinearity in the data was tested using BDS statistics, which revealed the data with nonlinear structure. Kernel ridge regression, kernel principal component regression, kernel partial least squares regression, and support vector regression models were developed using the Gaussian kernel function and their generalization and predictive abilities were compared in terms of several statistical parameters. Model parameters were optimized using the cross-validation procedure. The proposed kernel regression methods successfully captured the nonlinear features of the original data by transforming it to a high dimensional feature space using the kernel function. Performance of all the kernel-based modeling methods used here were comparable both in terms of predictive and generalization abilities. Values of the performance criteria parameters suggested for the adequacy of the constructed models to fit the nonlinear data and their good predictive capabilities.

  11. Mixed-effects Gaussian process functional regression models with application to dose-response curve prediction.

    Science.gov (United States)

    Shi, J Q; Wang, B; Will, E J; West, R M

    2012-11-20

    We propose a new semiparametric model for functional regression analysis, combining a parametric mixed-effects model with a nonparametric Gaussian process regression model, namely a mixed-effects Gaussian process functional regression model. The parametric component can provide explanatory information between the response and the covariates, whereas the nonparametric component can add nonlinearity. We can model the mean and covariance structures simultaneously, combining the information borrowed from other subjects with the information collected from each individual subject. We apply the model to dose-response curves that describe changes in the responses of subjects for differing levels of the dose of a drug or agent and have a wide application in many areas. We illustrate the method for the management of renal anaemia. An individual dose-response curve is improved when more information is included by this mechanism from the subject/patient over time, enabling a patient-specific treatment regime.

  12. An Alumni Oriented Approach to Sport Management Curriculum Design Using Performance Ratings and a Regression Model.

    Science.gov (United States)

    Ulrich, David; Parkhouse, Bonnie L.

    1982-01-01

    An alumni-based model is proposed as an alternative to sports management curriculum design procedures. The model relies on the assessment of curriculum by sport management alumni and uses performance ratings of employers and measures of satisfaction by alumni in a regression model to identify curriculum leading to increased work performance and…

  13. Penalized regression techniques for modeling relationships between metabolites and tomato taste attributes

    NARCIS (Netherlands)

    Menendez, P.; Eilers, P.; Tikunov, Y.M.; Bovy, A.G.; Eeuwijk, van F.

    2012-01-01

    The search for models which link tomato taste attributes to their metabolic profiling, is a main challenge within the breeding programs that aim to enhance tomato flavor. In this paper, we compared such models calculated by the traditional statistical approach, stepwise regression, with models obtai

  14. MULTIPLE LOGISTIC REGRESSION MODEL TO PREDICT RISK FACTORS OF ORAL HEALTH DISEASES

    Directory of Open Access Journals (Sweden)

    Parameshwar V. Pandit

    2012-06-01

    Full Text Available Purpose: To analysis the dependence of oral health diseases i.e. dental caries and periodontal disease on considering the number of risk factors through the applications of logistic regression model. Method: The cross sectional study involves a systematic random sample of 1760 permanent dentition aged between 18-40 years in Dharwad, Karnataka, India. Dharwad is situated in North Karnataka. The mean age was 34.26±7.28. The risk factors of dental caries and periodontal disease were established by multiple logistic regression model using SPSS statistical software. Results: The factors like frequency of brushing, timings of cleaning teeth and type of toothpastes are significant persistent predictors of dental caries and periodontal disease. The log likelihood value of full model is –1013.1364 and Akaike’s Information Criterion (AIC is 1.1752 as compared to reduced regression model are -1019.8106 and 1.1748 respectively for dental caries. But, the log likelihood value of full model is –1085.7876 and AIC is 1.2577 followed by reduced regression model are -1019.8106 and 1.1748 respectively for periodontal disease. The area under Receiver Operating Characteristic (ROC curve for the dental caries is 0.7509 (full model and 0.7447 (reduced model; the ROC for the periodontal disease is 0.6128 (full model and 0.5821 (reduced model. Conclusions: The frequency of brushing, timings of cleaning teeth and type of toothpastes are main signifi cant risk factors of dental caries and periodontal disease. The fitting performance of reduced logistic regression model is slightly a better fit as compared to full logistic regression model in identifying the these risk factors for both dichotomous dental caries and periodontal disease.

  15. Structured Additive Regression Models: An R Interface to BayesX

    Directory of Open Access Journals (Sweden)

    Nikolaus Umlauf

    2015-02-01

    Full Text Available Structured additive regression (STAR models provide a flexible framework for model- ing possible nonlinear effects of covariates: They contain the well established frameworks of generalized linear models and generalized additive models as special cases but also allow a wider class of effects, e.g., for geographical or spatio-temporal data, allowing for specification of complex and realistic models. BayesX is standalone software package providing software for fitting general class of STAR models. Based on a comprehensive open-source regression toolbox written in C++, BayesX uses Bayesian inference for estimating STAR models based on Markov chain Monte Carlo simulation techniques, a mixed model representation of STAR models, or stepwise regression techniques combining penalized least squares estimation with model selection. BayesX not only covers models for responses from univariate exponential families, but also models from less-standard regression situations such as models for multi-categorical responses with either ordered or unordered categories, continuous time survival data, or continuous time multi-state models. This paper presents a new fully interactive R interface to BayesX: the R package R2BayesX. With the new package, STAR models can be conveniently specified using Rs formula language (with some extended terms, fitted using the BayesX binary, represented in R with objects of suitable classes, and finally printed/summarized/plotted. This makes BayesX much more accessible to users familiar with R and adds extensive graphics capabilities for visualizing fitted STAR models. Furthermore, R2BayesX complements the already impressive capabilities for semiparametric regression in R by a comprehensive toolbox comprising in particular more complex response types and alternative inferential procedures such as simulation-based Bayesian inference.

  16. LINEAR LAYER AND GENERALIZED REGRESSION COMPUTATIONAL INTELLIGENCE MODELS FOR PREDICTING SHELF LIFE OF PROCESSED CHEESE

    Directory of Open Access Journals (Sweden)

    S. Goyal

    2012-03-01

    Full Text Available This paper highlights the significance of computational intelligence models for predicting shelf life of processed cheese stored at 7-8 g.C. Linear Layer and Generalized Regression models were developed with input parameters: Soluble nitrogen, pH, Standard plate count, Yeast & mould count, Spores, and sensory score as output parameter. Mean Square Error, Root Mean Square Error, Coefficient of Determination and Nash - Sutcliffo Coefficient were used in order to compare the prediction ability of the models. The study revealed that Generalized Regression computational intelligence models are quite effective in predicting the shelf life of processed cheese stored at 7-8 g.C.

  17. The Relationship between Economic Growth and Money Laundering – a Linear Regression Model

    Directory of Open Access Journals (Sweden)

    Daniel Rece

    2009-09-01

    Full Text Available This study provides an overview of the relationship between economic growth and money laundering modeled by a least squares function. The report analyzes statistically data collected from USA, Russia, Romania and other eleven European countries, rendering a linear regression model. The study illustrates that 23.7% of the total variance in the regressand (level of money laundering is “explained” by the linear regression model. In our opinion, this model will provide critical auxiliary judgment and decision support for anti-money laundering service systems.

  18. Regression models for interval censored survival data: Application to HIV infection in Danish homosexual men

    DEFF Research Database (Denmark)

    Carstensen, Bendix

    1996-01-01

    This paper shows how to fit excess and relative risk regression models to interval censored survival data, and how to implement the models in standard statistical software. The methods developed are used for the analysis of HIV infection rates in a cohort of Danish homosexual men.......This paper shows how to fit excess and relative risk regression models to interval censored survival data, and how to implement the models in standard statistical software. The methods developed are used for the analysis of HIV infection rates in a cohort of Danish homosexual men....

  19. A primer for biomedical scientists on how to execute model II linear regression analysis.

    Science.gov (United States)

    Ludbrook, John

    2012-04-01

    1. There are two very different ways of executing linear regression analysis. One is Model I, when the x-values are fixed by the experimenter. The other is Model II, in which the x-values are free to vary and are subject to error. 2. I have received numerous complaints from biomedical scientists that they have great difficulty in executing Model II linear regression analysis. This may explain the results of a Google Scholar search, which showed that the authors of articles in journals of physiology, pharmacology and biochemistry rarely use Model II regression analysis. 3. I repeat my previous arguments in favour of using least products linear regression analysis for Model II regressions. I review three methods for executing ordinary least products (OLP) and weighted least products (WLP) regression analysis: (i) scientific calculator and/or computer spreadsheet; (ii) specific purpose computer programs; and (iii) general purpose computer programs. 4. Using a scientific calculator and/or computer spreadsheet, it is easy to obtain correct values for OLP slope and intercept, but the corresponding 95% confidence intervals (CI) are inaccurate. 5. Using specific purpose computer programs, the freeware computer program smatr gives the correct OLP regression coefficients and obtains 95% CI by bootstrapping. In addition, smatr can be used to compare the slopes of OLP lines. 6. When using general purpose computer programs, I recommend the commercial programs systat and Statistica for those who regularly undertake linear regression analysis and I give step-by-step instructions in the Supplementary Information as to how to use loss functions.

  20. Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat.

    Science.gov (United States)

    Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne

    2012-12-01

    In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.

  1. Evaluation of Regression Models of Balance Calibration Data Using an Empirical Criterion

    Science.gov (United States)

    Ulbrich, Norbert; Volden, Thomas R.

    2012-01-01

    An empirical criterion for assessing the significance of individual terms of regression models of wind tunnel strain gage balance outputs is evaluated. The criterion is based on the percent contribution of a regression model term. It considers a term to be significant if its percent contribution exceeds the empirical threshold of 0.05%. The criterion has the advantage that it can easily be computed using the regression coefficients of the gage outputs and the load capacities of the balance. First, a definition of the empirical criterion is provided. Then, it is compared with an alternate statistical criterion that is widely used in regression analysis. Finally, calibration data sets from a variety of balances are used to illustrate the connection between the empirical and the statistical criterion. A review of these results indicated that the empirical criterion seems to be suitable for a crude assessment of the significance of a regression model term as the boundary between a significant and an insignificant term cannot be defined very well. Therefore, regression model term reduction should only be performed by using the more universally applicable statistical criterion.

  2. Modeling Governance KB with CATPCA to Overcome Multicollinearity in the Logistic Regression

    Science.gov (United States)

    Khikmah, L.; Wijayanto, H.; Syafitri, U. D.

    2017-04-01

    The problem often encounters in logistic regression modeling are multicollinearity problems. Data that have multicollinearity between explanatory variables with the result in the estimation of parameters to be bias. Besides, the multicollinearity will result in error in the classification. In general, to overcome multicollinearity in regression used stepwise regression. They are also another method to overcome multicollinearity which involves all variable for prediction. That is Principal Component Analysis (PCA). However, classical PCA in only for numeric data. Its data are categorical, one method to solve the problems is Categorical Principal Component Analysis (CATPCA). Data were used in this research were a part of data Demographic and Population Survey Indonesia (IDHS) 2012. This research focuses on the characteristic of women of using the contraceptive methods. Classification results evaluated using Area Under Curve (AUC) values. The higher the AUC value, the better. Based on AUC values, the classification of the contraceptive method using stepwise method (58.66%) is better than the logistic regression model (57.39%) and CATPCA (57.39%). Evaluation of the results of logistic regression using sensitivity, shows the opposite where CATPCA method (99.79%) is better than logistic regression method (92.43%) and stepwise (92.05%). Therefore in this study focuses on major class classification (using a contraceptive method), then the selected model is CATPCA because it can raise the level of the major class model accuracy.

  3. A Stochastic Restricted Principal Components Regression Estimator in the Linear Model

    Directory of Open Access Journals (Sweden)

    Daojiang He

    2014-01-01

    Full Text Available We propose a new estimator to combat the multicollinearity in the linear model when there are stochastic linear restrictions on the regression coefficients. The new estimator is constructed by combining the ordinary mixed estimator (OME and the principal components regression (PCR estimator, which is called the stochastic restricted principal components (SRPC regression estimator. Necessary and sufficient conditions for the superiority of the SRPC estimator over the OME and the PCR estimator are derived in the sense of the mean squared error matrix criterion. Finally, we give a numerical example and a Monte Carlo study to illustrate the performance of the proposed estimator.

  4. Regression analysis understanding and building business and economic models using Excel

    CERN Document Server

    Wilson, J Holton

    2012-01-01

    The technique of regression analysis is used so often in business and economics today that an understanding of its use is necessary for almost everyone engaged in the field. This book will teach you the essential elements of building and understanding regression models in a business/economic context in an intuitive manner. The authors take a non-theoretical treatment that is accessible even if you have a limited statistical background. It is specifically designed to teach the correct use of regression, while advising you of its limitations and teaching about common pitfalls. This book describe

  5. Restricted spatial regression in practice: Geostatistical models, confounding, and robustness under model misspecification

    Science.gov (United States)

    Hanks, Ephraim M.; Schliep, Erin M.; Hooten, Mevin B.; Hoeting, Jennifer A.

    2015-01-01

    In spatial generalized linear mixed models (SGLMMs), covariates that are spatially smooth are often collinear with spatially smooth random effects. This phenomenon is known as spatial confounding and has been studied primarily in the case where the spatial support of the process being studied is discrete (e.g., areal spatial data). In this case, the most common approach suggested is restricted spatial regression (RSR) in which the spatial random effects are constrained to be orthogonal to the fixed effects. We consider spatial confounding and RSR in the geostatistical (continuous spatial support) setting. We show that RSR provides computational benefits relative to the confounded SGLMM, but that Bayesian credible intervals under RSR can be inappropriately narrow under model misspecification. We propose a posterior predictive approach to alleviating this potential problem and discuss the appropriateness of RSR in a variety of situations. We illustrate RSR and SGLMM approaches through simulation studies and an analysis of malaria frequencies in The Gambia, Africa.

  6. Estimasi Model Seemingly Unrelated Regression (SUR dengan Metode Generalized Least Square (GLS

    Directory of Open Access Journals (Sweden)

    Ade Widyaningsih

    2014-06-01

    Full Text Available Regression analysis is a statistical tool that is used to determine the relationship between two or more quantitative variables so that one variable can be predicted from the other variables. A method that can used to obtain a good estimation in the regression analysis is ordinary least squares method. The least squares method is used to estimate the parameters of one or more regression but relationships among the errors in the response of other estimators are not allowed. One way to overcome this problem is Seemingly Unrelated Regression model (SUR in which parameters are estimated using Generalized Least Square (GLS. In this study, the author applies SUR model using GLS method on world gasoline demand data. The author obtains that SUR using GLS is better than OLS because SUR produce smaller errors than the OLS.

  7. Modeling of retardance in ferrofluid with Taguchi-based multiple regression analysis

    Science.gov (United States)

    Lin, Jing-Fung; Wu, Jyh-Shyang; Sheu, Jer-Jia

    2015-03-01

    The citric acid (CA) coated Fe3O4 ferrofluids are prepared by a co-precipitation method and the magneto-optical retardance property is measured by a Stokes polarimeter. Optimization and multiple regression of retardance in ferrofluids are executed by combining Taguchi method and Excel. From the nine tests for four parameters, including pH of suspension, molar ratio of CA to Fe3O4, volume of CA, and coating temperature, influence sequence and excellent program are found. Multiple regression analysis and F-test on the significance of regression equation are performed. It is found that the model F value is much larger than Fcritical and significance level P <0.0001. So it can be concluded that the regression model has statistically significant predictive ability. Substituting excellent program into equation, retardance is obtained as 32.703°, higher than the highest value in tests by 11.4%.

  8. A land use regression model for ambient ultrafine particles in Montreal, Canada: A comparison of linear regression and a machine learning approach.

    Science.gov (United States)

    Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne

    2016-04-01

    Existing evidence suggests that ambient ultrafine particles (UFPs) (regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure.

  9. A brief introduction to regression designs and mixed-effects modelling by a recent convert

    OpenAIRE

    Balling, Laura Winther

    2008-01-01

    This article discusses the advantages of multiple regression designs over the factorial designs traditionally used in many psycholinguistic experiments. It is shown that regression designs are typically more informative, statistically more powerful and better suited to the analysis of naturalistic tasks. The advantages of including both fixed and random effects are demonstrated with reference to linear mixed-effects models, and problems of collinearity, variable distribution and variable sele...

  10. Deriving Genomic Breeding Values for Residual Feed Intake from Covariance Functions of Random Regression Models

    OpenAIRE

    Strathe, Anders B; Mark, Thomas; Nielsen, Bjarne; Do, Duy Ngoc; KADARMIDEEN, Haja N.; Jensen, Just

    2014-01-01

    Random regression models were used to estimate covariance functions between cumulated feed intake (CFI) and body weight (BW) in 8424 Danish Duroc pigs. Random regressions on second order Legendre polynomials of age were used to describe genetic and permanent environmental curves in BW and CFI. Based on covariance functions, residual feed intake (RFI) was defined and derived as the conditional genetic variance in feed intake given mid-test breeding value for BW and rate of gain. The heritabili...

  11. Modelling of binary logistic regression for obesity among secondary students in a rural area of Kedah

    Science.gov (United States)

    Kamaruddin, Ainur Amira; Ali, Zalila; Noor, Norlida Mohd.; Baharum, Adam; Ahmad, Wan Muhamad Amir W.

    2014-07-01

    Logistic regression analysis examines the influence of various factors on a dichotomous outcome by estimating the probability of the event's occurrence. Logistic regression, also called a logit model, is a statistical procedure used to model dichotomous outcomes. In the logit model the log odds of the dichotomous outcome is modeled as a linear combination of the predictor variables. The log odds ratio in logistic regression provides a description of the probabilistic relationship of the variables and the outcome. In conducting logistic regression, selection procedures are used in selecting important predictor variables, diagnostics are used to check that assumptions are valid which include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers and a test statistic is calculated to determine the aptness of the model. This study used the binary logistic regression model to investigate overweight and obesity among rural secondary school students on the basis of their demographics profile, medical history, diet and lifestyle. The results indicate that overweight and obesity of students are influenced by obesity in family and the interaction between a student's ethnicity and routine meals intake. The odds of a student being overweight and obese are higher for a student having a family history of obesity and for a non-Malay student who frequently takes routine meals as compared to a Malay student.

  12. Proximate analysis based multiple regression models for higher heating value estimation of low rank coals

    Energy Technology Data Exchange (ETDEWEB)

    Akkaya, Ali Volkan [Department of Mechanical Engineering, Yildiz Technical University, 34349 Besiktas, Istanbul (Turkey)

    2009-02-15

    In this paper, multiple nonlinear regression models for estimation of higher heating value of coals are developed using proximate analysis data obtained generally from the low rank coal samples as-received basis. In this modeling study, three main model structures depended on the number of proximate analysis parameters, which are named the independent variables, such as moisture, ash, volatile matter and fixed carbon, are firstly categorized. Secondly, sub-model structures with different arrangements of the independent variables are considered. Each sub-model structure is analyzed with a number of model equations in order to find the best fitting model using multiple nonlinear regression method. Based on the results of nonlinear regression analysis, the best model for each sub-structure is determined. Among them, the models giving highest correlation for three main structures are selected. Although the selected all three models predicts HHV rather accurately, the model involving four independent variables provides the most accurate estimation of HHV. Additionally, when the chosen model with four independent variables and a literature model are tested with extra proximate analysis data, it is seen that that the developed model in this study can give more accurate prediction of HHV of coals. It can be concluded that the developed model is effective tool for HHV estimation of low rank coals. (author)

  13. Forecasting Helicoverpa populations in Australia: A comparison of regression based models and a bioclimatic based modelling approach

    Institute of Scientific and Technical Information of China (English)

    MYRONP.ZALUCKI; MICHAELJ.FURLONG

    2005-01-01

    Long-term forecasts of pest pressure are central to the effective management of many agricultural insect pests. In the eastern cropping regions of Australia, serious infestations of Helicoverpa punctigera (Wallenglen) and H. armigera (Hübner)(Lepidoptera:Noctuidae) are experienced annually. Regression analyses of a long series of light-trap catches of adult moths were used to describe the seasonal dynamics of both species. The size of the spring generation in eastern cropping zones could be related to rainfall in putative source areas in inland Australia. Subsequent generations could be related to the abundance of various crops in agricultural areas, rainfall and the magnitude of the spring population peak. As rainfall figured prominently as a predictor variable, and can itself be predicted using the Southern Oscillation Index (SOI), trap catches were also related to this variable. The geographic distribution of each species was modelled in relation to climate and CLIMEX was used to predict temporal variation in abundance at given putative source sites in inland Australia using historical meteorological data. These predictions were then correlated with subsequent pest abundance data in a major cropping region. The regression-based and bioclimatic-based approaches to predicting pest abundance are compared and their utility in predicting and interpreting pest dynamics are discussed.

  14. Multiple trait model combining random regressions for daily feed intake with single measured performance traits of growing pigs

    Directory of Open Access Journals (Sweden)

    Künzi Niklaus

    2002-01-01

    Full Text Available Abstract A random regression model for daily feed intake and a conventional multiple trait animal model for the four traits average daily gain on test (ADG, feed conversion ratio (FCR, carcass lean content and meat quality index were combined to analyse data from 1 449 castrated male Large White pigs performance tested in two French central testing stations in 1997. Group housed pigs fed ad libitum with electronic feed dispensers were tested from 35 to 100 kg live body weight. A quadratic polynomial in days on test was used as a regression function for weekly means of daily feed intake and to escribe its residual variance. The same fixed (batch and random (additive genetic, pen and individual permanent environmental effects were used for regression coefficients of feed intake and single measured traits. Variance components were estimated by means of a Bayesian analysis using Gibbs sampling. Four Gibbs chains were run for 550 000 rounds each, from which 50 000 rounds were discarded from the burn-in period. Estimates of posterior means of covariance matrices were calculated from the remaining two million samples. Low heritabilities of linear and quadratic regression coefficients and their unfavourable genetic correlations with other performance traits reveal that altering the shape of the feed intake curve by direct or indirect selection is difficult.

  15. Study of Mechanical Properties of Wool Type Fabrics using ANCOVA Regression Model

    Science.gov (United States)

    Hristian, L.; Ostafe, M. M.; Manea, L. R.; Apostol, L. L.

    2017-06-01

    The work has achieved a study on the variation of tensile strength for the four groups of wool fabric type, depending on the fiber composition, the tensile strength of the warp yarns and the weft yarns technological density using ANCOVA regression model. ANCOVA checks the correlation between a dependent variable and the covariate independent variables and removes the variability from the dependent variable that can be accounted for by the covariates. Analysis of covariance models combines analysis of variance with regression analysis techniques. Regarding design, ANCOVA models explain the dependent variable by combining categorical (qualitative) independent variables with continuous (quantitative) variables. There are special extensions to ANCOVA calculations to estimate parameters for both categorical and continuous variables. However ANCOVA models can also be calculated using multiple regression analysis using a design matrix with a mix of dummy-coded qualitative and quantitative variables.

  16. truncSP: An R Package for Estimation of Semi-Parametric Truncated Linear Regression Models

    Directory of Open Access Journals (Sweden)

    Maria Karlsson

    2014-05-01

    Full Text Available Problems with truncated data occur in many areas, complicating estimation and inference. Regarding linear regression models, the ordinary least squares estimator is inconsistent and biased for these types of data and is therefore unsuitable for use. Alternative estimators, designed for the estimation of truncated regression models, have been developed. This paper presents the R package truncSP. The package contains functions for the estimation of semi-parametric truncated linear regression models using three different estimators: the symmetrically trimmed least squares, quadratic mode, and left truncated estimators, all of which have been shown to have good asymptotic and ?nite sample properties. The package also provides functions for the analysis of the estimated models. Data from the environmental sciences are used to illustrate the functions in the package.

  17. Efficient Quantile Estimation for Functional-Coefficient Partially Linear Regression Models

    Institute of Scientific and Technical Information of China (English)

    Zhangong ZHOU; Rong JIANG; Weimin QIAN

    2011-01-01

    The quantile estimation methods are proposed for functional-coefficient partially linear regression (FCPLR) model by combining nonparametric and functional-coefficient regression (FCR) model.The local linear scheme and the integrated method are used to obtain local quantile estimators of all unknown functions in the FCPLR model.These resulting estimators are asymptotically normal,but each of them has big variance.To reduce variances of these quantile estimators,the one-step backfitting technique is used to obtain the efficient quantile estimators of all unknown functions,and their asymptotic normalities are derived.Two simulated examples are carried out to illustrate the proposed estimation methodology.

  18. Regression Basics

    CERN Document Server

    Kahane, Leo H

    2007-01-01

    Using a friendly, nontechnical approach, the Second Edition of Regression Basics introduces readers to the fundamentals of regression. Accessible to anyone with an introductory statistics background, this book builds from a simple two-variable model to a model of greater complexity. Author Leo H. Kahane weaves four engaging examples throughout the text to illustrate not only the techniques of regression but also how this empirical tool can be applied in creative ways to consider a broad array of topics. New to the Second Edition Offers greater coverage of simple panel-data estimation:

  19. Deep ensemble learning of sparse regression models for brain disease diagnosis.

    Science.gov (United States)

    Suk, Heung-Il; Lee, Seong-Whan; Shen, Dinggang

    2017-04-01

    Recent studies on brain imaging analysis witnessed the core roles of machine learning techniques in computer-assisted intervention for brain disease diagnosis. Of various machine-learning techniques, sparse regression models have proved their effectiveness in handling high-dimensional data but with a small number of training samples, especially in medical problems. In the meantime, deep learning methods have been making great successes by outperforming the state-of-the-art performances in various applications. In this paper, we propose a novel framework that combines the two conceptually different methods of sparse regression and deep learning for Alzheimer's disease/mild cognitive impairment diagnosis and prognosis. Specifically, we first train multiple sparse regression models, each of which is trained with different values of a regularization control parameter. Thus, our multiple sparse regression models potentially select different feature subsets from the original feature set; thereby they have different powers to predict the response values, i.e., clinical label and clinical scores in our work. By regarding the response values from our sparse regression models as target-level representations, we then build a deep convolutional neural network for clinical decision making, which thus we call 'Deep Ensemble Sparse Regression Network.' To our best knowledge, this is the first work that combines sparse regression models with deep neural network. In our experiments with the ADNI cohort, we validated the effectiveness of the proposed method by achieving the highest diagnostic accuracies in three classification tasks. We also rigorously analyzed our results and compared with the previous studies on the ADNI cohort in the literature.

  20. The more total cognitive load is reduced by cues, the better retention and transfer of multimedia learning: A meta-analysis and two meta-regression analyses.

    Science.gov (United States)

    Xie, Heping; Wang, Fuxing; Hao, Yanbin; Chen, Jiaxue; An, Jing; Wang, Yuxin; Liu, Huashan

    2017-01-01

    Cueing facilitates retention and transfer of multimedia learning. From the perspective of cognitive load theory (CLT), cueing has a positive effect on learning outcomes because of the reduction in total cognitive load and avoidance of cognitive overload. However, this has not been systematically evaluated. Moreover, what remains ambiguous is the direct relationship between the cue-related cognitive load and learning outcomes. A meta-analysis and two subsequent meta-regression analyses were conducted to explore these issues. Subjective total cognitive load (SCL) and scores on a retention test and transfer test were selected as dependent variables. Through a systematic literature search, 32 eligible articles encompassing 3,597 participants were included in the SCL-related meta-analysis. Among them, 25 articles containing 2,910 participants were included in the retention-related meta-analysis and the following retention-related meta-regression, while there were 29 articles containing 3,204 participants included in the transfer-related meta-analysis and the transfer-related meta-regression. The meta-analysis revealed a statistically significant cueing effect on subjective ratings of cognitive load (d = -0.11, 95% CI = [-0.19, -0.02], p multimedia materials can indeed reduce SCL and promote learning outcomes, and the more SCL is reduced by cues, the better retention and transfer of multimedia learning.

  1. Sensitivity analysis and optimization of system dynamics models : Regression analysis and statistical design of experiments

    NARCIS (Netherlands)

    Kleijnen, J.P.C.

    1995-01-01

    This tutorial discusses what-if analysis and optimization of System Dynamics models. These problems are solved, using the statistical techniques of regression analysis and design of experiments (DOE). These issues are illustrated by applying the statistical techniques to a System Dynamics model for

  2. Fitting multistate transition models with autoregressive logistic regression : Supervised exercise in intermittent claudication

    NARCIS (Netherlands)

    de Vries, S O; Fidler, Vaclav; Kuipers, Wietze D; Hunink, Maria G M

    1998-01-01

    The purpose of this study was to develop a model that predicts the outcome of supervised exercise for intermittent claudication. The authors present an example of the use of autoregressive logistic regression for modeling observed longitudinal data. Data were collected from 329 participants in a six

  3. A Percentile Regression Model for the Number of Errors in Group Conversation Tests.

    Science.gov (United States)

    Liski, Erkki P.; Puntanen, Simo

    A statistical model is presented for analyzing the results of group conversation tests in English, developed in a Finnish university study from 1977 to 1981. The model is illustrated with the findings from the study. In this study, estimates of percentile curves for the number of errors are of greater interest than the mean regression line. It was…

  4. Random regression models in the evaluation of the growth curve of Simbrasil beef cattle

    NARCIS (Netherlands)

    Mota, M.; Marques, F.A.; Lopes, P.S.; Hidalgo, A.M.

    2013-01-01

    Random regression models were used to estimate the types and orders of random effects of (co)variance functions in the description of the growth trajectory of the Simbrasil cattle breed. Records for 7049 animals totaling 18,677 individual weighings were submitted to 15 models from the third to the

  5. Sample Size Determination for Regression Models Using Monte Carlo Methods in R

    Science.gov (United States)

    Beaujean, A. Alexander

    2014-01-01

    A common question asked by researchers using regression models is, What sample size is needed for my study? While there are formulae to estimate sample sizes, their assumptions are often not met in the collected data. A more realistic approach to sample size determination requires more information such as the model of interest, strength of the…

  6. Random regression models in the evaluation of the growth curve of Simbrasil beef cattle

    NARCIS (Netherlands)

    Mota, M.; Marques, F.A.; Lopes, P.S.; Hidalgo, A.M.

    2013-01-01

    Random regression models were used to estimate the types and orders of random effects of (co)variance functions in the description of the growth trajectory of the Simbrasil cattle breed. Records for 7049 animals totaling 18,677 individual weighings were submitted to 15 models from the third to the f

  7. Logistic regression models of factors influencing the location of bioenergy and biofuels plants

    Science.gov (United States)

    T.M. Young; R.L. Zaretzki; J.H. Perdue; F.M. Guess; X. Liu

    2011-01-01

    Logistic regression models were developed to identify significant factors that influence the location of existing wood-using bioenergy/biofuels plants and traditional wood-using facilities. Logistic models provided quantitative insight for variables influencing the location of woody biomass-using facilities. Availability of "thinnings to a basal area of 31.7m2/ha...

  8. DESIGNING A FORECAST MODEL FOR ECONOMIC GROWTH OF JAPAN USING COMPETITIVE (HYBRID ANN VS MULTIPLE REGRESSION MODELS

    Directory of Open Access Journals (Sweden)

    Ahmet DEMIR

    2015-07-01

    Full Text Available Artificial neural network models have been already used on many different fields successfully. However, many researches show that ANN models provide better optimum results than other competitive models in most of the researches. But does it provide optimum solutions in case ANN is proposed as hybrid model? The answer of this question is given in this research by using these models on modelling a forecast for GDP growth of Japan. Multiple regression models utilized as competitive models versus hybrid ANN (ANN + multiple regression models. Results have shown that hybrid model gives better responds than multiple regression models. However, variables, which were significantly affecting GDP growth, were determined and some of the variables, which were assumed to be affecting GDP growth of Japan, were eliminated statistically.

  9. Longitudinal beta regression models for analyzing health-related quality of life scores over time

    Directory of Open Access Journals (Sweden)

    Hunger Matthias

    2012-09-01

    Full Text Available Abstract Background Health-related quality of life (HRQL has become an increasingly important outcome parameter in clinical trials and epidemiological research. HRQL scores are typically bounded at both ends of the scale and often highly skewed. Several regression techniques have been proposed to model such data in cross-sectional studies, however, methods applicable in longitudinal research are less well researched. This study examined the use of beta regression models for analyzing longitudinal HRQL data using two empirical examples with distributional features typically encountered in practice. Methods We used SF-6D utility data from a German older age cohort study and stroke-specific HRQL data from a randomized controlled trial. We described the conceptual differences between mixed and marginal beta regression models and compared both models to the commonly used linear mixed model in terms of overall fit and predictive accuracy. Results At any measurement time, the beta distribution fitted the SF-6D utility data and stroke-specific HRQL data better than the normal distribution. The mixed beta model showed better likelihood-based fit statistics than the linear mixed model and respected the boundedness of the outcome variable. However, it tended to underestimate the true mean at the upper part of the distribution. Adjusted group means from marginal beta model and linear mixed model were nearly identical but differences could be observed with respect to standard errors. Conclusions Understanding the conceptual differences between mixed and marginal beta regression models is important for their proper use in the analysis of longitudinal HRQL data. Beta regression fits the typical distribution of HRQL data better than linear mixed models, however, if focus is on estimating group mean scores rather than making individual predictions, the two methods might not differ substantially.

  10. Time-varying parameter auto-regressive models for autocovariance nonstationary time series

    Institute of Scientific and Technical Information of China (English)

    FEI WanChun; BAI Lun

    2009-01-01

    In this paper,autocovariance nonstationary time series is clearly defined on a family of time series.We propose three types of TVPAR (time-varying parameter auto-regressive) models:the full order TVPAR model,the time-unvarying order TVPAR model and the time-varying order TVPAR model for autocovariance nonstationary time series.Related minimum AIC (Akaike information criterion) estimations are carried out.

  11. Time-varying parameter auto-regressive models for autocovariance nonstationary time series

    Institute of Scientific and Technical Information of China (English)

    2009-01-01

    In this paper, autocovariance nonstationary time series is clearly defined on a family of time series. We propose three types of TVPAR (time-varying parameter auto-regressive) models: the full order TVPAR model, the time-unvarying order TVPAR model and the time-varying order TV-PAR model for autocovariance nonstationary time series. Related minimum AIC (Akaike information criterion) estimations are carried out.

  12. Predicting recovery of cognitive function soon after stroke: differential modeling of logarithmic and linear regression.

    Science.gov (United States)

    Suzuki, Makoto; Sugimura, Yuko; Yamada, Sumio; Omori, Yoshitsugu; Miyamoto, Masaaki; Yamamoto, Jun-ichi

    2013-01-01

    Cognitive disorders in the acute stage of stroke are common and are important independent predictors of adverse outcome in the long term. Despite the impact of cognitive disorders on both patients and their families, it is still difficult to predict the extent or duration of cognitive impairments. The objective of the present study was, therefore, to provide data on predicting the recovery of cognitive function soon after stroke by differential modeling with logarithmic and linear regression. This study included two rounds of data collection comprising 57 stroke patients enrolled in the first round for the purpose of identifying the time course of cognitive recovery in the early-phase group data, and 43 stroke patients in the second round for the purpose of ensuring that the correlation of the early-phase group data applied to the prediction of each individual's degree of cognitive recovery. In the first round, Mini-Mental State Examination (MMSE) scores were assessed 3 times during hospitalization, and the scores were regressed on the logarithm and linear of time. In the second round, calculations of MMSE scores were made for the first two scoring times after admission to tailor the structures of logarithmic and linear regression formulae to fit an individual's degree of functional recovery. The time course of early-phase recovery for cognitive functions resembled both logarithmic and linear functions. However, MMSE scores sampled at two baseline points based on logarithmic regression modeling could estimate prediction of cognitive recovery more accurately than could linear regression modeling (logarithmic modeling, R(2) = 0.676, Plinear regression modeling, R(2) = 0.598, P<0.0001). Logarithmic modeling based on MMSE scores could accurately predict the recovery of cognitive function soon after the occurrence of stroke. This logarithmic modeling with mathematical procedures is simple enough to be adopted in daily clinical practice.

  13. Predicting recovery of cognitive function soon after stroke: differential modeling of logarithmic and linear regression.

    Directory of Open Access Journals (Sweden)

    Makoto Suzuki

    Full Text Available Cognitive disorders in the acute stage of stroke are common and are important independent predictors of adverse outcome in the long term. Despite the impact of cognitive disorders on both patients and their families, it is still difficult to predict the extent or duration of cognitive impairments. The objective of the present study was, therefore, to provide data on predicting the recovery of cognitive function soon after stroke by differential modeling with logarithmic and linear regression. This study included two rounds of data collection comprising 57 stroke patients enrolled in the first round for the purpose of identifying the time course of cognitive recovery in the early-phase group data, and 43 stroke patients in the second round for the purpose of ensuring that the correlation of the early-phase group data applied to the prediction of each individual's degree of cognitive recovery. In the first round, Mini-Mental State Examination (MMSE scores were assessed 3 times during hospitalization, and the scores were regressed on the logarithm and linear of time. In the second round, calculations of MMSE scores were made for the first two scoring times after admission to tailor the structures of logarithmic and linear regression formulae to fit an individual's degree of functional recovery. The time course of early-phase recovery for cognitive functions resembled both logarithmic and linear functions. However, MMSE scores sampled at two baseline points based on logarithmic regression modeling could estimate prediction of cognitive recovery more accurately than could linear regression modeling (logarithmic modeling, R(2 = 0.676, P<0.0001; linear regression modeling, R(2 = 0.598, P<0.0001. Logarithmic modeling based on MMSE scores could accurately predict the recovery of cognitive function soon after the occurrence of stroke. This logarithmic modeling with mathematical procedures is simple enough to be adopted in daily clinical practice.

  14. Higher precision estimates of regional polar warming by ensemble regression of climate model projections

    Energy Technology Data Exchange (ETDEWEB)

    Bracegirdle, Thomas J. [British Antarctic Survey, Cambridge (United Kingdom); Stephenson, David B. [University of Exeter, Mathematics Research Institute, Exeter (United Kingdom); NCAS-Climate, Reading (United Kingdom)

    2012-12-15

    This study presents projections of twenty-first century wintertime surface temperature changes over the high-latitude regions based on the third Coupled Model Inter-comparison Project (CMIP3) multi-model ensemble. The state-dependence of the climate change response on the present day mean state is captured using a simple yet robust ensemble linear regression model. The ensemble regression approach gives different and more precise estimated mean responses compared to the ensemble mean approach. Over the Arctic in January, ensemble regression gives less warming than the ensemble mean along the boundary between sea ice and open ocean (sea ice edge). Most notably, the results show 3 C less warming over the Barents Sea ({proportional_to} 7 C compared to {proportional_to} 10 C). In addition, the ensemble regression method gives projections that are 30 % more precise over the Sea of Okhostk, Bering Sea and Labrador Sea. For the Antarctic in winter (July) the ensemble regression method gives 2 C more warming over the Southern Ocean close to the Greenwich Meridian ({proportional_to} 7 C compared to {proportional_to} 5 C). Projection uncertainty was almost half that of the ensemble mean uncertainty over the Southern Ocean between 30 W to 90 E and 30 % less over the northern Antarctic Peninsula. The ensemble regression model avoids the need for explicit ad hoc weighting of models and exploits the whole ensemble to objectively identify overly influential outlier models. Bootstrap resampling shows that maximum precision over the Southern Ocean can be obtained with ensembles having as few as only six climate models. (orig.)

  15. A Robbins-Monro procedure for estimation in semiparametric regression models

    CERN Document Server

    Bercu, Bernard

    2011-01-01

    This paper is devoted to the parametric estimation of a shift together with the nonparametric estimation of a regression function in a semiparametric regression model. We implement a Robbins-Monro procedure very efficient and easy to handle. On the one hand, we propose a stochastic algorithm similar to that of Robbins-Monro in order to estimate the shift parameter. A preliminary evaluation of the regression function is not necessary for estimating the shift parameter. On the other hand, we make use of a recursive Nadaraya-Watson estimator for the estimation of the regression function. This kernel estimator takes in account the previous estimation of the shift parameter. We establish the almost sure convergence for both Robbins-Monro and Nadaraya-Watson estimators. The asymptotic normality of our estimates is also provided.

  16. SPSS macros to compare any two fitted values from a regression model.

    Science.gov (United States)

    Weaver, Bruce; Dubois, Sacha

    2012-12-01

    In regression models with first-order terms only, the coefficient for a given variable is typically interpreted as the change in the fitted value of Y for a one-unit increase in that variable, with all other variables held constant. Therefore, each regression coefficient represents the difference between two fitted values of Y. But the coefficients represent only a fraction of the possible fitted value comparisons that might be of interest to researchers. For many fitted value comparisons that are not captured by any of the regression coefficients, common statistical software packages do not provide the standard errors needed to compute confidence intervals or carry out statistical tests-particularly in more complex models that include interactions, polynomial terms, or regression splines. We describe two SPSS macros that implement a matrix algebra method for comparing any two fitted values from a regression model. The !OLScomp and !MLEcomp macros are for use with models fitted via ordinary least squares and maximum likelihood estimation, respectively. The output from the macros includes the standard error of the difference between the two fitted values, a 95% confidence interval for the difference, and a corresponding statistical test with its p-value.

  17. Prediction of soil temperature using regression and artificial neural network models

    Science.gov (United States)

    Bilgili, Mehmet

    2010-12-01

    In this study, monthly soil temperature was modeled by linear regression (LR), nonlinear regression (NLR) and artificial neural network (ANN) methods. The soil temperature and other meteorological parameters, which have been taken from Adana meteorological station, were observed between the years of 2000 and 2007 by the Turkish State Meteorological Service (TSMS). The soil temperatures were measured at depths of 5, 10, 20, 50 and 100 cm below the ground level. A three-layer feed-forward ANN structure was constructed and a back-propagation algorithm was used for the training of ANNs. In order to get a successful simulation, the correlation coefficients between all of the meteorological variables (soil temperature, atmospheric temperature, atmospheric pressure, relative humidity, wind speed, rainfall, global solar radiation and sunshine duration) were calculated taking them two by two. First, all independent variables were split into two time periods such as cold and warm seasons. They were added to the enter regression model. Then, the method of stepwise multiple regression was applied for the selection of the "best" regression equation (model). Thus, the best independent variables were selected for the LR and NLR models and they were also used in the input layer of the ANN method. Results of these methods were compared to each other. Finally, the ANN method was found to provide better performance than the LR and NLR methods.

  18. VIPRE modeling of VVER-1000 reactor core for DNB analyses

    Energy Technology Data Exchange (ETDEWEB)

    Sung, Y.; Nguyen, Q. [Westinghouse Electric Corporation, Pittsburgh, PA (United States); Cizek, J. [Nuclear Research Institute, Prague, (Czech Republic)

    1995-09-01

    Based on the one-pass modeling approach, the hot channels and the VVER-1000 reactor core can be modeled in 30 channels for DNB analyses using the VIPRE-01/MOD02 (VIPRE) code (VIPRE is owned by Electric Power Research Institute, Palo Alto, California). The VIPRE one-pass model does not compromise any accuracy in the hot channel local fluid conditions. Extensive qualifications include sensitivity studies of radial noding and crossflow parameters and comparisons with the results from THINC and CALOPEA subchannel codes. The qualifications confirm that the VIPRE code with the Westinghouse modeling method provides good computational performance and accuracy for VVER-1000 DNB analyses.

  19. LINEAR REGRESSION MODEL ESTİMATİON FOR RIGHT CENSORED DATA

    Directory of Open Access Journals (Sweden)

    Ersin Yılmaz

    2016-05-01

    Full Text Available In this study, firstly we will define a right censored data. If we say shortly right-censored data is censoring values that above the exact line. This may be related with scaling device. And then  we will use response variable acquainted from right-censored explanatory variables. Then the linear regression model will be estimated. For censored data’s existence, Kaplan-Meier weights will be used for  the estimation of the model. With the weights regression model  will be consistent and unbiased with that.   And also there is a method for the censored data that is a semi parametric regression and this method also give  useful results  for censored data too. This study also might be useful for the health studies because of the censored data used in medical issues generally.

  20. Construction of risk prediction model of type 2 diabetes mellitus based on logistic regression

    Directory of Open Access Journals (Sweden)

    Li Jian

    2017-01-01

    Full Text Available Objective: to construct multi factor prediction model for the individual risk of T2DM, and to explore new ideas for early warning, prevention and personalized health services for T2DM. Methods: using logistic regression techniques to screen the risk factors for T2DM and construct the risk prediction model of T2DM. Results: Male’s risk prediction model logistic regression equation: logit(P=BMI × 0.735+ vegetables × (−0.671 + age × 0.838+ diastolic pressure × 0.296+ physical activity× (−2.287 + sleep ×(−0.009 +smoking ×0.214; Female’s risk prediction model logistic regression equation: logit(P=BMI ×1.979+ vegetables× (−0.292 + age × 1.355+ diastolic pressure× 0.522+ physical activity × (−2.287 + sleep × (−0.010.The area under the ROC curve of male was 0.83, the sensitivity was 0.72, the specificity was 0.86, the area under the ROC curve of female was 0.84, the sensitivity was 0.75, the specificity was 0.90. Conclusion: This study model data is from a compared study of nested case, the risk prediction model has been established by using the more mature logistic regression techniques, and the model is higher predictive sensitivity, specificity and stability.

  1. Validation of a regression model for standardizing lifetime racing performances of thoroughbreds.

    Science.gov (United States)

    Martin, G S; Strand, E; Kearney, M T

    1997-06-01

    To determine the relationship between prediction errors of a regression model of racing finish times and earnings or finish position; the relationship between standardized finish times, determined by use of this model, and earnings or finish position; and whether this model was valid when applied to data for horses that underwent surgical treatment. Survey. Records of 6,700 healthy Thoroughbreds racing in Louisiana and of 31 Thoroughbreds with idiopathic left laryngeal hemiplegia that underwent surgical treatment. Predicted and standardized finish times were calculated by use of the regression model for healthy horses, and the relationships between prediction error (actual--predicted finish time) and standardized finish times, and earnings and finish position, were examined. Then, the regression model was applied to data for horses with hemiplegia to determine whether the model was valid when used to calculate predicted and standardized finish times for lifetime performance data. Prediction error and standardized finish times were negatively correlated with earnings and positively correlated with finish position and, thus, appeared to be reliable measures of racing performance. The regression model was found to be valid when applied to lifetime performance records of horses with laryngeal hemiplegia. Prediction error and standardized finish times are measures of racing performance that can be used to compare performances among Thoroughbred racehorses across a variety of circumstances that would otherwise confound comparison.

  2. Monitoring seasonal influenza epidemics by using internet search data with an ensemble penalized regression model

    Science.gov (United States)

    Guo, Pi; Zhang, Jianjun; Wang, Li; Yang, Shaoyi; Luo, Ganfeng; Deng, Changyu; Wen, Ye; Zhang, Qingying

    2017-01-01

    Seasonal influenza epidemics cause serious public health problems in China. Search queries-based surveillance was recently proposed to complement traditional monitoring approaches of influenza epidemics. However, developing robust techniques of search query selection and enhancing predictability for influenza epidemics remains a challenge. This study aimed to develop a novel ensemble framework to improve penalized regression models for detecting influenza epidemics by using Baidu search engine query data from China. The ensemble framework applied a combination of bootstrap aggregating (bagging) and rank aggregation method to optimize penalized regression models. Different algorithms including lasso, ridge, elastic net and the algorithms in the proposed ensemble framework were compared by using Baidu search engine queries. Most of the selected search terms captured the peaks and troughs of the time series curves of influenza cases. The predictability of the conventional penalized regression models were improved by the proposed ensemble framework. The elastic net regression model outperformed the compared models, with the minimum prediction errors. We established a Baidu search engine queries-based surveillance model for monitoring influenza epidemics, and the proposed model provides a useful tool to support the public health response to influenza and other infectious diseases. PMID:28422149

  3. Linear regression models of floor surface parameters on friction between Neolite and quarry tiles.

    Science.gov (United States)

    Chang, Wen-Ruey; Matz, Simon; Grönqvist, Raoul; Hirvonen, Mikko

    2010-01-01

    For slips and falls, friction is widely used as an indicator of surface slipperiness. Surface parameters, including surface roughness and waviness, were shown to influence friction by correlating individual surface parameters with the measured friction. A collective input from multiple surface parameters as a predictor of friction, however, could provide a broader perspective on the contributions from all the surface parameters evaluated. The objective of this study was to develop regression models between the surface parameters and measured friction. The dynamic friction was measured using three different mixtures of glycerol and water as contaminants. Various surface roughness and waviness parameters were measured using three different cut-off lengths. The regression models indicate that the selected surface parameters can predict the measured friction coefficient reliably in most of the glycerol concentrations and cut-off lengths evaluated. The results of the regression models were, in general, consistent with those obtained from the correlation between individual surface parameters and the measured friction in eight out of nine conditions evaluated in this experiment. A hierarchical regression model was further developed to evaluate the cumulative contributions of the surface parameters in the final iteration by adding these parameters to the regression model one at a time from the easiest to measure to the most difficult to measure and evaluating their impacts on the adjusted R(2) values. For practical purposes, the surface parameter R(a) alone would account for the majority of the measured friction even if it did not reach a statistically significant level in some of the regression models.

  4. Logistic random effects regression models: a comparison of statistical packages for binary and ordinal outcomes

    Directory of Open Access Journals (Sweden)

    Steyerberg Ewout W

    2011-05-01

    Full Text Available Abstract Background Logistic random effects models are a popular tool to analyze multilevel also called hierarchical data with a binary or ordinal outcome. Here, we aim to compare different statistical software implementations of these models. Methods We used individual patient data from 8509 patients in 231 centers with moderate and severe Traumatic Brain Injury (TBI enrolled in eight Randomized Controlled Trials (RCTs and three observational studies. We fitted logistic random effects regression models with the 5-point Glasgow Outcome Scale (GOS as outcome, both dichotomized as well as ordinal, with center and/or trial as random effects, and as covariates age, motor score, pupil reactivity or trial. We then compared the implementations of frequentist and Bayesian methods to estimate the fixed and random effects. Frequentist approaches included R (lme4, Stata (GLLAMM, SAS (GLIMMIX and NLMIXED, MLwiN ([R]IGLS and MIXOR, Bayesian approaches included WinBUGS, MLwiN (MCMC, R package MCMCglmm and SAS experimental procedure MCMC. Three data sets (the full data set and two sub-datasets were analysed using basically two logistic random effects models with either one random effect for the center or two random effects for center and trial. For the ordinal outcome in the full data set also a proportional odds model with a random center effect was fitted. Results The packages gave similar parameter estimates for both the fixed and random effects and for the binary (and ordinal models for the main study and when based on a relatively large number of level-1 (patient level data compared to the number of level-2 (hospital level data. However, when based on relatively sparse data set, i.e. when the numbers of level-1 and level-2 data units were about the same, the frequentist and Bayesian approaches showed somewhat different results. The software implementations differ considerably in flexibility, computation time, and usability. There are also differences in

  5. Blind identification of threshold auto-regressive model for machine fault diagnosis

    Institute of Scientific and Technical Information of China (English)

    LI Zhinong; HE Yongyong; CHU Fulei; WU Zhaotong

    2007-01-01

    A blind identification method was developed for the threshold auto-regressive (TAR) model. The method had good identification accuracy and rapid convergence, especially for higher order systems. The proposed method was then combined with the hidden Markov model (HMM) to determine the auto-regressive (AR) coefficients for each interval used for feature extraction, with the HMM as a classifier. The fault diagnoses during the speed-up and speed- down processes for rotating machinery have been success- fully completed. The result of the experiment shows that the proposed method is practical and effective.

  6. Methods and applications of linear models regression and the analysis of variance

    CERN Document Server

    Hocking, Ronald R

    2013-01-01

    Praise for the Second Edition"An essential desktop reference book . . . it should definitely be on your bookshelf." -Technometrics A thoroughly updated book, Methods and Applications of Linear Models: Regression and the Analysis of Variance, Third Edition features innovative approaches to understanding and working with models and theory of linear regression. The Third Edition provides readers with the necessary theoretical concepts, which are presented using intuitive ideas rather than complicated proofs, to describe the inference that is appropriate for the methods being discussed. The book

  7. Analysis of Multivariate Experimental Data Using A Simplified Regression Model Search Algorithm

    Science.gov (United States)

    Ulbrich, Norbert Manfred

    2013-01-01

    A new regression model search algorithm was developed in 2011 that may be used to analyze both general multivariate experimental data sets and wind tunnel strain-gage balance calibration data. The new algorithm is a simplified version of a more complex search algorithm that was originally developed at the NASA Ames Balance Calibration Laboratory. The new algorithm has the advantage that it needs only about one tenth of the original algorithm's CPU time for the completion of a search. In addition, extensive testing showed that the prediction accuracy of math models obtained from the simplified algorithm is similar to the prediction accuracy of math models obtained from the original algorithm. The simplified algorithm, however, cannot guarantee that search constraints related to a set of statistical quality requirements are always satisfied in the optimized regression models. Therefore, the simplified search algorithm is not intended to replace the original search algorithm. Instead, it may be used to generate an alternate optimized regression model of experimental data whenever the application of the original search algorithm either fails or requires too much CPU time. Data from a machine calibration of NASA's MK40 force balance is used to illustrate the application of the new regression model search algorithm.

  8. Appraisal, coping, emotion, and performance during elite fencing matches: a random coefficient regression model approach.

    Science.gov (United States)

    Doron, J; Martinent, G

    2016-06-23

    Understanding more about the stress process is important for the performance of athletes during stressful situations. Grounded in Lazarus's (1991, 1999, 2000) CMRT of emotion, this study tracked longitudinally the relationships between cognitive appraisal, coping, emotions, and performance in nine elite fencers across 14 international matches (representing 619 momentary assessments) using a naturalistic, video-assisted methodology. A series of hierarchical linear modeling analyses were conducted to: (a) explore the relationships between cognitive appraisals (challenge and threat), coping strategies (task- and disengagement oriented coping), emotions (positive and negative) and objective performance; (b) ascertain whether the relationship between appraisal and emotion was mediated by coping; and (c) examine whether the relationship between appraisal and objective performance was mediated by emotion and coping. The results of the random coefficient regression models showed: (a) positive relationships between challenge appraisal, task-oriented coping, positive emotions, and performance, as well as between threat appraisal, disengagement-oriented coping and negative emotions; (b) that disengagement-oriented coping partially mediated the relationship between threat and negative emotions, whereas task-oriented coping partially mediated the relationship between challenge and positive emotions; and (c) that disengagement-oriented coping mediated the relationship between threat and performance, whereas task-oriented coping and positive emotions partially mediated the relationship between challenge and performance. As a whole, this study furthered knowledge during sport performance situations of Lazarus's (1999) claim that these psychological constructs exist within a conceptual unit. Specifically, our findings indicated that the ways these constructs are inter-related influence objective performance within competitive settings.

  9. Accounting for spatial effects in land use regression for urban air pollution modeling.

    Science.gov (United States)

    Bertazzon, Stefania; Johnson, Markey; Eccles, Kristin; Kaplan, Gilaad G

    2015-01-01

    In order to accurately assess air pollution risks, health studies require spatially resolved pollution concentrations. Land-use regression (LUR) models estimate ambient concentrations at a fine spatial scale. However, spatial effects such as spatial non-stationarity and spatial autocorrelation can reduce the accuracy of LUR estimates by increasing regression errors and uncertainty; and statistical methods for resolving these effects--e.g., spatially autoregressive (SAR) and geographically weighted regression (GWR) models--may be difficult to apply simultaneously. We used an alternate approach to address spatial non-stationarity and spatial autocorrelation in LUR models for nitrogen dioxide. Traditional models were re-specified to include a variable capturing wind speed and direction, and re-fit as GWR models. Mean R(2) values for the resulting GWR-wind models (summer: 0.86, winter: 0.73) showed a 10-20% improvement over traditional LUR models. GWR-wind models effectively addressed both spatial effects and produced meaningful predictive models. These results suggest a useful method for improving spatially explicit models.

  10. Improving regression-model-based streamwater constituent load estimates derived from serially correlated data

    Science.gov (United States)

    Aulenbach, Brent T.

    2013-10-01

    A regression-model based approach is a commonly used, efficient method for estimating streamwater constituent load when there is a relationship between streamwater constituent concentration and continuous variables such as streamwater discharge, season and time. A subsetting experiment using a 30-year dataset of daily suspended sediment observations from the Mississippi River at Thebes, Illinois, was performed to determine optimal sampling frequency, model calibration period length, and regression model methodology, as well as to determine the effect of serial correlation of model residuals on load estimate precision. Two regression-based methods were used to estimate streamwater loads, the Adjusted Maximum Likelihood Estimator (AMLE), and the composite method, a hybrid load estimation approach. While both methods accurately and precisely estimated loads at the model's calibration period time scale, precisions were progressively worse at shorter reporting periods, from annually to monthly. Serial correlation in model residuals resulted in observed AMLE precision to be significantly worse than the model calculated standard errors of prediction. The composite method effectively improved upon AMLE loads for shorter reporting periods, but required a sampling interval of at least 15-days or shorter, when the serial correlations in the observed load residuals were greater than 0.15. AMLE precision was better at shorter sampling intervals and when using the shortest model calibration periods, such that the regression models better fit the temporal changes in the concentration-discharge relationship. The models with the largest errors typically had poor high flow sampling coverage resulting in unrepresentative models. Increasing sampling frequency and/or targeted high flow sampling are more efficient approaches to ensure sufficient sampling and to avoid poorly performing models, than increasing calibration period length.

  11. Modeling Approach of Regression Orthogonal Experiment Design for Thermal Error Compensation of CNC Turning Center

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    The thermal induced errors can account for as much as 70% of the dimensional errors on a workpiece. Accurate modeling of errors is an essential part of error compensation. Base on analyzing the existing approaches of the thermal error modeling for machine tools, a new approach of regression orthogonal design is proposed, which combines the statistic theory with machine structures, surrounding condition, engineering judgements, and experience in modeling. A whole computation and analysis procedure is given. ...

  12. Stahel-Donoho kernel estimation for fixed design nonparametric regression models

    Institute of Scientific and Technical Information of China (English)

    LIN; Lu

    2006-01-01

    This paper reports a robust kernel estimation for fixed design nonparametric regression models.A Stahel-Donoho kernel estimation is introduced,in which the weight functions depend on both the depths of data and the distances between the design points and the estimation points.Based on a local approximation,a computational technique is given to approximate to the incomputable depths of the errors.As a result the new estimator is computationally efficient.The proposed estimator attains a high breakdown point and has perfect asymptotic behaviors such as the asymptotic normality and convergence in the mean squared error.Unlike the depth-weighted estimator for parametric regression models,this depth-weighted nonparametric estimator has a simple variance structure and then we can compare its efficiency with the original one.Some simulations show that the new method can smooth the regression estimation and achieve some desirable balances between robustness and efficiency.

  13. Bayesian Bandwidth Selection for a Nonparametric Regression Model with Mixed Types of Regressors

    Directory of Open Access Journals (Sweden)

    Xibin Zhang

    2016-04-01

    Full Text Available This paper develops a sampling algorithm for bandwidth estimation in a nonparametric regression model with continuous and discrete regressors under an unknown error density. The error density is approximated by the kernel density estimator of the unobserved errors, while the regression function is estimated using the Nadaraya-Watson estimator admitting continuous and discrete regressors. We derive an approximate likelihood and posterior for bandwidth parameters, followed by a sampling algorithm. Simulation results show that the proposed approach typically leads to better accuracy of the resulting estimates than cross-validation, particularly for smaller sample sizes. This bandwidth estimation approach is applied to nonparametric regression model of the Australian All Ordinaries returns and the kernel density estimation of gross domestic product (GDP growth rates among the organisation for economic co-operation and development (OECD and non-OECD countries.

  14. Combining the Performance Strengths of the Logistic Regression and Neural Network Models: A Medical Outcomes Approach

    Directory of Open Access Journals (Sweden)

    Wun Wong

    2003-01-01

    Full Text Available The assessment of medical outcomes is important in the effort to contain costs, streamline patient management, and codify medical practices. As such, it is necessary to develop predictive models that will make accurate predictions of these outcomes. The neural network methodology has often been shown to perform as well, if not better, than the logistic regression methodology in terms of sample predictive performance. However, the logistic regression method is capable of providing an explanation regarding the relationship(s between variables. This explanation is often crucial to understanding the clinical underpinnings of the disease process. Given the respective strengths of the methodologies in question, the combined use of a statistical (i.e., logistic regression and machine learning (i.e., neural network technology in the classification of medical outcomes is warranted under appropriate conditions. The study discusses these conditions and describes an approach for combining the strengths of the models.

  15. Local polynomial estimation of heteroscedasticity in a multivariate linear regression model and its applications in economics.

    Science.gov (United States)

    Su, Liyun; Zhao, Yanyong; Yan, Tianshun; Li, Fenglan

    2012-01-01

    Multivariate local polynomial fitting is applied to the multivariate linear heteroscedastic regression model. Firstly, the local polynomial fitting is applied to estimate heteroscedastic function, then the coefficients of regression model are obtained by using generalized least squares method. One noteworthy feature of our approach is that we avoid the testing for heteroscedasticity by improving the traditional two-stage method. Due to non-parametric technique of local polynomial estimation, it is unnecessary to know the form of heteroscedastic function. Therefore, we can improve the estimation precision, when the heteroscedastic function is unknown. Furthermore, we verify that the regression coefficients is asymptotic normal based on numerical simulations and normal Q-Q plots of residuals. Finally, the simulation results and the local polynomial estimation of real data indicate that our approach is surely effective in finite-sample situations.

  16. A componential model of human interaction with graphs: 1. Linear regression modeling

    Science.gov (United States)

    Gillan, Douglas J.; Lewis, Robert

    1994-01-01

    Task analyses served as the basis for developing the Mixed Arithmetic-Perceptual (MA-P) model, which proposes (1) that people interacting with common graphs to answer common questions apply a set of component processes-searching for indicators, encoding the value of indicators, performing arithmetic operations on the values, making spatial comparisons among indicators, and repsonding; and (2) that the type of graph and user's task determine the combination and order of the components applied (i.e., the processing steps). Two experiments investigated the prediction that response time will be linearly related to the number of processing steps according to the MA-P model. Subjects used line graphs, scatter plots, and stacked bar graphs to answer comparison questions and questions requiring arithmetic calculations. A one-parameter version of the model (with equal weights for all components) and a two-parameter version (with different weights for arithmetic and nonarithmetic processes) accounted for 76%-85% of individual subjects' variance in response time and 61%-68% of the variance taken across all subjects. The discussion addresses possible modifications in the MA-P model, alternative models, and design implications from the MA-P model.

  17. Replica analysis of overfitting in regression models for time-to-event data

    Science.gov (United States)

    Coolen, A. C. C.; Barrett, J. E.; Paga, P.; Perez-Vicente, C. J.

    2017-09-01

    Overfitting, which happens when the number of parameters in a model is too large compared to the number of data points available for determining these parameters, is a serious and growing problem in survival analysis. While modern medicine presents us with data of unprecedented dimensionality, these data cannot yet be used effectively for clinical outcome prediction. Standard error measures in maximum likelihood regression, such as p-values and z-scores, are blind to overfitting, and even for Cox’s proportional hazards model (the main tool of medical statisticians), one finds in literature only rules of thumb on the number of samples required to avoid overfitting. In this paper we present a mathematical theory of overfitting in regression models for time-to-event data, which aims to increase our quantitative understanding of the problem and provide practical tools with which to correct regression outcomes for the impact of overfitting. It is based on the replica method, a statistical mechanical technique for the analysis of heterogeneous many-variable systems that has been used successfully for several decades in physics, biology, and computer science, but not yet in medical statistics. We develop the theory initially for arbitrary regression models for time-to-event data, and verify its predictions in detail for the popular Cox model.

  18. Analysis of the Influence of Quantile Regression Model on Mainland Tourists’ Service Satisfaction Performance

    Directory of Open Access Journals (Sweden)

    Wen-Cheng Wang

    2014-01-01

    Full Text Available It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models.

  19. Analysis of the Influence of Quantile Regression Model on Mainland Tourists' Service Satisfaction Performance

    Science.gov (United States)

    Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen

    2014-01-01

    It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models. PMID:24574916

  20. Logistic回归模型及其应用%Logistic regression model and its application

    Institute of Scientific and Technical Information of China (English)

    常振海; 刘薇

    2012-01-01

    为了利用Logistic模型提高多分类定性因变量的预测准确率,在二分类Logistic回归模型的基础上,对实际统计数据建立三类别的Logistic模型.采用似然比检验法对自变量的显著性进行检验,剔除了不显著的变量;对每个类别的因变量都确定了1个线性回归函数,并进行了模型检验.分析结果表明,在处理因变量为定性变量的回归分析中,Logistic模型具有很好的预测准确度和实用推广性.%To improve the forecasting accuracy of the multinomial qualitative dependent variable by using logistic model,ternary logistic model is established for actual statistical data based on binary logistic regression model.The significance of independent variables is tested by using the likelihood ratio test method to remove the non-significant variable.A linear regression function is determined for each category dependent variable,and the models are tested.The analysis results show that logistic regression model has good predictive accuracy and practical promotional value in handling regression analysis of qualitative dependent variable.

  1. Post-L1-Penalized Estimators in High-Dimensional Linear Regression Models

    CERN Document Server

    Belloni, Alexandre

    2010-01-01

    In this paper we study the post-penalized estimator which applies ordinary, unpenalized linear regression to the model selected by the first step penalized estimators, typically the LASSO. We show that post-LASSO can perform as well or nearly as well as the LASSO in terms of the rate of convergence. We show that this performance occurs even if the LASSO-based model selection "fails", in the sense of missing some components of the "true" regression model. Furthermore, post-LASSO can perform strictly better than LASSO, in the sense of a strictly faster rate of convergence, if the LASSO-based model selection correctly includes all components of the "true" model as a subset and enough sparsity is obtained. Of course, in the extreme case, when LASSO perfectly selects the true model, the past-LASSO estimator becomes the oracle estimator. We show that the results hold in both parametric and non-parametric models; and by the "true" model we mean the best $s$-dimensional approximation to the true regression model, whe...

  2. Building factorial regression models to explain and predict nitrate concentrations in groundwater under agricultural land

    Science.gov (United States)

    Stigter, T. Y.; Ribeiro, L.; Dill, A. M. M. Carvalho

    2008-07-01

    SummaryFactorial regression models, based on correspondence analysis, are built to explain the high nitrate concentrations in groundwater beneath an agricultural area in the south of Portugal, exceeding 300 mg/l, as a function of chemical variables, electrical conductivity (EC), land use and hydrogeological setting. Two important advantages of the proposed methodology are that qualitative parameters can be involved in the regression analysis and that multicollinearity is avoided. Regression is performed on eigenvectors extracted from the data similarity matrix, the first of which clearly reveals the impact of agricultural practices and hydrogeological setting on the groundwater chemistry of the study area. Significant correlation exists between response variable NO3- and explanatory variables Ca 2+, Cl -, SO42-, depth to water, aquifer media and land use. Substituting Cl - by the EC results in the most accurate regression model for nitrate, when disregarding the four largest outliers (model A). When built solely on land use and hydrogeological setting, the regression model (model B) is less accurate but more interesting from a practical viewpoint, as it is based on easily obtainable data and can be used to predict nitrate concentrations in groundwater in other areas with similar conditions. This is particularly useful for conservative contaminants, where risk and vulnerability assessment methods, based on assumed rather than established correlations, generally produce erroneous results. Another purpose of the models can be to predict the future evolution of nitrate concentrations under influence of changes in land use or fertilization practices, which occur in compliance with policies such as the Nitrates Directive. Model B predicts a 40% decrease in nitrate concentrations in groundwater of the study area, when horticulture is replaced by other land use with much lower fertilization and irrigation rates.

  3. A regression model predicting isometric shoulder muscle activities from arm postures and shoulder joint moments.

    Science.gov (United States)

    Xu, Xu; McGorry, Raymond W; Lin, Jia-Hua

    2014-06-01

    Tissue overloading is a major contributor to shoulder musculoskeletal injuries. Previous studies attempted to use regression-based methods to predict muscle activities from shoulder kinematics and shoulder kinetics. While a regression-based method can address co-contraction of the antagonist muscles as opposed to the optimization method, most of these regression models were based on limited shoulder postures. The purpose of this study was to develop a set of regression equations to predict the 10th percentile, the median, and the 90th percentile of normalized electromyography (nEMG) activities from shoulder postures and net shoulder moments. Forty participants generated various 3-D shoulder moments at 96 static postures. The nEMG of 16 shoulder muscles was measured and the 3-D net shoulder moment was calculated using a static biomechanical model. A stepwise regression was used to derive the regression equations. The results indicated the measured range of the 3-D shoulder moment in this study was similar to those observed during work requiring light physical capacity. The r(2) of all the regression equations ranged between 0.228 and 0.818. For the median of the nEMG, the average r(2) among all 16 muscles was 0.645, and the five muscles with the greatest r(2) were the three deltoids, supraspinatus, and infraspinatus. The results can be used by practitioners to estimate the range of the shoulder muscle activities given a specific arm posture and net shoulder moment. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.

  4. Proposal of a regressive model for the hourly diffuse solar radiation under all sky conditions

    Energy Technology Data Exchange (ETDEWEB)

    Ruiz-Arias, J.A.; Alsamamra, H.; Tovar-Pescador, J.; Pozo-Vazquez, D. [Department of Physics, Building A3-066, University of Jaen, 23071 Jaen (Spain)

    2010-05-15

    In this work, we propose a new regressive model for the estimation of the hourly diffuse solar irradiation under all sky conditions. This new model is based on the sigmoid function and uses the clearness index and the relative optical mass as predictors. The model performance was compared against other five regressive models using radiation data corresponding to 21 stations in the USA and Europe. In a first part, the 21 stations were grouped into seven subregions (corresponding to seven different climatic regions) and all the models were locally-fitted and evaluated using these seven datasets. Results showed that the new proposed model provides slightly better estimates. Particularly, this new model provides a relative root mean square error in the range 25-35% and a relative mean bias error in the range -15% to 15%, depending on the region. In a second part, the potential global character of the new model was evaluated. To this end, the model was fitted using the whole dataset. Results showed that the global fitting model provides overall better estimates that the locally-fitted models, with relative root mean square error values ranging 20-35% and a relative mean bias error ranging -5% to -12%. Additionally, the new proposed model showed some advantages compared to other evaluated models. Particularly, the sigmoid behaviour of this model is able to provide physically reliable estimates for extreme values of the clearness index even though using less parameter than other tested models. (author)

  5. Modelling longevity bonds: Analysing the Swiss Re Kortis bond

    OpenAIRE

    2015-01-01

    A key contribution to the development of the traded market for longevity risk was the issuance of the Kortis bond, the world's first longevity trend bond, by Swiss Re in 2010. We analyse the design of the Kortis bond, develop suitable mortality models to analyse its payoff and discuss the key risk factors for the bond. We also investigate how the design of the Kortis bond can be adapted and extended to further develop the market for longevity risk.

  6. The limiting behavior of the estimated parameters in a misspecified random field regression model

    DEFF Research Database (Denmark)

    Dahl, Christian Møller; Qin, Yu

    convenient new uniform convergence results that we propose. This theory may have applications beyond those presented here. Our results indicate that classical statistical inference techniques, in general, works very well for random field regression models in finite samples and that these models succesfully......This paper examines the limiting properties of the estimated parameters in the random field regression model recently proposed by Hamilton (Econometrica, 2001). Though the model is parametric, it enjoys the flexibility of the nonparametric approach since it can approximate a large collection...... of nonlinear functions and it has the added advantage that there is no "curse of dimensionality."Contrary to existing literature on the asymptotic properties of the estimated parameters in random field models our results do not require that the explanatory variables are sampled on a grid. However...

  7. Predicting heartbeat arrival time for failure detection over internet using auto-regressive exogenous model

    Institute of Scientific and Technical Information of China (English)

    Zhao Haijun; Ma Yan; Huang Xiaohong; Su Yujie

    2008-01-01

    Predicting heartbeat message arrival time is crucial for the quality of failure detection service over internet. However, internet dynamic characteristics make it very difficult to understand message behavior and accurately predict heartbeat arrival time. To solve this problem, a novel black-box model is proposed to predict the next heartbeat arrival time. Heartbeat arrival time is modeled as auto-regressive process, heartbeat sending time is modeled as exogenous variable, the model's coefficients are estimated based on the sliding window of observations and this result is used to predict the next heartbeat arrival time. Simulation shows that this adaptive auto-regressive exogenous (ARX) model can accurately capture heartbeat arrival dynamics and minimize prediction error in different network environments.

  8. Modeling Zero – Inflated Regression of Road Accidents at Johor Federal Road F001

    Directory of Open Access Journals (Sweden)

    Prasetijo Joewono

    2016-01-01

    Full Text Available This study focused on the Poisson regression with excess zero outcomes on the response variable. A generalized linear modelling technique such as Poisson regression model and Negative Binomial model was found to be insignificant in explaining and handle over dispersion which due to high amount of zeros thus Zero Inflated model was introduced to overcome the problem. The application work on the number of road accidents on F001 Jalan Jb – Air Hitam. Data on road accident were collected for five-year period from 2010 through 2014. The result from analysis show that ZINB model performed best, in terms of the comparative criteria based on the P value less than 0.05.

  9. Profile-driven regression for modeling and runtime optimization of mobile networks

    DEFF Research Database (Denmark)

    McClary, Dan; Syrotiuk, Violet; Kulahci, Murat

    2010-01-01

    of throughput in a mobile ad hoc network, a self-organizing collection of mobile wireless nodes without any fixed infrastructure. The intermediate models generated in profile-driven regression are used to fit an overall model of throughput, and are also used to optimize controllable factors at runtime. Unlike......Computer networks often display nonlinear behavior when examined over a wide range of operating conditions. There are few strategies available for modeling such behavior and optimizing such systems as they run. Profile-driven regression is developed and applied to modeling and runtime optimization...... others, the throughput model accounts for node speed. The resulting optimization is very effective; locally optimizing the network factors at runtime results in throughput as much as six times higher than that achieved with the factors at their default levels....

  10. APPLICATION OF PARTIAL LEAST SQUARES REGRESSION FOR AUDIO-VISUAL SPEECH PROCESSING AND MODELING

    Directory of Open Access Journals (Sweden)

    A. L. Oleinik

    2015-09-01

    Full Text Available Subject of Research. The paper deals with the problem of lip region image reconstruction from speech signal by means of Partial Least Squares regression. Such problems arise in connection with development of audio-visual speech processing methods. Audio-visual speech consists of acoustic and visual components (called modalities. Applications of audio-visual speech processing methods include joint modeling of voice and lips’ movement dynamics, synchronization of audio and video streams, emotion recognition, liveness detection. Method. Partial Least Squares regression was applied to solve the posed problem. This method extracts components of initial data with high covariance. These components are used to build regression model. Advantage of this approach lies in the possibility of achieving two goals: identification of latent interrelations between initial data components (e.g. speech signal and lip region image and approximation of initial data component as a function of another one. Main Results. Experimental research on reconstruction of lip region images from speech signal was carried out on VidTIMIT audio-visual speech database. Results of the experiment showed that Partial Least Squares regression is capable of solving reconstruction problem. Practical Significance. Obtained findings give the possibility to assert that Partial Least Squares regression is successfully applicable for solution of vast variety of audio-visual speech processing problems: from synchronization of audio and video streams to liveness detection.

  11. Significance tests to determine the direction of effects in linear regression models.

    Science.gov (United States)

    Wiedermann, Wolfgang; Hagmann, Michael; von Eye, Alexander

    2015-02-01

    Previous studies have discussed asymmetric interpretations of the Pearson correlation coefficient and have shown that higher moments can be used to decide on the direction of dependence in the bivariate linear regression setting. The current study extends this approach by illustrating that the third moment of regression residuals may also be used to derive conclusions concerning the direction of effects. Assuming non-normally distributed variables, it is shown that the distribution of residuals of the correctly specified regression model (e.g., Y is regressed on X) is more symmetric than the distribution of residuals of the competing model (i.e., X is regressed on Y). Based on this result, 4 one-sample tests are discussed which can be used to decide which variable is more likely to be the response and which one is more likely to be the explanatory variable. A fifth significance test is proposed based on the differences of skewness estimates, which leads to a more direct test of a hypothesis that is compatible with direction of dependence. A Monte Carlo simulation study was performed to examine the behaviour of the procedures under various degrees of associations, sample sizes, and distributional properties of the underlying population. An empirical example is given which illustrates the application of the tests in practice.

  12. Predicting 30-day Hospital Readmission with Publicly Available Administrative Database. A Conditional Logistic Regression Modeling Approach.

    Science.gov (United States)

    Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P

    2015-01-01

    This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of

  13. Goodness-of-fit tests and model diagnostics for negative binomial regression of RNA sequencing data.

    Directory of Open Access Journals (Sweden)

    Gu Mi

    Full Text Available This work is about assessing model adequacy for negative binomial (NB regression, particularly (1 assessing the adequacy of the NB assumption, and (2 assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models.

  14. Goodness-of-fit tests and model diagnostics for negative binomial regression of RNA sequencing data.

    Science.gov (United States)

    Mi, Gu; Di, Yanming; Schafer, Daniel W

    2015-01-01

    This work is about assessing model adequacy for negative binomial (NB) regression, particularly (1) assessing the adequacy of the NB assumption, and (2) assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq) data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models.

  15. Logistic regression.

    Science.gov (United States)

    Nick, Todd G; Campbell, Kathleen M

    2007-01-01

    The Medical Subject Headings (MeSH) thesaurus used by the National Library of Medicine defines logistic regression models as "statistical models which describe the relationship between a qualitative dependent variable (that is, one which can take only certain discrete values, such as the presence or absence of a disease) and an independent variable." Logistic regression models are used to study effects of predictor variables on categorical outcomes and normally the outcome is binary, such as presence or absence of disease (e.g., non-Hodgkin's lymphoma), in which case the model is called a binary logistic model. When there are multiple predictors (e.g., risk factors and treatments) the model is referred to as a multiple or multivariable logistic regression model and is one of the most frequently used statistical model in medical journals. In this chapter, we examine both simple and multiple binary logistic regression models and present related issues, including interaction, categorical predictor variables, continuous predictor variables, and goodness of fit.

  16. Regression models of ultimate methane yields of fruits and vegetable solid wastes, sorghum and napiergrass on chemical composition

    Energy Technology Data Exchange (ETDEWEB)

    Gunaseelan, V.N. [PSG College of Arts and Science, Coimbatore (India). Department of Zoology

    2007-04-15

    Several fractions of fruits and vegetable solid wastes (FVSW), sorghum and napiergrass were analyzed for total solids (TS), volatile solids (VS), total organic carbon, total kjeldahl nitrogen, total soluble carbohydrate, extractable protein, acid-detergent fiber (ADF), lignin, cellulose and ash contents. Their ultimate methane yields (B{sub o}) were determined using the biochemical methane potential (BMP) assay. A series of simple and multiple regression models relating the B{sub o} to the various substrate constituents were generated and evaluated using computer statistical software, Statistical Package for Social Sciences (SPSS). The results of simple regression analyses revealed that, only weak relationship existed between the individual components such as carbohydrate, protein, ADF, lignin and cellulose versus B{sub o}. A regression of B{sub o} versus combination of two variables as a single independent variable such as carbohydrate/ADF and carbohydrate + protein/ADF also showed that the relationship is not strong. Thus it does not appear possible to relate the B{sub o} of FVSW, sorghum and napiergrass with single compositional characteristics. The results of multiple regression analyses showed promise and the relationship appeared to be good. When ADF and lignin/ADF were used as independent variables, the percentage of variation accounted for by the model is low for FVSW (r{sup 2}=0.665) and sorghum and napiergrass (r{sup 2}=0.746). Addition of nitrogen, ash and total soluble carbohydrate data to the model had a significantly higher effect on prediction of B{sub o} of these wastes with the r{sup 2} values ranging from 0.9 to 0.99. More than 90% of variation in B{sub o} of FVSW could be accounted for by the models when the variables carbohydrate, lignin, lignin/ADF, nitrogen and ash (r{sup 2}=0.904), carbohydrate, ADF, lignin/ADF, nitrogen and ash (r{sup 2}=0.90) and carbohydrate/ADF, lignin/ADF, lignin and ash (r{sup 2}=0.901) were used. All the models have

  17. Improved variance estimation of maximum likelihood estimators in stable first-order dynamic regression models

    NARCIS (Netherlands)

    Kiviet, J.F.; Phillips, G.D.A.

    2014-01-01

    In dynamic regression models conditional maximum likelihood (least-squares) coefficient and variance estimators are biased. Using expansion techniques an approximation is obtained to the bias in variance estimation yielding a bias corrected variance estimator. This is achieved for both the standard

  18. Modeling protein tandem mass spectrometry data with an extended linear regression strategy.

    Science.gov (United States)

    Liu, Han; Bonner, Anthony J; Emili, Andrew

    2004-01-01

    Tandem mass spectrometry (MS/MS) has emerged as a cornerstone of proteomics owing in part to robust spectral interpretation algorithm. The intensity patterns presented in mass spectra are useful information for identification of peptides and proteins. However, widely used algorithms can not predicate the peak intensity patterns exactly. We have developed a systematic analytical approach based on a family of extended regression models, which permits routine, large scale protein expression profile modeling. By proving an important technical result that the regression coefficient vector is just the eigenvector corresponding to the least eigenvalue of a space transformed version of the original data, this extended regression problem can be reduced to a SVD decomposition problem, thus gain the robustness and efficiency. To evaluate the performance of our model, from 60,960 spectra, we chose 2,859 with high confidence, non redundant matches as training data, based on this specific problem, we derived some measurements of goodness of fit to show that our modeling method is reasonable. The issues of overfitting and underfitting are also discussed. This extended regression strategy therefore offers an effective and efficient framework for in-depth investigation of complex mammalian proteomes.

  19. Simple multiple regression model for long range forecasting of Indian summer monsoon rainfall

    Digital Repository Service at National Institute of Oceanography (India)

    Sadhuram, Y.; Murthy, T.V.R.

    ) and ISMR is found to be 0.62. The multiple correlation using the above two parameters is 0.85 which explains 72% variance in ISMR. Using the above two parameters a linear multiple regression model to predict ISMR is developed. The results are comparable...

  20. Climate Impacts on Chinese Corn Yields: A Fractional Polynomial Regression Model

    NARCIS (Netherlands)

    Kooten, van G.C.; Sun, Baojing

    2012-01-01

    In this study, we examine the effect of climate on corn yields in northern China using data from ten districts in Inner Mongolia and two in Shaanxi province. A regression model with a flexible functional form is specified, with explanatory variables that include seasonal growing degree days,

  1. A Regression Solution to Cason and Cason's Model of Clinical Performance Rating: Easier, Cheaper, Faster.

    Science.gov (United States)

    Cason, Gerald J.; Cason, Carolyn L.

    A more familiar and efficient method for estimating the parameters of Cason and Cason's model was examined. Using a two-step analysis based on linear regression, rather than the direct search interative procedure, gave about equally good results while providing a 33 to 1 computer processing time advantage, across 14 cohorts of junior medical…

  2. FRICTION MODELING OF Al-Mg ALLOY SHEETS BASED ON MULTIPLE REGRESSION ANALYSIS AND NEURAL NETWORKS

    Directory of Open Access Journals (Sweden)

    Hirpa G. Lemu

    2017-03-01

    Full Text Available This article reports a proposed approach to a frictional resistance description in sheet metal forming processes that enables determination of the friction coefficient value under a wide range of friction conditions without performing time-consuming experiments. The motivation for this proposal is the fact that there exists a considerable amount of factors affect the friction coefficient value and as a result building analytical friction model for specified process conditions is practically impossible. In this proposed approach, a mathematical model of friction behaviour is created using multiple regression analysis and artificial neural networks. The regression analysis was performed using a subroutine in MATLAB programming code and STATISTICA Neural Networks was utilized to build an artificial neural networks model. The effect of different training strategies on the quality of neural networks was studied. As input variables for regression model and training of radial basis function networks, generalized regression neural networks and multilayer networks the results of strip drawing friction test were utilized. Four kinds of Al-Mg alloy sheets were used as a test material.

  3. Sieve M-estimation for semiparametric varying-coefficient partially linear regression model

    Institute of Scientific and Technical Information of China (English)

    2010-01-01

    This article considers a semiparametric varying-coefficient partially linear regression model.The semiparametric varying-coefficient partially linear regression model which is a generalization of the partially linear regression model and varying-coefficient regression model that allows one to explore the possibly nonlinear effect of a certain covariate on the response variable.A sieve M-estimation method is proposed and the asymptotic properties of the proposed estimators are discussed.Our main object is to estimate the nonparametric component and the unknown parameters simultaneously.It is easier to compute and the required computation burden is much less than the existing two-stage estimation method.Furthermore,the sieve M-estimation is robust in the presence of outliers if we choose appropriate ρ(·).Under some mild conditions,the estimators are shown to be strongly consistent;the convergence rate of the estimator for the unknown nonparametric component is obtained and the estimator for the unknown parameter is shown to be asymptotically normally distributed.Numerical experiments are carried out to investigate the performance of the proposed method.

  4. Multiple regression models for the prediction of the maximum obtainable thermal efficiency of organic Rankine cycles

    DEFF Research Database (Denmark)

    Larsen, Ulrik; Pierobon, Leonardo; Wronski, Jorrit;

    2014-01-01

    to power. In this study we propose four linear regression models to predict the maximum obtainable thermal efficiency for simple and recuperated ORCs. A previously derived methodology is able to determine the maximum thermal efficiency among many combinations of fluids and processes, given the boundary...

  5. Using ROC curves to compare neural networks and logistic regression for modeling individual noncatastrophic tree mortality

    Science.gov (United States)

    Susan L. King

    2003-01-01

    The performance of two classifiers, logistic regression and neural networks, are compared for modeling noncatastrophic individual tree mortality for 21 species of trees in West Virginia. The output of the classifier is usually a continuous number between 0 and 1. A threshold is selected between 0 and 1 and all of the trees below the threshold are classified as...

  6. The Performance of the Full Information Maximum Likelihood Estimator in Multiple Regression Models with Missing Data.

    Science.gov (United States)

    Enders, Craig K.

    2001-01-01

    Examined the performance of a recently available full information maximum likelihood (FIML) estimator in a multiple regression model with missing data using Monte Carlo simulation and considering the effects of four independent variables. Results indicate that FIML estimation was superior to that of three ad hoc techniques, with less bias and less…

  7. Strategies for Testing Statistical and Practical Significance in Detecting DIF with Logistic Regression Models

    Science.gov (United States)

    Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza

    2014-01-01

    This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…

  8. [Prediction model of health workforce and beds in county hospitals of Hunan by multiple linear regression].

    Science.gov (United States)

    Ling, Ru; Liu, Jiawang

    2011-12-01

    To construct prediction model for health workforce and hospital beds in county hospitals of Hunan by multiple linear regression. We surveyed 16 counties in Hunan with stratified random sampling according to uniform questionnaires,and multiple linear regression analysis with 20 quotas selected by literature view was done. Independent variables in the multiple linear regression model on medical personnels in county hospitals included the counties' urban residents' income, crude death rate, medical beds, business occupancy, professional equipment value, the number of devices valued above 10 000 yuan, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, and utilization rate of hospital beds. Independent variables in the multiple linear regression model on county hospital beds included the the population of aged 65 and above in the counties, disposable income of urban residents, medical personnel of medical institutions in county area, business occupancy, the total value of professional equipment, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, utilization rate of hospital beds, and length of hospitalization. The prediction model shows good explanatory and fitting, and may be used for short- and mid-term forecasting.

  9. Specific features of modelling rules of monetary policy on the basis of hybrid regression models with a neural component

    Directory of Open Access Journals (Sweden)

    Lukianenko Iryna H.

    2014-01-01

    Full Text Available The article considers possibilities and specific features of modelling economic phenomena with the help of the category of models that unite elements of econometric regressions and artificial neural networks. This category of models contains auto-regression neural networks (AR-NN, regressions of smooth transition (STR/STAR, multi-mode regressions of smooth transition (MRSTR/MRSTAR and smooth transition regressions with neural coefficients (NCSTR/NCSTAR. Availability of the neural network component allows models of this category achievement of a high empirical authenticity, including reproduction of complex non-linear interrelations. On the other hand, the regression mechanism expands possibilities of interpretation of the obtained results. An example of multi-mode monetary rule is used to show one of the cases of specification and interpretation of this model. In particular, the article models and interprets principles of management of the UAH exchange rate that come into force when economy passes from a relatively stable into a crisis state.

  10. Neural Network and Regression Soft Model Extended for PAX-300 Aircraft Engine

    Science.gov (United States)

    Patnaik, Surya N.; Hopkins, Dale A.

    2002-01-01

    In fiscal year 2001, the neural network and regression capabilities of NASA Glenn Research Center's COMETBOARDS design optimization testbed were extended to generate approximate models for the PAX-300 aircraft engine. The analytical model of the engine is defined through nine variables: the fan efficiency factor, the low pressure of the compressor, the high pressure of the compressor, the high pressure of the turbine, the low pressure of the turbine, the operating pressure, and three critical temperatures (T(sub 4), T(sub vane), and T(sub metal)). Numerical Propulsion System Simulation (NPSS) calculations of the specific fuel consumption (TSFC), as a function of the variables can become time consuming, and numerical instabilities can occur during these design calculations. "Soft" models can alleviate both deficiencies. These approximate models are generated from a set of high-fidelity input-output pairs obtained from the NPSS code and a design of the experiment strategy. A neural network and a regression model with 45 weight factors were trained for the input/output pairs. Then, the trained models were validated through a comparison with the original NPSS code. Comparisons of TSFC versus the operating pressure and of TSFC versus the three temperatures (T(sub 4), T(sub vane), and T(sub metal)) are depicted in the figures. The overall performance was satisfactory for both the regression and the neural network model. The regression model required fewer calculations than the neural network model, and it produced marginally superior results. Training the approximate methods is time consuming. Once trained, the approximate methods generated the solution with only a trivial computational effort, reducing the solution time from hours to less than a minute.

  11. An empirical approach to update multivariate regression models intended for routine industrial use

    Energy Technology Data Exchange (ETDEWEB)

    Garcia-Mencia, M.V.; Andrade, J.M.; Lopez-Mahia, P.; Prada, D. [University of La Coruna, La Coruna (Spain). Dept. of Analytical Chemistry

    2000-11-01

    Many problems currently tackled by analysts are highly complex and, accordingly, multivariate regression models need to be developed. Two intertwined topics are important when such models are to be applied within the industrial routines: (1) Did the model account for the 'natural' variance of the production samples? (2) Is the model stable on time? This paper focuses on the second topic and it presents an empirical approach where predictive models developed by using Mid-FTIR and PLS and PCR hold its utility during about nine months when used to predict the octane number of platforming naphthas in a petrochemical refinery. 41 refs., 10 figs., 1 tab.

  12. BOOTSTRAP WAVELET IN THE NONPARAMETRIC REGRESSION MODEL WITH WEAKLY DEPENDENT PROCESSES

    Institute of Scientific and Technical Information of China (English)

    林路; 张润楚

    2004-01-01

    This paper introduces a method of bootstrap wavelet estimation in a nonparametric regression model with weakly dependent processes for both fixed and random designs. The asymptotic bounds for the bias and variance of the bootstrap wavelet estimators are given in the fixed design model. The conditional normality for a modified version of the bootstrap wavelet estimators is obtained in the fixed model. The consistency for the bootstrap wavelet estimator is also proved in the random design model. These results show that the bootstrap wavelet method is valid for the model with weakly dependent processes.

  13. A brief introduction to regression designs and mixed-effects modelling by a recent convert

    DEFF Research Database (Denmark)

    Balling, Laura Winther

    2008-01-01

    This article discusses the advantages of multiple regression designs over the factorial designs traditionally used in many psycholinguistic experiments. It is shown that regression designs are typically more informative, statistically more powerful and better suited to the analysis of naturalistic...... tasks. The advantages of including both fixed and random effects are demonstrated with reference to linear mixed-effects models, and problems of collinearity, variable distribution and variable selection are discussed. The advantages of these techniques are exemplified in an analysis of a word...

  14. Iterative Weighted Semiparametric Least Squares Estimation in Repeated Measurement Partially Linear Regression Models

    Institute of Scientific and Technical Information of China (English)

    Ge-mai Chen; Jin-hong You

    2005-01-01

    Consider a repeated measurement partially linear regression model with an unknown vector pasemiparametric generalized least squares estimator (SGLSE) ofβ, we propose an iterative weighted semiparametric least squares estimator (IWSLSE) and show that it improves upon the SGLSE in terms of asymptotic covariance matrix. An adaptive procedure is given to determine the number of iterations. We also show that when the number of replicates is less than or equal to two, the IWSLSE can not improve upon the SGLSE.These results are generalizations of those in [2] to the case of semiparametric regressions.

  15. Regressions by leaps and bounds and biased estimation techniques in yield modeling

    Science.gov (United States)

    Marquina, N. E. (Principal Investigator)

    1979-01-01

    The author has identified the following significant results. It was observed that OLS was not adequate as an estimation procedure when the independent or regressor variables were involved in multicollinearities. This was shown to cause the presence of small eigenvalues of the extended correlation matrix A'A. It was demonstrated that the biased estimation techniques and the all-possible subset regression could help in finding a suitable model for predicting yield. Latent root regression was an excellent tool that found how many predictive and nonpredictive multicollinearities there were.

  16. A Study of Wind Statistics Through Auto-Regressive and Moving-Average (ARMA) Modeling

    Institute of Scientific and Technical Information of China (English)

    尹彰; 周宗仁

    2001-01-01

    Statistical properties of winds near the Taichung Harbour are investigated. The 26 years′incomplete data of wind speeds, measured on an hourly basis, are used as reference. The possibility of imputation using simulated results of the Auto-Regressive (AR), Moving-Average (MA), and/or Auto-Regressive and Moving-Average (ARMA) models is studied. Predictions of the 25-year extreme wind speeds based upon the augmented data are compared with the original series. Based upon the results, predictions of the 50- and 100-year extreme wind speeds are then made.

  17. Comparison of regression models for estimation of isometric wrist joint torques using surface electromyography

    Directory of Open Access Journals (Sweden)

    Menon Carlo

    2011-09-01

    Full Text Available Abstract Background Several regression models have been proposed for estimation of isometric joint torque using surface electromyography (SEMG signals. Common issues related to torque estimation models are degradation of model accuracy with passage of time, electrode displacement, and alteration of limb posture. This work compares the performance of the most commonly used regression models under these circumstances, in order to assist researchers with identifying the most appropriate model for a specific biomedical application. Methods Eleven healthy volunteers participated in this study. A custom-built rig, equipped with a torque sensor, was used to measure isometric torque as each volunteer flexed and extended his wrist. SEMG signals from eight forearm muscles, in addition to wrist joint torque data were gathered during the experiment. Additional data were gathered one hour and twenty-four hours following the completion of the first data gathering session, for the purpose of evaluating the effects of passage of time and electrode displacement on accuracy of models. Acquired SEMG signals were filtered, rectified, normalized and then fed to models for training. Results It was shown that mean adjusted coefficient of determination (Ra2 values decrease between 20%-35% for different models after one hour while altering arm posture decreased mean Ra2 values between 64% to 74% for different models. Conclusions Model estimation accuracy drops significantly with passage of time, electrode displacement, and alteration of limb posture. Therefore model retraining is crucial for preserving estimation accuracy. Data resampling can significantly reduce model training time without losing estimation accuracy. Among the models compared, ordinary least squares linear regression model (OLS was shown to have high isometric torque estimation accuracy combined with very short training times.

  18. The applicability of linear regression models in working environments' thermal evaluation.

    Directory of Open Access Journals (Sweden)

    Pablo Adamoglu de Oliveira

    2006-04-01

    Full Text Available The simultaneous analysis of thermal variables with normal distribution with the aim of checking if there is any significative correlation among them or if there is the possibility of making predictions of the values of some of them based on others’ values is considered a problem of great importance in statistics studies. The aim of this paper is to study the applicability of linear regression models in working environments’ thermal comfort studies, thus contributing for the comprehension of the possible environmental cooling, heating or winding needs. It starts with a bibliographical research, followed by a field research, data collection and and software statistical-mathematical data treatment. It was then performed data analysis and the construction of the regression linear models using the t and F tests for determining the consistency of the models and their parameters, as well as the building of conclusions based on the information obtained and on the significance of the mathematical models built.

  19. Application of artificial neural engineering and regression models for forecasting shelf life of instant coffee drink

    Directory of Open Access Journals (Sweden)

    Sumit Goyal

    2011-07-01

    Full Text Available Coffee as beverage is prepared from the roasted seeds (beans of the coffee plant. Coffee is the second most important product in the international market in terms of volume trade and the most important in terms of value. Artificial neural engineering and regression models were developed to predict shelf life of instant coffee drink. Colour and appearance, flavour, viscosity and sediment were used as input parameters. Overall acceptability was used as output parameter. The dataset consisted of experimentally developed 50 observations. The dataset was divided into two disjoint subsets, namely, training set containing 40 observations (80% of total observations and test set comprising of 10 observations (20% of total observations. The network was trained with 500 epochs. Neural network toolbox under Matlab 7.0 software was used for training the models. From the investigation it was revealed that multiple linear regression model was superior over radial basis model for forecasting shelf life of instant coffee drink.

  20. A multivariate linear regression model for the Jordanian industrial electric energy consumption

    Energy Technology Data Exchange (ETDEWEB)

    Al-Ghandoor, A.; Nahleh, Y.A.; Sandouqa, Y.; Al-Salaymeh, M. [Hashemite Univ., Zarqa (Jordan). Dept. of Industrial Engineering

    2007-08-09

    The amount of electricity used by the industrial sector in Jordan is an important driver for determining the future energy needs of the country. This paper proposed a model to simulate electricity and energy consumption by industry. The general model approach was based on multivariate regression analysis to provide valuable information regarding energy demands and analysis, and to identify the various factors that influence Jordanian industrial electricity consumption. It was determined that industrial gross output and capacity utilization are the most important variables that drive electricity consumption. The results revealed that the multivariate linear regression model can be used to adequately model the Jordanian industrial electricity consumption with coefficient of determination (R2) and adjusted R2 values of 99.3 and 99.2 per cent, respectively. 19 refs., 4 tabs., 2 figs.

  1. Floating Car Data Based Nonparametric Regression Model for Short-Term Travel Speed Prediction

    Institute of Scientific and Technical Information of China (English)

    WENG Jian-cheng; HU Zhong-wei; YU Quan; REN Fu-tian

    2007-01-01

    A K-nearest neighbor (K-NN) based nonparametric regression model was proposed to predict travel speed for Beijing expressway. By using the historical traffic data collected from the detectors in Beijing expressways, a specically designed database was developed via the processes including data filtering, wavelet analysis and clustering. The relativity based weighted Euclidean distance was used as the distance metric to identify the K groups of nearest data series. Then, a K-NN nonparametric regression model was built to predict the average travel speeds up to 6 min into the future. Several randomly selected travel speed data series,collected from the floating car data (FCD) system, were used to validate the model. The results indicate that using the FCD, the model can predict average travel speeds with an accuracy of above 90%, and hence is feasible and effective.

  2. Exploratory regression analysis: a tool for selecting models and determining predictor importance.

    Science.gov (United States)

    Braun, Michael T; Oswald, Frederick L

    2011-06-01

    Linear regression analysis is one of the most important tools in a researcher's toolbox for creating and testing predictive models. Although linear regression analysis indicates how strongly a set of predictor variables, taken together, will predict a relevant criterion (i.e., the multiple R), the analysis cannot indicate which predictors are the most important. Although there is no definitive or unambiguous method for establishing predictor variable importance, there are several accepted methods. This article reviews those methods for establishing predictor importance and provides a program (in Excel) for implementing them (available for direct download at http://dl.dropbox.com/u/2480715/ERA.xlsm?dl=1) . The program investigates all 2(p) - 1 submodels and produces several indices of predictor importance. This exploratory approach to linear regression, similar to other exploratory data analysis techniques, has the potential to yield both theoretical and practical benefits.

  3. Bayesian Method of Moments (BMOM) Analysis of Mean and Regression Models

    CERN Document Server

    Zellner, Arnold

    2008-01-01

    A Bayesian method of moments/instrumental variable (BMOM/IV) approach is developed and applied in the analysis of the important mean and multiple regression models. Given a single set of data, it is shown how to obtain posterior and predictive moments without the use of likelihood functions, prior densities and Bayes' Theorem. The posterior and predictive moments, based on a few relatively weak assumptions, are then used to obtain maximum entropy densities for parameters, realized error terms and future values of variables. Posterior means for parameters and realized error terms are shown to be equal to certain well known estimates and rationalized in terms of quadratic loss functions. Conditional maxent posterior densities for means and regression coefficients given scale parameters are in the normal form while scale parameters' maxent densities are in the exponential form. Marginal densities for individual regression coefficients, realized error terms and future values are in the Laplace or double-exponenti...

  4. A note on constrained M-estimation and its recursive analog in multivariate linear regression models

    Institute of Scientific and Technical Information of China (English)

    RAO; Calyampudi; R

    2009-01-01

    In this paper,the constrained M-estimation of the regression coeffcients and scatter parameters in a general multivariate linear regression model is considered.Since the constrained M-estimation is not easy to compute,an up-dating recursion procedure is proposed to simplify the com-putation of the estimators when a new observation is obtained.We show that,under mild conditions,the recursion estimates are strongly consistent.In addition,the asymptotic normality of the recursive constrained M-estimators of regression coeffcients is established.A Monte Carlo simulation study of the recursion estimates is also provided.Besides,robustness and asymptotic behavior of constrained M-estimators are briefly discussed.

  5. Fatigue design of a cellular phone folder using regression model-based multi-objective optimization

    Science.gov (United States)

    Kim, Young Gyun; Lee, Jongsoo

    2016-08-01

    In a folding cellular phone, the folding device is repeatedly opened and closed by the user, which eventually results in fatigue damage, particularly to the front of the folder. Hence, it is important to improve the safety and endurance of the folder while also reducing its weight. This article presents an optimal design for the folder front that maximizes its fatigue endurance while minimizing its thickness. Design data for analysis and optimization were obtained experimentally using a test jig. Multi-objective optimization was carried out using a nonlinear regression model. Three regression methods were employed: back-propagation neural networks, logistic regression and support vector machines. The AdaBoost ensemble technique was also used to improve the approximation. Two-objective Pareto-optimal solutions were identified using the non-dominated sorting genetic algorithm (NSGA-II). Finally, a numerically optimized solution was validated against experimental product data, in terms of both fatigue endurance and thickness index.

  6. A quantile regression approach for modelling a Health-Related Quality of Life Measure

    Directory of Open Access Journals (Sweden)

    Giulia Cavrini

    2013-05-01

    Full Text Available Objective. The aim of this study is to propose a new approach for modeling the EQ-5D index and EQ-5D VAS in order to explain the lifestyle determinants effect using the quantile regression analysis. Methods. Data was collected within a cross-sectional study that involved a probabilistic sample of 1,622 adults randomly selected from the population register of two Health Authorities of Bologna in northern Italy. The perceived health status of people was measured using the EQ-5D questionnaire. The Visual Analogue Scale included in the EQ-5D Questionnaire, the EQ-VAS, and the EQ-5D index were used to obtain the synthetic measures of quality of life. To model EQ-VAS Score and EQ-5D index, a quantile regression analysis was employed. Quantile Regression is a way to estimate the conditional quantiles of the VAS Score distribution in a linear model, in order to have a more complete view of possible associations between a measure of Health Related Quality of Life (dependent variable and socio-demographic and determinants data. This methodological approach was preferred to an OLS regression because of the EQ-VAS Score and EQ-5D index typical distribution. Main Results. The analysis suggested that age, gender, and comorbidity can explain variability in perceived health status measured by the EQ-5D index and the VAS.

  7. Comparison of a Bayesian Network with a Logistic Regression Model to Forecast IgA Nephropathy

    Directory of Open Access Journals (Sweden)

    Michel Ducher

    2013-01-01

    Full Text Available Models are increasingly used in clinical practice to improve the accuracy of diagnosis. The aim of our work was to compare a Bayesian network to logistic regression to forecast IgA nephropathy (IgAN from simple clinical and biological criteria. Retrospectively, we pooled the results of all biopsies (n=155 performed by nephrologists in a specialist clinical facility between 2002 and 2009. Two groups were constituted at random. The first subgroup was used to determine the parameters of the models adjusted to data by logistic regression or Bayesian network, and the second was used to compare the performances of the models using receiver operating characteristics (ROC curves. IgAN was found (on pathology in 44 patients. Areas under the ROC curves provided by both methods were highly significant but not different from each other. Based on the highest Youden indices, sensitivity reached (100% versus 67% and specificity (73% versus 95% using the Bayesian network and logistic regression, respectively. A Bayesian network is at least as efficient as logistic regression to estimate the probability of a patient suffering IgAN, using simple clinical and biological data obtained during consultation.

  8. Comparison of a Bayesian network with a logistic regression model to forecast IgA nephropathy.

    Science.gov (United States)

    Ducher, Michel; Kalbacher, Emilie; Combarnous, François; Finaz de Vilaine, Jérome; McGregor, Brigitte; Fouque, Denis; Fauvel, Jean Pierre

    2013-01-01

    Models are increasingly used in clinical practice to improve the accuracy of diagnosis. The aim of our work was to compare a Bayesian network to logistic regression to forecast IgA nephropathy (IgAN) from simple clinical and biological criteria. Retrospectively, we pooled the results of all biopsies (n = 155) performed by nephrologists in a specialist clinical facility between 2002 and 2009. Two groups were constituted at random. The first subgroup was used to determine the parameters of the models adjusted to data by logistic regression or Bayesian network, and the second was used to compare the performances of the models using receiver operating characteristics (ROC) curves. IgAN was found (on pathology) in 44 patients. Areas under the ROC curves provided by both methods were highly significant but not different from each other. Based on the highest Youden indices, sensitivity reached (100% versus 67%) and specificity (73% versus 95%) using the Bayesian network and logistic regression, respectively. A Bayesian network is at least as efficient as logistic regression to estimate the probability of a patient suffering IgAN, using simple clinical and biological data obtained during consultation.

  9. Passenger Flow Prediction of Subway Transfer Stations Based on Nonparametric Regression Model

    Directory of Open Access Journals (Sweden)

    Yujuan Sun

    2014-01-01

    Full Text Available Passenger flow is increasing dramatically with accomplishment of subway network system in big cities of China. As convergence nodes of subway lines, transfer stations need to assume more passengers due to amount transfer demand among different lines. Then, transfer facilities have to face great pressure such as pedestrian congestion or other abnormal situations. In order to avoid pedestrian congestion or warn the management before it occurs, it is very necessary to predict the transfer passenger flow to forecast pedestrian congestions. Thus, based on nonparametric regression theory, a transfer passenger flow prediction model was proposed. In order to test and illustrate the prediction model, data of transfer passenger flow for one month in XIDAN transfer station were used to calibrate and validate the model. By comparing with Kalman filter model and support vector machine regression model, the results show that the nonparametric regression model has the advantages of high accuracy and strong transplant ability and could predict transfer passenger flow accurately for different intervals.

  10. Reliability based design optimization of concrete mix proportions using generalized ridge regression model

    Directory of Open Access Journals (Sweden)

    Rachna Aggarwal

    2014-12-01

    Full Text Available This paper presents Reliability Based Design Optimization (RBDO model to deal with uncertainties involved in concrete mix design process. The optimization problem is formulated in such a way that probabilistic concrete mix input parameters showing random characteristics are determined by minimizing the cost of concrete subjected to concrete compressive strength constraint for a given target reliability.  Linear and quadratic models based on Ordinary Least Square Regression (OLSR, Traditional Ridge Regression (TRR and Generalized Ridge Regression (GRR techniques have been explored to select the best model to explicitly represent compressive strength of concrete. The RBDO model is solved by Sequential Optimization and Reliability Assessment (SORA method using fully quadratic GRR model. Optimization results for a wide range of target compressive strength and reliability levels of 0.90, 0.95 and 0.99 have been reported. Also, safety factor based Deterministic Design Optimization (DDO designs for each case are obtained. It has been observed that deterministic optimal designs are cost effective but proposed RBDO model gives improved design performance.

  11. A general binomial regression model to estimate standardized risk differences from binary response data.

    Science.gov (United States)

    Kovalchik, Stephanie A; Varadhan, Ravi; Fetterman, Barbara; Poitras, Nancy E; Wacholder, Sholom; Katki, Hormuzd A

    2013-02-28

    Estimates of absolute risks and risk differences are necessary for evaluating the clinical and population impact of biomedical research findings. We have developed a linear-expit regression model (LEXPIT) to incorporate linear and nonlinear risk effects to estimate absolute risk from studies of a binary outcome. The LEXPIT is a generalization of both the binomial linear and logistic regression models. The coefficients of the LEXPIT linear terms estimate adjusted risk differences, whereas the exponentiated nonlinear terms estimate residual odds ratios. The LEXPIT could be particularly useful for epidemiological studies of risk association, where adjustment for multiple confounding variables is common. We present a constrained maximum likelihood estimation algorithm that ensures the feasibility of risk estimates of the LEXPIT model and describe procedures for defining the feasible region of the parameter space, judging convergence, and evaluating boundary cases. Simulations demonstrate that the methodology is computationally robust and yields feasible, consistent estimators. We applied the LEXPIT model to estimate the absolute 5-year risk of cervical precancer or cancer associated with different Pap and human papillomavirus test results in 167,171 women undergoing screening at Kaiser Permanente Northern California. The LEXPIT model found an increased risk due to abnormal Pap test in human papillomavirus-negative that was not detected with logistic regression. Our R package blm provides free and easy-to-use software for fitting the LEXPIT model.

  12. Microstructural white matter changes in normal aging: a diffusion tensor imaging study with higher-order polynomial regression models.

    Science.gov (United States)

    Hsu, Jung-Lung; Van Hecke, Wim; Bai, Chyi-Huey; Lee, Cheng-Hui; Tsai, Yuh-Feng; Chiu, Hou-Chang; Jaw, Fu-Shan; Hsu, Chien-Yeh; Leu, Jyu-Gang; Chen, Wei-Hung; Leemans, Alexander

    2010-01-01

    Diffusion tensor imaging (DTI) has already proven to be a valuable tool when investigating both global and regional microstructural white matter (WM) brain changes in the human aging process. Although subject to many criticisms, voxel-based analysis is currently one of the most common and preferred approaches in such DTI aging studies. In this context, voxel-based DTI analyses have assumed a 'linear' correlation when finding the significant brain regions that relate age with a particular diffusion measure of interest. Recent literature, however, has clearly demonstrated 'non-linear' relationships between age and diffusion metrics by using region-of-interest and tractography-based approaches. In this work, we incorporated polynomial regression models in the voxel-based DTI analysis framework to assess age-related changes in WM diffusion properties (fractional anisotropy and axial, transverse, and mean diffusivity) in a large cohort of 346 subjects (25 to 81 years old). Our novel approach clearly demonstrates that voxel-based DTI analyses can greatly benefit from incorporating higher-order regression models when investigating potential relationships between aging and diffusion properties.

  13. Random regression models to estimate genetic parameters for milk production of Guzerat cows using orthogonal Legendre polynomials

    Directory of Open Access Journals (Sweden)

    Maria Gabriela Campolina Diniz Peixoto

    2014-05-01

    Full Text Available The objective of this work was to compare random regression models for the estimation of genetic parameters for Guzerat milk production, using orthogonal Legendre polynomials. Records (20,524 of test-day milk yield (TDMY from 2,816 first-lactation Guzerat cows were used. TDMY grouped into 10-monthly classes were analyzed for additive genetic effect and for environmental and residual permanent effects (random effects, whereas the contemporary group, calving age (linear and quadratic effects and mean lactation curve were analized as fixed effects. Trajectories for the additive genetic and permanent environmental effects were modeled by means of a covariance function employing orthogonal Legendre polynomials ranging from the second to the fifth order. Residual variances were considered in one, four, six, or ten variance classes. The best model had six residual variance classes. The heritability estimates for the TDMY records varied from 0.19 to 0.32. The random regression model that used a second-order Legendre polynomial for the additive genetic effect, and a fifth-order polynomial for the permanent environmental effect is adequate for comparison by the main employed criteria. The model with a second-order Legendre polynomial for the additive genetic effect, and that with a fourth-order for the permanent environmental effect could also be employed in these analyses.

  14. Efficient Blind System Identification of Non-Gaussian Auto-Regressive Models with HMM Modeling of the Excitation

    DEFF Research Database (Denmark)

    Li, Chunjian; Andersen, Søren Vang

    2007-01-01

    We propose two blind system identification methods that exploit the underlying dynamics of non-Gaussian signals. The two signal models to be identified are: an Auto-Regressive (AR) model driven by a discrete-state Hidden Markov process, and the same model whose output is perturbed by white Gaussian...

  15. A review of a priori regression models for warfarin maintenance dose prediction.

    Science.gov (United States)

    Francis, Ben; Lane, Steven; Pirmohamed, Munir; Jorgensen, Andrea

    2014-01-01

    A number of a priori warfarin dosing algorithms, derived using linear regression methods, have been proposed. Although these dosing algorithms may have been validated using patients derived from the same centre, rarely have they been validated using a patient cohort recruited from another centre. In order to undertake external validation, two cohorts were utilised. One cohort formed by patients from a prospective trial and the second formed by patients in the control arm of the EU-PACT trial. Of these, 641 patients were identified as having attained stable dosing and formed the dataset used for validation. Predicted maintenance doses from six criterion fulfilling regression models were then compared to individual patient stable warfarin dose. Predictive ability was assessed with reference to several statistics including the R-square and mean absolute error. The six regression models explained different amounts of variability in the stable maintenance warfarin dose requirements of the patients in the two validation cohorts; adjusted R-squared values ranged from 24.2% to 68.6%. An overview of the summary statistics demonstrated that no one dosing algorithm could be considered optimal. The larger validation cohort from the prospective trial produced more consistent statistics across the six dosing algorithms. The study found that all the regression models performed worse in the validation cohort when compared to the derivation cohort. Further, there was little difference between regression models that contained pharmacogenetic coefficients and algorithms containing just non-pharmacogenetic coefficients. The inconsistency of results between the validation cohorts suggests that unaccounted population specific factors cause variability in dosing algorithm performance. Better methods for dosing that take into account inter- and intra-individual variability, at the initiation and maintenance phases of warfarin treatment, are needed.

  16. A review of a priori regression models for warfarin maintenance dose prediction.

    Directory of Open Access Journals (Sweden)

    Ben Francis

    Full Text Available A number of a priori warfarin dosing algorithms, derived using linear regression methods, have been proposed. Although these dosing algorithms may have been validated using patients derived from the same centre, rarely have they been validated using a patient cohort recruited from another centre. In order to undertake external validation, two cohorts were utilised. One cohort formed by patients from a prospective trial and the second formed by patients in the control arm of the EU-PACT trial. Of these, 641 patients were identified as having attained stable dosing and formed the dataset used for validation. Predicted maintenance doses from six criterion fulfilling regression models were then compared to individual patient stable warfarin dose. Predictive ability was assessed with reference to several statistics including the R-square and mean absolute error. The six regression models explained different amounts of variability in the stable maintenance warfarin dose requirements of the patients in the two validation cohorts; adjusted R-squared values ranged from 24.2% to 68.6%. An overview of the summary statistics demonstrated that no one dosing algorithm could be considered optimal. The larger validation cohort from the prospective trial produced more consistent statistics across the six dosing algorithms. The study found that all the regression models performed worse in the validation cohort when compared to the derivation cohort. Further, there was little difference between regression models that contained pharmacogenetic coefficients and algorithms containing just non-pharmacogenetic coefficients. The inconsistency of results between the validation cohorts suggests that unaccounted population specific factors cause variability in dosing algorithm performance. Better methods for dosing that take into account inter- and intra-individual variability, at the initiation and maintenance phases of warfarin treatment, are needed.

  17. APPLICATION OF REGRESSION MODELLING TECHNIQUES IN DESALINATION OF SEA WATER BY MEMBRANE DISTILLATION

    Directory of Open Access Journals (Sweden)

    SELVI S. R

    2015-08-01

    Full Text Available The objective of this work is to gain an idea about the statistical significance of experimental parameters on the performance of membrane distillation. In this work the raw sea water sample without pretreatment was collected from Puducherry and desalinated using direct contact membrane distillation method. Experimental data analysis was carried out using statistical methods. The experimental data involves the effects of feed temperature, feed flow rate and feed concentration on the permeate flux. In statistical methods, regression model was developed to correlate the significance of input parameters like feed temperature, feed concentration and feed flow rate with the output parameter like permeate flux in the process of membrane distillation. Since the performance of the membrane distillation in the desalination of water is characterised by permeate flux, regression model using simple linear method was carried out. Goodness of model fitting should always has to be validated. Regression model was validated using ANOVA. Estimates of ANOVA for the parameter study was given and the coefficient obtained by regression analysis was specified in the regression equation and concluded that the highest coefficient of input parameter is significant, highly influences the response. Feed flow rate and feed temperature has higher influence on permeate flux than that of feed concentration. The coefficient of feed concentration was found to be negative which indicates less significant factor on permeate flux. The chemical composition of sea water was given by water quality analysis . TDS of membrane distilled water was found to be 18ppm than the initial feed TDS of sea water 27,720 ppm. From the experimental work it was found, salt rejection as 99% and water analysis report confirms the quality of distillate obtained by this desalination process as potable water.

  18. Identifying of risks in pricing using a regression model of demand on price dependence

    Directory of Open Access Journals (Sweden)

    O.I. Yashkina

    2016-09-01

    Full Text Available The aim of the article. The main purpose of the article is to describe scientific and methodological approaches of determining the price elasticity of demand as a regression model based on the price and risk assessment of price variations on the received model. The results of the analysis. The study is based on the assumption that the index of price elasticity of demand on high-tech innovation is not constant as it is commonly understood in the classical sense. On the stage of commodity market release and subsequent sales growth, the index of price elasticity of demand may vary within certain limits. Index value and thereafter market response are closely related to the current price. Achieving the stated purpose of the article is possible when having factual information about prices and corresponding volumes of sales of new high-tech products for a short period of time, on the basis of which types of demand and prices interrelation are modeled. Risk assessment of pricing and profit optimization by the regression of demand depending on price consists of three stages: a obtaining of a regression model of the demand on the price; b obtaining of function of demand price elasticity and risk assessment of pricing depending on behavior of the function; c determination of the price of company to receive a maximum operating profit based on the specific model of price to demand function. To receive the regression model of dependence of demand on price it is recommended to use specific reference models. The article includes linear, hyperbolic and parabolic models. The regression dependence of price elasticity of demand on price for each of the reference models of demand is obtained on the basis of the function elasticity concept in mathematical analysis. The concept of «function of price elasticity of demand» expresses this dependence. For the received functions of price elasticity of demand, the article provides intervals with the highest and lowest

  19. Prediction of Mind-Wandering with Electroencephalogram and Non-linear Regression Modeling.

    Science.gov (United States)

    Kawashima, Issaku; Kumano, Hiroaki

    2017-01-01

    Mind-wandering (MW), task-unrelated thought, has been examined by researchers in an increasing number of articles using models to predict whether subjects are in MW, using numerous physiological variables. However, these models are not applicable in general situations. Moreover, they output only binary classification. The current study suggests that the combination of electroencephalogram (EEG) variables and non-linear regression modeling can be a good indicator of MW intensity. We recorded EEGs of 50 subjects during the performance of a Sustained Attention to Response Task, including a thought sampling probe that inquired the focus of attention. We calculated the power and coherence value and prepared 35 patterns of variable combinations and applied Support Vector machine Regression (SVR) to them. Finally, we chose four SVR models: two of them non-linear models and the others linear models; two of the four models are composed of a limited number of electrodes to satisfy model usefulness. Examination using the held-out data indicated that all models had robust predictive precision and provided significantly better estimations than a linear regression model using single electrode EEG variables. Furthermore, in limited electrode condition, non-linear SVR model showed significantly better precision than linear SVR model. The method proposed in this study helps investigations into MW in various little-examined situations. Further, by measuring MW with a high temporal resolution EEG, unclear aspects of MW, such as time series variation, are expected to be revealed. Furthermore, our suggestion that a few electrodes can also predict MW contributes to the development of neuro-feedback studies.

  20. Improving the Prediction of Total Surgical Procedure Time Using Linear Regression Modeling.

    Science.gov (United States)

    Edelman, Eric R; van Kuijk, Sander M J; Hamaekers, Ankie E W; de Korte, Marcel J M; van Merode, Godefridus G; Buhre, Wolfgang F F A

    2017-01-01

    For efficient utilization of operating rooms (ORs), accurate schedules of assigned block time and sequences of patient cases need to be made. The quality of these planning tools is dependent on the accurate prediction of total procedure time (TPT) per case. In this paper, we attempt to improve the accuracy of TPT predictions by using linear regression models based on estimated surgeon-controlled time (eSCT) and other variables relevant to TPT. We extracted data from a Dutch benchmarking database of all surgeries performed in six academic hospitals in The Netherlands from 2012 till 2016. The final dataset consisted of 79,983 records, describing 199,772 h of total OR time. Potential predictors of TPT that were included in the subsequent analysis were eSCT, patient age, type of operation, American Society of Anesthesiologists (ASA) physical status classification, and type of anesthesia used. First, we computed the predicted TPT based on a previously described fixed ratio model for each record, multiplying eSCT by 1.33. This number is based on the research performed by van Veen-Berkx et al., which showed that 33% of SCT is generally a good approximation of anesthesia-controlled time (ACT). We then systematically tested all possible linear regression models to predict TPT using eSCT in combination with the other available independent variables. In addition, all regression models were again tested without eSCT as a predictor to predict ACT separately (which leads to TPT by adding SCT). TPT was most accurately predicted using a linear regression model based on the independent variables eSCT, type of operation, ASA classification, and type of anesthesia. This model performed significantly better than the fixed ratio model and the method of predicting ACT separately. Making use of these more accurate predictions in planning and sequencing algorithms may enable an increase in utilization of ORs, leading to significant financial and productivity related benefits.

  1. Improving the Prediction of Total Surgical Procedure Time Using Linear Regression Modeling

    Directory of Open Access Journals (Sweden)

    Eric R. Edelman

    2017-06-01

    Full Text Available For efficient utilization of operating rooms (ORs, accurate schedules of assigned block time and sequences of patient cases need to be made. The quality of these planning tools is dependent on the accurate prediction of total procedure time (TPT per case. In this paper, we attempt to improve the accuracy of TPT predictions by using linear regression models based on estimated surgeon-controlled time (eSCT and other variables relevant to TPT. We extracted data from a Dutch benchmarking database of all surgeries performed in six academic hospitals in The Netherlands from 2012 till 2016. The final dataset consisted of 79,983 records, describing 199,772 h of total OR time. Potential predictors of TPT that were included in the subsequent analysis were eSCT, patient age, type of operation, American Society of Anesthesiologists (ASA physical status classification, and type of anesthesia used. First, we computed the predicted TPT based on a previously described fixed ratio model for each record, multiplying eSCT by 1.33. This number is based on the research performed by van Veen-Berkx et al., which showed that 33% of SCT is generally a good approximation of anesthesia-controlled time (ACT. We then systematically tested all possible linear regression models to predict TPT using eSCT in combination with the other available independent variables. In addition, all regression models were again tested without eSCT as a predictor to predict ACT separately (which leads to TPT by adding SCT. TPT was most accurately predicted using a linear regression model based on the independent variables eSCT, type of operation, ASA classification, and type of anesthesia. This model performed significantly better than the fixed ratio model and the method of predicting ACT separately. Making use of these more accurate predictions in planning and sequencing algorithms may enable an increase in utilization of ORs, leading to significant financial and productivity related

  2. A class of additive-accelerated means regression models for recurrent event data

    Institute of Scientific and Technical Information of China (English)

    2010-01-01

    In this article, we propose a class of additive-accelerated means regression models for analyzing recurrent event data. The class includes the proportional means model, the additive rates model, the accelerated failure time model, the accelerated rates model and the additive-accelerated rate model as special cases. The new model offers great flexibility in formulating the effects of covariates on the mean functions of counting processes while leaving the stochastic structure completely unspecified. For the inference on the model parameters, estimating equation approaches are derived and asymptotic properties of the proposed estimators are established. In addition, a technique is provided for model checking. The finite-sample behavior of the proposed methods is examined through Monte Carlo simulation studies, and an application to a bladder cancer study is illustrated.

  3. The limiting behavior of the estimated parameters in a misspecified random field regression model

    DEFF Research Database (Denmark)

    Dahl, Christian Møller; Qin, Yu

    , as a consequence the random field model specification introduces non-stationarity and non-ergodicity in the misspecified model and it becomes non-trivial, relative to the existing literature, to establish the limiting behavior of the estimated parameters. The asymptotic results are obtained by applying some...... convenient new uniform convergence results that we propose. This theory may have applications beyond those presented here. Our results indicate that classical statistical inference techniques, in general, works very well for random field regression models in finite samples and that these models succesfully...

  4. Analysis of Covariance with Linear Regression Error Model on Antenna Control Unit Tracking

    Science.gov (United States)

    2015-10-20

    hypotheses, analyses and perhaps modeling to assess test results objectively, i.e., on statistical metrics, probability of confidence, logical inference to...perhaps modeling to assess test results objectively, i.e., based on statistical metrics, probability of confidence and logical inference to...less variable than opinion. Logic, statistical inference and belief are the bases of testable, repeatable and refutable hypothesis and analyses. In

  5. Partial Least Squares Regression Model to Predict Water Quality in Urban Water Distribution Systems

    Institute of Scientific and Technical Information of China (English)

    LUO Bijun; ZHAO Yuan; CHEN Kai; ZHAO Xinhua

    2009-01-01

    The water distribution system of one residential district in Tianjin is taken as an example to analyze the changes of water quality. Partial least squares (PLS) regression model, in which the turbidity and Fe are regarded as con-trol objectives, is used to establish the statistical model. The experimental results indicate that the PLS regression model has good predicted results of water quality compared with the monitored data. The percentages of absolute relative error (below 15%, 20%, 30%) are 44.4%, 66.7%, 100% (turbidity) and 33.3%, 44.4%, 77.8% (Fe) on the 4th sampling point; 77.8%, 88.9%, 88.9% (turbidity) and 44.4%, 55.6%, 66.7% (Fe) on the 5th sampling point.

  6. Ordinal regression models to describe tourist satisfaction with Sintra's world heritage

    Science.gov (United States)

    Mouriño, Helena

    2013-10-01

    In Tourism Research, ordinal regression models are becoming a very powerful tool in modelling the relationship between an ordinal response variable and a set of explanatory variables. In August and September 2010, we conducted a pioneering Tourist Survey in Sintra, Portugal. The data were obtained by face-to-face interviews at the entrances of the Palaces and Parks of Sintra. The work developed in this paper focus on two main points: tourists' perception of the entrance fees; overall level of satisfaction with this heritage site. For attaining these goals, ordinal regression models were developed. We concluded that tourist's nationality was the only significant variable to describe the perception of the admission fees. Also, Sintra's image among tourists depends not only on their nationality, but also on previous knowledge about Sintra's World Heritage status.

  7. Probing turbulence intermittency via Auto-Regressive Moving-Average models

    CERN Document Server

    Faranda, Davide; Dubrulle, Berengere; Daviaud, Francois

    2014-01-01

    We suggest a new approach to probing intermittency corrections to the Kolmogorov law in turbulent flows based on the Auto-Regressive Moving-Average modeling of turbulent time series. We introduce a new index $\\Upsilon$ that measures the distance from a Kolmogorov-Obukhov model in the Auto-Regressive Moving-Average models space. Applying our analysis to Particle Image Velocimetry and Laser Doppler Velocimetry measurements in a von K\\'arm\\'an swirling flow, we show that $\\Upsilon$ is proportional to the traditional intermittency correction computed from the structure function. Therefore it provides the same information, using much shorter time series. We conclude that $\\Upsilon$ is a suitable index to reconstruct the spatial intermittency of the dissipation in both numerical and experimental turbulent fields.

  8. Analysing the temporal dynamics of model performance for hydrological models

    NARCIS (Netherlands)

    Reusser, D.E.; Blume, T.; Schaefli, B.; Zehe, E.

    2009-01-01

    The temporal dynamics of hydrological model performance gives insights into errors that cannot be obtained from global performance measures assigning a single number to the fit of a simulated time series to an observed reference series. These errors can include errors in data, model parameters, or m

  9. ATLS Hypovolemic Shock Classification by Prediction of Blood Loss in Rats Using Regression Models.

    Science.gov (United States)

    Choi, Soo Beom; Choi, Joon Yul; Park, Jee Soo; Kim, Deok Won

    2016-07-01

    In our previous study, our input data set consisted of 78 rats, the blood loss in percent as a dependent variable, and 11 independent variables (heart rate, systolic blood pressure, diastolic blood pressure, mean arterial pressure, pulse pressure, respiration rate, temperature, perfusion index, lactate concentration, shock index, and new index (lactate concentration/perfusion)). The machine learning methods for multicategory classification were applied to a rat model in acute hemorrhage to predict the four Advanced Trauma Life Support (ATLS) hypovolemic shock classes for triage in our previous study. However, multicategory classification is much more difficult and complicated than binary classification. We introduce a simple approach for classifying ATLS hypovolaemic shock class by predicting blood loss in percent using support vector regression and multivariate linear regression (MLR). We also compared the performance of the classification models using absolute and relative vital signs. The accuracies of support vector regression and MLR models with relative values by predicting blood loss in percent were 88.5% and 84.6%, respectively. These were better than the best accuracy of 80.8% of the direct multicategory classification using the support vector machine one-versus-one model in our previous study for the same validation data set. Moreover, the simple MLR models with both absolute and relative values could provide possibility of the future clinical decision support system for ATLS classification. The perfusion index and new index were more appropriate with relative changes than absolute values.

  10. Regression model for generating time series of daily precipitation amounts for climate change impact studies

    Science.gov (United States)

    Buishand, T. A.; Klein Tank, A. M. G.

    1996-05-01

    The precipitation amounts on wet days at De Bilt (the Netherlands) are linked to temperature and surface air pressure through advanced regression techniques. Temperature is chosen as a covariate to use the model for generating synthetic time series of daily precipitation in a CO2 induced warmer climate. The precipitation-temperature dependence can partly be ascribed to the phenomenon that warmer air can contain more moisture. Spline functions are introduced to reproduce the non-monotonous change of the mean daily precipitation amount with temperature. Because the model is non-linear and the variance of the errors depends on the expected response, an iteratively reweighted least-squares technique is needed to estimate the regression coefficients. A representative rainfall sequence for the situation of a systematic temperature rise is obtained by multiplying the precipitation amounts in the observed record with a temperature dependent factor based on a fitted regression model. For a temperature change of 3°C (reasonable guess for a doubled CO2 climate according to the present-day general circulation models) this results in an increase in the annual average amount of 9% (20% in winter and 4% in summer). An extended model with both temperature and surface air pressure is presented which makes it possible to study the additional effects of a potential systematic change in surface air pressure on precipitation.

  11. A Logistic Regression Model for Predicting Axillary Lymph Node Metastases in Early Breast Carcinoma Patients

    Directory of Open Access Journals (Sweden)

    Jiaqing Zhang

    2012-07-01

    Full Text Available Nodal staging in breast cancer is a key predictor of prognosis. This paper presents the results of potential clinicopathological predictors of axillary lymph node involvement and develops an efficient prediction model to assist in predicting axillary lymph node metastases. Seventy patients with primary early breast cancer who underwent axillary dissection were evaluated. Univariate and multivariate logistic regression were performed to evaluate the association between clinicopathological factors and lymph node metastatic status. A logistic regression predictive model was built from 50 randomly selected patients; the model was also applied to the remaining 20 patients to assess its validity. Univariate analysis showed a significant relationship between lymph node involvement and absence of nm-23 (p = 0.010 and Kiss-1 (p = 0.001 expression. Absence of Kiss-1 remained significantly associated with positive axillary node status in the multivariate analysis (p = 0.018. Seven clinicopathological factors were involved in the multivariate logistic regression model: menopausal status, tumor size, ER, PR, HER2, nm-23 and Kiss-1. The model was accurate and discriminating, with an area under the receiver operating characteristic curve of 0.702 when applied to the validation group. Moreover, there is a need discover more specific candidate proteins and molecular biology tools to select more variables which should improve predictive accuracy.

  12. Multivariate Multiple Regression Models for a Big Data-Empowered SON Framework in Mobile Wireless Networks

    Directory of Open Access Journals (Sweden)

    Yoonsu Shin

    2016-01-01

    Full Text Available In the 5G era, the operational cost of mobile wireless networks will significantly increase. Further, massive network capacity and zero latency will be needed because everything will be connected to mobile networks. Thus, self-organizing networks (SON are needed, which expedite automatic operation of mobile wireless networks, but have challenges to satisfy the 5G requirements. Therefore, researchers have proposed a framework to empower SON using big data. The recent framework of a big data-empowered SON analyzes the relationship between key performance indicators (KPIs and related network parameters (NPs using machine-learning tools, and it develops regression models using a Gaussian process with those parameters. The problem, however, is that the methods of finding the NPs related to the KPIs differ individually. Moreover, the Gaussian process regression model cannot determine the relationship between a KPI and its various related NPs. In this paper, to solve these problems, we proposed multivariate multiple regression models to determine the relationship between various KPIs and NPs. If we assume one KPI and multiple NPs as one set, the proposed models help us process multiple sets at one time. Also, we can find out whether some KPIs are conflicting or not. We implement the proposed models using MapReduce.

  13. A Stepwise Time Series Regression Procedure for Water Demand Model Identification

    Science.gov (United States)

    Miaou, Shaw-Pin

    1990-09-01

    Annual time series water demand has traditionally been studied through multiple linear regression analysis. Four associated model specification problems have long been recognized: (1) the length of the available time series data is relatively short, (2) a large set of candidate explanatory or "input" variables needs to be considered, (3) input variables can be highly correlated with each other (multicollinearity problem), and (4) model error series are often highly autocorrelated or even nonstationary. A step wise time series regression identification procedure is proposed to alleviate these problems. The proposed procedure adopts the sequential input variable selection concept of stepwise regression and the "three-step" time series model building strategy of Box and Jenkins. Autocorrelated model error is assumed to follow an autoregressive integrated moving average (ARIMA) process. The stepwise selection procedure begins with a univariate time series demand model with no input variables. Subsequently, input variables are selected and inserted into the equation one at a time until the last entered variable is found to be statistically insignificant. The order of insertion is determined by a statistical measure called between-variable partial correlation. This correlation measure is free from the contamination of serial autocorrelation. Three data sets from previous studies are employed to illustrate the proposed procedure. The results are then compared with those from their original studies.

  14. A Linear Regression Model for Global Solar Radiation on Horizontal Surfaces at Warri, Nigeria

    Directory of Open Access Journals (Sweden)

    Michael S. Okundamiya

    2013-10-01

    Full Text Available The growing anxiety on the negative effects of fossil fuels on the environment and the global emission reduction targets call for a more extensive use of renewable energy alternatives. Efficient solar energy utilization is an essential solution to the high atmospheric pollution caused by fossil fuel combustion. Global solar radiation (GSR data, which are useful for the design and evaluation of solar energy conversion system, are not measured at the forty-five meteorological stations in Nigeria. The dearth of the measured solar radiation data calls for accurate estimation. This study proposed a temperature-based linear regression, for predicting the monthly average daily GSR on horizontal surfaces, at Warri (latitude 5.020N and longitude 7.880E an oil city located in the south-south geopolitical zone, in Nigeria. The proposed model is analyzed based on five statistical indicators (coefficient of correlation, coefficient of determination, mean bias error, root mean square error, and t-statistic, and compared with the existing sunshine-based model for the same study. The results indicate that the proposed temperature-based linear regression model could replace the existing sunshine-based model for generating global solar radiation data. Keywords: air temperature; empirical model; global solar radiation; regression analysis; renewable energy; Warri

  15. Comprehensible Predictive Modeling Using Regularized Logistic Regression and Comorbidity Based Features.

    Directory of Open Access Journals (Sweden)

    Gregor Stiglic

    Full Text Available Different studies have demonstrated the importance of comorbidities to better understand the origin and evolution of medical complications. This study focuses on improvement of the predictive model interpretability based on simple logical features representing comorbidities. We use group lasso based feature interaction discovery followed by a post-processing step, where simple logic terms are added. In the final step, we reduce the feature set by applying lasso logistic regression to obtain a compact set of non-zero coefficients that represent a more comprehensible predictive model. The effectiveness of the proposed approach was demonstrated on a pediatric hospital discharge dataset that was used to build a readmission risk estimation model. The evaluation of the proposed method demonstrates a reduction of the initial set of features in a regression model by 72%, with a slight improvement in the Area Under the ROC Curve metric from 0.763 (95% CI: 0.755-0.771 to 0.769 (95% CI: 0.761-0.777. Additionally, our results show improvement in comprehensibility of the final predictive model using simple comorbidity based terms for logistic regression.

  16. A Calculus for Modelling, Simulating and Analysing Compartmentalized Biological Systems

    DEFF Research Database (Denmark)

    Mardare, Radu Iulian; Ihekwaba, Adoha

    2007-01-01

    A. Ihekwaba, R. Mardare. A Calculus for Modelling, Simulating and Analysing Compartmentalized Biological Systems. Case study: NFkB system. In Proc. of International Conference of Computational Methods in Sciences and Engineering (ICCMSE), American Institute of Physics, AIP Proceedings, N 2...

  17. A Calculus for Modelling, Simulating and Analysing Compartmentalized Biological Systems

    DEFF Research Database (Denmark)

    Mardare, Radu Iulian; Ihekwaba, Adoha

    2007-01-01

    A. Ihekwaba, R. Mardare. A Calculus for Modelling, Simulating and Analysing Compartmentalized Biological Systems. Case study: NFkB system. In Proc. of International Conference of Computational Methods in Sciences and Engineering (ICCMSE), American Institute of Physics, AIP Proceedings, N 2...

  18. The method of characteristics applied to analyse 2DH models

    NARCIS (Netherlands)

    Sloff, C.J.

    1992-01-01

    To gain insight into the physical behaviour of 2D hydraulic models (mathematically formulated as a system of partial differential equations), the method of characteristics is used to analyse the propagation of physical meaningful disturbances. These disturbances propagate as wave fronts along bichar

  19. Revisiting Gaussian Process Regression Modeling for Localization in Wireless Sensor Networks.

    Science.gov (United States)

    Richter, Philipp; Toledano-Ayala, Manuel

    2015-09-08

    Signal strength-based positioning in wireless sensor networks is a key technology for seamless, ubiquitous localization, especially in areas where Global Navigation Satellite System (GNSS) signals propagate poorly. To enable wireless local area network (WLAN) location fingerprinting in larger areas while maintaining accuracy, methods to reduce the effort of radio map creation must be consolidated and automatized. Gaussian process regression has been applied to overcome this issue, also with auspicious results, but the fit of the model was never thoroughly assessed. Instead, most studies trained a readily available model, relying on the zero mean and squared exponential covariance function, without further scrutinization. This paper studies the Gaussian process regression model selection for WLAN fingerprinting in indoor and outdoor environments. We train several models for indoor/outdoor- and combined areas; we evaluate them quantitatively and compare them by means of adequate model measures, hence assessing the fit of these models directly. To illuminate the quality of the model fit, the residuals of the proposed model are investigated, as well. Comparative experiments on the positioning performance verify and conclude the model selection. In this way, we show that the standard model is not the most appropriate, discuss alternatives and present our best candidate.

  20. Revisiting Gaussian Process Regression Modeling for Localization in Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Philipp Richter

    2015-09-01

    Full Text Available Signal strength-based positioning in wireless sensor networks is a key technology for seamless, ubiquitous localization, especially in areas where Global Navigation Satellite System (GNSS signals propagate poorly. To enable wireless local area network (WLAN location fingerprinting in larger areas while maintaining accuracy, methods to reduce the effort of radio map creation must be consolidated and automatized. Gaussian process regression has been applied to overcome this issue, also with auspicious results, but the fit of the model was never thoroughly assessed. Instead, most studies trained a readily available model, relying on the zero mean and squared exponential covariance function, without further scrutinization. This paper studies the Gaussian process regression model selection for WLAN fingerprinting in indoor and outdoor environments. We train several models for indoor/outdoor- and combined areas; we evaluate them quantitatively and compare them by means of adequate model measures, hence assessing the fit of these models directly. To illuminate the quality of the model fit, the residuals of the proposed model are investigated, as well. Comparative experiments on the positioning performance verify and conclude the model selection. In this way, we show that the standard model is not the most appropriate, discuss alternatives and present our best candidate.

  1. The study on Sanmenxia annual flow forecasting in the Yellow River with mix regression model

    Institute of Scientific and Technical Information of China (English)

    JIANG Xiaohui; LIU Changming; WANG Yu; WANG Hongrui

    2004-01-01

    This paper established mix regression model for simulating annual flow, in which annual runoff is auto-regression factor, precipitation, air temperature and water consumption are regression factors; we adopted 9 hypothesis climate change schemes to forecast the change of annual flow of Sanmenxia Station. The results show: (1) When temperature is steady, the average annual runoff will increase by 8.3% if precipitation increases by 10%; when precipitation decreases by 10%, the average annual runoff will decrease by 8.2%; when precipitation is steady, the average annual runoff will decrease by 2.4% if temperature increases 1 ℃; if temperature decreases 1 ℃, runoff will increase by 1.2%. The mix regression model can well simulate annual runoff. (2) As to 9 different temperature and precipitation scenarios, scenario 9 is the most adverse to the runoff of Sanmenxia Station of Yellow River; i.e. temperature increases 1℃and precipitation decreases by 10%. Under this condition, the simulated average annual runoff decreases by 10.8%. On the contrary, scenario 1 is the best to the enhancement of runoff; i.e. when temperature decreases 1 ℃ precipitation will increase by 10%, which will make the annual runoff of Sanmenxia increase by 10.6%.

  2. A regression-kriging model for estimation of rainfall in the Laohahe basin

    Science.gov (United States)

    Wang, Hong; Ren, Li L.; Liu, Gao H.

    2009-10-01

    This paper presents a multivariate geostatistical algorithm called regression-kriging (RK) for predicting the spatial distribution of rainfall by incorporating five topographic/geographic factors of latitude, longitude, altitude, slope and aspect. The technique is illustrated using rainfall data collected at 52 rain gauges from the Laohahe basis in northeast China during 1986-2005 . Rainfall data from 44 stations were selected for modeling and the remaining 8 stations were used for model validation. To eliminate multicollinearity, the five explanatory factors were first transformed using factor analysis with three Principal Components (PCs) extracted. The rainfall data were then fitted using step-wise regression and residuals interpolated using SK. The regression coefficients were estimated by generalized least squares (GLS), which takes the spatial heteroskedasticity between rainfall and PCs into account. Finally, the rainfall prediction based on RK was compared with that predicted from ordinary kriging (OK) and ordinary least squares (OLS) multiple regression (MR). For correlated topographic factors are taken into account, RK improves the efficiency of predictions. RK achieved a lower relative root mean square error (RMSE) (44.67%) than MR (49.23%) and OK (73.60%) and a lower bias than MR and OK (23.82 versus 30.89 and 32.15 mm) for annual rainfall. It is much more effective for the wet season than for the dry season. RK is suitable for estimation of rainfall in areas where there are no stations nearby and where topography has a major influence on rainfall.

  3. A comparison of ordinal regression models in an analysis of factors associated with periodontal disease

    Directory of Open Access Journals (Sweden)

    Javali Shivalingappa

    2010-01-01

    Full Text Available Aim: The study aimed to determine the factors associated with periodontal disease (different levels of severity by using different regression models for ordinal data. Design: A cross-sectional design was employed using clinical examination and ′questionnaire with interview′ method. Materials and Methods: The study was conducted during June 2008 to October 2008 in Dharwad, Karnataka, India. It involved a systematic random sample of 1760 individuals aged 18-40 years. The periodontal disease examination was conducted by using Community Periodontal Index for Treatment Needs (CPITN. Statistical Analysis Used: Regression models for ordinal data with different built-in link functions were used in determination of factors associated with periodontal disease. Results: The study findings indicated that, the ordinal regression models with four built-in link functions (logit, probit, Clog-log and nlog-log displayed similar results with negligible differences in significant factors associated with periodontal disease. The factors such as religion, caste, sources of drinking water, Timings for sweet consumption, Timings for cleaning or brushing the teeth and materials used for brushing teeth were significantly associated with periodontal disease in all ordinal models. Conclusions: The ordinal regression model with Clog-log is a better fit in determination of significant factors associated with periodontal disease as compared to models with logit, probit and nlog-log built-in link functions. The factors such as caste and time for sweet consumption are negatively associated with periodontal disease. But religion, sources of drinking water, Timings for cleaning or brushing the teeth and materials used for brushing teeth are significantly and positively associated with periodontal disease.

  4. Evaluation for Long Term PM10 Concentration Forecasting using Multi Linear Regression (MLR and Principal Component Regression (PCR Models

    Directory of Open Access Journals (Sweden)

    Samsuri Abdullah

    2016-07-01

    Full Text Available Air pollution in Peninsular Malaysia is dominated by particulate matter which is demonstrated by having the highest Air Pollution Index (API value compared to the other pollutants at most part of the country. Particulate Matter (PM10 forecasting models development is crucial because it allows the authority and citizens of a community to take necessary actions to limit their exposure to harmful levels of particulates pollution and implement protection measures to significantly improve air quality on designated locations. This study aims in improving the ability of MLR using PCs inputs for PM10 concentrations forecasting. Daily observations for PM10 in Kuala Terengganu, Malaysia from January 2003 till December 2011 were utilized to forecast PM10 concentration levels. MLR and PCR (using PCs input models were developed and the performance was evaluated using RMSE, NAE and IA. Results revealed that PCR performed better than MLR due to the implementation of PCA which reduce intricacy and eliminate data multi-collinearity.

  5. Evaluation for Long Term PM10 Concentration Forecasting using Multi Linear Regression (MLR) and Principal Component Regression (PCR) Models

    OpenAIRE

    Samsuri Abdullah; Marzuki Ismail; Si Yuen Fong; Al Mahfoodh Ali Najah Ahmed

    2016-01-01

    Air pollution in Peninsular Malaysia is dominated by particulate matter which is demonstrated by having the highest Air Pollution Index (API) value compared to the other pollutants at most part of the country. Particulate Matter (PM10) forecasting models development is crucial because it allows the authority and citizens of a community to take necessary actions to limit their exposure to harmful levels of particulates pollution and implement protection measures to significantly improve air qu...

  6. The Combination Forecasting Model of Grain Production Based on Stepwise Regression Method and RBF Neural Network

    Directory of Open Access Journals (Sweden)

    Lihua Yang

    2015-04-01

    Full Text Available In order to improve the accuracy of grain production forecasting, this study proposed a new combination forecasting model, the model combined stepwise regression method with RBF neural network by assigning proper weights using inverse variance method. By comparing different criteria, the result indicates that the combination forecasting model is superior to other models. The performance of the models is measured using three types of error measurement, which are Mean Absolute Percentage Error (MAPE, Theil Inequality Coefficient (Theil IC and Root Mean Squared Error (RMSE. The model with smallest value of MAPE, Theil IC and RMSE stands out to be the best model in predicting the grain production. Based on the MAPE, Theil IC and RMSE evaluation criteria, the combination model can reduce the forecasting error and has high prediction accuracy in grain production forecasting, making the decision more scientific and rational.

  7. Analysing the temporal dynamics of model performance for hydrological models

    Directory of Open Access Journals (Sweden)

    D. E. Reusser

    2008-11-01

    Full Text Available The temporal dynamics of hydrological model performance gives insights into errors that cannot be obtained from global performance measures assigning a single number to the fit of a simulated time series to an observed reference series. These errors can include errors in data, model parameters, or model structure. Dealing with a set of performance measures evaluated at a high temporal resolution implies analyzing and interpreting a high dimensional data set. This paper presents a method for such a hydrological model performance assessment with a high temporal resolution and illustrates its application for two very different rainfall-runoff modeling case studies. The first is the Wilde Weisseritz case study, a headwater catchment in the eastern Ore Mountains, simulated with the conceptual model WaSiM-ETH. The second is the Malalcahuello case study, a headwater catchment in the Chilean Andes, simulated with the physics-based model Catflow. The proposed time-resolved performance assessment starts with the computation of a large set of classically used performance measures for a moving window. The key of the developed approach is a data-reduction method based on self-organizing maps (SOMs and cluster analysis to classify the high-dimensional performance matrix. Synthetic peak errors are used to interpret the resulting error classes. The final outcome of the proposed method is a time series of the occurrence of dominant error types. For the two case studies analyzed here, 6 such error types have been identified. They show clear temporal patterns which can lead to the identification of model structural errors.

  8. Analysing the temporal dynamics of model performance for hydrological models

    Directory of Open Access Journals (Sweden)

    E. Zehe

    2009-07-01

    Full Text Available The temporal dynamics of hydrological model performance gives insights into errors that cannot be obtained from global performance measures assigning a single number to the fit of a simulated time series to an observed reference series. These errors can include errors in data, model parameters, or model structure. Dealing with a set of performance measures evaluated at a high temporal resolution implies analyzing and interpreting a high dimensional data set. This paper presents a method for such a hydrological model performance assessment with a high temporal resolution and illustrates its application for two very different rainfall-runoff modeling case studies. The first is the Wilde Weisseritz case study, a headwater catchment in the eastern Ore Mountains, simulated with the conceptual model WaSiM-ETH. The second is the Malalcahuello case study, a headwater catchment in the Chilean Andes, simulated with the physics-based model Catflow. The proposed time-resolved performance assessment starts with the computation of a large set of classically used performance measures for a moving window. The key of the developed approach is a data-reduction method based on self-organizing maps (SOMs and cluster analysis to classify the high-dimensional performance matrix. Synthetic peak errors are used to interpret the resulting error classes. The final outcome of the proposed method is a time series of the occurrence of dominant error types. For the two case studies analyzed here, 6 such error types have been identified. They show clear temporal patterns, which can lead to the identification of model structural errors.

  9. Measurement error in epidemiologic studies of air pollution based on land-use regression models.

    Science.gov (United States)

    Basagaña, Xavier; Aguilera, Inmaculada; Rivera, Marcela; Agis, David; Foraster, Maria; Marrugat, Jaume; Elosua, Roberto; Künzli, Nino

    2013-10-15

    Land-use regression (LUR) models are increasingly used to estimate air pollution exposure in epidemiologic studies. These models use air pollution measurements taken at a small set of locations and modeling based on geographical covariates for which data are available at all study participant locations. The process of LUR model development commonly includes a variable selection procedure. When LUR model predictions are used as explanatory variables in a model for a health outcome, measurement error can lead to bias of the regression coefficients and to inflation of their variance. In previous studies dealing with spatial predictions of air pollution, bias was shown to be small while most of the effect of measurement error was on the variance. In this study, we show that in realistic cases where LUR models are applied to health data, bias in health-effect estimates can be substantial. This bias depends on the number of air pollution measurement sites, the number of available predictors for model selection, and the amount of explainable variability in the true exposure. These results should be taken into account when interpreting health effects from studies that used LUR models.

  10. Intermittent reservoir daily-inflow prediction using lumped and distributed data multi-linear regression models

    Indian Academy of Sciences (India)

    R B Magar; V Jothiprakash

    2011-12-01

    In this study, multi-linear regression (MLR) approach is used to construct intermittent reservoir daily inflow forecasting system. To illustrate the applicability and effect of using lumped and distributed input data in MLR approach, Koyna river watershed in Maharashtra, India is chosen as a case study. The results are also compared with autoregressive integrated moving average (ARIMA) models. MLR attempts to model the relationship between two or more independent variables over a dependent variable by fitting a linear regression equation. The main aim of the present study is to see the consequences of development and applicability of simple models, when sufficient data length is available. Out of 47 years of daily historical rainfall and reservoir inflow data, 33 years of data is used for building the model and 14 years of data is used for validating the model. Based on the observed daily rainfall and reservoir inflow, various types of time-series, cause-effect and combined models are developed using lumped and distributed input data. Model performance was evaluated using various performance criteria and it was found that as in the present case, of well correlated input data, both lumped and distributed MLR models perform equally well. For the present case study considered, both MLR and ARIMA models performed equally sound due to availability of large dataset.

  11. A Heterogeneous Bayesian Regression Model for Cross-Sectional Data Involving a Single Observation per Response Unit

    Science.gov (United States)

    Fong, Duncan K. H.; Ebbes, Peter; DeSarbo, Wayne S.

    2012-01-01

    Multiple regression is frequently used across the various social sciences to analyze cross-sectional data. However, it can often times be challenging to justify the assumption of common regression coefficients across all respondents. This manuscript presents a heterogeneous Bayesian regression model that enables the estimation of…

  12. Regression by L1 regularization of smart contrasts and sums (ROSCAS) beats PLS and elastic net in latent variable model

    NARCIS (Netherlands)

    Braak, ter C.J.F.

    2009-01-01

    This paper proposes a regression method, ROSCAS, which regularizes smart contrasts and sums of regression coefficients by an L1 penalty. The contrasts and sums are based on the sample correlation matrix of the predictors and are suggested by a latent variable regression model. The contrasts express

  13. Remote sensing and GIS-based landslide hazard analysis and cross-validation using multivariate logistic regression model on three test areas in Malaysia

    Science.gov (United States)

    Pradhan, Biswajeet

    2010-05-01

    This paper presents the results of the cross-validation of a multivariate logistic regression model using remote sensing data and GIS for landslide hazard analysis on the Penang, Cameron, and Selangor areas in Malaysia. Landslide locations in the study areas were identified by interpreting aerial photographs and satellite images, supported by field surveys. SPOT 5 and Landsat TM satellite imagery were used to map landcover and vegetation index, respectively. Maps of topography, soil type, lineaments and land cover were constructed from the spatial datasets. Ten factors which influence landslide occurrence, i.e., slope, aspect, curvature, distance from drainage, lithology, distance from lineaments, soil type, landcover, rainfall precipitation, and normalized difference vegetation index (ndvi), were extracted from the spatial database and the logistic regression coefficient of each factor was computed. Then the landslide hazard was analysed using the multivariate logistic regression coefficients derived not only from the data for the respective area but also using the logistic regression coefficients calculated from each of the other two areas (nine hazard maps in all) as a cross-validation of the model. For verification of the model, the results of the analyses were then compared with the field-verified landslide locations. Among the three cases of the application of logistic regression coefficient in the same study area, the case of Selangor based on the Selangor logistic regression coefficients showed the highest accuracy (94%), where as Penang based on the Penang coefficients showed the lowest accuracy (86%). Similarly, among the six cases from the cross application of logistic regression coefficient in other two areas, the case of Selangor based on logistic coefficient of Cameron showed highest (90%) prediction accuracy where as the case of Penang based on the Selangor logistic regression coefficients showed the lowest accuracy (79%). Qualitatively, the cross

  14. A methodology for the design of experiments in computational intelligence with multiple regression models.

    Science.gov (United States)

    Fernandez-Lozano, Carlos; Gestal, Marcos; Munteanu, Cristian R; Dorado, Julian; Pazos, Alejandro

    2016-01-01

    The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable.

  15. A methodology for the design of experiments in computational intelligence with multiple regression models

    Directory of Open Access Journals (Sweden)

    Carlos Fernandez-Lozano

    2016-12-01

    Full Text Available The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable.

  16. Regression models for near-infrared measurement of subcutaneous adipose tissue thickness.

    Science.gov (United States)

    Wang, Yu; Hao, Dongmei; Shi, Jingbin; Yang, Zeqiang; Jin, Liu; Zhang, Song; Yang, Yimin; Bin, Guangyu; Zeng, Yanjun; Zheng, Dingchang

    2016-07-01

    Obesity is often associated with the risks of diabetes and cardiovascular disease, and there is a need to measure subcutaneous adipose tissue (SAT) thickness for acquiring the distribution of body fat. The present study aimed to develop and evaluate different model-based methods for SAT thickness measurement using an SATmeter developed in our laboratory. Near-infrared signals backscattered from the body surfaces from 40 subjects at 20 body sites each were recorded. Linear regression (LR) and support vector regression (SVR) models were established to predict SAT thickness on different body sites. The measurement accuracy was evaluated by ultrasound, and compared with results from a mechanical skinfold caliper (MSC) and a body composition balance monitor (BCBM). The results showed that both LR- and SVR-based measurement produced better accuracy than MSC and BCBM. It was also concluded that by using regression models specifically designed for certain parts of human body, higher measurement accuracy could be achieved than using a general model for the whole body. Our results demonstrated that the SATmeter is a feasible method, which can be applied at home and in the community due to its portability and convenience.

  17. Selection of higher order regression models in the analysis of multi-factorial transcription data.

    Directory of Open Access Journals (Sweden)

    Olivia Prazeres da Costa

    Full Text Available INTRODUCTION: Many studies examine gene expression data that has been obtained under the influence of multiple factors, such as genetic background, environmental conditions, or exposure to diseases. The interplay of multiple factors may lead to effect modification and confounding. Higher order linear regression models can account for these effects. We present a new methodology for linear model selection and apply it to microarray data of bone marrow-derived macrophages. This experiment investigates the influence of three variable factors: the genetic background of the mice from which the macrophages were obtained, Yersinia enterocolitica infection (two strains, and a mock control, and treatment/non-treatment with interferon-γ. RESULTS: We set up four different linear regression models in a hierarchical order. We introduce the eruption plot as a new practical tool for model selection complementary to global testing. It visually compares the size and significance of effect estimates between two nested models. Using this methodology we were able to select the most appropriate model by keeping only relevant factors showing additional explanatory power. Application to experimental data allowed us to qualify the interaction of factors as either neutral (no interaction, alleviating (co-occurring effects are weaker than expected from the single effects, or aggravating (stronger than expected. We find a biologically meaningful gene cluster of putative C2TA target genes that appear to be co-regulated with MHC class II genes. CONCLUSIONS: We introduced the eruption plot as a tool for visual model comparison to identify relevant higher order interactions in the analysis of expression data obtained under the influence of multiple factors. We conclude that model selection in higher order linear regression models should generally be performed for the analysis of multi-factorial microarray data.

  18. PM10 modeling in the Oviedo urban area (Northern Spain) by using multivariate adaptive regression splines

    Science.gov (United States)

    Nieto, Paulino José García; Antón, Juan Carlos Álvarez; Vilán, José Antonio Vilán; García-Gonzalo, Esperanza

    2014-10-01

    The aim of this research work is to build a regression model of the particulate matter up to 10 micrometers in size (PM10) by using the multivariate adaptive regression splines (MARS) technique in the Oviedo urban area (Northern Spain) at local scale. This research work explores the use of a nonparametric regression algorithm known as multivariate adaptive regression splines (MARS) which has the ability to approximate the relationship between the inputs and outputs, and express the relationship mathematically. In this sense, hazardous air pollutants or toxic air contaminants refer to any substance that may cause or contribute to an increase in mortality or serious illness, or that may pose a present or potential hazard to human health. To accomplish the objective of this study, the experimental dataset of nitrogen oxides (NOx), carbon monoxide (CO), sulfur dioxide (SO2), ozone (O3) and dust (PM10) were collected over 3 years (2006-2008) and they are used to create a highly nonlinear model of the PM10 in the Oviedo urban nucleus (Northern Spain) based on the MARS technique. One main objective of this model is to obtain a preliminary estimate of the dependence between PM10 pollutant in the Oviedo urban area at local scale. A second aim is to determine the factors with the greatest bearing on air quality with a view to proposing health and lifestyle improvements. The United States National Ambient Air Quality Standards (NAAQS) establishes the limit values of the main pollutants in the atmosphere in order to ensure the health of healthy people. Firstly, this MARS regression model captures the main perception of statistical learning theory in order to obtain a good prediction of the dependence among the main pollutants in the Oviedo urban area. Secondly, the main advantages of MARS are its capacity to produce simple, easy-to-interpret models, its ability to estimate the contributions of the input variables, and its computational efficiency. Finally, on the basis of

  19. Social Network Analyses and Nutritional Behavior: An Integrated Modeling Approach

    Directory of Open Access Journals (Sweden)

    Alistair McNair Senior

    2016-01-01

    Full Text Available Animals have evolved complex foraging strategies to obtain a nutritionally balanced diet and associated fitness benefits. Recent advances in nutrition research, combining state-space models of nutritional geometry with agent-based models of systems biology, show how nutrient targeted foraging behavior can also influence animal social interactions, ultimately affecting collective dynamics and group structures. Here we demonstrate how social network analyses can be integrated into such a modeling framework and provide a tangible and practical analytical tool to compare experimental results with theory. We illustrate our approach by examining the case of nutritionally mediated dominance hierarchies. First we show how nutritionally explicit agent-based models that simulate the emergence of dominance hierarchies can be used to generate social networks. Importantly the structural properties of our simulated networks bear similarities to dominance networks of real animals (where conflicts are not always directly related to nutrition. Finally, we demonstrate how metrics from social network analyses can be used to predict the fitness of agents in these simulated competitive environments. Our results highlight the potential importance of nutritional mechanisms in shaping dominance interactions in a wide range of social and ecological contexts. Nutrition likely influences social interaction in many species, and yet a theoretical framework for exploring these effects is currently lacking. Combining social network analyses with computational models from nutritional ecology may bridge this divide, representing a pragmatic approach for generating theoretical predictions for nutritional experiments.

  20. WOMBAT: a tool for mixed model analyses in quantitative genetics by restricted maximum likelihood (REML).

    Science.gov (United States)

    Meyer, Karin

    2007-11-01

    WOMBAT is a software package for quantitative genetic analyses of continuous traits, fitting a linear, mixed model; estimates of covariance components and the resulting genetic parameters are obtained by restricted maximum likelihood. A wide range of models, comprising numerous traits, multiple fixed and random effects, selected genetic covariance structures, random regression models and reduced rank estimation are accommodated. WOMBAT employs up-to-date numerical and computational methods. Together with the use of efficient compilers, this generates fast executable programs, suitable for large scale analyses. Use of WOMBAT is illustrated for a bivariate analysis. The package consists of the executable program, available for LINUX and WINDOWS environments, manual and a set of worked example, and can be downloaded free of charge from (http://agbu. une.edu.au/~kmeyer/wombat.html).