Guo, Pi; Zhang, Jianjun; Wang, Li; Yang, Shaoyi; Luo, Ganfeng; Deng, Changyu; Wen, Ye; Zhang, Qingying
2017-01-01
Seasonal influenza epidemics cause serious public health problems in China. Search queries-based surveillance was recently proposed to complement traditional monitoring approaches of influenza epidemics. However, developing robust techniques of search query selection and enhancing predictability for influenza epidemics remains a challenge. This study aimed to develop a novel ensemble framework to improve penalized regression models for detecting influenza epidemics by using Baidu search engine query data from China. The ensemble framework applied a combination of bootstrap aggregating (bagging) and rank aggregation method to optimize penalized regression models. Different algorithms including lasso, ridge, elastic net and the algorithms in the proposed ensemble framework were compared by using Baidu search engine queries. Most of the selected search terms captured the peaks and troughs of the time series curves of influenza cases. The predictability of the conventional penalized regression models were improved by the proposed ensemble framework. The elastic net regression model outperformed the compared models, with the minimum prediction errors. We established a Baidu search engine queries-based surveillance model for monitoring influenza epidemics, and the proposed model provides a useful tool to support the public health response to influenza and other infectious diseases. PMID:28422149
Seasonal forecasting of Bangladesh summer monsoon rainfall using simple multiple regression model
Md Mizanur Rahman; M Rafiuddin; Md Mahbub Alam
2013-04-01
In this paper, the development of a statistical forecasting method for summer monsoon rainfall over Bangladesh is described. Predictors for Bangladesh summer monsoon (June–September) rainfall were identified from the large scale ocean–atmospheric circulation variables (i.e., sea-surface temperature, surface air temperature and sea level pressure). The predictors exhibited a significant relationship with Bangladesh summer monsoon rainfall during the period 1961–2007. After carrying out a detailed analysis of various global climate datasets; three predictors were selected. The model performance was evaluated during the period 1977–2007. The model showed better performance in their hindcast seasonal monsoon rainfall over Bangladesh. The RMSE and Heidke skill score for 31 years was 8.13 and 0.37, respectively, and the correlation between the predicted and observed rainfall was 0.74. The BIAS of the forecasts (% of long period average, LPA) was −0.85 and Hit score was 58%. The experimental forecasts for the year 2008 summer monsoon rainfall based on the model were also found to be in good agreement with the observation.
Flexible survival regression modelling
Cortese, Giuliana; Scheike, Thomas H; Martinussen, Torben
2009-01-01
Regression analysis of survival data, and more generally event history data, is typically based on Cox's regression model. We here review some recent methodology, focusing on the limitations of Cox's regression model. The key limitation is that the model is not well suited to represent time-varyi...
Unitary Response Regression Models
Lipovetsky, S.
2007-01-01
The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…
Lijing Yu; Lingling Zhou; Li Tan; Hongbo Jiang; Ying Wang; Sheng Wei; Shaofa Nie
2014-01-01
BACKGROUND: Outbreaks of hand-foot-mouth disease (HFMD) have been reported for many times in Asia during the last decades. This emerging disease has drawn worldwide attention and vigilance. Nowadays, the prevention and control of HFMD has become an imperative issue in China. Early detection and response will be helpful before it happening, using modern information technology during the epidemic. METHOD: In this paper, a hybrid model combining seasonal auto-regressive integrated moving average...
Yu, Lijing; Zhou, Lingling; Tan, Li; Jiang, Hongbo; Wang, Ying; Wei, Sheng; Nie, Shaofa
2014-01-01
Outbreaks of hand-foot-mouth disease (HFMD) have been reported for many times in Asia during the last decades. This emerging disease has drawn worldwide attention and vigilance. Nowadays, the prevention and control of HFMD has become an imperative issue in China. Early detection and response will be helpful before it happening, using modern information technology during the epidemic. In this paper, a hybrid model combining seasonal auto-regressive integrated moving average (ARIMA) model and nonlinear auto-regressive neural network (NARNN) is proposed to predict the expected incidence cases from December 2012 to May 2013, using the retrospective observations obtained from China Information System for Disease Control and Prevention from January 2008 to November 2012. The best-fitted hybrid model was combined with seasonal ARIMA [Formula: see text] and NARNN with 15 hidden units and 5 delays. The hybrid model makes the good forecasting performance and estimates the expected incidence cases from December 2012 to May 2013, which are respectively -965.03, -1879.58, 4138.26, 1858.17, 4061.86 and 6163.16 with an obviously increasing trend. The model proposed in this paper can predict the incidence trend of HFMD effectively, which could be helpful to policy makers. The usefulness of expected cases of HFMD perform not only in detecting outbreaks or providing probability statements, but also in providing decision makers with a probable trend of the variability of future observations that contains both historical and recent information.
TWO REGRESSION CREDIBILITY MODELS
Constanţa-Nicoleta BODEA
2010-03-01
Full Text Available In this communication we will discuss two regression credibility models from Non – Life Insurance Mathematics that can be solved by means of matrix theory. In the first regression credibility model, starting from a well-known representation formula of the inverse for a special class of matrices a risk premium will be calculated for a contract with risk parameter θ. In the next regression credibility model, we will obtain a credibility solution in the form of a linear combination of the individual estimate (based on the data of a particular state and the collective estimate (based on aggregate USA data. To illustrate the solution with the properties mentioned above, we shall need the well-known representation theorem for a special class of matrices, the properties of the trace for a square matrix, the scalar product of two vectors, the norm with respect to a positive definite matrix given in advance and the complicated mathematical properties of conditional expectations and of conditional covariances.
M.A. Mousavi Shalmani
2014-08-01
Full Text Available In order to assessment of water quality and characterize seasonal variation in 18O and 2H in relation with different chemical and physiographical parameters and modelling of effective parameters, an study was conducted during 2010 to 2011 in 30 different ponds in the north of Iran. Samples were collected at three different seasons and analysed for chemical and isotopic components. Data shows that highest amounts of δ18O and δ2H were recorded in the summer (-1.15‰ and -12.11‰ and the lowest amounts were seen in the winter (-7.50‰ and -47.32‰ respectively. Data also reveals that there is significant increase in d-excess during spring and summer in ponds 20, 21, 22, 24, 25 and 26. We can conclude that residual surface runoff (from upper lands is an important source of water to transfer soluble salts in to these ponds. In this respect, high retention time may be the main reason for movements of light isotopes in to the ponds. This has led d-excess of pond 12 even greater in summer than winter. This could be an acceptable reason for ponds 25 and 26 (Siyahkal county with highest amount of d-excess and lowest amounts of δ18O and δ2H. It seems light water pumped from groundwater wells with minor source of salt (originated from sea deep percolation in to the ponds, could may be another reason for significant decrease in the heavy isotopes of water (18O and 2H for ponds 2, 12, 14 and 25 from spring to summer. Overall conclusion of multiple linear regression test indicate that firstly from 30 variables (under investigation only a few cases can be used for identifying of changes in 18O and 2H by applications. Secondly, among the variables (studied, phytoplankton content was a common factor for interpretation of 18O and 2H during spring and summer, and also total period (during a year. Thirdly, the use of water in the spring was recommended for sampling, for 18O and 2H interpretation compared with other seasons. This is because of function can be
Forecasting with Dynamic Regression Models
Pankratz, Alan
2012-01-01
One of the most widely used tools in statistical forecasting, single equation regression models is examined here. A companion to the author's earlier work, Forecasting with Univariate Box-Jenkins Models: Concepts and Cases, the present text pulls together recent time series ideas and gives special attention to possible intertemporal patterns, distributed lag responses of output to input series and the auto correlation patterns of regression disturbance. It also includes six case studies.
Modified Regression Correlation Coefficient for Poisson Regression Model
Kaengthong, Nattacha; Domthong, Uthumporn
2017-09-01
This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).
Ridge Regression for Interactive Models.
Tate, Richard L.
1988-01-01
An exploratory study of the value of ridge regression for interactive models is reported. Assuming that the linear terms in a simple interactive model are centered to eliminate non-essential multicollinearity, a variety of common models, representing both ordinal and disordinal interactions, are shown to have "orientations" that are favorable to…
Inferential Models for Linear Regression
Zuoyi Zhang
2011-09-01
Full Text Available Linear regression is arguably one of the most widely used statistical methods in applications. However, important problems, especially variable selection, remain a challenge for classical modes of inference. This paper develops a recently proposed framework of inferential models (IMs in the linear regression context. In general, an IM is able to produce meaningful probabilistic summaries of the statistical evidence for and against assertions about the unknown parameter of interest and, moreover, these summaries are shown to be properly calibrated in a frequentist sense. Here we demonstrate, using simple examples, that the IM framework is promising for linear regression analysis --- including model checking, variable selection, and prediction --- and for uncertain inference in general.
Heteroscedasticity checks for regression models
无
2001-01-01
For checking on heteroscedasticity in regression models, a unified approach is proposed to constructing test statistics in parametric and nonparametric regression models. For nonparametric regression, the test is not affected sensitively by the choice of smoothing parameters which are involved in estimation of the nonparametric regression function. The limiting null distribution of the test statistic remains the same in a wide range of the smoothing parameters. When the covariate is one-dimensional, the tests are, under some conditions, asymptotically distribution-free. In the high-dimensional cases, the validity of bootstrap approximations is investigated. It is shown that a variant of the wild bootstrap is consistent while the classical bootstrap is not in the general case, but is applicable if some extra assumption on conditional variance of the squared error is imposed. A simulation study is performed to provide evidence of how the tests work and compare with tests that have appeared in the literature. The approach may readily be extended to handle partial linear, and linear autoregressive models.
Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…
Callén, M S; López, J M; Mastral, A M
2010-08-15
The estimation of benzo(a)pyrene (BaP) concentrations in ambient air is very important from an environmental point of view especially with the introduction of the Directive 2004/107/EC and due to the carcinogenic character of this pollutant. A sampling campaign of particulate matter less or equal than 10 microns (PM10) carried out during 2008-2009 in four locations of Spain was collected to determine experimentally BaP concentrations by gas chromatography mass-spectrometry mass-spectrometry (GC-MS-MS). Multivariate linear regression models (MLRM) were used to predict BaP air concentrations in two sampling places, taking PM10 and meteorological variables as possible predictors. The model obtained with data from two sampling sites (all sites model) (R(2)=0.817, PRESS/SSY=0.183) included the significant variables like PM10, temperature, solar radiation and wind speed and was internally and externally validated. The first validation was performed by cross validation and the last one by BaP concentrations from previous campaigns carried out in Zaragoza from 2001-2004. The proposed model constitutes a first approximation to estimate BaP concentrations in urban atmospheres with very good internal prediction (Q(CV)(2)=0.813, PRESS/SSY=0.187) and with the maximal external prediction for the 2001-2002 campaign (Q(ext)(2)=0.679 and PRESS/SSY=0.321) versus the 2001-2004 campaign (Q(ext)(2)=0.551, PRESS/SSY=0.449).
Heteroscedasticity checks for regression models
ZHU; Lixing
2001-01-01
［1］Carroll, R. J., Ruppert, D., Transformation and Weighting in Regression, New York: Chapman and Hall, 1988.［2］Cook, R. D., Weisberg, S., Diagnostics for heteroscedasticity in regression, Biometrika, 1988, 70: 1—10.［3］Davidian, M., Carroll, R. J., Variance function estimation, J. Amer. Statist. Assoc., 1987, 82: 1079—1091.［4］Bickel, P., Using residuals robustly I: Tests for heteroscedasticity, Ann. Statist., 1978, 6: 266—291.［5］Carroll, R. J., Ruppert, D., On robust tests for heteroscedasticity, Ann. Statist., 1981, 9: 205—209.［6］Eubank, R. L., Thomas, W., Detecting heteroscedasticity in nonparametric regression, J. Roy. Statist. Soc., Ser. B, 1993, 55: 145—155.［7］Diblasi, A., Bowman, A., Testing for constant variance in a linear model, Statist. and Probab. Letters, 1997, 33: 95—103.［8］Dette, H., Munk, A., Testing heteoscedasticity in nonparametric regression, J. R. Statist. Soc. B, 1998, 60: 693—708.［9］Müller, H. G., Zhao, P. L., On a semi-parametric variance function model and a test for heteroscedasticity, Ann. Statist., 1995, 23: 946—967.［10］Stute, W., Manteiga, G., Quindimil, M. P., Bootstrap approximations in model checks for regression, J. Amer. Statist. Asso., 1998, 93: 141—149.［11］Stute, W., Thies, G., Zhu, L. X., Model checks for regression: An innovation approach, Ann. Statist., 1998, 26: 1916—1939.［12］Shorack, G. R., Wellner, J. A., Empirical Processes with Applications to Statistics, New York: Wiley, 1986.［13］Efron, B., Bootstrap methods: Another look at the jackknife, Ann. Statist., 1979, 7: 1—26.［14］Wu, C. F. J., Jackknife, bootstrap and other re-sampling methods in regression analysis, Ann. Statist., 1986, 14: 1261—1295.［15］H rdle, W., Mammen, E., Comparing non-parametric versus parametric regression fits, Ann. Statist., 1993, 21: 1926—1947.［16］Liu, R. Y., Bootstrap procedures under some non-i.i.d. models, Ann. Statist., 1988, 16: 1696—1708.［17
Boosted Regression Tree Models to Explain Watershed ...
Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on the Index of Biotic Integrity (IBI), were also analyzed. Seasonal BRT models at two spatial scales (watershed and riparian buffered area [RBA]) for nitrite-nitrate (NO2-NO3), total Kjeldahl nitrogen, and total phosphorus (TP) and annual models for the IBI score were developed. Two primary factors — location within the watershed (i.e., geographic position, stream order, and distance to a downstream confluence) and percentage of urban land cover (both scales) — emerged as important predictor variables. Latitude and longitude interacted with other factors to explain the variability in summer NO2-NO3 concentrations and IBI scores. BRT results also suggested that location might be associated with indicators of sources (e.g., land cover), runoff potential (e.g., soil and topographic factors), and processes not easily represented by spatial data indicators. Runoff indicators (e.g., Hydrological Soil Group D and Topographic Wetness Indices) explained a substantial portion of the variability in nutrient concentrations as did point sources for TP in the summer months. The results from our BRT approach can help prioritize areas for nutrient management in mixed-use and heavily impacted watershed
Semiparametric Regression and Model Refining
无
2002-01-01
This paper presents a semiparametric adjustment method suitable for general cases.Assuming that the regularizer matrix is positive definite,the calculation method is discussed and the corresponding formulae are presented.Finally,a simulated adjustment problem is constructed to explain the method given in this paper.The results from the semiparametric model and G-M model are compared.The results demonstrate that the model errors or the systematic errors of the observations can be detected correctly with the semiparametric estimate method.
A Note on the Effect of Seasonal Dummies on the Periodogram Regression
M. Ooms (Marius); U. Hassler
1996-01-01
textabstractWe discuss how prior regression on seasonal dummies leads to singularities in periodogram regression procedures for the detection of long memory. We suggest a modified procedure. We illustrate the problems using monthly inflation data from Hassler and Wolters (1995).
Regression modeling of ground-water flow
Cooley, R.L.; Naff, R.L.
1985-01-01
Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
[From clinical judgment to linear regression model.
Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O
2013-01-01
When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R(2)) indicates the importance of independent variables in the outcome.
Regression Model With Elliptically Contoured Errors
Arashi, M; Tabatabaey, S M M
2012-01-01
For the regression model where the errors follow the elliptically contoured distribution (ECD), we consider the least squares (LS), restricted LS (RLS), preliminary test (PT), Stein-type shrinkage (S) and positive-rule shrinkage (PRS) estimators for the regression parameters. We compare the quadratic risks of the estimators to determine the relative dominance properties of the five estimators.
The Infinite Hierarchical Factor Regression Model
Rai, Piyush
2009-01-01
We propose a nonparametric Bayesian factor regression model that accounts for uncertainty in the number of factors, and the relationship between factors. To accomplish this, we propose a sparse variant of the Indian Buffet Process and couple this with a hierarchical model over factors, based on Kingman's coalescent. We apply this model to two problems (factor analysis and factor regression) in gene-expression data analysis.
Islam, M Nazrul; Tsukahara, N; Sugita, S
2012-06-01
The present study investigated effects of apoptosis observed during seasonal testicular regression in Japanese Jungle Crows. The study was conducted during January to June 2008, 2009. Testes from adults captured during non-breeding (January), prebreeding (February to mid-March), main-breeding (late March to early May), transition (mid-May to late May), and post-breeding (June) seasons were analyzed. Apoptosis was assessed by in situ terminal deoxynucleotidyl transferase-mediated dUTP nick end-labeling (TUNEL) assay. Paired-testis volume increased 95-fold from the non-breeding to the main-breeding season (P Crows; however, testis function was terminated rapidly after the breeding season. Furthermore, we concluded, similar to other avian species, Sertoli cell apoptosis followed by massive germ cell death was responsible for rapid testicular regression in Jungle Crows. Copyright © 2012 Elsevier Inc. All rights reserved.
Mattia Callegari
2015-05-01
Full Text Available In this contribution we analyze the performance of a monthly river discharge forecasting model with a Support Vector Regression (SVR technique in a European alpine area. We considered as predictors the discharges of the antecedent months, snow-covered area (SCA, and meteorological and climatic variables for 14 catchments in South Tyrol (Northern Italy, as well as the long-term average discharge of the month of prediction, also regarded as a benchmark. Forecasts at a six-month lead time tend to perform no better than the benchmark, with an average 33% relative root mean square error (RMSE% on test samples. However, at one month lead time, RMSE% was 22%, a non-negligible improvement over the benchmark; moreover, the SVR model reduces the frequency of higher errors associated with anomalous months. Predictions with a lead time of three months show an intermediate performance between those at one and six months lead time. Among the considered predictors, SCA alone reduces RMSE% to 6% and 5% compared to using monthly discharges only, for a lead time equal to one and three months, respectively, whereas meteorological parameters bring only minor improvements. The model also outperformed a simpler linear autoregressive model, and yielded the lowest volume error in forecasting with one month lead time, while at longer lead times the differences compared to the benchmarks are negligible. Our results suggest that although an SVR model may deliver better forecasts than its simpler linear alternatives, long lead-time hydrological forecasting in Alpine catchments remains a challenge. Catchment state variables may play a bigger role than catchment input variables; hence a focus on characterizing seasonal catchment storage—Rather than seasonal weather forecasting—Could be key for improving our predictive capacity.
Applied Regression Modeling A Business Approach
Pardoe, Iain
2012-01-01
An applied and concise treatment of statistical regression techniques for business students and professionals who have little or no background in calculusRegression analysis is an invaluable statistical methodology in business settings and is vital to model the relationship between a response variable and one or more predictor variables, as well as the prediction of a response value given values of the predictors. In view of the inherent uncertainty of business processes, such as the volatility of consumer spending and the presence of market uncertainty, business professionals use regression a
A new bivariate negative binomial regression model
Faroughi, Pouya; Ismail, Noriszura
2014-12-01
This paper introduces a new form of bivariate negative binomial (BNB-1) regression which can be fitted to bivariate and correlated count data with covariates. The BNB regression discussed in this study can be fitted to bivariate and overdispersed count data with positive, zero or negative correlations. The joint p.m.f. of the BNB1 distribution is derived from the product of two negative binomial marginals with a multiplicative factor parameter. Several testing methods were used to check overdispersion and goodness-of-fit of the model. Application of BNB-1 regression is illustrated on Malaysian motor insurance dataset. The results indicated that BNB-1 regression has better fit than bivariate Poisson and BNB-2 models with regards to Akaike information criterion.
A Spline Regression Model for Latent Variables
Harring, Jeffrey R.
2014-01-01
Spline (or piecewise) regression models have been used in the past to account for patterns in observed data that exhibit distinct phases. The changepoint or knot marking the shift from one phase to the other, in many applications, is an unknown parameter to be estimated. As an extension of this framework, this research considers modeling the…
Regression modeling methods, theory, and computation with SAS
Panik, Michael
2009-01-01
Regression Modeling: Methods, Theory, and Computation with SAS provides an introduction to a diverse assortment of regression techniques using SAS to solve a wide variety of regression problems. The author fully documents the SAS programs and thoroughly explains the output produced by the programs.The text presents the popular ordinary least squares (OLS) approach before introducing many alternative regression methods. It covers nonparametric regression, logistic regression (including Poisson regression), Bayesian regression, robust regression, fuzzy regression, random coefficients regression,
Constrained regression models for optimization and forecasting
P.J.S. Bruwer
2003-12-01
Full Text Available Linear regression models and the interpretation of such models are investigated. In practice problems often arise with the interpretation and use of a given regression model in spite of the fact that researchers may be quite "satisfied" with the model. In this article methods are proposed which overcome these problems. This is achieved by constructing a model where the "area of experience" of the researcher is taken into account. This area of experience is represented as a convex hull of available data points. With the aid of a linear programming model it is shown how conclusions can be formed in a practical way regarding aspects such as optimal levels of decision variables and forecasting.
A Skew-Normal Mixture Regression Model
Liu, Min; Lin, Tsung-I
2014-01-01
A challenge associated with traditional mixture regression models (MRMs), which rest on the assumption of normally distributed errors, is determining the number of unobserved groups. Specifically, even slight deviations from normality can lead to the detection of spurious classes. The current work aims to (a) examine how sensitive the commonly…
Modeling confounding by half-sibling regression
Schölkopf, Bernhard; Hogg, David W; Wang, Dun
2016-01-01
We describe a method for removing the effect of confounders to reconstruct a latent quantity of interest. The method, referred to as "half-sibling regression," is inspired by recent work in causal inference using additive noise models. We provide a theoretical justification, discussing both...
Chen, Qiang; Mei, Kun; Dahlgren, Randy A; Wang, Ting; Gong, Jian; Zhang, Minghua
2016-12-01
As an important regulator of pollutants in overland flow and interflow, land use has become an essential research component for determining the relationships between surface water quality and pollution sources. This study investigated the use of ordinary least squares (OLS) and geographically weighted regression (GWR) models to identify the impact of land use and population density on surface water quality in the Wen-Rui Tang River watershed of eastern China. A manual variable excluding-selecting method was explored to resolve multicollinearity issues. Standard regression coefficient analysis coupled with cluster analysis was introduced to determine which variable had the greatest influence on water quality. Results showed that: (1) Impact of land use on water quality varied with spatial and seasonal scales. Both positive and negative effects for certain land-use indicators were found in different subcatchments. (2) Urban land was the dominant factor influencing N, P and chemical oxygen demand (COD) in highly urbanized regions, but the relationship was weak as the pollutants were mainly from point sources. Agricultural land was the primary factor influencing N and P in suburban and rural areas; the relationship was strong as the pollutants were mainly from agricultural surface runoff. Subcatchments located in suburban areas were identified with urban land as the primary influencing factor during the wet season while agricultural land was identified as a more prevalent influencing factor during the dry season. (3) Adjusted R(2) values in OLS models using the manual variable excluding-selecting method averaged 14.3% higher than using stepwise multiple linear regressions. However, the corresponding GWR models had adjusted R(2) ~59.2% higher than the optimal OLS models, confirming that GWR models demonstrated better prediction accuracy. Based on our findings, water resource protection policies should consider site-specific land-use conditions within each watershed to
Bayesian multimodel inference for geostatistical regression models.
Devin S Johnson
Full Text Available The problem of simultaneous covariate selection and parameter inference for spatial regression models is considered. Previous research has shown that failure to take spatial correlation into account can influence the outcome of standard model selection methods. A Markov chain Monte Carlo (MCMC method is investigated for the calculation of parameter estimates and posterior model probabilities for spatial regression models. The method can accommodate normal and non-normal response data and a large number of covariates. Thus the method is very flexible and can be used to fit spatial linear models, spatial linear mixed models, and spatial generalized linear mixed models (GLMMs. The Bayesian MCMC method also allows a priori unequal weighting of covariates, which is not possible with many model selection methods such as Akaike's information criterion (AIC. The proposed method is demonstrated on two data sets. The first is the whiptail lizard data set which has been previously analyzed by other researchers investigating model selection methods. Our results confirmed the previous analysis suggesting that sandy soil and ant abundance were strongly associated with lizard abundance. The second data set concerned pollution tolerant fish abundance in relation to several environmental factors. Results indicate that abundance is positively related to Strahler stream order and a habitat quality index. Abundance is negatively related to percent watershed disturbance.
An Application on Multinomial Logistic Regression Model
Abdalla M El-Habil
2012-03-01
Full Text Available Normal 0 false false false EN-US X-NONE X-NONE This study aims to identify an application of Multinomial Logistic Regression model which is one of the important methods for categorical data analysis. This model deals with one nominal/ordinal response variable that has more than two categories, whether nominal or ordinal variable. This model has been applied in data analysis in many areas, for example health, social, behavioral, and educational.To identify the model by practical way, we used real data on physical violence against children, from a survey of Youth 2003 which was conducted by Palestinian Central Bureau of Statistics (PCBS. Segment of the population of children in the age group (10-14 years for residents in Gaza governorate, size of 66,935 had been selected, and the response variable consisted of four categories. Eighteen of explanatory variables were used for building the primary multinomial logistic regression model. Model had been tested through a set of statistical tests to ensure its appropriateness for the data. Also the model had been tested by selecting randomly of two observations of the data used to predict the position of each observation in any classified group it can be, by knowing the values of the explanatory variables used. We concluded by using the multinomial logistic regression model that we can able to define accurately the relationship between the group of explanatory variables and the response variable, identify the effect of each of the variables, and we can predict the classification of any individual case.
Regression Models for Count Data in R
Christian Kleiber
2008-06-01
Full Text Available The classical Poisson, geometric and negative binomial regression models for count data belong to the family of generalized linear models and are available at the core of the statistics toolbox in the R system for statistical computing. After reviewing the conceptual and computational features of these methods, a new implementation of hurdle and zero-inﬂated regression models in the functions hurdle( and zeroinfl( from the package pscl is introduced. It re-uses design and functionality of the basic R functions just as the underlying conceptual tools extend the classical models. Both hurdle and zero-inﬂated model, are able to incorporate over-dispersion and excess zeros-two problems that typically occur in count data sets in economics and the social sciences—better than their classical counterparts. Using cross-section data on the demand for medical care, it is illustrated how the classical as well as the zero-augmented models can be ﬁtted, inspected and tested in practice.
Parametric Regression Models Using Reversed Hazard Rates
Asokan Mulayath Variyath
2014-01-01
Full Text Available Proportional hazard regression models are widely used in survival analysis to understand and exploit the relationship between survival time and covariates. For left censored survival times, reversed hazard rate functions are more appropriate. In this paper, we develop a parametric proportional hazard rates model using an inverted Weibull distribution. The estimation and construction of confidence intervals for the parameters are discussed. We assess the performance of the proposed procedure based on a large number of Monte Carlo simulations. We illustrate the proposed method using a real case example.
Bayesian model selection in Gaussian regression
Abramovich, Felix
2009-01-01
We consider a Bayesian approach to model selection in Gaussian linear regression, where the number of predictors might be much larger than the number of observations. From a frequentist view, the proposed procedure results in the penalized least squares estimation with a complexity penalty associated with a prior on the model size. We investigate the optimality properties of the resulting estimator. We establish the oracle inequality and specify conditions on the prior that imply its asymptotic minimaxity within a wide range of sparse and dense settings for "nearly-orthogonal" and "multicollinear" designs.
Bayesian Inference of a Multivariate Regression Model
Marick S. Sinay
2014-01-01
Full Text Available We explore Bayesian inference of a multivariate linear regression model with use of a flexible prior for the covariance structure. The commonly adopted Bayesian setup involves the conjugate prior, multivariate normal distribution for the regression coefficients and inverse Wishart specification for the covariance matrix. Here we depart from this approach and propose a novel Bayesian estimator for the covariance. A multivariate normal prior for the unique elements of the matrix logarithm of the covariance matrix is considered. Such structure allows for a richer class of prior distributions for the covariance, with respect to strength of beliefs in prior location hyperparameters, as well as the added ability, to model potential correlation amongst the covariance structure. The posterior moments of all relevant parameters of interest are calculated based upon numerical results via a Markov chain Monte Carlo procedure. The Metropolis-Hastings-within-Gibbs algorithm is invoked to account for the construction of a proposal density that closely matches the shape of the target posterior distribution. As an application of the proposed technique, we investigate a multiple regression based upon the 1980 High School and Beyond Survey.
General regression and representation model for classification.
Jianjun Qian
Full Text Available Recently, the regularized coding-based classification methods (e.g. SRC and CRC show a great potential for pattern classification. However, most existing coding methods assume that the representation residuals are uncorrelated. In real-world applications, this assumption does not hold. In this paper, we take account of the correlations of the representation residuals and develop a general regression and representation model (GRR for classification. GRR not only has advantages of CRC, but also takes full use of the prior information (e.g. the correlations between representation residuals and representation coefficients and the specific information (weight matrix of image pixels to enhance the classification performance. GRR uses the generalized Tikhonov regularization and K Nearest Neighbors to learn the prior information from the training data. Meanwhile, the specific information is obtained by using an iterative algorithm to update the feature (or image pixel weights of the test sample. With the proposed model as a platform, we design two classifiers: basic general regression and representation classifier (B-GRR and robust general regression and representation classifier (R-GRR. The experimental results demonstrate the performance advantages of proposed methods over state-of-the-art algorithms.
Adaptive regression for modeling nonlinear relationships
Knafl, George J
2016-01-01
This book presents methods for investigating whether relationships are linear or nonlinear and for adaptively fitting appropriate models when they are nonlinear. Data analysts will learn how to incorporate nonlinearity in one or more predictor variables into regression models for different types of outcome variables. Such nonlinear dependence is often not considered in applied research, yet nonlinear relationships are common and so need to be addressed. A standard linear analysis can produce misleading conclusions, while a nonlinear analysis can provide novel insights into data, not otherwise possible. A variety of examples of the benefits of modeling nonlinear relationships are presented throughout the book. Methods are covered using what are called fractional polynomials based on real-valued power transformations of primary predictor variables combined with model selection based on likelihood cross-validation. The book covers how to formulate and conduct such adaptive fractional polynomial modeling in the s...
Time series regression model for infectious disease and weather.
Imai, Chisato; Armstrong, Ben; Chalabi, Zaid; Mangtani, Punam; Hashizume, Masahiro
2015-10-01
Time series regression has been developed and long used to evaluate the short-term associations of air pollution and weather with mortality or morbidity of non-infectious diseases. The application of the regression approaches from this tradition to infectious diseases, however, is less well explored and raises some new issues. We discuss and present potential solutions for five issues often arising in such analyses: changes in immune population, strong autocorrelations, a wide range of plausible lag structures and association patterns, seasonality adjustments, and large overdispersion. The potential approaches are illustrated with datasets of cholera cases and rainfall from Bangladesh and influenza and temperature in Tokyo. Though this article focuses on the application of the traditional time series regression to infectious diseases and weather factors, we also briefly introduce alternative approaches, including mathematical modeling, wavelet analysis, and autoregressive integrated moving average (ARIMA) models. Modifications proposed to standard time series regression practice include using sums of past cases as proxies for the immune population, and using the logarithm of lagged disease counts to control autocorrelation due to true contagion, both of which are motivated from "susceptible-infectious-recovered" (SIR) models. The complexity of lag structures and association patterns can often be informed by biological mechanisms and explored by using distributed lag non-linear models. For overdispersed models, alternative distribution models such as quasi-Poisson and negative binomial should be considered. Time series regression can be used to investigate dependence of infectious diseases on weather, but may need modifying to allow for features specific to this context. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Hierarchical linear regression models for conditional quantiles
TIAN Maozai; CHEN Gemai
2006-01-01
The quantile regression has several useful features and therefore is gradually developing into a comprehensive approach to the statistical analysis of linear and nonlinear response models,but it cannot deal effectively with the data with a hierarchical structure.In practice,the existence of such data hierarchies is neither accidental nor ignorable,it is a common phenomenon.To ignore this hierarchical data structure risks overlooking the importance of group effects,and may also render many of the traditional statistical analysis techniques used for studying data relationships invalid.On the other hand,the hierarchical models take a hierarchical data structure into account and have also many applications in statistics,ranging from overdispersion to constructing min-max estimators.However,the hierarchical models are virtually the mean regression,therefore,they cannot be used to characterize the entire conditional distribution of a dependent variable given high-dimensional covariates.Furthermore,the estimated coefficient vector (marginal effects)is sensitive to an outlier observation on the dependent variable.In this article,a new approach,which is based on the Gauss-Seidel iteration and taking a full advantage of the quantile regression and hierarchical models,is developed.On the theoretical front,we also consider the asymptotic properties of the new method,obtaining the simple conditions for an n1/2-convergence and an asymptotic normality.We also illustrate the use of the technique with the real educational data which is hierarchical and how the results can be explained.
Regression Models For Saffron Yields in Iran
S. H, Sanaeinejad; S. N, Hosseini
Saffron is an important crop in social and economical aspects in Khorassan Province (Northeast of Iran). In this research wetried to evaluate trends of saffron yield in recent years and to study the relationship between saffron yield and the climate change. A regression analysis was used to predict saffron yield based on 20 years of yield data in Birjand, Ghaen and Ferdows cities.Climatologically data for the same periods was provided by database of Khorassan Climatology Center. Climatologically data includedtemperature, rainfall, relative humidity and sunshine hours for ModelI, and temperature and rainfall for Model II. The results showed the coefficients of determination for Birjand, Ferdows and Ghaen for Model I were 0.69, 0.50 and 0.81 respectively. Also coefficients of determination for the same cities for model II were 0.53, 0.50 and 0.72 respectively. Multiple regression analysisindicated that among weather variables, temperature was the key parameter for variation ofsaffron yield. It was concluded that increasing temperature at spring was the main cause of declined saffron yield during recent years across the province. Finally, yield trend was predicted for the last 5 years using time series analysis.
A Machine Learning Tool for Weighted Regressions in Time, Discharge, and Season
Alexander Maestre
2014-01-01
Full Text Available A new machine learning tool has been developed to classify water stations with similar water quality trends. The tool is based on the statistical method, Weighted Regressions in Time, Discharge, and Season (WRTDS, developed by the United States Geological Survey (USGS to estimate daily concentrations of water constituents in rivers and streams based on continuous daily discharge data and discrete water quality samples collected at the same or nearby locations. WRTDS is based on parametric survival regressions using a jack-knife cross validation procedure that generates unbiased estimates of the prediction errors. One of the disadvantages of WRTDS is that it needs a large number of samples (n > 200 collected during at least two decades. In this article, the tool is used to evaluate the use of Boosted Regression Trees (BRT as an alternative to the parametric survival regressions for water quality stations with a small number of samples. We describe the development of the machine learning tool as well as an evaluation comparison of the two methods, WRTDS and BRT. The purpose of the tool is to evaluate the reduction in variability of the estimates by clustering data from nearby stations with similar concentration and discharge characteristics. The results indicate that, using clustering, the predicted concentrations using BRT are in general higher than the observed concentrations. In addition, it appears that BRT generates higher sum of square residuals than the parametric survival regressions.
Dynamic Regression Intervention Modeling for the Malaysian Daily Load
Fadhilah Abdrazak
2014-05-01
Full Text Available Malaysia is a unique country due to having both fixed and moving holidays. These moving holidays may overlap with other fixed holidays and therefore, increase the complexity of the load forecasting activities. The errors due to holidays’ effects in the load forecasting are known to be higher than other factors. If these effects can be estimated and removed, the behavior of the series could be better viewed. Thus, the aim of this paper is to improve the forecasting errors by using a dynamic regression model with intervention analysis. Based on the linear transfer function method, a daily load model consists of either peak or average is developed. The developed model outperformed the seasonal ARIMA model in estimating the fixed and moving holidays’ effects and achieved a smaller Mean Absolute Percentage Error (MAPE in load forecast.
Inferring gene regression networks with model trees
Aguilar-Ruiz Jesus S
2010-10-01
Full Text Available Abstract Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear
Quantile regression modeling for Malaysian automobile insurance premium data
Fuzi, Mohd Fadzli Mohd; Ismail, Noriszura; Jemain, Abd Aziz
2015-09-01
Quantile regression is a robust regression to outliers compared to mean regression models. Traditional mean regression models like Generalized Linear Model (GLM) are not able to capture the entire distribution of premium data. In this paper we demonstrate how a quantile regression approach can be used to model net premium data to study the effects of change in the estimates of regression parameters (rating classes) on the magnitude of response variable (pure premium). We then compare the results of quantile regression model with Gamma regression model. The results from quantile regression show that some rating classes increase as quantile increases and some decrease with decreasing quantile. Further, we found that the confidence interval of median regression (τ = O.5) is always smaller than Gamma regression in all risk factors.
Entrepreneurial intention modeling using hierarchical multiple regression
Marina Jeger
2014-12-01
Full Text Available The goal of this study is to identify the contribution of effectuation dimensions to the predictive power of the entrepreneurial intention model over and above that which can be accounted for by other predictors selected and confirmed in previous studies. As is often the case in social and behavioral studies, some variables are likely to be highly correlated with each other. Therefore, the relative amount of variance in the criterion variable explained by each of the predictors depends on several factors such as the order of variable entry and sample specifics. The results show the modest predictive power of two dimensions of effectuation prior to the introduction of the theory of planned behavior elements. The article highlights the main advantages of applying hierarchical regression in social sciences as well as in the specific context of entrepreneurial intention formation, and addresses some of the potential pitfalls that this type of analysis entails.
A Malthusian Model for all Seasons
Sharp, Paul Richard; Weisdorf, Jacob Louis
associated with labour shortages (the high-season bottleneck on production), although there might be labour surplus during the low season. We introduce the concept of seasonality into a stylized Malthusian model, and endogenize the extent of agricultural labour input, which is then used to calculate labour...
Introduction to the use of regression models in epidemiology.
Bender, Ralf
2009-01-01
Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.
A Malthusian Model for all Seasons
Sharp, Paul Richard; Weisdorf, Jacob Louis
economy. Inspired by the work of Boserup (1965) and others, and in contrast to the Lewis (1954) approach, we suggest that the phenomenon of surplus labour is best understood through an acceptance of the importance of seasonality in agriculture. Boserup observed that the harvest season was invariably...... associated with labour shortages (the high-season bottleneck on production), although there might be labour surplus during the low season. We introduce the concept of seasonality into a stylized Malthusian model, and endogenize the extent of agricultural labour input, which is then used to calculate labour...... surplus and the rate of labour productivity. We observe the effects of season-specific technological progress, and find that technological progress in the low-season increases labour surplus and labour productivity whilst, perhaps surprisingly, technological progress in the high-season, by relaxing...
Model performance analysis and model validation in logistic regression
Rosa Arboretti Giancristofaro
2007-10-01
Full Text Available In this paper a new model validation procedure for a logistic regression model is presented. At first, we illustrate a brief review of different techniques of model validation. Next, we define a number of properties required for a model to be considered "good", and a number of quantitative performance measures. Lastly, we describe a methodology for the assessment of the performance of a given model by using an example taken from a management study.
SMOOTH TRANSITION LOGISTIC REGRESSION MODEL TREE
RODRIGO PINTO MOREIRA
2008-01-01
Este trabalho tem como objetivo principal adaptar o modelo STR-Tree, o qual é a combinação de um modelo Smooth Transition Regression com Classification and Regression Tree (CART), a fim de utilizá-lo em Classificação. Para isto algumas alterações foram realizadas em sua forma estrutural e na estimação. Devido ao fato de estarmos fazendo classificação de variáveis dependentes binárias, se faz necessária a utilização das técnicas empregadas em Regressão Logística, dessa forma a estimação dos pa...
Permanasari, Adhistya Erna; Dominic, Dhanapal Durai
2009-01-01
Zoonosis refers to the transmission of infectious diseases from animal to human. The increasing number of zoonosis incidence makes the great losses to lives, including humans and animals, and also the impact in social economic. It motivates development of a system that can predict the future number of zoonosis occurrences in human. This paper analyses and presents the use of Seasonal Autoregressive Integrated Moving Average (SARIMA) method for developing a forecasting model that able to support and provide prediction number of zoonosis human incidence. The dataset for model development was collected on a time series data of human tuberculosis occurrences in United States which comprises of fourteen years of monthly data obtained from a study published by Centers for Disease Control and Prevention (CDC). Several trial models of SARIMA were compared to obtain the most appropriate model. Then, diagnostic tests were used to determine model validity. The result showed that the SARIMA(9,0,14)(12,1,24)12 is the fitt...
Model selection in kernel ridge regression
Exterkate, Peter
2013-01-01
Kernel ridge regression is a technique to perform ridge regression with a potentially infinite number of nonlinear transformations of the independent variables as regressors. This method is gaining popularity as a data-rich nonlinear forecasting tool, which is applicable in many different contexts....... The influence of the choice of kernel and the setting of tuning parameters on forecast accuracy is investigated. Several popular kernels are reviewed, including polynomial kernels, the Gaussian kernel, and the Sinc kernel. The latter two kernels are interpreted in terms of their smoothing properties......, and the tuning parameters associated to all these kernels are related to smoothness measures of the prediction function and to the signal-to-noise ratio. Based on these interpretations, guidelines are provided for selecting the tuning parameters from small grids using cross-validation. A Monte Carlo study...
A Dirty Model for Multiple Sparse Regression
Jalali, Ali; Sanghavi, Sujay
2011-01-01
Sparse linear regression -- finding an unknown vector from linear measurements -- is now known to be possible with fewer samples than variables, via methods like the LASSO. We consider the multiple sparse linear regression problem, where several related vectors -- with partially shared support sets -- have to be recovered. A natural question in this setting is whether one can use the sharing to further decrease the overall number of samples required. A line of recent research has studied the use of \\ell_1/\\ell_q norm block-regularizations with q>1 for such problems; however these could actually perform worse in sample complexity -- vis a vis solving each problem separately ignoring sharing -- depending on the level of sharing. We present a new method for multiple sparse linear regression that can leverage support and parameter overlap when it exists, but not pay a penalty when it does not. A very simple idea: we decompose the parameters into two components and regularize these differently. We show both theore...
Logistic Regression Model on Antenna Control Unit Autotracking Mode
2015-10-20
412TW-PA-15240 Logistic Regression Model on Antenna Control Unit Autotracking Mode DANIEL T. LAIRD AIR FORCE TEST CENTER EDWARDS AFB, CA...OCT 15 4. TITLE AND SUBTITLE Logistic Regression Model on Antenna Control Unit Autotracking Mode 5a. CONTRACT NUMBER 5b. GRANT...alternative-hypothesis. This paper will present an Antenna Auto- tracking model using Logistic Regression modeling. This paper presents an example of
Climate Impacts on Chinese Corn Yields: A Fractional Polynomial Regression Model
Kooten, van G.C.; Sun, Baojing
2012-01-01
In this study, we examine the effect of climate on corn yields in northern China using data from ten districts in Inner Mongolia and two in Shaanxi province. A regression model with a flexible functional form is specified, with explanatory variables that include seasonal growing degree days,
Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model
Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami
2017-06-01
A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.
Multiple Retrieval Models and Regression Models for Prior Art Search
Lopez, Patrice
2009-01-01
This paper presents the system called PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach presents three main characteristics: 1. The usage of multiple retrieval models (KL, Okapi) and term index definitions (lemma, phrase, concept) for the three languages considered in the present track (English, French, German) producing ten different sets of ranked results. 2. The merging of the different results based on multiple regression models using an additional validation set created from the patent collection. 3. The exploitation of patent metadata and of the citation structures for creating restricted initial working sets of patents and for producing a final re-ranking regression model. As we exploit specific metadata of the patent documents and the citation relations only at the creation of initial working sets and during the final post ranking step, our architecture remains generic and easy to extend.
Relative risk regression models with inverse polynomials.
Ning, Yang; Woodward, Mark
2013-08-30
The proportional hazards model assumes that the log hazard ratio is a linear function of parameters. In the current paper, we model the log relative risk as an inverse polynomial, which is particularly suitable for modeling bounded and asymmetric functions. The parameters estimated by maximizing the partial likelihood are consistent and asymptotically normal. The advantages of the inverse polynomial model over the ordinary polynomial model and the fractional polynomial model for fitting various asymmetric log relative risk functions are shown by simulation. The utility of the method is further supported by analyzing two real data sets, addressing the specific question of the location of the minimum risk threshold.
Model Selection in Kernel Ridge Regression
Exterkate, Peter
Kernel ridge regression is gaining popularity as a data-rich nonlinear forecasting tool, which is applicable in many different contexts. This paper investigates the influence of the choice of kernel and the setting of tuning parameters on forecast accuracy. We review several popular kernels......, including polynomial kernels, the Gaussian kernel, and the Sinc kernel. We interpret the latter two kernels in terms of their smoothing properties, and we relate the tuning parameters associated to all these kernels to smoothness measures of the prediction function and to the signal-to-noise ratio. Based...... on these interpretations, we provide guidelines for selecting the tuning parameters from small grids using cross-validation. A Monte Carlo study confirms the practical usefulness of these rules of thumb. Finally, the flexible and smooth functional forms provided by the Gaussian and Sinc kernels makes them widely...
Combining logistic regression and neural networks to create predictive models.
Spackman, K. A.
1992-01-01
Neural networks are being used widely in medicine and other areas to create predictive models from data. The statistical method that most closely parallels neural networks is logistic regression. This paper outlines some ways in which neural networks and logistic regression are similar, shows how a small modification of logistic regression can be used in the training of neural network models, and illustrates the use of this modification for variable selection and predictive model building wit...
Zhang, Ying; Bi, Peng; Hiller, Janet
2008-01-01
This is the first study to identify appropriate regression models for the association between climate variation and salmonellosis transmission. A comparison between different regression models was conducted using surveillance data in Adelaide, South Australia. By using notified salmonellosis cases and climatic variables from the Adelaide metropolitan area over the period 1990-2003, four regression methods were examined: standard Poisson regression, autoregressive adjusted Poisson regression, multiple linear regression, and a seasonal autoregressive integrated moving average (SARIMA) model. Notified salmonellosis cases in 2004 were used to test the forecasting ability of the four models. Parameter estimation, goodness-of-fit and forecasting ability of the four regression models were compared. Temperatures occurring 2 weeks prior to cases were positively associated with cases of salmonellosis. Rainfall was also inversely related to the number of cases. The comparison of the goodness-of-fit and forecasting ability suggest that the SARIMA model is better than the other three regression models. Temperature and rainfall may be used as climatic predictors of salmonellosis cases in regions with climatic characteristics similar to those of Adelaide. The SARIMA model could, thus, be adopted to quantify the relationship between climate variations and salmonellosis transmission.
Hong-Juan Li
2013-04-01
Full Text Available Electric load forecasting is an important issue for a power utility, associated with the management of daily operations such as energy transfer scheduling, unit commitment, and load dispatch. Inspired by strong non-linear learning capability of support vector regression (SVR, this paper presents a SVR model hybridized with the empirical mode decomposition (EMD method and auto regression (AR for electric load forecasting. The electric load data of the New South Wales (Australia market are employed for comparing the forecasting performances of different forecasting models. The results confirm the validity of the idea that the proposed model can simultaneously provide forecasting with good accuracy and interpretability.
Stochastic Approximation Methods for Latent Regression Item Response Models
von Davier, Matthias; Sinharay, Sandip
2010-01-01
This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…
Symbolic regression of generative network models
Menezes, Telmo
2014-01-01
Networks are a powerful abstraction with applicability to a variety of scientific fields. Models explaining their morphology and growth processes permit a wide range of phenomena to be more systematically analysed and understood. At the same time, creating such models is often challenging and requires insights that may be counter-intuitive. Yet there currently exists no general method to arrive at better models. We have developed an approach to automatically detect realistic decentralised network growth models from empirical data, employing a machine learning technique inspired by natural selection and defining a unified formalism to describe such models as computer programs. As the proposed method is completely general and does not assume any pre-existing models, it can be applied "out of the box" to any given network. To validate our approach empirically, we systematically rediscover pre-defined growth laws underlying several canonical network generation models and credible laws for diverse real-world netwo...
Vargas, M.; Crossa, J.; Eeuwijk, van F.A.; Ramirez, M.E.; Sayre, K.
1999-01-01
Partial least squares (PLS) and factorial regression (FR) are statistical models that incorporate external environmental and/or cultivar variables for studying and interpreting genotype × environment interaction (GEl). The Additive Main effect and Multiplicative Interaction (AMMI) model uses only th
Corporate prediction models, ratios or regression analysis?
Bijnen, E.J.; Wijn, M.F.C.M.
1994-01-01
The models developed in the literature with respect to the prediction of a company s failure are based on ratios. It has been shown before that these models should be rejected on theoretical grounds. Our study of industrial companies in the Netherlands shows that the ratios which are used in
Sparse Volterra and Polynomial Regression Models: Recoverability and Estimation
Kekatos, Vassilis
2011-01-01
Volterra and polynomial regression models play a major role in nonlinear system identification and inference tasks. Exciting applications ranging from neuroscience to genome-wide association analysis build on these models with the additional requirement of parsimony. This requirement has high interpretative value, but unfortunately cannot be met by least-squares based or kernel regression methods. To this end, compressed sampling (CS) approaches, already successful in linear regression settings, can offer a viable alternative. The viability of CS for sparse Volterra and polynomial models is the core theme of this work. A common sparse regression task is initially posed for the two models. Building on (weighted) Lasso-based schemes, an adaptive RLS-type algorithm is developed for sparse polynomial regressions. The identifiability of polynomial models is critically challenged by dimensionality. However, following the CS principle, when these models are sparse, they could be recovered by far fewer measurements. ...
Mixed Frequency Data Sampling Regression Models: The R Package midasr
Eric Ghysels
2016-08-01
Full Text Available When modeling economic relationships it is increasingly common to encounter data sampled at different frequencies. We introduce the R package midasr which enables estimating regression models with variables sampled at different frequencies within a MIDAS regression framework put forward in work by Ghysels, Santa-Clara, and Valkanov (2002. In this article we define a general autoregressive MIDAS regression model with multiple variables of different frequencies and show how it can be specified using the familiar R formula interface and estimated using various optimization methods chosen by the researcher. We discuss how to check the validity of the estimated model both in terms of numerical convergence and statistical adequacy of a chosen regression specification, how to perform model selection based on a information criterion, how to assess forecasting accuracy of the MIDAS regression model and how to obtain a forecast aggregation of different MIDAS regression models. We illustrate the capabilities of the package with a simulated MIDAS regression model and give two empirical examples of application of MIDAS regression.
Modelling seasonality in Australian building approvals
Harry M Karamujic
2012-02-01
Full Text Available The paper examines the impact of seasonal influences on Australian housing approvals, represented by the State of Victoria[1] building approvals for new houses (BANHs. The prime objective of BANHs is to provide timely estimates of future residential building work. Due to the relevance of the residential property sector to the property sector as whole, BANHs are viewed by economic analysts and commentators as a leading indicator of property sector investment and as such the general level of economic activity and employment. The generic objective of the study is to enhance the practice of modelling housing variables. In particular, the study seeks to cast some additional light on modelling the seasonal behaviour of BANHs by: (i establishing the presence, or otherwise, of seasonality in Victorian BANHs; (ii if present, ascertaining is it deterministic or stochastic; (iii determining out of sample forecasting capabilities of the considered modelling specifications; and (iv speculating on possible interpretation of the results. To do so the study utilises a structural time series model of Harwey (1989. The modelling results confirm that the modelling specification allowing for stochastic trend and deterministic seasonality performs best in terms of diagnostic tests and goodness of fit measures. This is corroborated with the analysis of out of sample forecasting capabilities of the considered modelling specifications, which showed that the models with deterministic seasonal specification exhibit superior forecasting capabilities. The paper also demonstrates that if time series are characterized by either stochastic trend or seasonality, the conventional modelling approach[2] is bound to be mis-specified i.e. would not be able to identify statistically significant seasonality in time series.According to the selected modeling specification, factors corresponding to June, April, December and November are found to be significant at five per cent level
Impact of multicollinearity on small sample hydrologic regression models
Kroll, Charles N.; Song, Peter
2013-06-01
Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.
ASYMPTOTIC EFFICIENT ESTIMATION IN SEMIPARAMETRIC NONLINEAR REGRESSION MODELS
ZhuZhongyi; WeiBocheng
1999-01-01
In this paper, the estimation method based on the “generalized profile likelihood” for the conditionally parametric models in the paper given by Severini and Wong (1992) is extendedto fixed design semiparametrie nonlinear regression models. For these semiparametrie nonlinear regression models,the resulting estimator of parametric component of the model is shown to beasymptotically efficient and the strong convergence rate of nonparametric component is investigated. Many results (for example Chen (1988) ,Gao & Zhao (1993), Rice (1986) et al. ) are extended to fixed design semiparametric nonlinear regression models.
Support vector regression model for complex target RCS predicting
Wang Gu; Chen Weishi; Miao Jungang
2009-01-01
The electromagnetic scattering computation has developed rapidly for many years; some computing problems for complex and coated targets cannot be solved by using the existing theory and computing models. A computing model based on data is established for making up the insufficiency of theoretic models. Based on the "support vector regression method", which is formulated on the principle of minimizing a structural risk, a data model to predicate the unknown radar cross section of some appointed targets is given. Comparison between the actual data and the results of this predicting model based on support vector regression method proved that the support vector regression method is workable and with a comparative precision.
Rank-preserving regression: a more robust rank regression model against outliers.
Chen, Tian; Kowalski, Jeanne; Chen, Rui; Wu, Pan; Zhang, Hui; Feng, Changyong; Tu, Xin M
2016-08-30
Mean-based semi-parametric regression models such as the popular generalized estimating equations are widely used to improve robustness of inference over parametric models. Unfortunately, such models are quite sensitive to outlying observations. The Wilcoxon-score-based rank regression (RR) provides more robust estimates over generalized estimating equations against outliers. However, the RR and its extensions do not sufficiently address missing data arising in longitudinal studies. In this paper, we propose a new approach to address outliers under a different framework based on the functional response models. This functional-response-model-based alternative not only addresses limitations of the RR and its extensions for longitudinal data, but, with its rank-preserving property, even provides more robust estimates than these alternatives. The proposed approach is illustrated with both real and simulated data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Nonlinear and Non Normal Regression Models in Physiological Research
1984-01-01
Applications of nonlinear and non normal regression models are in increasing order for appropriate interpretation of complex phenomenon of biomedical sciences. This paper reviews critically some applications of these models physiological research.
Identification of Influential Points in a Linear Regression Model
Jan Grosz
2011-03-01
Full Text Available The article deals with the detection and identification of influential points in the linear regression model. Three methods of detection of outliers and leverage points are described. These procedures can also be used for one-sample (independentdatasets. This paper briefly describes theoretical aspects of several robust methods as well. Robust statistics is a powerful tool to increase the reliability and accuracy of statistical modelling and data analysis. A simulation model of the simple linear regression is presented.
Adaptive Regression and Classification Models with Applications in Insurance
Jekabsons Gints
2014-07-01
Full Text Available Nowadays, in the insurance industry the use of predictive modeling by means of regression and classification techniques is becoming increasingly important and popular. The success of an insurance company largely depends on the ability to perform such tasks as credibility estimation, determination of insurance premiums, estimation of probability of claim, detecting insurance fraud, managing insurance risk. This paper discusses regression and classification modeling for such types of prediction problems using the method of Adaptive Basis Function Construction
Geometric Properties of AR（q） Nonlinear Regression Models
LIUYing-ar; WEIBo-cheng
2004-01-01
This paper is devoted to a study of geometric properties of AR(q) nonlinear regression models. We present geometric frameworks for regression parameter space and autoregression parameter space respectively based on the weighted inner product by fisher information matrix. Several geometric properties related to statistical curvatures are given for the models. The results of this paper extended the work of Bates & Watts(1980,1988)[1.2] and Seber & Wild (1989)[3].
Linard, Joshua I.
2013-01-01
Mitigating the effects of salt and selenium on water quality in the Grand Valley and lower Gunnison River Basin in western Colorado is a major concern for land managers. Previous modeling indicated means to improve the models by including more detailed geospatial data and a more rigorous method for developing the models. After evaluating all possible combinations of geospatial variables, four multiple linear regression models resulted that could estimate irrigation-season salt yield, nonirrigation-season salt yield, irrigation-season selenium yield, and nonirrigation-season selenium yield. The adjusted r-squared and the residual standard error (in units of log-transformed yield) of the models were, respectively, 0.87 and 2.03 for the irrigation-season salt model, 0.90 and 1.25 for the nonirrigation-season salt model, 0.85 and 2.94 for the irrigation-season selenium model, and 0.93 and 1.75 for the nonirrigation-season selenium model. The four models were used to estimate yields and loads from contributing areas corresponding to 12-digit hydrologic unit codes in the lower Gunnison River Basin study area. Each of the 175 contributing areas was ranked according to its estimated mean seasonal yield of salt and selenium.
Robust Depth-Weighted Wavelet for Nonparametric Regression Models
Lu LIN
2005-01-01
In the nonpaxametric regression models, the original regression estimators including kernel estimator, Fourier series estimator and wavelet estimator are always constructed by the weighted sum of data, and the weights depend only on the distance between the design points and estimation points. As a result these estimators are not robust to the perturbations in data. In order to avoid this problem, a new nonparametric regression model, called the depth-weighted regression model, is introduced and then the depth-weighted wavelet estimation is defined. The new estimation is robust to the perturbations in data, which attains very high breakdown value close to 1/2. On the other hand, some asymptotic behaviours such as asymptotic normality are obtained. Some simulations illustrate that the proposed wavelet estimator is more robust than the original wavelet estimator and, as a price to pay for the robustness, the new method is slightly less efficient than the original method.
Modeling seasonal measles transmission in China
Bai, Zhenguo; Liu, Dan
2015-08-01
A discrete-time deterministic measles model with periodic transmission rate is formulated and studied. The basic reproduction number R0 is defined and used as the threshold parameter in determining the dynamics of the model. It is shown that the disease will die out if R0 1 . Parameters in the model are estimated on the basis of demographic and epidemiological data. Numerical simulations are presented to describe the seasonal fluctuation of measles infection in China.
Wavelet regression model in forecasting crude oil price
Hamid, Mohd Helmie; Shabri, Ani
2017-05-01
This study presents the performance of wavelet multiple linear regression (WMLR) technique in daily crude oil forecasting. WMLR model was developed by integrating the discrete wavelet transform (DWT) and multiple linear regression (MLR) model. The original time series was decomposed to sub-time series with different scales by wavelet theory. Correlation analysis was conducted to assist in the selection of optimal decomposed components as inputs for the WMLR model. The daily WTI crude oil price series has been used in this study to test the prediction capability of the proposed model. The forecasting performance of WMLR model were also compared with regular multiple linear regression (MLR), Autoregressive Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) using root mean square errors (RMSE) and mean absolute errors (MAE). Based on the experimental results, it appears that the WMLR model performs better than the other forecasting technique tested in this study.
Regression Model Optimization for the Analysis of Experimental Data
Ulbrich, N.
2009-01-01
A candidate math model search algorithm was developed at Ames Research Center that determines a recommended math model for the multivariate regression analysis of experimental data. The search algorithm is applicable to classical regression analysis problems as well as wind tunnel strain gage balance calibration analysis applications. The algorithm compares the predictive capability of different regression models using the standard deviation of the PRESS residuals of the responses as a search metric. This search metric is minimized during the search. Singular value decomposition is used during the search to reject math models that lead to a singular solution of the regression analysis problem. Two threshold dependent constraints are also applied. The first constraint rejects math models with insignificant terms. The second constraint rejects math models with near-linear dependencies between terms. The math term hierarchy rule may also be applied as an optional constraint during or after the candidate math model search. The final term selection of the recommended math model depends on the regressor and response values of the data set, the user s function class combination choice, the user s constraint selections, and the result of the search metric minimization. A frequently used regression analysis example from the literature is used to illustrate the application of the search algorithm to experimental data.
The Seasons Explained by Refutational Modeling Activities
Frede, Valerie
2008-01-01
This article describes the principles and investigation of a small-group laboratory activity based on refutational modeling to teach the concept of seasons to preservice elementary teachers. The results show that these teachers improved significantly when they had to refute their initial misconceptions practically. (Contains 8 figures and 1 table.)
Alternative regression models to assess increase in childhood BMI
Mansmann Ulrich
2008-09-01
Full Text Available Abstract Background Body mass index (BMI data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Methods Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs, quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS. We analyzed data of 4967 children participating in the school entry health examination in Bavaria, Germany, from 2001 to 2002. TV watching, meal frequency, breastfeeding, smoking in pregnancy, maternal obesity, parental social class and weight gain in the first 2 years of life were considered as risk factors for obesity. Results GAMLSS showed a much better fit regarding the estimation of risk factors effects on transformed and untransformed BMI data than common GLMs with respect to the generalized Akaike information criterion. In comparison with GAMLSS, quantile regression allowed for additional interpretation of prespecified distribution quantiles, such as quantiles referring to overweight or obesity. The variables TV watching, maternal BMI and weight gain in the first 2 years were directly, and meal frequency was inversely significantly associated with body composition in any model type examined. In contrast, smoking in pregnancy was not directly, and breastfeeding and parental social class were not inversely significantly associated with body composition in GLM models, but in GAMLSS and partly in quantile regression models. Risk factor specific BMI percentile curves could be estimated from GAMLSS and quantile regression models. Conclusion GAMLSS and quantile regression seem to be more appropriate than common GLMs for risk factor modeling of BMI data.
Credit Scoring Model Hybridizing Artificial Intelligence with Logistic Regression
Han Lu
2013-01-01
Full Text Available Today the most commonly used techniques for credit scoring are artificial intelligence and statistics. In this paper, we started a new way to use these two kinds of models. Through logistic regression filters the variables with a high degree of correlation, artificial intelligence models reduce complexity and accelerate convergence, while these models hybridizing logistic regression have better explanations in statistically significance, thus improve the effect of artificial intelligence models. With experiments on German data set, we find an interesting phenomenon defined as ‘Dimensional interference’ with support vector machine and from cross validation it can be seen that the new method gives a lot of help with credit scoring.
Analysis of Sting Balance Calibration Data Using Optimized Regression Models
Ulbrich, N.; Bader, Jon B.
2010-01-01
Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.
Group Lasso for high dimensional sparse quantile regression models
Kato, Kengo
2011-01-01
This paper studies the statistical properties of the group Lasso estimator for high dimensional sparse quantile regression models where the number of explanatory variables (or the number of groups of explanatory variables) is possibly much larger than the sample size while the number of variables in "active" groups is sufficiently small. We establish a non-asymptotic bound on the $\\ell_{2}$-estimation error of the estimator. This bound explains situations under which the group Lasso estimator is potentially superior/inferior to the $\\ell_{1}$-penalized quantile regression estimator in terms of the estimation error. We also propose a data-dependent choice of the tuning parameter to make the method more practical, by extending the original proposal of Belloni and Chernozhukov (2011) for the $\\ell_{1}$-penalized quantile regression estimator. As an application, we analyze high dimensional additive quantile regression models. We show that under a set of primitive regularity conditions, the group Lasso estimator c...
A generalized exponential time series regression model for electricity prices
Haldrup, Niels; Knapik, Oskar; Proietti, Tomasso
We consider the issue of modeling and forecasting daily electricity spot prices on the Nord Pool Elspot power market. We propose a method that can handle seasonal and non-seasonal persistence by modelling the price series as a generalized exponential process. As the presence of spikes can distort...... the estimation of the dynamic structure of the series we consider an iterative estimation strategy which, conditional on a set of parameter estimates, clears the spikes using a data cleaning algorithm, and reestimates the parameters using the cleaned data so as to robustify the estimates. Conditional...... on the estimated model, the best linear predictor is constructed. Our modeling approach provides good fit within sample and outperforms competing benchmark predictors in terms of forecasting accuracy. We also find that building separate models for each hour of the day and averaging the forecasts is a better...
Azadi, Sama; Karimi-Jashni, Ayoub
2016-02-01
Predicting the mass of solid waste generation plays an important role in integrated solid waste management plans. In this study, the performance of two predictive models, Artificial Neural Network (ANN) and Multiple Linear Regression (MLR) was verified to predict mean Seasonal Municipal Solid Waste Generation (SMSWG) rate. The accuracy of the proposed models is illustrated through a case study of 20 cities located in Fars Province, Iran. Four performance measures, MAE, MAPE, RMSE and R were used to evaluate the performance of these models. The MLR, as a conventional model, showed poor prediction performance. On the other hand, the results indicated that the ANN model, as a non-linear model, has a higher predictive accuracy when it comes to prediction of the mean SMSWG rate. As a result, in order to develop a more cost-effective strategy for waste management in the future, the ANN model could be used to predict the mean SMSWG rate.
Joint regression analysis and AMMI model applied to oat improvement
Oliveira, A.; Oliveira, T. A.; Mejza, S.
2012-09-01
In our work we present an application of some biometrical methods useful in genotype stability evaluation, namely AMMI model, Joint Regression Analysis (JRA) and multiple comparison tests. A genotype stability analysis of oat (Avena Sativa L.) grain yield was carried out using data of the Portuguese Plant Breeding Board, sample of the 22 different genotypes during the years 2002, 2003 and 2004 in six locations. In Ferreira et al. (2006) the authors state the relevance of the regression models and of the Additive Main Effects and Multiplicative Interactions (AMMI) model, to study and to estimate phenotypic stability effects. As computational techniques we use the Zigzag algorithm to estimate the regression coefficients and the agricolae-package available in R software for AMMI model analysis.
Buffalos milk yield analysis using random regression models
A.S. Schierholt
2010-02-01
Full Text Available Data comprising 1,719 milk yield records from 357 females (predominantly Murrah breed, daughters of 110 sires, with births from 1974 to 2004, obtained from the Programa de Melhoramento Genético de Bubalinos (PROMEBUL and from records of EMBRAPA Amazônia Oriental - EAO herd, located in Belém, Pará, Brazil, were used to compare random regression models for estimating variance components and predicting breeding values of the sires. The data were analyzed by different models using the Legendre’s polynomial functions from second to fourth orders. The random regression models included the effects of herd-year, month of parity date of the control; regression coefficients for age of females (in order to describe the fixed part of the lactation curve and random regression coefficients related to the direct genetic and permanent environment effects. The comparisons among the models were based on the Akaike Infromation Criterion. The random effects regression model using third order Legendre’s polynomials with four classes of the environmental effect were the one that best described the additive genetic variation in milk yield. The heritability estimates varied from 0.08 to 0.40. The genetic correlation between milk yields in younger ages was close to the unit, but in older ages it was low.
Optimization of Regression Models of Experimental Data Using Confirmation Points
Ulbrich, N.
2010-01-01
A new search metric is discussed that may be used to better assess the predictive capability of different math term combinations during the optimization of a regression model of experimental data. The new search metric can be determined for each tested math term combination if the given experimental data set is split into two subsets. The first subset consists of data points that are only used to determine the coefficients of the regression model. The second subset consists of confirmation points that are exclusively used to test the regression model. The new search metric value is assigned after comparing two values that describe the quality of the fit of each subset. The first value is the standard deviation of the PRESS residuals of the data points. The second value is the standard deviation of the response residuals of the confirmation points. The greater of the two values is used as the new search metric value. This choice guarantees that both standard deviations are always less or equal to the value that is used during the optimization. Experimental data from the calibration of a wind tunnel strain-gage balance is used to illustrate the application of the new search metric. The new search metric ultimately generates an optimized regression model that was already tested at regression model independent confirmation points before it is ever used to predict an unknown response from a set of regressors.
Geographically Weighted Logistic Regression Applied to Credit Scoring Models
Pedro Henrique Melo Albuquerque
Full Text Available Abstract This study used real data from a Brazilian financial institution on transactions involving Consumer Direct Credit (CDC, granted to clients residing in the Distrito Federal (DF, to construct credit scoring models via Logistic Regression and Geographically Weighted Logistic Regression (GWLR techniques. The aims were: to verify whether the factors that influence credit risk differ according to the borrower’s geographic location; to compare the set of models estimated via GWLR with the global model estimated via Logistic Regression, in terms of predictive power and financial losses for the institution; and to verify the viability of using the GWLR technique to develop credit scoring models. The metrics used to compare the models developed via the two techniques were the AICc informational criterion, the accuracy of the models, the percentage of false positives, the sum of the value of false positive debt, and the expected monetary value of portfolio default compared with the monetary value of defaults observed. The models estimated for each region in the DF were distinct in their variables and coefficients (parameters, with it being concluded that credit risk was influenced differently in each region in the study. The Logistic Regression and GWLR methodologies presented very close results, in terms of predictive power and financial losses for the institution, and the study demonstrated viability in using the GWLR technique to develop credit scoring models for the target population in the study.
CICAAR - Convolutive ICA with an Auto-Regressive Inverse Model
Dyrholm, Mads; Hansen, Lars Kai
2004-01-01
We invoke an auto-regressive IIR inverse model for convolutive ICA and derive expressions for the likelihood and its gradient. We argue that optimization will give a stable inverse. When there are more sensors than sources the mixing model parameters are estimated in a second step by least squares...
Systematic evaluation of land use regression models for NO₂
Wang, M.|info:eu-repo/dai/nl/345480279; Beelen, R.M.J.|info:eu-repo/dai/nl/30483100X; Eeftens, M.R.|info:eu-repo/dai/nl/315028300; Meliefste, C.; Hoek, G.|info:eu-repo/dai/nl/069553475; Brunekreef, B.|info:eu-repo/dai/nl/067548180
2012-01-01
Land use regression (LUR) models have become popular to explain the spatial variation of air pollution concentrations. Independent evaluation is important. We developed LUR models for nitrogen dioxide (NO(2)) using measurements conducted at 144 sampling sites in The Netherlands. Sites were randomly
FUNCTIONAL-COEFFICIENT REGRESSION MODEL AND ITS ESTIMATION
无
2001-01-01
In this paper,a class of functional-coefficient regression models is proposed and an estimation procedure based on the locally weighted least equares is suggested. This class of models,with the proposed estimation method,is a powerful means for exploratory data analysis.
Seasonal radiative modeling of Titan's stratosphere
Bézard, Bruno; Vinatier, Sandrine; Achterberg, Richard
2016-10-01
We have developed a seasonal radiative model of Titan's stratosphere to investigate the time variation of stratospheric temperatures in the 10-3 - 5 mbar range as observed by the Cassini/CIRS spectrometer. The model incorporates gas and aerosol vertical profiles derived from Cassini/CIRS spectra to calculate the heating and cooling rate profiles as a function of time and latitude. In the equatorial region, the radiative equilibrium profile is warmer than the observed one. Adding adiabatic cooling in the energy equation, with a vertical velocity profile decreasing with depth and having w ≈ 0.4 mm sec-1 at 1 mbar, allows us to reproduce the observed profile. The model predicts a 5 K decrease at 1 mbar between 2008 and 2016 as a result of orbit eccentricity, in relatively good agreement with the observations. At other latitudes, as expected, the radiative model predicts seasonal variations of temperature larger than observed, pointing to latitudinal redistribution of heat by dynamics. Vertical velocities seasonally varying between -0.4 and 1.2 mm sec-1 at 1 mbar provide adiabatic cooling and heating adequate to reproduce the time variation of 1-mbar temperatures from 2005 to 2016 at 30°N and S. The model is also used to investigate the role of the strong compositional changes observed at high southern latitudes after equinox in the concomitant rapid cooling of the stratosphere.
Statistical Seasonal Sea Surface based Prediction Model
Suarez, Roberto; Rodriguez-Fonseca, Belen; Diouf, Ibrahima
2014-05-01
The interannual variability of the sea surface temperature (SST) plays a key role in the strongly seasonal rainfall regime on the West African region. The predictability of the seasonal cycle of rainfall is a field widely discussed by the scientific community, with results that fail to be satisfactory due to the difficulty of dynamical models to reproduce the behavior of the Inter Tropical Convergence Zone (ITCZ). To tackle this problem, a statistical model based on oceanic predictors has been developed at the Universidad Complutense of Madrid (UCM) with the aim to complement and enhance the predictability of the West African Monsoon (WAM) as an alternative to the coupled models. The model, called S4CAST (SST-based Statistical Seasonal Forecast) is based on discriminant analysis techniques, specifically the Maximum Covariance Analysis (MCA) and Canonical Correlation Analysis (CCA). Beyond the application of the model to the prediciton of rainfall in West Africa, its use extends to a range of different oceanic, atmospheric and helth related parameters influenced by the temperature of the sea surface as a defining factor of variability.
Fitting Additive Binomial Regression Models with the R Package blm
Stephanie Kovalchik
2013-09-01
Full Text Available The R package blm provides functions for fitting a family of additive regression models to binary data. The included models are the binomial linear model, in which all covariates have additive effects, and the linear-expit (lexpit model, which allows some covariates to have additive effects and other covariates to have logisitc effects. Additive binomial regression is a model of event probability, and the coefficients of linear terms estimate covariate-adjusted risk differences. Thus, in contrast to logistic regression, additive binomial regression puts focus on absolute risk and risk differences. In this paper, we give an overview of the methodology we have developed to fit the binomial linear and lexpit models to binary outcomes from cohort and population-based case-control studies. We illustrate the blm packages methods for additive model estimation, diagnostics, and inference with risk association analyses of a bladder cancer nested case-control study in the NIH-AARP Diet and Health Study.
Maximum Entropy Discrimination Poisson Regression for Software Reliability Modeling.
Chatzis, Sotirios P; Andreou, Andreas S
2015-11-01
Reliably predicting software defects is one of the most significant tasks in software engineering. Two of the major components of modern software reliability modeling approaches are: 1) extraction of salient features for software system representation, based on appropriately designed software metrics and 2) development of intricate regression models for count data, to allow effective software reliability data modeling and prediction. Surprisingly, research in the latter frontier of count data regression modeling has been rather limited. More specifically, a lack of simple and efficient algorithms for posterior computation has made the Bayesian approaches appear unattractive, and thus underdeveloped in the context of software reliability modeling. In this paper, we try to address these issues by introducing a novel Bayesian regression model for count data, based on the concept of max-margin data modeling, effected in the context of a fully Bayesian model treatment with simple and efficient posterior distribution updates. Our novel approach yields a more discriminative learning technique, making more effective use of our training data during model inference. In addition, it allows of better handling uncertainty in the modeled data, which can be a significant problem when the training data are limited. We derive elegant inference algorithms for our model under the mean-field paradigm and exhibit its effectiveness using the publicly available benchmark data sets.
Sugarcane Land Classification with Satellite Imagery using Logistic Regression Model
Henry, F.; Herwindiati, D. E.; Mulyono, S.; Hendryli, J.
2017-03-01
This paper discusses the classification of sugarcane plantation area from Landsat-8 satellite imagery. The classification process uses binary logistic regression method with time series data of normalized difference vegetation index as input. The process is divided into two steps: training and classification. The purpose of training step is to identify the best parameter of the regression model using gradient descent algorithm. The best fit of the model can be utilized to classify sugarcane and non-sugarcane area. The experiment shows high accuracy and successfully maps the sugarcane plantation area which obtained best result of Cohen’s Kappa value 0.7833 (strong) with 89.167% accuracy.
The art of regression modeling in road safety
Hauer, Ezra
2015-01-01
This unique book explains how to fashion useful regression models from commonly available data to erect models essential for evidence-based road safety management and research. Composed from techniques and best practices presented over many years of lectures and workshops, The Art of Regression Modeling in Road Safety illustrates that fruitful modeling cannot be done without substantive knowledge about the modeled phenomenon. Class-tested in courses and workshops across North America, the book is ideal for professionals, researchers, university professors, and graduate students with an interest in, or responsibilities related to, road safety. This book also: · Presents for the first time a powerful analytical tool for road safety researchers and practitioners · Includes problems and solutions in each chapter as well as data and spreadsheets for running models and PowerPoint presentation slides · Features pedagogy well-suited for graduate courses and workshops including problems, solutions, and PowerPoint p...
A regression-kriging model for estimation of rainfall in the Laohahe basin
Wang, Hong; Ren, Li L.; Liu, Gao H.
2009-10-01
This paper presents a multivariate geostatistical algorithm called regression-kriging (RK) for predicting the spatial distribution of rainfall by incorporating five topographic/geographic factors of latitude, longitude, altitude, slope and aspect. The technique is illustrated using rainfall data collected at 52 rain gauges from the Laohahe basis in northeast China during 1986-2005 . Rainfall data from 44 stations were selected for modeling and the remaining 8 stations were used for model validation. To eliminate multicollinearity, the five explanatory factors were first transformed using factor analysis with three Principal Components (PCs) extracted. The rainfall data were then fitted using step-wise regression and residuals interpolated using SK. The regression coefficients were estimated by generalized least squares (GLS), which takes the spatial heteroskedasticity between rainfall and PCs into account. Finally, the rainfall prediction based on RK was compared with that predicted from ordinary kriging (OK) and ordinary least squares (OLS) multiple regression (MR). For correlated topographic factors are taken into account, RK improves the efficiency of predictions. RK achieved a lower relative root mean square error (RMSE) (44.67%) than MR (49.23%) and OK (73.60%) and a lower bias than MR and OK (23.82 versus 30.89 and 32.15 mm) for annual rainfall. It is much more effective for the wet season than for the dry season. RK is suitable for estimation of rainfall in areas where there are no stations nearby and where topography has a major influence on rainfall.
Logistic regression for risk factor modelling in stuttering research.
Reed, Phil; Wu, Yaqionq
2013-06-01
To outline the uses of logistic regression and other statistical methods for risk factor analysis in the context of research on stuttering. The principles underlying the application of a logistic regression are illustrated, and the types of questions to which such a technique has been applied in the stuttering field are outlined. The assumptions and limitations of the technique are discussed with respect to existing stuttering research, and with respect to formulating appropriate research strategies to accommodate these considerations. Finally, some alternatives to the approach are briefly discussed. The way the statistical procedures are employed are demonstrated with some hypothetical data. Research into several practical issues concerning stuttering could benefit if risk factor modelling were used. Important examples are early diagnosis, prognosis (whether a child will recover or persist) and assessment of treatment outcome. After reading this article you will: (a) Summarize the situations in which logistic regression can be applied to a range of issues about stuttering; (b) Follow the steps in performing a logistic regression analysis; (c) Describe the assumptions of the logistic regression technique and the precautions that need to be checked when it is employed; (d) Be able to summarize its advantages over other techniques like estimation of group differences and simple regression. Copyright © 2012 Elsevier Inc. All rights reserved.
Direction of Effects in Multiple Linear Regression Models.
Wiedermann, Wolfgang; von Eye, Alexander
2015-01-01
Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed.
Modelling multimodal photometric redshift regression with noisy observations
Kügler, S D
2016-01-01
In this work, we are trying to extent the existing photometric redshift regression models from modeling pure photometric data back to the spectra themselves. To that end, we developed a PCA that is capable of describing the input uncertainty (including missing values) in a dimensionality reduction framework. With this "spectrum generator" at hand, we are capable of treating the redshift regression problem in a fully Bayesian framework, returning a posterior distribution over the redshift. This approach allows therefore to approach the multimodal regression problem in an adequate fashion. In addition, input uncertainty on the magnitudes can be included quite naturally and lastly, the proposed algorithm allows in principle to make predictions outside the training values which makes it a fascinating opportunity for the detection of high-redshifted quasars.
Robust Bayesian Regularized Estimation Based on t Regression Model
Zean Li
2015-01-01
Full Text Available The t distribution is a useful extension of the normal distribution, which can be used for statistical modeling of data sets with heavy tails, and provides robust estimation. In this paper, in view of the advantages of Bayesian analysis, we propose a new robust coefficient estimation and variable selection method based on Bayesian adaptive Lasso t regression. A Gibbs sampler is developed based on the Bayesian hierarchical model framework, where we treat the t distribution as a mixture of normal and gamma distributions and put different penalization parameters for different regression coefficients. We also consider the Bayesian t regression with adaptive group Lasso and obtain the Gibbs sampler from the posterior distributions. Both simulation studies and real data example show that our method performs well compared with other existing methods when the error distribution has heavy tails and/or outliers.
A Multi-objective Procedure for Efficient Regression Modeling
Sinha, Ankur; Kuosmanen, Timo
2012-01-01
Variable selection is recognized as one of the most critical steps in statistical modeling. The problems encountered in engineering and social sciences are commonly characterized by over-abundance of explanatory variables, non-linearities and unknown interdependencies between the regressors. An added difficulty is that the analysts may have little or no prior knowledge on the relative importance of the variables. To provide a robust method for model selection, this paper introduces a technique called the Multi-objective Genetic Algorithm for Variable Selection (MOGA-VS) which provides the user with an efficient set of regression models for a given data-set. The algorithm considers the regression problem as a two objective task, where the purpose is to choose those models over the other which have less number of regression coefficients and better goodness of fit. In MOGA-VS, the model selection procedure is implemented in two steps. First, we generate the frontier of all efficient or non-dominated regression m...
Analyzing industrial energy use through ordinary least squares regression models
Golden, Allyson Katherine
Extensive research has been performed using regression analysis and calibrated simulations to create baseline energy consumption models for residential buildings and commercial institutions. However, few attempts have been made to discuss the applicability of these methodologies to establish baseline energy consumption models for industrial manufacturing facilities. In the few studies of industrial facilities, the presented linear change-point and degree-day regression analyses illustrate ideal cases. It follows that there is a need in the established literature to discuss the methodologies and to determine their applicability for establishing baseline energy consumption models of industrial manufacturing facilities. The thesis determines the effectiveness of simple inverse linear statistical regression models when establishing baseline energy consumption models for industrial manufacturing facilities. Ordinary least squares change-point and degree-day regression methods are used to create baseline energy consumption models for nine different case studies of industrial manufacturing facilities located in the southeastern United States. The influence of ambient dry-bulb temperature and production on total facility energy consumption is observed. The energy consumption behavior of industrial manufacturing facilities is only sometimes sufficiently explained by temperature, production, or a combination of the two variables. This thesis also provides methods for generating baseline energy models that are straightforward and accessible to anyone in the industrial manufacturing community. The methods outlined in this thesis may be easily replicated by anyone that possesses basic spreadsheet software and general knowledge of the relationship between energy consumption and weather, production, or other influential variables. With the help of simple inverse linear regression models, industrial manufacturing facilities may better understand their energy consumption and
Prediction of soil temperature using regression and artificial neural network models
Bilgili, Mehmet
2010-12-01
In this study, monthly soil temperature was modeled by linear regression (LR), nonlinear regression (NLR) and artificial neural network (ANN) methods. The soil temperature and other meteorological parameters, which have been taken from Adana meteorological station, were observed between the years of 2000 and 2007 by the Turkish State Meteorological Service (TSMS). The soil temperatures were measured at depths of 5, 10, 20, 50 and 100 cm below the ground level. A three-layer feed-forward ANN structure was constructed and a back-propagation algorithm was used for the training of ANNs. In order to get a successful simulation, the correlation coefficients between all of the meteorological variables (soil temperature, atmospheric temperature, atmospheric pressure, relative humidity, wind speed, rainfall, global solar radiation and sunshine duration) were calculated taking them two by two. First, all independent variables were split into two time periods such as cold and warm seasons. They were added to the enter regression model. Then, the method of stepwise multiple regression was applied for the selection of the "best" regression equation (model). Thus, the best independent variables were selected for the LR and NLR models and they were also used in the input layer of the ANN method. Results of these methods were compared to each other. Finally, the ANN method was found to provide better performance than the LR and NLR methods.
Applications of some discrete regression models for count data
B. M. Golam Kibria
2006-01-01
Full Text Available In this paper we have considered several regression models to fit the count data that encounter in the field of Biometrical, Environmental, Social Sciences and Transportation Engineering. We have fitted Poisson (PO, Negative Binomial (NB, Zero-Inflated Poisson (ZIP and Zero-Inflated Negative Binomial (ZINB regression models to run-off-road (ROR crash data which collected on arterial roads in south region (rural of Florida State. To compare the performance of these models, we analyzed data with moderate to high percentage of zero counts. Because the variances were almost three times greater than the means, it appeared that both NB and ZINB models performed better than PO and ZIP models for the zero inflated and over dispersed count data.
A regression model to estimate regional ground water recharge.
Lorenz, David L; Delin, Geoffrey N
2007-01-01
A regional regression model was developed to estimate the spatial distribution of ground water recharge in subhumid regions. The regional regression recharge (RRR) model was based on a regression of basin-wide estimates of recharge from surface water drainage basins, precipitation, growing degree days (GDD), and average basin specific yield (SY). Decadal average recharge, precipitation, and GDD were used in the RRR model. The RRR estimates were derived from analysis of stream base flow using a computer program that was based on the Rorabaugh method. As expected, there was a strong correlation between recharge and precipitation. The model was applied to statewide data in Minnesota. Where precipitation was least in the western and northwestern parts of the state (50 to 65 cm/year), recharge computed by the RRR model also was lowest (0 to 5 cm/year). A strong correlation also exists between recharge and SY. SY was least in areas where glacial lake clay occurs, primarily in the northwest part of the state; recharge estimates in these areas were in the 0- to 5-cm/year range. In sand-plain areas where SY is greatest, recharge estimates were in the 15- to 29-cm/year range on the basis of the RRR model. Recharge estimates that were based on the RRR model compared favorably with estimates made on the basis of other methods. The RRR model can be applied in other subhumid regions where region wide data sets of precipitation, streamflow, GDD, and soils data are available.
Harrell , Jr , Frank E
2015-01-01
This highly anticipated second edition features new chapters and sections, 225 new references, and comprehensive R software. In keeping with the previous edition, this book is about the art and science of data analysis and predictive modeling, which entails choosing and using multiple tools. Instead of presenting isolated techniques, this text emphasizes problem solving strategies that address the many issues arising when developing multivariable models using real data and not standard textbook examples. It includes imputation methods for dealing with missing data effectively, methods for fitting nonlinear relationships and for making the estimation of transformations a formal part of the modeling process, methods for dealing with "too many variables to analyze and not enough observations," and powerful model validation techniques based on the bootstrap. The reader will gain a keen understanding of predictive accuracy, and the harm of categorizing continuous predictors or outcomes. This text realistically...
Modeling energy expenditure in children and adolescents using quantile regression
Advanced mathematical models have the potential to capture the complex metabolic and physiological processes that result in energy expenditure (EE). Study objective is to apply quantile regression (QR) to predict EE and determine quantile-dependent variation in covariate effects in nonobese and obes...
Linearity and Misspecification Tests for Vector Smooth Transition Regression Models
Teräsvirta, Timo; Yang, Yukai
The purpose of the paper is to derive Lagrange multiplier and Lagrange multiplier type specification and misspecification tests for vector smooth transition regression models. We report results from simulation studies in which the size and power properties of the proposed asymptotic tests in small...
Trimmed Likelihood-based Estimation in Binary Regression Models
Cizek, P.
2005-01-01
The binary-choice regression models such as probit and logit are typically estimated by the maximum likelihood method.To improve its robustness, various M-estimation based procedures were proposed, which however require bias corrections to achieve consistency and their resistance to outliers is rela
PARAMETER ESTIMATION IN LINEAR REGRESSION MODELS FOR LONGITUDINAL CONTAMINATED DATA
QianWeimin; LiYumei
2005-01-01
The parameter estimation and the coefficient of contamination for the regression models with repeated measures are studied when its response variables are contaminated by another random variable sequence. Under the suitable conditions it is proved that the estimators which are established in the paper are strongly consistent estimators.
Change-point estimation for censored regression model
Zhan-feng WANG; Yao-hua WU; Lin-cheng ZHAO
2007-01-01
In this paper, we consider the change-point estimation in the censored regression model assuming that there exists one change point. A nonparametric estimate of the change-point is proposed and is shown to be strongly consistent. Furthermore, its convergence rate is also obtained.
Establishment of Statistical Model for Precipitation Prediction in the Flood Season in China
无
2011-01-01
[Objective] The research aimed to establish the regression model which was used to predict the precipitation in the flood season in China.[Method] Based on statistical model,North Atlantic oscillation index and the sea surface temperature index in development and declining stages of ENSO were used to predict East Asian summer monsoon index.After the stations were divided into 16 zones,the same factors were used to establish the regression model predicting the station precipitation in the flood season in Chi...
Improved Methodology for Parameter Inference in Nonlinear, Hydrologic Regression Models
Bates, Bryson C.
1992-01-01
A new method is developed for the construction of reliable marginal confidence intervals and joint confidence regions for the parameters of nonlinear, hydrologic regression models. A parameter power transformation is combined with measures of the asymptotic bias and asymptotic skewness of maximum likelihood estimators to determine the transformation constants which cause the bias or skewness to vanish. These optimized constants are used to construct confidence intervals and regions for the transformed model parameters using linear regression theory. The resulting confidence intervals and regions can be easily mapped into the original parameter space to give close approximations to likelihood method confidence intervals and regions for the model parameters. Unlike many other approaches to parameter transformation, the procedure does not use a grid search to find the optimal transformation constants. An example involving the fitting of the Michaelis-Menten model to velocity-discharge data from an Australian gauging station is used to illustrate the usefulness of the methodology.
On modified skew logistic regression model and its applications
C. Satheesh Kumar
2015-12-01
Full Text Available Here we consider a modiﬁed form of the logistic regression model useful for situations where the dependent variable is dichotomous in nature and the explanatory variables exhibit asymmetric and multimodal behaviour. The proposed model has been ﬁtted to some real life data set by using method of maximum likelihood estimation and illustrated its usefulness in certain medical applications.
Improved Testing and Specifivations of Smooth Transition Regression Models
Escribano, Álvaro; Jordá, Óscar
1997-01-01
This paper extends previous work in Escribano and Jordá (1997)and introduces new LM specification procedures to choose between Logistic and Exponential Smooth Transition Regression (STR)Models. These procedures are simpler, consistent and more powerful than those previously available in the literature. An analysis of the properties of Taylor approximations around the transition function of STR models permits one to understand why these procedures work better and it suggests ways to improve te...
Support vector regression-based internal model control
HUANG Yan-wei; PENG Tie-gen
2007-01-01
This paper proposes a design of internal model control systems for process with delay by using support vector regression (SVR). The proposed system fully uses the excellent nonlinear estimation performance of SVR with the structural risk minimization principle. Closed-system stability and steady error are analyzed for the existence of modeling errors. The simulations show that the proposed control systems have the better control performance than that by neural networks in the cases of the training samples with small size and noises.
CONSERVATIVE ESTIMATING FUNCTIONIN THE NONLINEAR REGRESSION MODEL WITHAGGREGATED DATA
无
2000-01-01
The purpose of this paper is to study the theory of conservative estimating functions in nonlinear regression model with aggregated data. In this model, a quasi-score function with aggregated data is defined. When this function happens to be conservative, it is projection of the true score function onto a class of estimation functions. By constructing, the potential function for the projected score with aggregated data is obtained, which have some properties of log-likelihood function.
Using regression models to determine the poroelastic properties of cartilage.
Chung, Chen-Yuan; Mansour, Joseph M
2013-07-26
The feasibility of determining biphasic material properties using regression models was investigated. A transversely isotropic poroelastic finite element model of stress relaxation was developed and validated against known results. This model was then used to simulate load intensity for a wide range of material properties. Linear regression equations for load intensity as a function of the five independent material properties were then developed for nine time points (131, 205, 304, 390, 500, 619, 700, 800, and 1000s) during relaxation. These equations illustrate the effect of individual material property on the stress in the time history. The equations at the first four time points, as well as one at a later time (five equations) could be solved for the five unknown material properties given computed values of the load intensity. Results showed that four of the five material properties could be estimated from the regression equations to within 9% of the values used in simulation if time points up to 1000s are included in the set of equations. However, reasonable estimates of the out of plane Poisson's ratio could not be found. Although all regression equations depended on permeability, suggesting that true equilibrium was not realized at 1000s of simulation, it was possible to estimate material properties to within 10% of the expected values using equations that included data up to 800s. This suggests that credible estimates of most material properties can be obtained from tests that are not run to equilibrium, which is typically several thousand seconds.
On concurvity in nonlinear and nonparametric regression models
Sonia Amodio
2014-12-01
Full Text Available When data are affected by multicollinearity in the linear regression framework, then concurvity will be present in fitting a generalized additive model (GAM. The term concurvity describes nonlinear dependencies among the predictor variables. As collinearity results in inflated variance of the estimated regression coefficients in the linear regression model, the result of the presence of concurvity leads to instability of the estimated coefficients in GAMs. Even if the backfitting algorithm will always converge to a solution, in case of concurvity the final solution of the backfitting procedure in fitting a GAM is influenced by the starting functions. While exact concurvity is highly unlikely, approximate concurvity, the analogue of multicollinearity, is of practical concern as it can lead to upwardly biased estimates of the parameters and to underestimation of their standard errors, increasing the risk of committing type I error. We compare the existing approaches to detect concurvity, pointing out their advantages and drawbacks, using simulated and real data sets. As a result, this paper will provide a general criterion to detect concurvity in nonlinear and non parametric regression models.
Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate
Minh, Vu Trieu; Katushin, Dmitri; Antonov, Maksim; Veinthal, Renno
2017-03-01
This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM) based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS), Brazilian tensile strength (BTS), rock brittleness index (BI), the distance between planes of weakness (DPW), and the alpha angle (Alpha) between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP). Four (4) statistical regression models (two linear and two nonlinear) are built to predict the ROP of TBM. Finally a fuzzy logic model is developed as an alternative method and compared to the four statistical regression models. Results show that the fuzzy logic model provides better estimations and can be applied to predict the TBM performance. The R-squared value (R2) of the fuzzy logic model scores the highest value of 0.714 over the second runner-up of 0.667 from the multiple variables nonlinear regression model.
Efficient robust nonparametric estimation in a semimartingale regression model
Konev, Victor
2010-01-01
The paper considers the problem of robust estimating a periodic function in a continuous time regression model with dependent disturbances given by a general square integrable semimartingale with unknown distribution. An example of such a noise is non-gaussian Ornstein-Uhlenbeck process with the L\\'evy process subordinator, which is used to model the financial Black-Scholes type markets with jumps. An adaptive model selection procedure, based on the weighted least square estimates, is proposed. Under general moment conditions on the noise distribution, sharp non-asymptotic oracle inequalities for the robust risks have been derived and the robust efficiency of the model selection procedure has been shown.
REGRESSION ANALYSIS OF PRODUCTIVITY USING MIXED EFFECT MODEL
Siana Halim
2007-01-01
Full Text Available Production plants of a company are located in several areas that spread across Middle and East Java. As the production process employs mostly manpower, we suspected that each location has different characteristics affecting the productivity. Thus, the production data may have a spatial and hierarchical structure. For fitting a linear regression using the ordinary techniques, we are required to make some assumptions about the nature of the residuals i.e. independent, identically and normally distributed. However, these assumptions were rarely fulfilled especially for data that have a spatial and hierarchical structure. We worked out the problem using mixed effect model. This paper discusses the model construction of productivity and several characteristics in the production line by taking location as a random effect. The simple model with high utility that satisfies the necessary regression assumptions was built using a free statistic software R version 2.6.1.
Illustrating Bayesian evaluation of informative hypotheses for regression models
Anouck eKluytmans
2012-01-01
Full Text Available In the present paper we illustrate the Bayesian evaluation of informative hypotheses for regression models. This approach allows psychologists to more directly test their theories than they would using conventional statis- tical analyses. Throughout this paper, both real-world data and simulated datasets will be introduced and evaluated to investigate the pragmatical as well as the theoretical qualities of the approach. We will pave the way from forming informative hypotheses in the context of regression models to interpreting the Bayes factors that express the support for the hypotheses being evaluated. In doing so, the present approach goes beyond p-values and uninformative null hypothesis testing, moving on to informative testing and quantification of model support in a way that is accessible to everyday psychologists.
Batch Mode Active Learning for Regression With Expected Model Change.
Cai, Wenbin; Zhang, Muhan; Zhang, Ya
2016-04-20
While active learning (AL) has been widely studied for classification problems, limited efforts have been done on AL for regression. In this paper, we introduce a new AL framework for regression, expected model change maximization (EMCM), which aims at choosing the unlabeled data instances that result in the maximum change of the current model once labeled. The model change is quantified as the difference between the current model parameters and the updated parameters after the inclusion of the newly selected examples. In light of the stochastic gradient descent learning rule, we approximate the change as the gradient of the loss function with respect to each single candidate instance. Under the EMCM framework, we propose novel AL algorithms for the linear and nonlinear regression models. In addition, by simulating the behavior of the sequential AL policy when applied for k iterations, we further extend the algorithms to batch mode AL to simultaneously choose a set of k most informative instances at each query time. Extensive experimental results on both UCI and StatLib benchmark data sets have demonstrated that the proposed algorithms are highly effective and efficient.
Hierarchical Neural Regression Models for Customer Churn Prediction
Golshan Mohammadi
2013-01-01
Full Text Available As customers are the main assets of each industry, customer churn prediction is becoming a major task for companies to remain in competition with competitors. In the literature, the better applicability and efficiency of hierarchical data mining techniques has been reported. This paper considers three hierarchical models by combining four different data mining techniques for churn prediction, which are backpropagation artificial neural networks (ANN, self-organizing maps (SOM, alpha-cut fuzzy c-means (α-FCM, and Cox proportional hazards regression model. The hierarchical models are ANN + ANN + Cox, SOM + ANN + Cox, and α-FCM + ANN + Cox. In particular, the first component of the models aims to cluster data in two churner and nonchurner groups and also filter out unrepresentative data or outliers. Then, the clustered data as the outputs are used to assign customers to churner and nonchurner groups by the second technique. Finally, the correctly classified data are used to create Cox proportional hazards model. To evaluate the performance of the hierarchical models, an Iranian mobile dataset is considered. The experimental results show that the hierarchical models outperform the single Cox regression baseline model in terms of prediction accuracy, Types I and II errors, RMSE, and MAD metrics. In addition, the α-FCM + ANN + Cox model significantly performs better than the two other hierarchical models.
Regression Model to Predict Global Solar Irradiance in Malaysia
Hairuniza Ahmed Kutty
2015-01-01
Full Text Available A novel regression model is developed to estimate the monthly global solar irradiance in Malaysia. The model is developed based on different available meteorological parameters, including temperature, cloud cover, rain precipitate, relative humidity, wind speed, pressure, and gust speed, by implementing regression analysis. This paper reports on the details of the analysis of the effect of each prediction parameter to identify the parameters that are relevant to estimating global solar irradiance. In addition, the proposed model is compared in terms of the root mean square error (RMSE, mean bias error (MBE, and the coefficient of determination (R2 with other models available from literature studies. Seven models based on single parameters (PM1 to PM7 and five multiple-parameter models (PM7 to PM12 are proposed. The new models perform well, with RMSE ranging from 0.429% to 1.774%, R2 ranging from 0.942 to 0.992, and MBE ranging from −0.1571% to 0.6025%. In general, cloud cover significantly affects the estimation of global solar irradiance. However, cloud cover in Malaysia lacks sufficient influence when included into multiple-parameter models although it performs fairly well in single-parameter prediction models.
Phone Duration Modeling of Affective Speech Using Support Vector Regression
Alexandros Lazaridis
2012-07-01
Full Text Available In speech synthesis accurate modeling of prosody is important for producing high quality synthetic speech. One of the main aspects of prosody is phone duration. Robust phone duration modeling is a prerequisite for synthesizing emotional speech with natural sounding. In this work ten phone duration models are evaluated. These models belong to well known and widely used categories of algorithms, such as the decision trees, linear regression, lazy-learning algorithms and meta-learning algorithms. Furthermore, we investigate the effectiveness of Support Vector Regression (SVR in phone duration modeling in the context of emotional speech. The evaluation of the eleven models is performed on a Modern Greek emotional speech database which consists of four categories of emotional speech (anger, fear, joy, sadness plus neutral speech. The experimental results demonstrated that the SVR-based modeling outperforms the other ten models across all the four emotion categories. Specifically, the SVR model achieved an average relative reduction of 8% in terms of root mean square error (RMSE throughout all emotional categories.
Data correction for seven activity trackers based on regression models.
Andalibi, Vafa; Honko, Harri; Christophe, Francois; Viik, Jari
2015-08-01
Using an activity tracker for measuring activity-related parameters, e.g. steps and energy expenditure (EE), can be very helpful in assisting a person's fitness improvement. Unlike the measuring of number of steps, an accurate EE estimation requires additional personal information as well as accurate velocity of movement, which is hard to achieve due to inaccuracy of sensors. In this paper, we have evaluated regression-based models to improve the precision for both steps and EE estimation. For this purpose, data of seven activity trackers and two reference devices was collected from 20 young adult volunteers wearing all devices at once in three different tests, namely 60-minute office work, 6-hour overall activity and 60-minute walking. Reference data is used to create regression models for each device and relative percentage errors of adjusted values are then statistically compared to that of original values. The effectiveness of regression models are determined based on the result of a statistical test. During a walking period, EE measurement was improved in all devices. The step measurement was also improved in five of them. The results show that improvement of EE estimation is possible only with low-cost implementation of fitting model over the collected data e.g. in the app or in corresponding service back-end.
Forecasting relativistic electron flux using dynamic multiple regression models
H.-L. Wei
2011-02-01
Full Text Available The forecast of high energy electron fluxes in the radiation belts is important because the exposure of modern spacecraft to high energy particles can result in significant damage to onboard systems. A comprehensive physical model of processes related to electron energisation that can be used for such a forecast has not yet been developed. In the present paper a systems identification approach is exploited to deduce a dynamic multiple regression model that can be used to predict the daily maximum of high energy electron fluxes at geosynchronous orbit from data. It is shown that the model developed provides reliable predictions.
Resampling procedures to validate dendro-auxometric regression models
2009-03-01
Full Text Available Regression analysis has a large use in several sectors of forest research. The validation of a dendro-auxometric model is a basic step in the building of the model itself. The more a model resists to attempts of demonstrating its groundlessness, the more its reliability increases. In the last decades many new theories, that quite utilizes the calculation speed of the calculators, have been formulated. Here we show the results obtained by the application of a bootsprap resampling procedure as a validation tool.
Two-step variable selection in quantile regression models
FAN Yali
2015-06-01
Full Text Available We propose a two-step variable selection procedure for high dimensional quantile regressions,in which the dimension of the covariates, pn is much larger than the sample size n. In the first step, we perform l1 penalty, and we demonstrate that the first step penalized estimator with the LASSO penalty can reduce the model from an ultra-high dimensional to a model whose size has the same order as that of the true model, and the selected model can cover the true model. The second step excludes the remained irrelevant covariates by applying the adaptive LASSO penalty to the reduced model obtained from the first step. Under some regularity conditions, we show that our procedure enjoys the model selection consistency. We conduct a simulation study and a real data analysis to evaluate the finite sample performance of the proposed approach.
Fuzzy and Regression Modelling of Hard Milling Process
A. Tamilarasan
2014-04-01
Full Text Available The present study highlights the application of box-behnken design coupled with fuzzy and regression modeling approach for making expert system in hard milling process to improve the process performance with systematic reduction of production cost. The important input fields of work piece hardness, nose radius, feed per tooth, radial depth of cut and axial depth cut were considered. The cutting forces, work surface temperature and sound pressure level were identified as key index of machining outputs. The results indicate that the fuzzy logic and regression modeling technique can be effectively used for the prediction of desired responses with less average error variation. Predicted results were verified by experiments and shown the good potential characteristics of the developed system for automated machining environment.
Regression Cloud Models and Their Applications in Energy Consumption of Data Center
Yanshuang Zhou
2015-01-01
Full Text Available As cloud data center consumes more and more energy, both researchers and engineers aim to minimize energy consumption while keeping its services available. A good energy model can reflect the relationships between running tasks and the energy consumed by hardware and can be further used to schedule tasks for saving energy. In this paper, we analyzed linear and nonlinear regression energy model based on performance counters and system utilization and proposed a support vector regression energy model. For performance counters, we gave a general linear regression framework and compared three linear regression models. For system utilization, we compared our support vector regression model with linear regression and three nonlinear regression models. The experiments show that linear regression model is good enough to model performance counters, nonlinear regression is better than linear regression model for modeling system utilization, and support vector regression model is better than polynomial and exponential regression models.
Central limit theorem of linear regression model under right censorship
HE; Shuyuan(何书元); HUANG; Xiang(Heung; Wong)(黄香)
2003-01-01
In this paper, the estimation of joint distribution F(y,z) of (Y, Z) and the estimation in thelinear regression model Y = b′Z + ε for complete data are extended to that of the right censored data. Theregression parameter estimates of b and the variance of ε are weighted least square estimates with randomweights. The central limit theorems of the estimators are obtained under very weak conditions and the derivedasymptotic variance has a very simple form.
APPLYING LOGISTIC REGRESSION MODEL TO THE EXAMINATION RESULTS DATA
Goutam Saha
2011-01-01
Full Text Available The binary logistic regression model is used to analyze the school examination results(scores of 1002 students. The analysis is performed on the basis of the independent variables viz.gender, medium of instruction, type of schools, category of schools, board of examinations andlocation of schools, where scores or marks are assumed to be dependent variables. The odds ratioanalysis compares the scores obtained in two examinations viz. matriculation and highersecondary.
Predicting and Modelling of Survival Data when Cox's Regression Model does not hold
Scheike, Thomas H.; Zhang, Mei-Jie
2002-01-01
Aalen model; additive risk model; counting processes; competing risk; Cox regression; flexible modeling; goodness of fit; prediction of survival; survival analysis; time-varying effects......Aalen model; additive risk model; counting processes; competing risk; Cox regression; flexible modeling; goodness of fit; prediction of survival; survival analysis; time-varying effects...
GAUSSIAN COPULA MARGINAL REGRESSION FOR MODELING EXTREME DATA WITH APPLICATION
Sutikno
2014-01-01
Full Text Available Regression is commonly used to determine the relationship between the response variable and the predictor variable, where the parameters are estimated by Ordinary Least Square (OLS. This method can be used with an assumption that residuals are normally distributed (0, σ^{2}. However, the assumption of normality of the data is often violated due to extreme observations, which are often found in the climate data. Modeling of rice harvested area with rainfall predictor variables allows extreme observations. Therefore, another approximation is necessary to be applied in order to overcome the presence of extreme observations. The method used to solve this problem is a Gaussian Copula Marginal Regression (GCMR, the regression-based Copula. As a case study, the method is applied to model rice harvested area of rice production centers in East Java, Indonesia, covering District: Banyuwangi, Lamongan, Bojonegoro, Ngawi and Jember. Copula is chosen because this method is not strict against the assumption distribution, especially the normal distribution. Moreover, this method can describe dependency on extreme point clearly. The GCMR performance will be compared with OLS and Generalized Linear Models (GLM. The identification result of the dependencies structure between the Rice Harvest per period (RH and monthly rainfall showed a dependency in all areas of research. It is shown that the real test copula type mostly follows the Gumbel distribution. While the comparison of the model goodness for rice harvested area in the modeling showed that the method used to model the exact GCMR in five districts RH1 and RH2 in Jember district since its lowest AICc. Looking at the data distribution pattern of response variables, it can be concluded that the GCMR good for modeling the response variable that is not normally distributed and tend to have a large skew.
Aulenbach, Brent T.
2013-10-01
A regression-model based approach is a commonly used, efficient method for estimating streamwater constituent load when there is a relationship between streamwater constituent concentration and continuous variables such as streamwater discharge, season and time. A subsetting experiment using a 30-year dataset of daily suspended sediment observations from the Mississippi River at Thebes, Illinois, was performed to determine optimal sampling frequency, model calibration period length, and regression model methodology, as well as to determine the effect of serial correlation of model residuals on load estimate precision. Two regression-based methods were used to estimate streamwater loads, the Adjusted Maximum Likelihood Estimator (AMLE), and the composite method, a hybrid load estimation approach. While both methods accurately and precisely estimated loads at the model's calibration period time scale, precisions were progressively worse at shorter reporting periods, from annually to monthly. Serial correlation in model residuals resulted in observed AMLE precision to be significantly worse than the model calculated standard errors of prediction. The composite method effectively improved upon AMLE loads for shorter reporting periods, but required a sampling interval of at least 15-days or shorter, when the serial correlations in the observed load residuals were greater than 0.15. AMLE precision was better at shorter sampling intervals and when using the shortest model calibration periods, such that the regression models better fit the temporal changes in the concentration-discharge relationship. The models with the largest errors typically had poor high flow sampling coverage resulting in unrepresentative models. Increasing sampling frequency and/or targeted high flow sampling are more efficient approaches to ensure sufficient sampling and to avoid poorly performing models, than increasing calibration period length.
Online Statistical Modeling (Regression Analysis) for Independent Responses
Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus
2017-06-01
Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.
Mukesh Gautam
Full Text Available BACKGROUND: Reptiles are phylogenically important group of organisms as mammals have evolved from them. Wall lizard testis exhibits clearly distinct morphology during various phases of a reproductive cycle making them an interesting model to study regulation of spermatogenesis. Studies on reptile spermatogenesis are negligible hence this study will prove to be an important resource. METHODOLOGY/PRINCIPAL FINDINGS: Histological analyses show complete regression of seminiferous tubules during regressed phase with retracted Sertoli cells and spermatognia. In the recrudescent phase, regressed testis regain cellular activity showing presence of normal Sertoli cells and developing germ cells. In the active phase, testis reaches up to its maximum size with enlarged seminiferous tubules and presence of sperm in seminiferous lumen. Total RNA extracted from whole testis of regressed, recrudescent and active phase of wall lizard was hybridized on Mouse Whole Genome 8×60 K format gene chip. Microarray data from regressed phase was deemed as control group. Microarray data were validated by assessing the expression of some selected genes using Quantitative Real-Time PCR. The genes prominently expressed in recrudescent and active phase testis are cytoskeleton organization GO 0005856, cell growth GO 0045927, GTpase regulator activity GO: 0030695, transcription GO: 0006352, apoptosis GO: 0006915 and many other biological processes. The genes showing higher expression in regressed phase belonged to functional categories such as negative regulation of macromolecule metabolic process GO: 0010605, negative regulation of gene expression GO: 0010629 and maintenance of stem cell niche GO: 0045165. CONCLUSION/SIGNIFICANCE: This is the first exploratory study profiling transcriptome of three drastically different conditions of any reptilian testis. The genes expressed in the testis during regressed, recrudescent and active phase of reproductive cycle are in concordance
Klein, John P.; Andersen, Per Kragh
2005-01-01
Bone marrow transplantation; Generalized estimating equations; Jackknife statistics; Regression models......Bone marrow transplantation; Generalized estimating equations; Jackknife statistics; Regression models...
K factor estimation in distribution transformers using linear regression models
Juan Miguel Astorga Gómez
2016-06-01
Full Text Available Background: Due to massive incorporation of electronic equipment to distribution systems, distribution transformers are subject to operation conditions other than the design ones, because of the circulation of harmonic currents. It is necessary to quantify the effect produced by these harmonic currents to determine the capacity of the transformer to withstand these new operating conditions. The K-factor is an indicator that estimates the ability of a transformer to withstand the thermal effects caused by harmonic currents. This article presents a linear regression model to estimate the value of the K-factor, from total current harmonic content obtained with low-cost equipment.Method: Two distribution transformers that feed different loads are studied variables, current total harmonic distortion factor K are recorded, and the regression model that best fits the data field is determined. To select the regression model the coefficient of determination R2 and the Akaike Information Criterion (AIC are used. With the selected model, the K-factor is estimated to actual operating conditions.Results: Once determined the model it was found that for both agricultural cargo and industrial mining, present harmonic content (THDi exceeds the values that these transformers can drive (average of 12.54% and minimum 8,90% in the case of agriculture and average value of 18.53% and a minimum of 6.80%, for industrial mining case.Conclusions: When estimating the K factor using polynomial models it was determined that studied transformers can not withstand the current total harmonic distortion of their current loads. The appropriate K factor for studied transformer should be 4; this allows transformers support the current total harmonic distortion of their respective loads.
Extended cox regression model: The choice of timefunction
Isik, Hatice; Tutkun, Nihal Ata; Karasoy, Durdu
2017-07-01
Cox regression model (CRM), which takes into account the effect of censored observations, is one the most applicative and usedmodels in survival analysis to evaluate the effects of covariates. Proportional hazard (PH), requires a constant hazard ratio over time, is the assumptionofCRM. Using extended CRM provides the test of including a time dependent covariate to assess the PH assumption or an alternative model in case of nonproportional hazards. In this study, the different types of real data sets are used to choose the time function and the differences between time functions are analyzed and discussed.
A New Approach in Regression Analysis for Modeling Adsorption Isotherms
Dana D. Marković
2014-01-01
Full Text Available Numerous regression approaches to isotherm parameters estimation appear in the literature. The real insight into the proper modeling pattern can be achieved only by testing methods on a very big number of cases. Experimentally, it cannot be done in a reasonable time, so the Monte Carlo simulation method was applied. The objective of this paper is to introduce and compare numerical approaches that involve different levels of knowledge about the noise structure of the analytical method used for initial and equilibrium concentration determination. Six levels of homoscedastic noise and five types of heteroscedastic noise precision models were considered. Performance of the methods was statistically evaluated based on median percentage error and mean absolute relative error in parameter estimates. The present study showed a clear distinction between two cases. When equilibrium experiments are performed only once, for the homoscedastic case, the winning error function is ordinary least squares, while for the case of heteroscedastic noise the use of orthogonal distance regression or Margart’s percent standard deviation is suggested. It was found that in case when experiments are repeated three times the simple method of weighted least squares performed as well as more complicated orthogonal distance regression method.
Model and Variable Selection Procedures for Semiparametric Time Series Regression
Risa Kato
2009-01-01
Full Text Available Semiparametric regression models are very useful for time series analysis. They facilitate the detection of features resulting from external interventions. The complexity of semiparametric models poses new challenges for issues of nonparametric and parametric inference and model selection that frequently arise from time series data analysis. In this paper, we propose penalized least squares estimators which can simultaneously select significant variables and estimate unknown parameters. An innovative class of variable selection procedure is proposed to select significant variables and basis functions in a semiparametric model. The asymptotic normality of the resulting estimators is established. Information criteria for model selection are also proposed. We illustrate the effectiveness of the proposed procedures with numerical simulations.
Regularized multivariate regression models with skew-t error distributions
Chen, Lianfu
2014-06-01
We consider regularization of the parameters in multivariate linear regression models with the errors having a multivariate skew-t distribution. An iterative penalized likelihood procedure is proposed for constructing sparse estimators of both the regression coefficient and inverse scale matrices simultaneously. The sparsity is introduced through penalizing the negative log-likelihood by adding L1-penalties on the entries of the two matrices. Taking advantage of the hierarchical representation of skew-t distributions, and using the expectation conditional maximization (ECM) algorithm, we reduce the problem to penalized normal likelihood and develop a procedure to minimize the ensuing objective function. Using a simulation study the performance of the method is assessed, and the methodology is illustrated using a real data set with a 24-dimensional response vector. © 2014 Elsevier B.V.
Modeling the number of car theft using Poisson regression
Zulkifli, Malina; Ling, Agnes Beh Yen; Kasim, Maznah Mat; Ismail, Noriszura
2016-10-01
Regression analysis is the most popular statistical methods used to express the relationship between the variables of response with the covariates. The aim of this paper is to evaluate the factors that influence the number of car theft using Poisson regression model. This paper will focus on the number of car thefts that occurred in districts in Peninsular Malaysia. There are two groups of factor that have been considered, namely district descriptive factors and socio and demographic factors. The result of the study showed that Bumiputera composition, Chinese composition, Other ethnic composition, foreign migration, number of residence with the age between 25 to 64, number of employed person and number of unemployed person are the most influence factors that affect the car theft cases. These information are very useful for the law enforcement department, insurance company and car owners in order to reduce and limiting the car theft cases in Peninsular Malaysia.
Interpreting parameters in the logistic regression model with random effects
Larsen, Klaus; Petersen, Jørgen Holm; Budtz-Jørgensen, Esben
2000-01-01
interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects......interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects...
Seasonal Predictability in a Model Atmosphere.
Lin, Hai
2001-07-01
The predictability of atmospheric mean-seasonal conditions in the absence of externally varying forcing is examined. A perfect-model approach is adopted, in which a global T21 three-level quasigeostrophic atmospheric model is integrated over 21 000 days to obtain a reference atmospheric orbit. The model is driven by a time-independent forcing, so that the only source of time variability is the internal dynamics. The forcing is set to perpetual winter conditions in the Northern Hemisphere (NH) and perpetual summer in the Southern Hemisphere.A significant temporal variability in the NH 90-day mean states is observed. The component of that variability associated with the higher-frequency motions, or climate noise, is estimated using a method developed by Madden. In the polar region, and to a lesser extent in the midlatitudes, the temporal variance of the winter means is significantly greater than the climate noise, suggesting some potential predictability in those regions.Forecast experiments are performed to see whether the presence of variance in the 90-day mean states that is in excess of the climate noise leads to some skill in the prediction of these states. Ensemble forecast experiments with nine members starting from slightly different initial conditions are performed for 200 different 90-day means along the reference atmospheric orbit. The serial correlation between the ensemble means and the reference orbit shows that there is skill in the 90-day mean predictions. The skill is concentrated in those regions of the NH that have the largest variance in excess of the climate noise. An EOF analysis shows that nearly all the predictive skill in the seasonal means is associated with one mode of variability with a strong axisymmetric component.
Modeling of the Monthly Rainfall-Runoff Process Through Regressions
Campos-Aranda Daniel Francisco
2014-10-01
Full Text Available To solve the problems associated with the assessment of water resources of a river, the modeling of the rainfall-runoff process (RRP allows the deduction of runoff missing data and to extend its record, since generally the information available on precipitation is larger. It also enables the estimation of inputs to reservoirs, when their building led to the suppression of the gauging station. The simplest mathematical model that can be set for the RRP is the linear regression or curve on a monthly basis. Such a model is described in detail and is calibrated with the simultaneous record of monthly rainfall and runoff in Ballesmi hydrometric station, which covers 35 years. Since the runoff of this station has an important contribution from the spring discharge, the record is corrected first by removing that contribution. In order to do this a procedure was developed based either on the monthly average regional runoff coefficients or on nearby and similar watershed; in this case the Tancuilín gauging station was used. Both stations belong to the Partial Hydrologic Region No. 26 (Lower Rio Panuco and are located within the state of San Luis Potosi, México. The study performed indicates that the monthly regression model, due to its conceptual approach, faithfully reproduces monthly average runoff volumes and achieves an excellent approximation in relation to the dispersion, proved by calculation of the means and standard deviations.
Mixed-model Regression for Variable-star Photometry
Dose, Eric
2016-05-01
Mixed-model regression, a recent advance from social-science statistics, applies directly to reducing one night's photometric raw data, especially for variable stars in fields with multiple comparison stars. One regression model per filter/passband yields any or all of: transform values, extinction values, nightly zero-points, rapid zero-point fluctuations ("cirrus effect"), ensemble comparisons, vignette and gradient removal arising from incomplete flat-correction, check-star and target-star magnitudes, and specific indications of unusually large catalog magnitude errors. When images from several different fields of view are included, the models improve without complicating the calculations. The mixed-model approach is generally robust to outliers and missing data points, and it directly yields 14 diagnostic plots, used to monitor data set quality and/or residual systematic errors - these diagnostic plots may in fact turn out to be the prime advantage of this approach. Also presented is initial work on a split-annulus approach to sky background estimation, intended to address the sensitivity of photometric observations to noise within the sky-background annulus.
Genetic evaluation of European quails by random regression models
Flaviana Miranda Gonçalves
2012-09-01
Full Text Available The objective of this study was to compare different random regression models, defined from different classes of heterogeneity of variance combined with different Legendre polynomial orders for the estimate of (covariance of quails. The data came from 28,076 observations of 4,507 female meat quails of the LF1 lineage. Quail body weights were determined at birth and 1, 14, 21, 28, 35 and 42 days of age. Six different classes of residual variance were fitted to Legendre polynomial functions (orders ranging from 2 to 6 to determine which model had the best fit to describe the (covariance structures as a function of time. According to the evaluated criteria (AIC, BIC and LRT, the model with six classes of residual variances and of sixth-order Legendre polynomial was the best fit. The estimated additive genetic variance increased from birth to 28 days of age, and dropped slightly from 35 to 42 days. The heritability estimates decreased along the growth curve and changed from 0.51 (1 day to 0.16 (42 days. Animal genetic and permanent environmental correlation estimates between weights and age classes were always high and positive, except for birth weight. The sixth order Legendre polynomial, along with the residual variance divided into six classes was the best fit for the growth rate curve of meat quails; therefore, they should be considered for breeding evaluation processes by random regression models.
Dadhich, Rajesh K; Barrionuevo, Francisco J; Real, Francisca M; Lupiañez, Darío G; Ortega, Esperanza; Burgos, Miguel; Jiménez, Rafael
2013-04-01
In males of seasonally breeding species, testes undergo a severe involution at the end of the breeding season, with a major volume decrease due to massive germ-cell depletion associated with photoperiod-dependent reduced levels of testosterone and gonadotropins. Although it has been repeatedly suggested that apoptosis is the principal effector of testicular regression in vertebrates, recent studies do not support this hypothesis in some mammals. The purpose of our work is to discover alternative mechanisms of testis regression in these species. In this paper, we have performed a morphological, hormonal, ultrastructural, molecular, and functional study of the mechanism of testicular regression and the role that cell junctions play in the cell-content dynamics of the testis of the Iberian mole, Talpa occidentalis, throughout the seasonal breeding cycle. Desquamation of live, nonapoptotic germ cells has been identified here as a new mechanism for seasonal testis involution in mammals, indicating that testis regression is regulated by modulating the expression and distribution of the cell-adhesion molecules in the seminiferous epithelium. During this process, which is mediated by low intratesticular testosterone levels, Sertoli cells lose their nursing and supporting function, as well as the impermeability of the blood-testis barrier. Our results contradict the current paradigm that apoptosis is the major testis regression effector in vertebrates, as it is clearly not true in all mammals. The new testis regression mechanism described here for the mole could then be generalized to other mammalian species. Available data from some previously studied mammals should be reevaluated.
Zhao, Tongtiegang; Schepen, Andrew; Wang, Q. J.
2016-10-01
The Bayesian joint probability (BJP) modelling approach is used operationally to produce seasonal (three-month-total) ensemble streamflow forecasts in Australia. However, water resource managers are calling for more informative sub-seasonal forecasts. Taking advantage of BJP's capability of handling multiple predictands, ensemble forecasting of sub-seasonal to seasonal streamflows is investigated for 23 catchments around Australia. Using antecedent streamflow and climate indices as predictors, monthly forecasts are developed for the three-month period ahead. Forecast reliability and skill are evaluated for the period 1982-2011 using a rigorous leave-five-years-out cross validation strategy. BJP ensemble forecasts of monthly streamflow volumes are generally reliable in ensemble spread. Forecast skill, relative to climatology, is positive in 74% of cases in the first month, decreasing to 57% and 46% respectively for streamflow forecasts for the final two months of the season. As forecast skill diminishes with increasing lead time, the monthly forecasts approach climatology. Seasonal forecasts accumulated from monthly forecasts are found to be similarly skilful to forecasts from BJP models based on seasonal totals directly. The BJP modelling approach is demonstrated to be a viable option for producing ensemble time-series sub-seasonal to seasonal streamflow forecasts.
A Lotka-Volterra competition model with seasonal succession.
Hsu, Sze-Bi; Zhao, Xiao-Qiang
2012-01-01
A complete classification for the global dynamics of a Lotka-Volterra two species competition model with seasonal succession is obtained via the stability analysis of equilibria and the theory of monotone dynamical systems. The effects of two death rates in the bad season and the proportion of the good season on the competition outcomes are also discussed. © Springer-Verlag 2011
Fuzzy regression modeling for tool performance prediction and degradation detection.
Li, X; Er, M J; Lim, B S; Zhou, J H; Gan, O P; Rutkowski, L
2010-10-01
In this paper, the viability of using Fuzzy-Rule-Based Regression Modeling (FRM) algorithm for tool performance and degradation detection is investigated. The FRM is developed based on a multi-layered fuzzy-rule-based hybrid system with Multiple Regression Models (MRM) embedded into a fuzzy logic inference engine that employs Self Organizing Maps (SOM) for clustering. The FRM converts a complex nonlinear problem to a simplified linear format in order to further increase the accuracy in prediction and rate of convergence. The efficacy of the proposed FRM is tested through a case study - namely to predict the remaining useful life of a ball nose milling cutter during a dry machining process of hardened tool steel with a hardness of 52-54 HRc. A comparative study is further made between four predictive models using the same set of experimental data. It is shown that the FRM is superior as compared with conventional MRM, Back Propagation Neural Networks (BPNN) and Radial Basis Function Networks (RBFN) in terms of prediction accuracy and learning speed.
A hybrid neural network model for noisy data regression.
Lee, Eric W M; Lim, Chee Peng; Yuen, Richard K K; Lo, S M
2004-04-01
A hybrid neural network model, based on the fusion of fuzzy adaptive resonance theory (FA ART) and the general regression neural network (GRNN), is proposed in this paper. Both FA and the GRNN are incremental learning systems and are very fast in network training. The proposed hybrid model, denoted as GRNNFA, is able to retain these advantages and, at the same time, to reduce the computational requirements in calculating and storing information of the kernels. A clustering version of the GRNN is designed with data compression by FA for noise removal. An adaptive gradient-based kernel width optimization algorithm has also been devised. Convergence of the gradient descent algorithm can be accelerated by the geometric incremental growth of the updating factor. A series of experiments with four benchmark datasets have been conducted to assess and compare effectiveness of GRNNFA with other approaches. The GRNNFA model is also employed in a novel application task for predicting the evacuation time of patrons at typical karaoke centers in Hong Kong in the event of fire. The results positively demonstrate the applicability of GRNNFA in noisy data regression problems.
Mukesh Gautam; Amitabh Mathur; Meraj Alam Khan; Majumdar, Subeer S.; Umesh Rai
2013-01-01
BACKGROUND: Reptiles are phylogenically important group of organisms as mammals have evolved from them. Wall lizard testis exhibits clearly distinct morphology during various phases of a reproductive cycle making them an interesting model to study regulation of spermatogenesis. Studies on reptile spermatogenesis are negligible hence this study will prove to be an important resource. METHODOLOGY/PRINCIPAL FINDINGS: Histological analyses show complete regression of seminiferous tubules during r...
Multivariate parametric random effect regression models for fecundability studies.
Ecochard, R; Clayton, D G
2000-12-01
Delay until conception is generally described by a mixture of geometric distributions. Weinberg and Gladen (1986, Biometrics 42, 547-560) proposed a regression generalization of the beta-geometric mixture model where covariates effects were expressed in terms of contrasts of marginal hazards. Scheike and Jensen (1997, Biometrics 53, 318-329) developed a frailty model for discrete event times data based on discrete-time analogues of Hougaard's results (1984, Biometrika 71, 75-83). This paper is on a generalization to a three-parameter family distribution and an extension to multivariate cases. The model allows the introduction of explanatory variables, including time-dependent variables at the subject-specific level, together with a choice from a flexible family of random effect distributions. This makes it possible, in the context of medically assisted conception, to include data sources with multiple pregnancies (or attempts at pregnancy) per couple.
Ciupak, Maurycy; Ozga-Zielinski, Bogdan; Adamowski, Jan; Quilty, John; Khalil, Bahaa
2015-11-01
A novel implementation of Dynamic Linear Bayesian Models (DLBM), using either a Varying Coefficient Regression (VCR) or a Discount Weighted Regression (DWR) algorithm was used in the hydrological modeling of annual hydrographs as well as 1-, 2-, and 3-day lead time stream flow forecasting. Using hydrological data (daily discharge, rainfall, and mean, maximum and minimum air temperatures) from the Upper Narew River watershed in Poland, the forecasting performance of DLBM was compared to that of traditional multiple linear regression (MLR) and more recent artificial neural network (ANN) based models. Model performance was ranked DLBM-DWR > DLBM-VCR > MLR > ANN for both annual hydrograph modeling and 1-, 2-, and 3-day lead forecasting, indicating that the DWR and VCR algorithms, operating in a DLBM framework, represent promising new methods for both annual hydrograph modeling and short-term stream flow forecasting.
Khoshravesh, Mojtaba; Sefidkouhi, Mohammad Ali Gholami; Valipour, Mohammad
2017-07-01
The proper evaluation of evapotranspiration is essential in food security investigation, farm management, pollution detection, irrigation scheduling, nutrient flows, carbon balance as well as hydrologic modeling, especially in arid environments. To achieve sustainable development and to ensure water supply, especially in arid environments, irrigation experts need tools to estimate reference evapotranspiration on a large scale. In this study, the monthly reference evapotranspiration was estimated by three different regression models including the multivariate fractional polynomial (MFP), robust regression, and Bayesian regression in Ardestan, Esfahan, and Kashan. The results were compared with Food and Agriculture Organization (FAO)-Penman-Monteith (FAO-PM) to select the best model. The results show that at a monthly scale, all models provided a closer agreement with the calculated values for FAO-PM ( R 2 > 0.95 and RMSE < 12.07 mm month-1). However, the MFP model gives better estimates than the other two models for estimating reference evapotranspiration at all stations.
Regression Models for Predicting Force Coefficients of Aerofoils
Mohammed ABDUL AKBAR
2015-09-01
Full Text Available Renewable sources of energy are attractive and advantageous in a lot of different ways. Among the renewable energy sources, wind energy is the fastest growing type. Among wind energy converters, Vertical axis wind turbines (VAWTs have received renewed interest in the past decade due to some of the advantages they possess over their horizontal axis counterparts. VAWTs have evolved into complex 3-D shapes. A key component in predicting the output of VAWTs through analytical studies is obtaining the values of lift and drag coefficients which is a function of shape of the aerofoil, ‘angle of attack’ of wind and Reynolds’s number of flow. Sandia National Laboratories have carried out extensive experiments on aerofoils for the Reynolds number in the range of those experienced by VAWTs. The volume of experimental data thus obtained is huge. The current paper discusses three Regression analysis models developed wherein lift and drag coefficients can be found out using simple formula without having to deal with the bulk of the data. Drag coefficients and Lift coefficients were being successfully estimated by regression models with R2 values as high as 0.98.
Empirical likelihood ratio tests for multivariate regression models
WU Jianhong; ZHU Lixing
2007-01-01
This paper proposes some diagnostic tools for checking the adequacy of multivariate regression models including classical regression and time series autoregression. In statistical inference, the empirical likelihood ratio method has been well known to be a powerful tool for constructing test and confidence region. For model checking, however, the naive empirical likelihood (EL) based tests are not of Wilks' phenomenon. Hence, we make use of bias correction to construct the EL-based score tests and derive a nonparametric version of Wilks' theorem. Moreover, by the advantages of both the EL and score test method, the EL-based score tests share many desirable features as follows: They are self-scale invariant and can detect the alternatives that converge to the null at rate n-1/2, the possibly fastest rate for lack-of-fit testing; they involve weight functions, which provides us with the flexibility to choose scores for improving power performance, especially under directional alternatives. Furthermore, when the alternatives are not directional, we construct asymptotically distribution-free maximin tests for a large class of possible alternatives. A simulation study is carried out and an application for a real dataset is analyzed.
Approximation by randomly weighting method in censored regression model
无
2009-01-01
Censored regression ("Tobit") models have been in common use, and their linear hypothesis testings have been widely studied. However, the critical values of these tests are usually related to quantities of an unknown error distribution and estimators of nuisance parameters. In this paper, we propose a randomly weighting test statistic and take its conditional distribution as an approximation to null distribution of the test statistic. It is shown that, under both the null and local alternative hypotheses, conditionally asymptotic distribution of the randomly weighting test statistic is the same as the null distribution of the test statistic. Therefore, the critical values of the test statistic can be obtained by randomly weighting method without estimating the nuisance parameters. At the same time, we also achieve the weak consistency and asymptotic normality of the randomly weighting least absolute deviation estimate in censored regression model. Simulation studies illustrate that the per-formance of our proposed resampling test method is better than that of central chi-square distribution under the null hypothesis.
Approximation by randomly weighting method in censored regression model
WANG ZhanFeng; WU YaoHua; ZHAO LinCheng
2009-01-01
Censored regression ("Tobit") models have been in common use,and their linear hypothesis testings have been widely studied.However,the critical values of these tests are usually related to quantities of an unknown error distribution and estimators of nuisance parameters.In this paper,we propose a randomly weighting test statistic and take its conditional distribution as an approximation to null distribution of the test statistic.It is shown that,under both the null and local alternative hypotheses,conditionally asymptotic distribution of the randomly weighting test statistic is the same as the null distribution of the test statistic.Therefore,the critical values of the test statistic can be obtained by randomly weighting method without estimating the nuisance parameters.At the same time,we also achieve the weak consistency and asymptotic normality of the randomly weighting least absolute deviation estimate in censored regression model.Simulation studies illustrate that the performance of our proposed resampling test method is better than that of central chi-square distribution under the null hypothesis.
Information for seasonal models of carbon fluxes in agroecosystems
King, A.W.; DeAngelis, D.L.
1987-04-01
This report is a compilation of information useful for constructing regionally differentiated models of seasonal carbon fluxes in the terrestrial biosphere. Two classes of information are presented. First, extant agroecosystem models that simulate the flux of carbon in a stand or whole field are reviewed. Second, empirical data on seasonal carbon fluxes are compiled. These reviews and compilations are extensive, but not exhaustive. No attempt is made to evaluate the usefulness of seasonal models and data.
Remodeling and Estimation for Sparse Partially Linear Regression Models
Yunhui Zeng
2013-01-01
Full Text Available When the dimension of covariates in the regression model is high, one usually uses a submodel as a working model that contains significant variables. But it may be highly biased and the resulting estimator of the parameter of interest may be very poor when the coefficients of removed variables are not exactly zero. In this paper, based on the selected submodel, we introduce a two-stage remodeling method to get the consistent estimator for the parameter of interest. More precisely, in the first stage, by a multistep adjustment, we reconstruct an unbiased model based on the correlation information between the covariates; in the second stage, we further reduce the adjusted model by a semiparametric variable selection method and get a new estimator of the parameter of interest simultaneously. Its convergence rate and asymptotic normality are also obtained. The simulation results further illustrate that the new estimator outperforms those obtained by the submodel and the full model in the sense of mean square errors of point estimation and mean square prediction errors of model prediction.
Information Criteria for Deciding between Normal Regression Models
Maier, Robert S
2013-01-01
Regression models fitted to data can be assessed on their goodness of fit, though models with many parameters should be disfavored to prevent over-fitting. Statisticians' tools for this are little known to physical scientists. These include the Akaike Information Criterion (AIC), a penalized goodness-of-fit statistic, and the AICc, a variant including a small-sample correction. They entered the physical sciences through being used by astrophysicists to compare cosmological models; e.g., predictions of the distance-redshift relation. The AICc is shown to have been misapplied, being applicable only if error variances are unknown. If error bars accompany the data, the AIC should be used instead. Erroneous applications of the AICc are listed in an appendix. It is also shown how the variability of the AIC difference between models with a known error variance can be estimated. This yields a significance test that can potentially replace the use of `Akaike weights' for deciding between such models. Additionally, the...
Genomic breeding value estimation using nonparametric additive regression models
Solberg Trygve
2009-01-01
Full Text Available Abstract Genomic selection refers to the use of genomewide dense markers for breeding value estimation and subsequently for selection. The main challenge of genomic breeding value estimation is the estimation of many effects from a limited number of observations. Bayesian methods have been proposed to successfully cope with these challenges. As an alternative class of models, non- and semiparametric models were recently introduced. The present study investigated the ability of nonparametric additive regression models to predict genomic breeding values. The genotypes were modelled for each marker or pair of flanking markers (i.e. the predictors separately. The nonparametric functions for the predictors were estimated simultaneously using additive model theory, applying a binomial kernel. The optimal degree of smoothing was determined by bootstrapping. A mutation-drift-balance simulation was carried out. The breeding values of the last generation (genotyped was predicted using data from the next last generation (genotyped and phenotyped. The results show moderate to high accuracies of the predicted breeding values. A determination of predictor specific degree of smoothing increased the accuracy.
THE REGRESSION MODEL OF IRAN LIBRARIES ORGANIZATIONAL CLIMATE.
Jahani, Mohammad Ali; Yaminfirooz, Mousa; Siamian, Hasan
2015-10-01
The purpose of this study was to drawing a regression model of organizational climate of central libraries of Iran's universities. This study is an applied research. The statistical population of this study consisted of 96 employees of the central libraries of Iran's public universities selected among the 117 universities affiliated to the Ministry of Health by Stratified Sampling method (510 people). Climate Qual localized questionnaire was used as research tools. For predicting the organizational climate pattern of the libraries is used from the multivariate linear regression and track diagram. of the 9 variables affecting organizational climate, 5 variables of innovation, teamwork, customer service, psychological safety and deep diversity play a major role in prediction of the organizational climate of Iran's libraries. The results also indicate that each of these variables with different coefficient have the power to predict organizational climate but the climate score of psychological safety (0.94) plays a very crucial role in predicting the organizational climate. Track diagram showed that five variables of teamwork, customer service, psychological safety, deep diversity and innovation directly effects on the organizational climate variable that contribution of the team work from this influence is more than any other variables. Of the indicator of the organizational climate of climateQual, the contribution of the team work from this influence is more than any other variables that reinforcement of teamwork in academic libraries can be more effective in improving the organizational climate of this type libraries.
THE REGRESSION MODEL OF IRAN LIBRARIES ORGANIZATIONAL CLIMATE
Jahani, Mohammad Ali; Yaminfirooz, Mousa; Siamian, Hasan
2015-01-01
Background: The purpose of this study was to drawing a regression model of organizational climate of central libraries of Iran’s universities. Methods: This study is an applied research. The statistical population of this study consisted of 96 employees of the central libraries of Iran’s public universities selected among the 117 universities affiliated to the Ministry of Health by Stratified Sampling method (510 people). Climate Qual localized questionnaire was used as research tools. For predicting the organizational climate pattern of the libraries is used from the multivariate linear regression and track diagram. Results: of the 9 variables affecting organizational climate, 5 variables of innovation, teamwork, customer service, psychological safety and deep diversity play a major role in prediction of the organizational climate of Iran’s libraries. The results also indicate that each of these variables with different coefficient have the power to predict organizational climate but the climate score of psychological safety (0.94) plays a very crucial role in predicting the organizational climate. Track diagram showed that five variables of teamwork, customer service, psychological safety, deep diversity and innovation directly effects on the organizational climate variable that contribution of the team work from this influence is more than any other variables. Conclusions: Of the indicator of the organizational climate of climateQual, the contribution of the team work from this influence is more than any other variables that reinforcement of teamwork in academic libraries can be more effective in improving the organizational climate of this type libraries. PMID:26622203
A Gompertz regression model for fern spores germination
Gabriel y Galán, Jose María
2015-06-01
Full Text Available Germination is one of the most important biological processes for both seed and spore plants, also for fungi. At present, mathematical models of germination have been developed in fungi, bryophytes and several plant species. However, ferns are the only group whose germination has never been modelled. In this work we develop a regression model of the germination of fern spores. We have found that for Blechnum serrulatum, Blechnum yungense, Cheilanthes pilosa, Niphidium macbridei and Polypodium feuillei species the Gompertz growth model describe satisfactorily cumulative germination. An important result is that regression parameters are independent of fern species and the model is not affected by intraspecific variation. Our results show that the Gompertz curve represents a general germination model for all the non-green spore leptosporangiate ferns, including in the paper a discussion about the physiological and ecological meaning of the model.La germinación es uno de los procesos biológicos más relevantes tanto para las plantas con esporas, como para las plantas con semillas y los hongos. Hasta el momento, se han desarrollado modelos de germinación para hongos, briofitos y diversas especies de espermatófitos. Los helechos son el único grupo de plantas cuya germinación nunca ha sido modelizada. En este trabajo se desarrolla un modelo de regresión para explicar la germinación de las esporas de helechos. Observamos que para las especies Blechnum serrulatum, Blechnum yungense, Cheilanthes pilosa, Niphidium macbridei y Polypodium feuillei el modelo de crecimiento de Gompertz describe satisfactoriamente la germinación acumulativa. Un importante resultado es que los parámetros de la regresión son independientes de la especie y que el modelo no está afectado por variación intraespecífica. Por lo tanto, los resultados del trabajo muestran que la curva de Gompertz puede representar un modelo general para todos los helechos leptosporangiados
Zhou, Lim Yi; Shan, Fam Pei; Shimizu, Kunio; Imoto, Tomoaki; Lateh, Habibah; Peng, Koay Swee
2017-08-01
A comparative study of logistic regression, support vector machine (SVM) and least square support vector machine (LSSVM) models has been done to predict the slope failure (landslide) along East-West Highway (Gerik-Jeli). The effects of two monsoon seasons (southwest and northeast) that occur in Malaysia are considered in this study. Two related factors of occurrence of slope failure are included in this study: rainfall and underground water. For each method, two predictive models are constructed, namely SOUTHWEST and NORTHEAST models. Based on the results obtained from logistic regression models, two factors (rainfall and underground water level) contribute to the occurrence of slope failure. The accuracies of the three statistical models for two monsoon seasons are verified by using Relative Operating Characteristics curves. The validation results showed that all models produced prediction of high accuracy. For the results of SVM and LSSVM, the models using RBF kernel showed better prediction compared to the models using linear kernel. The comparative results showed that, for SOUTHWEST models, three statistical models have relatively similar performance. For NORTHEAST models, logistic regression has the best predictive efficiency whereas the SVM model has the second best predictive efficiency.
Meta-Modeling by Symbolic Regression and Pareto Simulated Annealing
Stinstra, E.; Rennen, G.; Teeuwen, G.J.A.
2006-01-01
The subject of this paper is a new approach to Symbolic Regression.Other publications on Symbolic Regression use Genetic Programming.This paper describes an alternative method based on Pareto Simulated Annealing.Our method is based on linear regression for the estimation of constants.Interval arithm
Examining secular trend and seasonality in count data using dynamic generalized linear modelling
Lundbye-Christensen, Søren; Dethlefsen, Claus; Gorst-Rasmussen, Anders;
series regression model for Poisson counts. It differs in allowing the regression coefficients to vary gradually over time in a random fashion. Data In the period January 1980 to 1999, 17,989 incidents of acute myocardial infarction were recorded in the county of Northern Jutland, Denmark. Records were...... updated daily. Results The model with a seasonal pattern and an approximately linear trend was fitted to the data, and diagnostic plots indicate a good model fit. The analysis with the dynamic model revealed peaks coinciding with influenza epidemics. On average the peak-to-trough ratio is estimated...
Land-use regression panel models of NO2 concentrations in Seoul, Korea
Kim, Youngkook; Guldmann, Jean-Michel
2015-04-01
Transportation and land-use activities are major air pollution contributors. Since their shares of emissions vary across space and time, so do air pollution concentrations. Despite these variations, panel data have rarely been used in land-use regression (LUR) modeling of air pollution. In addition, the complex interactions between traffic flows, land uses, and meteorological variables, have not been satisfactorily investigated in LUR models. The purpose of this research is to develop and estimate nitrogen dioxide (NO2) panel models based on the LUR framework with data for Seoul, Korea, accounting for the impacts of these variables, and their interactions with spatial and temporal dummy variables. The panel data vary over several scales: daily (24 h), seasonally (4), and spatially (34 intra-urban measurement locations). To enhance model explanatory power, wind direction and distance decay effects are accounted for. The results show that vehicle-kilometers-traveled (VKT) and solar radiation have statistically strong positive and negative impacts on NO2 concentrations across the four seasonal models. In addition, there are significant interactions with the dummy variables, pointing to VKT and solar radiation effects on NO2 concentrations that vary with time and intra-urban location. The results also show that residential, commercial, and industrial land uses, and wind speed, temperature, and humidity, all impact NO2 concentrations. The R2 vary between 0.95 and 0.98.
H. Tonhati
2010-02-01
Full Text Available The objectives of this study were to estimate (covariance functions for additive genetic and permanent environmental effects, as well as the genetic parameters for milk yield over multiple parities, using random regressions models (RRM. Records of 4,757 complete lactations of Murrah breed buffaloes from 12 herds were analyzed. Ages at calving were between 2 and 11 years. The model included the additive genetic and permanent environmental random effects and the fixed effects of contemporary groups (herd, year and calving season and milking frequency (1 or 2. A cubic regression on Legendre orthogonal polynomials of ages was used to model the mean trend. The additive genetic and permanent environmental effects were modeled by Legendre orthogonal polynomials. Residual variances were considered homogenous or heterogeneous, modeled through variance functions or step functions with 5, 7 or 10 classes. Results from Akaike’s and Schwarz’s Bayesian information criterion indicated that a RRM considering a third order polynomial for the additive genetic and permanent environmental effects and a step function with 5 classes for residual variances fitted best. Heritability estimates obtained by this model varied from 0.10 to 0.28. Genetic correlations were high between consecutive ages, but decreased when intervals between ages increased
Modelling Seasonal Carbon Dynamics on Fen Peatlands
Giebels, Michael; Beyer, Madlen; Augustin, Jürgen; Roppel, Mario; Juszczak, Radoszlav; Serba, Tomasz
2010-05-01
In Germany more than 99 % of fens have lost their carbon and nutrient sink function due to heavy drainage and agricultural land use especially during the last decades and thus resulted in compression and heavy peat loss (CHARMAN 2002; JOOSTEN & CLARKE 2002; SUCCOW & JOOSTEN 2001; AUGUSTIN et al. 1996; KUNTZE 1993). Therefore fen peatlands play an important part (4-5 %) in the national anthropogenic trace gas budget. But only a small part of drained and agricultural used fens in NE Germany can be restored. Knowledge of the influence of land use to trace gas exchange is important for mitigation of the climate impact of the anthropogenic peatland use. We study carbon exchanges between soil and atmosphere on several fen peatland use areas at different sites in NE-Germany. Our research covers peatlands of supposed strongly climate forcing land use (cornfield and intensive pasture) and of probably less forcing, alternative types (meadow and extensive pasture) as well as rewetted (formerly drained) areas and near-natural sites like a low-degraded fen and a wetted alder woodland. We measured trace gas fluxes with manual and automatic chambers in periodic routines since spring 2007. The used chamber technique bases on DROESLER (2005). In total we now do research at 22 sites situated in 5 different locations covering agricultural, varying states of rewetted and near-natural treatments. We present results of at least 2 years of measurements on our site of varying types of agricultural land use. There we found significant differences in the annual carbon balances depending on the genesis of the observed sites and the seasonal dynamics. Annual balances were constructed by applying single respiration and photosynthesis CO2 models for each measurement campaign. These models were based on LLOYD-TAYLOR (1994) and Michaelis-Menten-Kinetics respectively. Crosswise comparison of different site treatments combined with the seasonal environmental observations give good hints for the
Modeling Information Content Via Dirichlet-Multinomial Regression Analysis.
Ferrari, Alberto
2017-02-16
Shannon entropy is being increasingly used in biomedical research as an index of complexity and information content in sequences of symbols, e.g. languages, amino acid sequences, DNA methylation patterns and animal vocalizations. Yet, distributional properties of information entropy as a random variable have seldom been the object of study, leading to researchers mainly using linear models or simulation-based analytical approach to assess differences in information content, when entropy is measured repeatedly in different experimental conditions. Here a method to perform inference on entropy in such conditions is proposed. Building on results coming from studies in the field of Bayesian entropy estimation, a symmetric Dirichlet-multinomial regression model, able to deal efficiently with the issue of mean entropy estimation, is formulated. Through a simulation study the model is shown to outperform linear modeling in a vast range of scenarios and to have promising statistical properties. As a practical example, the method is applied to a data set coming from a real experiment on animal communication.
A nonlinear regression model-based predictive control algorithm.
Dubay, R; Abu-Ayyad, M; Hernandez, J M
2009-04-01
This paper presents a unique approach for designing a nonlinear regression model-based predictive controller (NRPC) for single-input-single-output (SISO) and multi-input-multi-output (MIMO) processes that are common in industrial applications. The innovation of this strategy is that the controller structure allows nonlinear open-loop modeling to be conducted while closed-loop control is executed every sampling instant. Consequently, the system matrix is regenerated every sampling instant using a continuous function providing a more accurate prediction of the plant. Computer simulations are carried out on nonlinear plants, demonstrating that the new approach is easily implemented and provides tight control. Also, the proposed algorithm is implemented on two real time SISO applications; a DC motor, a plastic injection molding machine and a nonlinear MIMO thermal system comprising three temperature zones to be controlled with interacting effects. The experimental closed-loop responses of the proposed algorithm were compared to a multi-model dynamic matrix controller (MPC) with improved results for various set point trajectories. Good disturbance rejection was attained, resulting in improved tracking of multi-set point profiles in comparison to multi-model MPC.
Statistical Inference for Partially Linear Regression Models with Measurement Errors
Jinhong YOU; Qinfeng XU; Bin ZHOU
2008-01-01
In this paper, the authors investigate three aspects of statistical inference for the partially linear regression models where some covariates are measured with errors. Firstly,a bandwidth selection procedure is proposed, which is a combination of the difference-based technique and GCV method. Secondly, a goodness-of-fit test procedure is proposed,which is an extension of the generalized likelihood technique. Thirdly, a variable selection procedure for the parametric part is provided based on the nonconcave penalization and corrected profile least squares. Same as "Variable selection via nonconcave penalized like-lihood and its oracle properties" (J. Amer. Statist. Assoc., 96, 2001, 1348-1360), it is shown that the resulting estimator has an oracle property with a proper choice of regu-larization parameters and penalty function. Simulation studies are conducted to illustrate the finite sample performances of the proposed procedures.
Projection-type estimation for varying coefficient regression models
Lee, Young K; Park, Byeong U; 10.3150/10-BEJ331
2012-01-01
In this paper we introduce new estimators of the coefficient functions in the varying coefficient regression model. The proposed estimators are obtained by projecting the vector of the full-dimensional kernel-weighted local polynomial estimators of the coefficient functions onto a Hilbert space with a suitable norm. We provide a backfitting algorithm to compute the estimators. We show that the algorithm converges at a geometric rate under weak conditions. We derive the asymptotic distributions of the estimators and show that the estimators have the oracle properties. This is done for the general order of local polynomial fitting and for the estimation of the derivatives of the coefficient functions, as well as the coefficient functions themselves. The estimators turn out to have several theoretical and numerical advantages over the marginal integration estimators studied by Yang, Park, Xue and H\\"{a}rdle [J. Amer. Statist. Assoc. 101 (2006) 1212--1227].
The R Package threg to Implement Threshold Regression Models
Tao Xiao
2015-08-01
This new package includes four functions: threg, and the methods hr, predict and plot for threg objects returned by threg. The threg function is the model-fitting function which is used to calculate regression coefficient estimates, asymptotic standard errors and p values. The hr method for threg objects is the hazard-ratio calculation function which provides the estimates of hazard ratios at selected time points for specified scenarios (based on given categories or value settings of covariates. The predict method for threg objects is used for prediction. And the plot method for threg objects provides plots for curves of estimated hazard functions, survival functions and probability density functions of the first-hitting-time; function curves corresponding to different scenarios can be overlaid in the same plot for comparison to give additional research insights.
Epistasis analysis for quantitative traits by functional regression model.
Zhang, Futao; Boerwinkle, Eric; Xiong, Momiao
2014-06-01
The critical barrier in interaction analysis for rare variants is that most traditional statistical methods for testing interactions were originally designed for testing the interaction between common variants and are difficult to apply to rare variants because of their prohibitive computational time and poor ability. The great challenges for successful detection of interactions with next-generation sequencing (NGS) data are (1) lack of methods for interaction analysis with rare variants, (2) severe multiple testing, and (3) time-consuming computations. To meet these challenges, we shift the paradigm of interaction analysis between two loci to interaction analysis between two sets of loci or genomic regions and collectively test interactions between all possible pairs of SNPs within two genomic regions. In other words, we take a genome region as a basic unit of interaction analysis and use high-dimensional data reduction and functional data analysis techniques to develop a novel functional regression model to collectively test interactions between all possible pairs of single nucleotide polymorphisms (SNPs) within two genome regions. By intensive simulations, we demonstrate that the functional regression models for interaction analysis of the quantitative trait have the correct type 1 error rates and a much better ability to detect interactions than the current pairwise interaction analysis. The proposed method was applied to exome sequence data from the NHLBI's Exome Sequencing Project (ESP) and CHARGE-S study. We discovered 27 pairs of genes showing significant interactions after applying the Bonferroni correction (P-values < 4.58 × 10(-10)) in the ESP, and 11 were replicated in the CHARGE-S study.
Daniel J Spade
Full Text Available BACKGROUND: Queen conch (Strombus gigas reproduction is inhibited in nearshore areas of the Florida Keys, relative to the offshore environment where conchs reproduce successfully. Nearshore reproductive failure is possibly a result of exposure to environmental factors, including heavy metals, which are likely to accumulate close to shore. Metals such as Cu and Zn are detrimental to reproduction in many mollusks. METHODOLOGY/PRINCIPAL FINDINGS: Histology shows gonadal atrophy in nearshore conchs as compared to reproductively healthy offshore conchs. In order to determine molecular mechanisms leading to tissue changes and reproductive failure, a microarray was developed. A normalized cDNA library for queen conch was constructed and sequenced using the 454 Life Sciences GS-FLX pyrosequencer, producing 27,723 assembled contigs and 7,740 annotated transcript sequences. The resulting sequences were used to design the microarray. Microarray analysis of conch testis indicated differential regulation of 255 genes (p<0.01 in nearshore conch, relative to offshore. Changes in expression for three of four transcripts of interest were confirmed using real-time reverse transcription polymerase chain reaction. Gene Ontology enrichment analysis indicated changes in biological processes: respiratory chain (GO:0015992, spermatogenesis (GO:0007283, small GTPase-mediated signal transduction (GO:0007264, and others. Inductively coupled plasma-mass spectrometry analysis indicated that Zn and possibly Cu were elevated in some nearshore conch tissues. CONCLUSIONS/SIGNIFICANCE: Congruence between testis histology and microarray data suggests that nearshore conch testes regress during the reproductive season, while offshore conch testes develop normally. Possible mechanisms underlying the testis regression observed in queen conch in the nearshore Florida Keys include a disruption of small GTPase (Ras-mediated signaling in testis development. Additionally, elevated tissue
Rajab, Jasim Mohammed; Jafri, Mohd. Zubir Mat; Lim, Hwee San; Abdullah, Khiruddin
2012-10-01
This study encompasses air surface temperature (AST) modeling in the lower atmosphere. Data of four atmosphere pollutant gases (CO, O3, CH4, and H2O) dataset, retrieved from the National Aeronautics and Space Administration Atmospheric Infrared Sounder (AIRS), from 2003 to 2008 was employed to develop a model to predict AST value in the Malaysian peninsula using the multiple regression method. For the entire period, the pollutants were highly correlated (R=0.821) with predicted AST. Comparisons among five stations in 2009 showed close agreement between the predicted AST and the observed AST from AIRS, especially in the southwest monsoon (SWM) season, within 1.3 K, and for in situ data, within 1 to 2 K. The validation results of AST with AST from AIRS showed high correlation coefficient (R=0.845 to 0.918), indicating the model's efficiency and accuracy. Statistical analysis in terms of β showed that H2O (0.565 to 1.746) tended to contribute significantly to high AST values during the northeast monsoon season. Generally, these results clearly indicate the advantage of using the satellite AIRS data and a correlation analysis study to investigate the impact of atmospheric greenhouse gases on AST over the Malaysian peninsula. A model was developed that is capable of retrieving the Malaysian peninsulan AST in all weather conditions, with total uncertainties ranging between 1 and 2 K.
Robust Medical Test Evaluation Using Flexible Bayesian Semiparametric Regression Models
Adam J. Branscum
2013-01-01
Full Text Available The application of Bayesian methods is increasing in modern epidemiology. Although parametric Bayesian analysis has penetrated the population health sciences, flexible nonparametric Bayesian methods have received less attention. A goal in nonparametric Bayesian analysis is to estimate unknown functions (e.g., density or distribution functions rather than scalar parameters (e.g., means or proportions. For instance, ROC curves are obtained from the distribution functions corresponding to continuous biomarker data taken from healthy and diseased populations. Standard parametric approaches to Bayesian analysis involve distributions with a small number of parameters, where the prior specification is relatively straight forward. In the nonparametric Bayesian case, the prior is placed on an infinite dimensional space of all distributions, which requires special methods. A popular approach to nonparametric Bayesian analysis that involves Polya tree prior distributions is described. We provide example code to illustrate how models that contain Polya tree priors can be fit using SAS software. The methods are used to evaluate the covariate-specific accuracy of the biomarker, soluble epidermal growth factor receptor, for discerning lung cancer cases from controls using a flexible ROC regression modeling framework. The application highlights the usefulness of flexible models over a standard parametric method for estimating ROC curves.
Modeling Pan Evaporation for Kuwait by Multiple Linear Regression
Jaber Almedeij
2012-01-01
Full Text Available Evaporation is an important parameter for many projects related to hydrology and water resources systems. This paper constitutes the first study conducted in Kuwait to obtain empirical relations for the estimation of daily and monthly pan evaporation as functions of available meteorological data of temperature, relative humidity, and wind speed. The data used here for the modeling are daily measurements of substantial continuity coverage, within a period of 17 years between January 1993 and December 2009, which can be considered representative of the desert climate of the urban zone of the country. Multiple linear regression technique is used with a procedure of variable selection for fitting the best model forms. The correlations of evaporation with temperature and relative humidity are also transformed in order to linearize the existing curvilinear patterns of the data by using power and exponential functions, respectively. The evaporation models suggested with the best variable combinations were shown to produce results that are in a reasonable agreement with observation values.
A multivariate approach to modeling univariate seasonal time series
Ph.H.B.F. Franses (Philip Hans)
1994-01-01
textabstractA seasonal time series can be represented by a vector autoregressive model for the annual series containing the seasonal observations. This model allows for periodically varying coefficients. When the vector elements are integrated, the maximum likelihood cointegration method can be used
Hybrid grey model to forecast monitoring series with seasonality
WANG Qi-jie; LIAO Xin-hao; ZHOU Yong-hong; ZOU Zheng-rong; ZHU Jian-jun; PENG Yue
2005-01-01
The grey forecasting model has been successfully applied to many fields. However, the precision of GM(1,1) model is not high. In order to remove the seasonal fluctuations in monitoring series before building GM(1,1) model, the forecasting series of GM(1,1) was built, and an inverse process was used to resume the seasonal fluctuations. Two deseasonalization methods were presented , i.e., seasonal index-based deseasonalization and standard normal distribution-based deseasonalization. They were combined with the GM(1,1) model to form hybrid grey models. A simple but practical method to further improve the forecasting results was also suggested. For comparison, a conventional periodic function model was investigated. The concept and algorithms were tested with four years monthly monitoring data. The results show that on the whole the seasonal index-GM(1,1) model outperform the conventional periodic function model and the conventional periodic function model outperform the SND-GM(1,1) model. The mean absolute error and mean square error of seasonal index-GM(1,1) are 30.69% and 54.53% smaller than that of conventional periodic function model, respectively. The high accuracy, straightforward and easy implementation natures of the proposed hybrid seasonal index-grey model make it a powerful analysis technique for seasonal monitoring series.
The microcomputer scientific software series 2: general linear model--regression.
Harold M. Rauscher
1983-01-01
The general linear model regression (GLMR) program provides the microcomputer user with a sophisticated regression analysis capability. The output provides a regression ANOVA table, estimators of the regression model coefficients, their confidence intervals, confidence intervals around the predicted Y-values, residuals for plotting, a check for multicollinearity, a...
Spengler John D
2010-11-01
Full Text Available Abstract Background There is growing concern in communities surrounding airports regarding the contribution of various emission sources (such as aircraft and ground support equipment to nearby ambient concentrations. We used extensive monitoring of nitrogen dioxide (NO2 in neighborhoods surrounding T.F. Green Airport in Warwick, RI, and land-use regression (LUR modeling techniques to determine the impact of proximity to the airport and local traffic on these concentrations. Methods Palmes diffusion tube samplers were deployed along the airport's fence line and within surrounding neighborhoods for one to two weeks. In total, 644 measurements were collected over three sampling campaigns (October 2007, March 2008 and June 2008 and each sampling location was geocoded. GIS-based variables were created as proxies for local traffic and airport activity. A forward stepwise regression methodology was employed to create general linear models (GLMs of NO2 variability near the airport. The effect of local meteorology on associations with GIS-based variables was also explored. Results Higher concentrations of NO2 were seen near the airport terminal, entrance roads to the terminal, and near major roads, with qualitatively consistent spatial patterns between seasons. In our final multivariate model (R2 = 0.32, the local influences of highways and arterial/collector roads were statistically significant, as were local traffic density and distance to the airport terminal (all p Conclusion Our study has shown that there are clear local variations in NO2 in the neighborhoods that surround an urban airport, which are spatially consistent across seasons. LUR modeling demonstrated a strong influence of local traffic, except the smallest roads that predominate in residential areas, as well as proximity to the airport terminal.
Modeling seasonal migration of fall armyworm moths
Westbrook, J. K.; Nagoshi, R. N.; Meagher, R. L.; Fleischer, S. J.; Jairam, S.
2016-02-01
Fall armyworm, Spodoptera frugiperda (J.E. Smith), is a highly mobile insect pest of a wide range of host crops. However, this pest of tropical origin cannot survive extended periods of freezing temperature but must migrate northward each spring if it is to re-infest cropping areas in temperate regions. The northward limit of the winter-breeding region for North America extends to southern regions of Texas and Florida, but infestations are regularly reported as far north as Québec and Ontario provinces in Canada by the end of summer. Recent genetic analyses have characterized migratory pathways from these winter-breeding regions, but knowledge is lacking on the atmosphere's role in influencing the timing, distance, and direction of migratory flights. The Hybrid Single-Particle Lagrangian Integrated Trajectory (HYSPLIT) model was used to simulate migratory flight of fall armyworm moths from distinct winter-breeding source areas. Model simulations identified regions of dominant immigration from the Florida and Texas source areas and overlapping immigrant populations in the Alabama-Georgia and Pennsylvania-Mid-Atlantic regions. This simulated migratory pattern corroborates a previous migratory map based on the distribution of fall armyworm haplotype profiles. We found a significant regression between the simulated first week of moth immigration and first week of moth capture (for locations which captured ≥10 moths), which on average indicated that the model simulated first immigration 2 weeks before first captures in pheromone traps. The results contribute to knowledge of fall armyworm population ecology on a continental scale and will aid in the prediction and interpretation of inter-annual variability of insect migration patterns including those in response to climatic change and adoption rates of transgenic cultivars.
Air Pollution Analysis using Ontologies and Regression Models
Parul Choudhary
2016-07-01
Full Text Available Rapidly throughout the world economy, "the expansive Web" in the "world" explosive growth, rapidly growing market characterized by short product cycles exists and the demand for increased flexibility as well as the extensive use of a new data vision managed data society. A new socio-economic system that relies more and more on movement and allocation results in data whose daily existence, refinement, economy and adjust the exchange industry. Cooperative Engineering Co -operation and multi -disciplinary installed on people's cooperation is a good example. Semantic Web is a new form of Web content that is meaningful to computers and additional approved another example. Communication, vision sharing and exchanging data Society's are new commercial bet. Urban air pollution modeling and data processing techniques need elevated Association. Artificial intelligence in countless ways and breakthrough technologies can solve environmental problems from uneven offers. A method for data to formal ontology means a true meaning and lack of ambiguity to allow us to portray memo. In this work we survey regression model for ontologies and air pollution.
Domagalski, J. L.; Schlegel, B.; Hutchins, J.
2014-12-01
Long-term data sets on stream-water quality and discharge can be used to assess whether best management practices (BMPs) are restoring beneficial uses of impaired water as required under the Clean Water Act. In this study, we evaluated a greater than 20-year record of water quality from selected streams in the Central Valley (CV) of California and Lake Tahoe (California and Nevada, USA). The CV contains a mix of agricultural and urbanized land, while the Lake Tahoe area is mostly forested, with seasonal residents and tourism. Because nutrients and fine sediments cause a reduction in water clarity that impair Lake Tahoe, BMPs were implemented in the early 1990's, to reduce nitrogen and phosphorus loads. The CV does not have a current nutrient management plan, but numerous BMPs exist to reduce pesticide loads, and it was hypothesized that these programs could also reduce nutrient levels. In the CV and Lake Tahoe areas, nutrient concentrations, loads, and trends were estimated by using the recently developed Weighted Regressions on Time, Discharge, and Season (WRTDS) model. Sufficient data were available to compare trends during a voluntary and enforcement period for seven CV sites within the lower Sacramento and San Joaquin Basins. For six of the seven sites, flow-normalized mean annual concentrations of total phosphorus and nitrate decreased at a faster rate during the enforcement period than during the earlier voluntary period. Concentration changes during similar years and ranges of flow conditions suggest that BMPs designed for pesticides also reduced nutrient loads in the CV. A trend analysis using WRTDS was completed for six streams that enter Lake Tahoe during the late 1980's through 2008. The results of the model confirm that nutrient loading is influenced strongly by season, such as by spring runoff from snowmelt. The highest nutrient concentrations in the late 1980's and early 1990's correlate with high flows, followed by statistically significant decreases
Song, Chao; Kwan, Mei-Po; Zhu, Jiping
2017-04-08
An increasing number of fires are occurring with the rapid development of cities, resulting in increased risk for human beings and the environment. This study compares geographically weighted regression-based models, including geographically weighted regression (GWR) and geographically and temporally weighted regression (GTWR), which integrates spatial and temporal effects and global linear regression models (LM) for modeling fire risk at the city scale. The results show that the road density and the spatial distribution of enterprises have the strongest influences on fire risk, which implies that we should focus on areas where roads and enterprises are densely clustered. In addition, locations with a large number of enterprises have fewer fire ignition records, probably because of strict management and prevention measures. A changing number of significant variables across space indicate that heterogeneity mainly exists in the northern and eastern rural and suburban areas of Hefei city, where human-related facilities or road construction are only clustered in the city sub-centers. GTWR can capture small changes in the spatiotemporal heterogeneity of the variables while GWR and LM cannot. An approach that integrates space and time enables us to better understand the dynamic changes in fire risk. Thus governments can use the results to manage fire safety at the city scale.
Random regression models for daily feed intake in Danish Duroc pigs
Strathe, Anders Bjerring; Mark, Thomas; Jensen, Just
The objective of this study was to develop random regression models and estimate covariance functions for daily feed intake (DFI) in Danish Duroc pigs. A total of 476201 DFI records were available on 6542 Duroc boars between 70 to 160 days of age. The data originated from the National test station...... and were collected using ACEMO electronic feeders in the period of 2008 to 2011. The pedigree was traced back to 1995 and included 17222 animals. The phenotypic feed intake curve was decomposed into a fixed curve, being specific to the barn-year-season effect and curves associated with the random pen....... Eigenvalues of the genetic covariance function showed that 33% of genetic variability was explained by the individual genetic curve of the pigs. This proportion was covered by linear (27%) and quadratic (6%) coefficients. Genetic eigenfunctions revealed that altering the shape of the feed intake curve...
A Malthusian Model for All Seasons
Weisdorf, Jacob Louis; Sharp, Paul Richard
2009-01-01
with agricultural intensification, depending on whether technological progress emerges in relation to cultivation or harvesting activities. Our result rests on evidence reported by Boserup (1965) and others, which suggests that harvest seasons in traditional agriculture are characterized by severe labour shortage....
Bilinear modulation models for seasonal tables of counts
B.D. Marx (Brian); P.H.C. Eilers (Paul); J. Gampe (Jutta); R. Rau (Roland)
2010-01-01
textabstractWe propose generalized linear models for time or age-time tables of seasonal counts, with the goal of better understanding seasonal patterns in the data. The linear predictor contains a smooth component for the trend and the product of a smooth component (the modulation) and a periodic t
MODELING SNAKE MICROHABITAT FROM RADIOTELEMETRY STUDIES USING POLYTOMOUS LOGISTIC REGRESSION
Multivariate analysis of snake microhabitat has historically used techniques that were derived under assumptions of normality and common covariance structure (e.g., discriminant function analysis, MANOVA). In this study, polytomous logistic regression (PLR which does not require ...
Correlation-regression model for physico-chemical quality of ...
abusaad
Key words: Groundwater, water quality, bore well, water supply, correlation, regression. INTRODUCTION ..... interpreting groundwater quality data and relating them to specific hydro ..... Regional trends in nitrate content of Texas groundwater.
Faraway, Julian J
2005-01-01
Linear models are central to the practice of statistics and form the foundation of a vast range of statistical methodologies. Julian J. Faraway''s critically acclaimed Linear Models with R examined regression and analysis of variance, demonstrated the different methods available, and showed in which situations each one applies. Following in those footsteps, Extending the Linear Model with R surveys the techniques that grow from the regression model, presenting three extensions to that framework: generalized linear models (GLMs), mixed effect models, and nonparametric regression models. The author''s treatment is thoroughly modern and covers topics that include GLM diagnostics, generalized linear mixed models, trees, and even the use of neural networks in statistics. To demonstrate the interplay of theory and practice, throughout the book the author weaves the use of the R software environment to analyze the data of real examples, providing all of the R commands necessary to reproduce the analyses. All of the ...
Regression of retinopathy by squalamine in a mouse model.
Higgins, Rosemary D; Yan, Yun; Geng, Yixun; Zasloff, Michael; Williams, Jon I
2004-07-01
The goal of this study was to determine whether an antiangiogenic agent, squalamine, given late during the evolution of oxygen-induced retinopathy (OIR) in the mouse, could improve retinal neovascularization. OIR was induced in neonatal C57BL6 mice and the neonates were treated s.c. with squalamine doses begun at various times after OIR induction. A system of retinal whole mounts and assessment of neovascular nuclei extending beyond the inner limiting membrane from animals reared under room air or OIR conditions and killed periodically from d 12 to 21 were used to assess retinopathy in squalamine-treated and untreated animals. OIR evolved after 75% oxygen exposure in neonatal mice with florid retinal neovascularization developing by d 14. Squalamine (single dose, 25 mg/kg s.c.) given on d 15 or 16, but not d 17, substantially improved retinal neovascularization in the mouse model of OIR. There was improvement seen in the degree of blood vessel tuft formation, blood vessel tortuosity, and central vasoconstriction with squalamine treatment at d 15 or 16. Single-dose squalamine at d 12 was effective at reducing subsequent development of retinal neovascularization at doses as low as 1 mg/kg. Squalamine is a very active inhibitor of OIR in mouse neonates at doses as low as 1 mg/kg given once. Further, squalamine given late in the course of OIR improves retinopathy by inducing regression of retinal neovessels and abrogating invasion of new vessels beyond the inner-limiting membrane of the retina.
Linking Simple Economic Theory Models and the Cointegrated Vector AutoRegressive Model
Møller, Niels Framroze
This paper attempts to clarify the connection between simple economic theory models and the approach of the Cointegrated Vector-Auto-Regressive model (CVAR). By considering (stylized) examples of simple static equilibrium models, it is illustrated in detail, how the theoretical model and its...
Analysis of rainfall seasonality from observations and climate models
Pascale, Salvatore; Feng, Xue; Porporato, Amilcare; Hasson, Shabeh-ul
2014-01-01
Precipitation seasonality of observational datasets and CMIP5 historical simulations are analyzed using novel quantitative measures based on information theory. Two new indicators, the relative entropy (RE) and the dimensionless seasonality index (DSI), together with the mean annual rainfall, are evaluated on a global scale for recently updated precipitation gridded datasets and for historical simulations from coupled atmosphere-ocean general circulation models. The RE provides a measure of how peaked the shape of the annual rainfall curve is whereas the DSI quantifies the intensity of the rainfall during the wet season. The global monsoon regions feature the largest values of the DSI. For precipitation regimes featuring one maximum in the monthly rain distribution the RE is related to the duration of the wet season. We show that the RE and the DSI are measures of rainfall seasonality fairly independent of the time resolution of the precipitation data, thereby allowing objective metrics for model intercompari...
Regression model for tuning the PID controller with fractional order time delay system
S.P. Agnihotri; Laxman Madhavrao Waghmare
2014-01-01
In this paper a regression model based for tuning proportional integral derivative (PID) controller with fractional order time delay system is proposed. The novelty of this paper is that tuning parameters of the fractional order time delay system are optimally predicted using the regression model. In the proposed method, the output parameters of the fractional order system are used to derive the regression function. Here, the regression model depends on the weights of the exponential function...
A generalized additive regression model for survival times
Scheike, Thomas H.
2001-01-01
Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models......Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models...
A generalized additive regression model for survival times
Scheike, Thomas H.
2001-01-01
Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models......Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models...
Gu, Fei; Preacher, Kristopher J; Wu, Wei; Yung, Yiu-Fai
2014-01-01
Although the state space approach for estimating multilevel regression models has been well established for decades in the time series literature, it does not receive much attention from educational and psychological researchers. In this article, we (a) introduce the state space approach for estimating multilevel regression models and (b) extend the state space approach for estimating multilevel factor models. A brief outline of the state space formulation is provided and then state space forms for univariate and multivariate multilevel regression models, and a multilevel confirmatory factor model, are illustrated. The utility of the state space approach is demonstrated with either a simulated or real example for each multilevel model. It is concluded that the results from the state space approach are essentially identical to those from specialized multilevel regression modeling and structural equation modeling software. More importantly, the state space approach offers researchers a computationally more efficient alternative to fit multilevel regression models with a large number of Level 1 units within each Level 2 unit or a large number of observations on each subject in a longitudinal study.
A Bayesian Nonparametric Causal Model for Regression Discontinuity Designs
Karabatsos, George; Walker, Stephen G.
2013-01-01
The regression discontinuity (RD) design (Thistlewaite & Campbell, 1960; Cook, 2008) provides a framework to identify and estimate causal effects from a non-randomized design. Each subject of a RD design is assigned to the treatment (versus assignment to a non-treatment) whenever her/his observed value of the assignment variable equals or…
An innovative land use regression model incorporating meteorology for exposure analysis.
Su, Jason G; Brauer, Michael; Ainslie, Bruce; Steyn, Douw; Larson, Timothy; Buzzelli, Michael
2008-02-15
The advent of spatial analysis and geographic information systems (GIS) has led to studies of chronic exposure and health effects based on the rationale that intra-urban variations in ambient air pollution concentrations are as great as inter-urban differences. Such studies typically rely on local spatial covariates (e.g., traffic, land use type) derived from circular areas (buffers) to predict concentrations/exposures at receptor sites, as a means of averaging the annual net effect of meteorological influences (i.e., wind speed, wind direction and insolation). This is the approach taken in the now popular land use regression (LUR) method. However spatial studies of chronic exposures and temporal studies of acute exposures have not been adequately integrated. This paper presents an innovative LUR method implemented in a GIS environment that reflects both temporal and spatial variability and considers the role of meteorology. The new source area LUR integrates wind speed, wind direction and cloud cover/insolation to estimate hourly nitric oxide (NO) and nitrogen dioxide (NO(2)) concentrations from land use types (i.e., road network, commercial land use) and these concentrations are then used as covariates to regress against NO and NO(2) measurements at various receptor sites across the Vancouver region and compared directly with estimates from a regular LUR. The results show that, when variability in seasonal concentration measurements is present, the source area LUR or SA-LUR model is a better option for concentration estimation.
A Method to Model Season of Birth as a Surrogate Environmental Risk Factor for Disease
Susan Searles Nielsen
2008-03-01
Full Text Available Environmental exposures, including some that vary seasonally, may play a role in the development of many types of childhood diseases such as cancer. Those observed in children are unique in that the relevant period of exposure is inherently limited or perhaps even specific to a very short window during prenatal development or early infancy. As such, researchers have investigated whether specific childhood cancers are associated with season of birth. Typically a basic method for analysis has been used, for example categorization of births into one of four seasons, followed by simple comparisons between categories such as via logistic regression, to obtain odds ratios (ORs, confidence intervals (CIs and p-values. In this paper we present an alternative method, based upon an iterative trigonometric logistic regression model used to analyze the cyclic nature of birth dates related to disease occurrence. Disease birth-date results are presented using a sinusoidal graph with a peak date of relative risk and a single p-value that tests whether an overall seasonal association is present. An OR and CI comparing children born in the 3-month period around the peak to the symmetrically opposite 3-month period also can be obtained. Advantages of this derivative-free method include ease of use, increased statistical power to detect associations, and the ability to avoid potentially arbitrary, subjective demarcation of seasons.
Linear regression model selection using p-values when the model dimension grows
Pokarowski, Piotr; Teisseyre, Paweł
2012-01-01
We consider a new criterion-based approach to model selection in linear regression. Properties of selection criteria based on p-values of a likelihood ratio statistic are studied for families of linear regression models. We prove that such procedures are consistent i.e. the minimal true model is chosen with probability tending to 1 even when the number of models under consideration slowly increases with a sample size. The simulation study indicates that introduced methods perform promisingly when compared with Akaike and Bayesian Information Criteria.
Grajeda, Laura M; Ivanescu, Andrada; Saito, Mayuko; Crainiceanu, Ciprian; Jaganath, Devan; Gilman, Robert H; Crabtree, Jean E; Kelleher, Dermott; Cabrera, Lilia; Cama, Vitaliano; Checkley, William
2016-01-01
Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p linear regression equation for both estimation and prediction of population- and individual-level growth in height. We show that cubic regression splines are superior to linear regression splines for the case of a small number of knots in both estimation and prediction with the full linear mixed effect model (AIC 19,352 vs. 19,598, respectively). While the regression parameters are more complex to interpret in the former, we argue that inference for any problem depends more on the estimated curve or differences in curves rather
Cattani, Giorgio; Gaeta, Alessandra; Di Menno di Bucchianico, Alessandro; De Santis, Antonella; Gaddi, Raffaela; Cusano, Mariacarmela; Ancona, Carla; Badaloni, Chiara; Forastiere, Francesco; Gariazzo, Claudio; Sozzi, Roberto; Inglessis, Marco; Silibello, Camillo; Salvatori, Elisabetta; Manes, Fausto; Cesaroni, Giulia
2017-05-01
The health effects of long-term exposure to ultrafine particles (UFPs) are poorly understood. Data on spatial contrasts in ambient ultrafine particles (UFPs) concentrations are needed with fine resolution. This study aimed to assess the spatial variability of total particle number concentrations (PNC, a proxy for UFPs) in the city of Rome, Italy, using land use regression (LUR) models, and the correspondent exposure of population here living. PNC were measured using condensation particle counters at the building facade of 28 homes throughout the city. Three 7-day monitoring periods were carried out during cold, warm and intermediate seasons. Geographic Information System predictor variables, with buffers of varying size, were evaluated to model spatial variations of PNC. A stepwise forward selection procedure was used to develop a ;base; linear regression model according to the European Study of Cohorts for Air Pollution Effects project methodology. Other variables were then included in more enhanced models and their capability of improving model performance was evaluated. Four LUR models were developed. Local variation in UFPs in the study area can be largely explained by the ratio of traffic intensity and distance to the nearest major road. The best model (adjusted R2 = 0.71; root mean square error = ±1,572 particles/cm³, leave one out cross validated R2 = 0.68) was achieved by regressing building and street configuration variables against residual from the ;base; model, which added 3% more to the total variance explained. Urban green and population density in a 5,000 m buffer around each home were also relevant predictors. The spatial contrast in ambient PNC across the large conurbation of Rome, was successfully assessed. The average exposure of subjects living in the study area was 16,006 particles/cm³ (SD 2165 particles/cm³, range: 11,075-28,632 particles/cm³). A total of 203,886 subjects (16%) lives in Rome within 50 m from a high traffic road and they
A nonparametric dynamic additive regression model for longitudinal data
Martinussen, Torben; Scheike, Thomas H.
2000-01-01
dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models......dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models...
VARIABLE SELECTION BY PSEUDO WAVELETS IN HETEROSCEDASTIC REGRESSION MODELS INVOLVING TIME SERIES
无
2006-01-01
A simple but efficient method has been proposed to select variables in heteroscedastic regression models. It is shown that the pseudo empirical wavelet coefficients corresponding to the significant explanatory variables in the regression models are clearly larger than those nonsignificant ones, on the basis of which a procedure is developed to select variables in regression models. The coefficients of the models are also estimated. All estimators are proved to be consistent.
Lamont, A.E.; Vermunt, J.K.; Van Horn, M.L.
2016-01-01
Regression mixture models are increasingly used as an exploratory approach to identify heterogeneity in the effects of a predictor on an outcome. In this simulation study, we tested the effects of violating an implicit assumption often made in these models; that is, independent variables in the
Snedden, Gregg A.; Steyer, Gregory D.
2013-01-01
Understanding plant community zonation along estuarine stress gradients is critical for effective conservation and restoration of coastal wetland ecosystems. We related the presence of plant community types to estuarine hydrology at 173 sites across coastal Louisiana. Percent relative cover by species was assessed at each site near the end of the growing season in 2008, and hourly water level and salinity were recorded at each site Oct 2007–Sep 2008. Nine plant community types were delineated with k-means clustering, and indicator species were identified for each of the community types with indicator species analysis. An inverse relation between salinity and species diversity was observed. Canonical correspondence analysis (CCA) effectively segregated the sites across ordination space by community type, and indicated that salinity and tidal amplitude were both important drivers of vegetation composition. Multinomial logistic regression (MLR) and Akaike's Information Criterion (AIC) were used to predict the probability of occurrence of the nine vegetation communities as a function of salinity and tidal amplitude, and probability surfaces obtained from the MLR model corroborated the CCA results. The weighted kappa statistic, calculated from the confusion matrix of predicted versus actual community types, was 0.7 and indicated good agreement between observed community types and model predictions. Our results suggest that models based on a few key hydrologic variables can be valuable tools for predicting vegetation community development when restoring and managing coastal wetlands.
N. Diodato
2010-12-01
Full Text Available To reconstruct sub-regional European climate over the past centuries, several efforts have been made using historical datasets. However, only scattered information at low spatial and temporal resolution have been produced to date for the Mediterranean area. This paper has exploited, for Southern and Central Italy (Mediterranean Sub-Regional Area, an unprecedented historical dataset as an attempt to model seasonal (winter and summer air temperatures in pre-instrumental time (back to 1500. Combining information derived from proxy documentary data and large-scale simulation, a statistical methodology in the form of multiscale-temperature regression (MTR-model was developed to adapt larger-scale estimations to the sub-regional temperature pattern. The modelled response lacks essentially of autocorrelations among the residuals (marginal or any significance in the Durbin-Watson statistic, and agrees well with the independent data from the validation sample (Nash-Sutcliffe efficiency coefficient >0.60. The advantage of the approach is not merely increased accuracy in estimation. Rather, it relies on the ability to extract (and exploit the right information to replicate coherent temperature series in historical times.
Modeling Haze Problems in the North of Thailand using Logistic Regression
Busayamas Pimpunchat
2014-07-01
Full Text Available At present, air pollution is a major problem in the upper northern region of Thailand. Air pollutants have an effect on human health, the economy and the traveling industry. The severity of this problem clearly appears every year during the dry season, from February to April. In particular it becomes very serious in March, especially in Chiang Mai province where smoke haze is a major issue. This study looked into related data from 2005-2010 covering eight principal parameters: PM10 (particulate matter with a diameter smaller than 10 micrometer, CO (carbon monoxide, NO2 (nitrogen dioxide, SO2 (sulphur dioxide, RH (relative humidity, NO (nitrogen oxide, pressure, and rainfall. Overall haze problem occurrence was calculated from a logistic regression model. Its dependence on the eight parameters stated above was determined for design conditions using the correlation coefficients with PM10. The proposed overall haze problem modeling can be used as a quantitative assessment criterion for supporting decision making to protect human health. This study proposed to predict haze problem occurrence in 2011. The agreement of the results from the mathematical model with actual measured PM10 concentration data from the Pollution Control Department was quite satisfactory.
Simone Becker Lopes
2014-04-01
Full Text Available Considering the importance of spatial issues in transport planning, the main objective of this study was to analyze the results obtained from different approaches of spatial regression models. In the case of spatial autocorrelation, spatial dependence patterns should be incorporated in the models, since that dependence may affect the predictive power of these models. The results obtained with the spatial regression models were also compared with the results of a multiple linear regression model that is typically used in trips generation estimations. The findings support the hypothesis that the inclusion of spatial effects in regression models is important, since the best results were obtained with alternative models (spatial regression models or the ones with spatial variables included. This was observed in a case study carried out in the city of Porto Alegre, in the state of Rio Grande do Sul, Brazil, in the stages of specification and calibration of the models, with two distinct datasets.
Hybrid model for forecasting time series with trend, seasonal and salendar variation patterns
Suhartono; Rahayu, S. P.; Prastyo, D. D.; Wijayanti, D. G. P.; Juliyanto
2017-09-01
Most of the monthly time series data in economics and business in Indonesia and other Moslem countries not only contain trend and seasonal, but also affected by two types of calendar variation effects, i.e. the effect of the number of working days or trading and holiday effects. The purpose of this research is to develop a hybrid model or a combination of several forecasting models to predict time series that contain trend, seasonal and calendar variation patterns. This hybrid model is a combination of classical models (namely time series regression and ARIMA model) and/or modern methods (artificial intelligence method, i.e. Artificial Neural Networks). A simulation study was used to show that the proposed procedure for building the hybrid model could work well for forecasting time series with trend, seasonal and calendar variation patterns. Furthermore, the proposed hybrid model is applied for forecasting real data, i.e. monthly data about inflow and outflow of currency at Bank Indonesia. The results show that the hybrid model tend to provide more accurate forecasts than individual forecasting models. Moreover, this result is also in line with the third results of the M3 competition, i.e. the hybrid model on average provides a more accurate forecast than the individual model.
First Look at Photometric Reduction via Mixed-Model Regression (Poster abstract)
Dose, E.
2016-12-01
(Abstract only) Mixed-model regression is proposed as a new approach to photometric reduction, especially for variable-star photometry in several filters. Mixed-model regression adds to normal multivariate regression certain "random effects": categorical-variable terms that model and extract specific systematic errors such as image-to-image zero-point fluctuations (cirrus effect) or even errors in comp-star catalog magnitudes.
Genetic parameters for tunisian holsteins using a test-day random regression model.
Hammami, H; Rekik, B; Soyeurt, H; Ben Gara, A; Gengler, N
2008-05-01
Genetic parameters of milk, fat, and protein yields were estimated in the first 3 lactations for registered Tunisian Holsteins. Data included 140,187; 97,404; and 62,221 test-day production records collected on 22,538; 15,257; and 9,722 first-, second-, and third-parity cows, respectively. Records were of cows calving from 1992 to 2004 in 96 herds. (Co)variance components were estimated by Bayesian methods and a 3-trait-3-lactation random regression model. Gibbs sampling was used to obtain posterior distributions. The model included herd x test date, age x season of calving x stage of lactation [classes of 25 days in milk (DIM)], production sector x stage of lactation (classes of 5 DIM) as fixed effects, and random regression coefficients for additive genetic, permanent environmental, and herd-year of calving effects, which were defined as modified constant, linear, and quadratic Legendre coefficients. Heritability estimates for 305-d milk, fat and protein yields were moderate (0.12 to 0.18) and in the same range of parameters estimated in management systems with low to medium production levels. Heritabilities of test-day milk and protein yields for selected DIM were higher in the middle than at the beginning or the end of lactation. Inversely, heritabilities of fat yield were high at the peripheries of lactation. Genetic correlations among 305-d yield traits ranged from 0.50 to 0.86. The largest genetic correlation was observed between the first and second lactation, potentially due to the limited expression of genetic potential of superior cows in later lactations. Results suggested a lack of adaptation under the local management and climatic conditions. Results should be useful to implement a BLUP evaluation for the Tunisian cow population; however, results also indicated that further research focused on data quality might be needed.
Introduction to mixed modelling beyond regression and analysis of variance
Galwey, N W
2007-01-01
Mixed modelling is one of the most promising and exciting areas of statistical analysis, enabling more powerful interpretation of data through the recognition of random effects. However, many perceive mixed modelling as an intimidating and specialized technique.
Waller, Niels; Jones, Jeff
2011-01-01
We describe methods for assessing all possible criteria (i.e., dependent variables) and subsets of criteria for regression models with a fixed set of predictors, x (where x is an n x 1 vector of independent variables). Our methods build upon the geometry of regression coefficients (hereafter called regression weights) in n-dimensional space. For a…
U.S. Environmental Protection Agency — Spreadsheets are included here to support the manuscript "Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition". This...
Preference learning with evolutionary Multivariate Adaptive Regression Spline model
Abou-Zleikha, Mohamed; Shaker, Noor; Christensen, Mads Græsbøll
2015-01-01
for human decision making. Learning models from pairwise preference data is however an NP-hard problem. Therefore, constructing models that can effectively learn such data is a challenging task. Models are usually constructed with accuracy being the most important factor. Another vitally important aspect...... that is usually given less attention is expressiveness, i.e. how easy it is to explain the relationship between the model input and output. Most machine learning techniques are focused either on performance or on expressiveness. This paper employ MARS models which have the advantage of being a powerful method...
Cepeda-Cuervo, Edilberto; Núñez-Antón, Vicente
2013-01-01
In this article, a proposed Bayesian extension of the generalized beta spatial regression models is applied to the analysis of the quality of education in Colombia. We briefly revise the beta distribution and describe the joint modeling approach for the mean and dispersion parameters in the spatial regression models' setting. Finally, we motivate…
von Davier, Matthias; Sinharay, Sandip
2009-01-01
This paper presents an application of a stochastic approximation EM-algorithm using a Metropolis-Hastings sampler to estimate the parameters of an item response latent regression model. Latent regression models are extensions of item response theory (IRT) to a 2-level latent variable model in which covariates serve as predictors of the…
Kleibergen, F.
2003-01-01
We obtain the prior and posterior probability of a nested regression model as the Hausdorff-integral of the prior and posterior on the parameters of an encompassing linear regression model over a lower dimensional set that represents the nested model. The invariant expression of the
Kleibergen, F.R.
2004-01-01
We obtain the prior and posterior probability of a nested regression model as the Hausdorff-integral of the prior and posterior on the parameters of an encompassing linear regression model over a lower-dimensional set that represents the nested model. The Hausdorff-integral is invariant and
A note on the maximum likelihood estimator in the gamma regression model
Jerzy P. Rydlewski
2009-01-01
Full Text Available This paper considers a nonlinear regression model, in which the dependent variable has the gamma distribution. A model is considered in which the shape parameter of the random variable is the sum of continuous and algebraically independent functions. The paper proves that there is exactly one maximum likelihood estimator for the gamma regression model.
Genetic parameters for various random regression models to describe the weight data of pigs
Huisman, A.E.; Veerkamp, R.F.; Arendonk, van J.A.M.
2002-01-01
Various random regression models have been advocated for the fitting of covariance structures. It was suggested that a spline model would fit better to weight data than a random regression model that utilizes orthogonal polynomials. The objective of this study was to investigate which kind of random
Genetic parameters for different random regression models to describe weight data of pigs
Huisman, A.E.; Veerkamp, R.F.; Arendonk, van J.A.M.
2001-01-01
Various random regression models have been advocated for the fitting of covariance structures. It was suggested that a spline model would fit better to weight data than a random regression model that utilizes orthogonal polynomials. The objective of this study was to investigate which kind of random
Cepeda-Cuervo, Edilberto; Núñez-Antón, Vicente
2013-01-01
In this article, a proposed Bayesian extension of the generalized beta spatial regression models is applied to the analysis of the quality of education in Colombia. We briefly revise the beta distribution and describe the joint modeling approach for the mean and dispersion parameters in the spatial regression models' setting. Finally, we…
Dynamically downscaled multi-model ensemble seasonal forecasts over Ethiopia
Asharaf, Shakeel; Fröhlich, Kristina; Fernandez, Jesus; Cardoso, Rita; Nikulin, Grigory; Früh, Barbara
2016-04-01
Truthful and reliable seasonal rainfall predictions have an important social and economic value for the east African countries as their economy is highly dependent on rain-fed agriculture and pastoral systems. Only June to September (JJAS) seasonal rainfall accounts to more than 80% crop production in Ethiopia. Hence, seasonal foresting is a crucial concern for the region. The European Provision of Regional Impact Assessment on a seasonal to decadal timescale (EUPORIAS) project offers a common framework to understand hindcast uncertainties through the use of multi-model and multi-member simulations over east Africa. Under this program, the participating regional climate models (RCMs) were driven by the atmospheric-only version of the ECEARTH global climate model, which provides hindcasts of a five-months period (May to September) from 1991-2012. In this study the RCMs downscaled rainfall is evaluated with respect to the observed JJAS rainfall over Ethiopia. Both deterministic and probabilistic based forecast skills are assessed. Our preliminary results show the potential usefulness of multi-model ensemble simulations in forecasting the seasonal rainfall over the region.
Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.
Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko
2016-03-01
In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. Copyright © 2015 Elsevier Ltd. All rights reserved.
Callegari, Mattia; Mazzoli, Paolo; Gregorio, Ludovica de; Notarnicola, Claudia; PETITTA Marcello; Pasolli, Luca; Seppi, Roberto; Pistocchi, Alberto
2014-01-01
The prediction of monthly mean discharge is critical for water resources management. Statistical methods applied on discharge time series are traditionally used for predicting this kind of slow response hydrological events. With this paper we present a Support Vector Regression (SVR) system able to predict monthly mean discharge considering discharge and snow cover extent (250 meters resolution obtained by MODIS images) time series as input. Additional meteorological and climatic variables ar...
Modeling by regression for laser cutting of quartz crystal
无
2000-01-01
Presents the theoretical models built by analysis of the mechanism of laser cutting of quartz crystal and re gression of test results for the laser cutting of quartz crystal, and comparative analysis of calculation errors for these models, and concludes with test results that these models comprehensively reflect the physical features of laser cutting of quartz crystal and satisfy the industrial production requirements, and they can be used to select right parameters for improvement of productivity and quality and saving of energy.
Logistic Regression Models to Forecast Travelling Behaviour in Tripoli City
Amiruddin Ismail
2011-01-01
Full Text Available Transport modes are very important to Libyan’s Tripoli residents for their daily trips. However, the total number of own car and private transport namely taxi and micro buses on the road increases and causes many problems such as traffic congestion, accidents, air and noise pollution. These problems then causes other related phenomena to the travel activities such as delay in trips, stress and frustration to motorists which may affect their productivity and efficiency to both workers and students. Delay may also increase travel cost as well inefficiency in trips making if compare to other public transport users in some Arabs cities. Switching to public transport (PT modes alternatives such as buses, light rail transit and underground train could improve travel time and travel costs. A transport study has been carried out at Tripoli City Authority areas among own car users who live in areas with inadequate of private transport and poor public transportation services. Analyses about relation between factors such as travel time, travel cost, trip purpose and parking cost have been made to answer research questions. Logistic regression technique has been used to analyse these factors that influence users to switch their trips mode to public transport alternatives.
Teacher training through the Regression Model in foreign language education
Jesús García Laborda
2011-01-01
Full Text Available In the last few years, Spain has seen dramatic changes in its educational system. Many of them have been rejected by most teachers after their implementation (LOGSE while others have found potential drawbacks even before starting operating (LOCE, LOE. To face these changes, schools need well qualified instructors. Given this need, and also considering that, although all the schools want the best teachers but, as teachers’ salaries are regulated by the state, few schools can actually offer incentives to their teachers and consequently schools never have the instructors they wish. Apart from this, state schools have a fixed salary for their teachers and private institutions offer no additional bonuses for things like additional training or diplomas (for example, masters or post-degree courses and, therefore, teachers are rarely interested in pursuing any further studies in methodology or any other related fields such as education or applied linguistics. Although many teachers acknowledge their love to teaching, the current situation in schools (school violence, bad salaries, depression, social desprestige, legal changes and so has made the teaching job one of the most complicated and undevoted in Spain. It is not unusual to have a couple of instructors ill due to depression and other psychological sicknesses. This paper deals with the development and implementation of a training program based on regressive visualizations of one’s experience both as a teacher as well as a learner.
Misspecified poisson regression models for large-scale registry data
Grøn, Randi; Gerds, Thomas A.; Andersen, Per K.
2016-01-01
working models that are then likely misspecified. To support and improve conclusions drawn from such models, we discuss methods for sensitivity analysis, for estimation of average exposure effects using aggregated data, and a semi-parametric bootstrap method to obtain robust standard errors. The methods...
CONSISTENCY OF LS ESTIMATOR IN SIMPLE LINEAR EV REGRESSION MODELS
Liu Jixue; Chen Xiru
2005-01-01
Consistency of LS estimate of simple linear EV model is studied. It is shown that under some common assumptions of the model, both weak and strong consistency of the estimate are equivalent but it is not so for quadratic-mean consistency.
A Noncentral "t" Regression Model for Meta-Analysis
Camilli, Gregory; de la Torre, Jimmy; Chiu, Chia-Yi
2010-01-01
In this article, three multilevel models for meta-analysis are examined. Hedges and Olkin suggested that effect sizes follow a noncentral "t" distribution and proposed several approximate methods. Raudenbush and Bryk further refined this model; however, this procedure is based on a normal approximation. In the current research literature, this…
A Negative Binomial Regression Model for Accuracy Tests
Hung, Lai-Fa
2012-01-01
Rasch used a Poisson model to analyze errors and speed in reading tests. An important property of the Poisson distribution is that the mean and variance are equal. However, in social science research, it is very common for the variance to be greater than the mean (i.e., the data are overdispersed). This study embeds the Rasch model within an…
Additive Intensity Regression Models in Corporate Default Analysis
Lando, David; Medhat, Mamdouh; Nielsen, Mads Stenbo
2013-01-01
We consider additive intensity (Aalen) models as an alternative to the multiplicative intensity (Cox) models for analyzing the default risk of a sample of rated, nonfinancial U.S. firms. The setting allows for estimating and testing the significance of time-varying effects. We use a variety of mo...
Monthly to seasonal low flow prediction: statistical versus dynamical models
Ionita-Scholz, Monica; Klein, Bastian; Meissner, Dennis; Rademacher, Silke
2016-04-01
the Alfred Wegener Institute a purely statistical scheme to generate streamflow forecasts for several months ahead. Instead of directly using teleconnection indices (e.g. NAO, AO) the idea is to identify regions with stable teleconnections between different global climate information (e.g. sea surface temperature, geopotential height etc.) and streamflow at different gauges relevant for inland waterway transport. So-called stability (correlation) maps are generated showing regions where streamflow and climate variable from previous months are significantly correlated in a 21 (31) years moving window. Finally, the optimal forecast model is established based on a multiple regression analysis of the stable predictors. We will present current results of the aforementioned approaches with focus on the River Rhine (being one of the world's most frequented waterways and the backbone of the European inland waterway network) and the Elbe River. Overall, our analysis reveals the existence of a valuable predictability of the low flows at monthly and seasonal time scales, a result that may be useful to water resources management. Given that all predictors used in the models are available at the end of each month, the forecast scheme can be used operationally to predict extreme events and to provide early warnings for upcoming low flows.
Using the classical linear regression model in analysis of the dependences of conveyor belt life
Miriam Andrejiová
2013-12-01
Full Text Available The paper deals with the classical linear regression model of the dependence of conveyor belt life on some selected parameters: thickness of paint layer, width and length of the belt, conveyor speed and quantity of transported material. The first part of the article is about regression model design, point and interval estimation of parameters, verification of statistical significance of the model, and about the parameters of the proposed regression model. The second part of the article deals with identification of influential and extreme values that can have an impact on estimation of regression model parameters. The third part focuses on assumptions of the classical regression model, i.e. on verification of independence assumptions, normality and homoscedasticity of residuals.
Siti Choirun Nisak
2016-06-01
Full Text Available Time series forecasting models can be used to predict phenomena that occur in nature. Generalized Space Time Autoregressive (GSTAR is one of time series model used to forecast the data consisting the elements of time and space. This model is limited to the stationary and non-seasonal data. Generalized Space Time Autoregressive Integrated Moving Average (GSTARIMA is GSTAR development model that accommodates the non-stationary and seasonal data. Ordinary Least Squares (OLS is method used to estimate parameter of GSTARIMA model. Estimation parameter of GSTARIMA model using OLS will not produce efficiently estimator if there is an error correlation between spaces. Ordinary Least Square (OLS assumes the variance-covariance matrix has a constant error ~(, but in fact, the observatory spaces are correlated so that variance-covariance matrix of the error is not constant. Therefore, Seemingly Unrelated Regression (SUR approach is used to accommodate the weakness of the OLS. SUR assumption is ~(, for estimating parameters GSTARIMA model. The method to estimate parameter of SUR is Generalized Least Square (GLS. Applications GSTARIMA-SUR models for rainfall data in the region Malang obtained GSTARIMA models ((1(1,12,36,(0,(1-SUR with determination coefficient generated with the average of 57.726%.
Wheeler, David C.; Calder, Catherine A.
2007-06-01
The realization in the statistical and geographical sciences that a relationship between an explanatory variable and a response variable in a linear regression model is not always constant across a study area has led to the development of regression models that allow for spatially varying coefficients. Two competing models of this type are geographically weighted regression (GWR) and Bayesian regression models with spatially varying coefficient processes (SVCP). In the application of these spatially varying coefficient models, marginal inference on the regression coefficient spatial processes is typically of primary interest. In light of this fact, there is a need to assess the validity of such marginal inferences, since these inferences may be misleading in the presence of explanatory variable collinearity. In this paper, we present the results of a simulation study designed to evaluate the sensitivity of the spatially varying coefficients in the competing models to various levels of collinearity. The simulation study results show that the Bayesian regression model produces more accurate inferences on the regression coefficients than does GWR. In addition, the Bayesian regression model is overall fairly robust in terms of marginal coefficient inference to moderate levels of collinearity, and degrades less substantially than GWR with strong collinearity.
Moment-bases estimation of smooth transition regression models with endogenous variables
W.D. Areosa (Waldyr Dutra); M.J. McAleer (Michael); M.C. Medeiros (Marcelo)
2008-01-01
textabstractNonlinear regression models have been widely used in practice for a variety of time series and cross-section datasets. For purposes of analyzing univariate and multivariate time series data, in particular, Smooth Transition Regression (STR) models have been shown to be very useful for re
MYRONP.ZALUCKI; MICHAELJ.FURLONG
2005-01-01
Long-term forecasts of pest pressure are central to the effective management of many agricultural insect pests. In the eastern cropping regions of Australia, serious infestations of Helicoverpa punctigera (Wallenglen) and H. armigera (Hübner)(Lepidoptera:Noctuidae) are experienced annually. Regression analyses of a long series of light-trap catches of adult moths were used to describe the seasonal dynamics of both species. The size of the spring generation in eastern cropping zones could be related to rainfall in putative source areas in inland Australia. Subsequent generations could be related to the abundance of various crops in agricultural areas, rainfall and the magnitude of the spring population peak. As rainfall figured prominently as a predictor variable, and can itself be predicted using the Southern Oscillation Index (SOI), trap catches were also related to this variable. The geographic distribution of each species was modelled in relation to climate and CLIMEX was used to predict temporal variation in abundance at given putative source sites in inland Australia using historical meteorological data. These predictions were then correlated with subsequent pest abundance data in a major cropping region. The regression-based and bioclimatic-based approaches to predicting pest abundance are compared and their utility in predicting and interpreting pest dynamics are discussed.
Covariance Functions and Random Regression Models in the ...
ARC-IRENE
modelled to account for heterogeneity of variance by AY. ... Results suggest that selection for CW could be effective and that RRM could be .... permanent environmental effects; and εij is the temporary environmental effect or measurement error. .... (1999), however, obtained correlations that were variable as low as 0.23 ...
[Analysis of seasonal fluctuations in the Lotka-Volterra model].
Lobanov, A I; Sarancha, D A; Starozhilova, T K
2002-01-01
A modification of the Lotka-Volterra model was proposed. The modification takes into account the factor of seasonal fluctuations in a "predator-prey" model. In this modification, interactions between species in summer are described by the Lotka-Volterra equations; in winter, individuals of both species extinct. This generalization makes the classic model unrough, which substantially extends the field of its application. The results of numerical simulation illustrate the statement formulated above.
Genomic Prediction of Genotype × Environment Interaction Kernel Regression Models.
Cuevas, Jaime; Crossa, José; Soberanis, Víctor; Pérez-Elizalde, Sergio; Pérez-Rodríguez, Paulino; Campos, Gustavo de Los; Montesinos-López, O A; Burgueño, Juan
2016-11-01
In genomic selection (GS), genotype × environment interaction (G × E) can be modeled by a marker × environment interaction (M × E). The G × E may be modeled through a linear kernel or a nonlinear (Gaussian) kernel. In this study, we propose using two nonlinear Gaussian kernels: the reproducing kernel Hilbert space with kernel averaging (RKHS KA) and the Gaussian kernel with the bandwidth estimated through an empirical Bayesian method (RKHS EB). We performed single-environment analyses and extended to account for G × E interaction (GBLUP-G × E, RKHS KA-G × E and RKHS EB-G × E) in wheat ( L.) and maize ( L.) data sets. For single-environment analyses of wheat and maize data sets, RKHS EB and RKHS KA had higher prediction accuracy than GBLUP for all environments. For the wheat data, the RKHS KA-G × E and RKHS EB-G × E models did show up to 60 to 68% superiority over the corresponding single environment for pairs of environments with positive correlations. For the wheat data set, the models with Gaussian kernels had accuracies up to 17% higher than that of GBLUP-G × E. For the maize data set, the prediction accuracy of RKHS EB-G × E and RKHS KA-G × E was, on average, 5 to 6% higher than that of GBLUP-G × E. The superiority of the Gaussian kernel models over the linear kernel is due to more flexible kernels that accounts for small, more complex marker main effects and marker-specific interaction effects.
Hao, Lingxin
2007-01-01
Quantile Regression, the first book of Hao and Naiman's two-book series, establishes the seldom recognized link between inequality studies and quantile regression models. Though separate methodological literature exists for each subject, the authors seek to explore the natural connections between this increasingly sought-after tool and research topics in the social sciences. Quantile regression as a method does not rely on assumptions as restrictive as those for the classical linear regression; though more traditional models such as least squares linear regression are more widely utilized, Hao
Linking Simple Economic Theory Models and the Cointegrated Vector AutoRegressive Model
Møller, Niels Framroze
This paper attempts to clarify the connection between simple economic theory models and the approach of the Cointegrated Vector-Auto-Regressive model (CVAR). By considering (stylized) examples of simple static equilibrium models, it is illustrated in detail, how the theoretical model and its stru....... Further fundamental extensions and advances to more sophisticated theory models, such as those related to dynamics and expectations (in the structural relations) are left for future papers......This paper attempts to clarify the connection between simple economic theory models and the approach of the Cointegrated Vector-Auto-Regressive model (CVAR). By considering (stylized) examples of simple static equilibrium models, it is illustrated in detail, how the theoretical model and its......, it is demonstrated how other controversial hypotheses such as Rational Expectations can be formulated directly as restrictions on the CVAR-parameters. A simple example of a "Neoclassical synthetic" AS-AD model is also formulated. Finally, the partial- general equilibrium distinction is related to the CVAR as well...
Asymptotic Normality of LS Estimate in Simple Linear EV Regression Model
Jixue LIU
2006-01-01
Though EV model is theoretically more appropriate for applications in which measurement errors exist, people are still more inclined to use the ordinary regression models and the traditional LS method owing to the difficulties of statistical inference and computation. So it is meaningful to study the performance of LS estimate in EV model.In this article we obtain general conditions guaranteeing the asymptotic normality of the estimates of regression coefficients in the linear EV model. It is noticeable that the result is in some way different from the corresponding result in the ordinary regression model.
Is equine colic seasonal? Novel application of a model based approach
Proudman Christopher J
2006-08-01
Full Text Available Abstract Background Colic is an important cause of mortality and morbidity in domesticated horses yet many questions about this condition remain to be answered. One such question is: does season have an effect on the occurrence of colic? Time-series analysis provides a rigorous statistical approach to this question but until now, to our knowledge, it has not been used in this context. Traditional time-series modelling approaches have limited applicability in the case of relatively rare diseases, such as specific types of equine colic. In this paper we present a modelling approach that respects the discrete nature of the count data and, using a regression model with a correlated latent variable and one with a linear trend, we explored the seasonality of specific types of colic occurring at a UK referral hospital between January 1995–December 2004. Results Six- and twelve-month cyclical patterns were identified for all colics, all medical colics, epiploic foramen entrapment (EFE, equine grass sickness (EGS, surgically treated and large colon displacement/torsion colic groups. A twelve-month cyclical pattern only was seen in the large colon impaction colic group. There was no evidence of any cyclical pattern in the pedunculated lipoma group. These results were consistent irrespective of whether we were using a model including latent correlation or trend. Problems were encountered in attempting to include both trend and latent serial dependence in models simultaneously; this is likely to be a consequence of a lack of power to separate these two effects in the presence of small counts, yet in reality the underlying physical effect is likely to be a combination of both. Conclusion The use of a regression model with either an autocorrelated latent variable or a linear trend has allowed us to establish formally a seasonal component to certain types of colic presented to a UK referral hospital over a 10 year period. These patterns appeared to coincide
2009-01-01
In this paper, we study the local asymptotic behavior of the regression spline estimator in the framework of marginal semiparametric model. Similarly to Zhu, Fung and He (2008), we give explicit expression for the asymptotic bias of regression spline estimator for nonparametric function f. Our results also show that the asymptotic bias of the regression spline estimator does not depend on the working covariance matrix, which distinguishes the regression splines from the smoothing splines and the seemingly unrelated kernel. To understand the local bias result of the regression spline estimator, we show that the regression spline estimator can be obtained iteratively by applying the standard weighted least squares regression spline estimator to pseudo-observations. At each iteration, the bias of the estimator is unchanged and only the variance is updated.
Ivanka Jerić
2011-11-01
Full Text Available Predicting antitumor activity of compounds using regression models trained on a small number of compounds with measured biological activity is an ill-posed inverse problem. Yet, it occurs very often within the academic community. To counteract, up to some extent, overfitting problems caused by a small training data, we propose to use consensus of six regression models for prediction of biological activity of virtual library of compounds. The QSAR descriptors of 22 compounds related to the opioid growth factor (OGF, Tyr-Gly-Gly-Phe-Met with known antitumor activity were used to train regression models: the feed-forward artificial neural network, the k-nearest neighbor, sparseness constrained linear regression, the linear and nonlinear (with polynomial and Gaussian kernel support vector machine. Regression models were applied on a virtual library of 429 compounds that resulted in six lists with candidate compounds ranked by predicted antitumor activity. The highly ranked candidate compounds were synthesized, characterized and tested for an antiproliferative activity. Some of prepared peptides showed more pronounced activity compared with the native OGF; however, they were less active than highly ranked compounds selected previously by the radial basis function support vector machine (RBF SVM regression model. The ill-posedness of the related inverse problem causes unstable behavior of trained regression models on test data. These results point to high complexity of prediction based on the regression models trained on a small data sample.
A Vector Auto Regression Model Applied to Real Estate Development Investment: A Statistic Analysis
Liu, Fengyun; Matsuno, Shuji; Malekian, Reza; Yu, Jin; Li, Zhixiong
2016-01-01
.... The above theoretical model is empirically evidenced with VAR (Vector Auto Regression) methodology. A panel VAR model shows that land leasing and real estate price appreciation positively affect local government general fiscal revenue...
Reduction of the curvature of a class of nonlinear regression models
吴翊; 易东云
2000-01-01
It is proved that the curvature of nonlinear model can be reduced to zero by increasing measured data for a class of nonlinear regression models. The result is important to actual problem and has obtained satisfying effect on data fusing.
Zhao, Na; Yue, Tianxiang; Zhou, Xun; Zhao, Mingwei; Liu, Yu; Du, Zhengping; Zhang, Lili
2017-07-01
Downscaling precipitation is required in local scale climate impact studies. In this paper, a statistical downscaling scheme was presented with a combination of geographically weighted regression (GWR) model and a recently developed method, high accuracy surface modeling method (HASM). This proposed method was compared with another downscaling method using the Coupled Model Intercomparison Project Phase 5 (CMIP5) database and ground-based data from 732 stations across China for the period 1976-2005. The residual which was produced by GWR was modified by comparing different interpolators including HASM, Kriging, inverse distance weighted method (IDW), and Spline. The spatial downscaling from 1° to 1-km grids for period 1976-2005 and future scenarios was achieved by using the proposed downscaling method. The prediction accuracy was assessed at two separate validation sites throughout China and Jiangxi Province on both annual and seasonal scales, with the root mean square error (RMSE), mean relative error (MRE), and mean absolute error (MAE). The results indicate that the developed model in this study outperforms the method that builds transfer function using the gauge values. There is a large improvement in the results when using a residual correction with meteorological station observations. In comparison with other three classical interpolators, HASM shows better performance in modifying the residual produced by local regression method. The success of the developed technique lies in the effective use of the datasets and the modification process of the residual by using HASM. The results from the future climate scenarios show that precipitation exhibits overall increasing trend from T1 (2011-2040) to T2 (2041-2070) and T2 to T3 (2071-2100) in RCP2.6, RCP4.5, and RCP8.5 emission scenarios. The most significant increase occurs in RCP8.5 from T2 to T3, while the lowest increase is found in RCP2.6 from T2 to T3, increased by 47.11 and 2.12 mm, respectively.
Zhao, Na; Yue, Tianxiang; Zhou, Xun; Zhao, Mingwei; Liu, Yu; Du, Zhengping; Zhang, Lili
2016-03-01
Downscaling precipitation is required in local scale climate impact studies. In this paper, a statistical downscaling scheme was presented with a combination of geographically weighted regression (GWR) model and a recently developed method, high accuracy surface modeling method (HASM). This proposed method was compared with another downscaling method using the Coupled Model Intercomparison Project Phase 5 (CMIP5) database and ground-based data from 732 stations across China for the period 1976-2005. The residual which was produced by GWR was modified by comparing different interpolators including HASM, Kriging, inverse distance weighted method (IDW), and Spline. The spatial downscaling from 1° to 1-km grids for period 1976-2005 and future scenarios was achieved by using the proposed downscaling method. The prediction accuracy was assessed at two separate validation sites throughout China and Jiangxi Province on both annual and seasonal scales, with the root mean square error (RMSE), mean relative error (MRE), and mean absolute error (MAE). The results indicate that the developed model in this study outperforms the method that builds transfer function using the gauge values. There is a large improvement in the results when using a residual correction with meteorological station observations. In comparison with other three classical interpolators, HASM shows better performance in modifying the residual produced by local regression method. The success of the developed technique lies in the effective use of the datasets and the modification process of the residual by using HASM. The results from the future climate scenarios show that precipitation exhibits overall increasing trend from T1 (2011-2040) to T2 (2041-2070) and T2 to T3 (2071-2100) in RCP2.6, RCP4.5, and RCP8.5 emission scenarios. The most significant increase occurs in RCP8.5 from T2 to T3, while the lowest increase is found in RCP2.6 from T2 to T3, increased by 47.11 and 2.12 mm, respectively.
Kim, Yoojin; Kim, Ha-Rim; Choi, Yong-Sang; Kim, WonMoo; Kim, Hye-Sil
2016-11-01
Statistical seasonal prediction models for the Arctic sea ice concentration (SIC) were developed for the late summer (August-October) when the downward trend is dramatic. The absorbed solar radiation (ASR) at the top of the atmosphere in June has a significant seasonal leading role on the SIC. Based on the lagged ASR-SIC relationship, two simple statistical models were established: the Markovian stochastic and the linear regression models. Crossvalidated hindcasts of SIC from 1979 to 2014 by the two models were compared with each other and observation. The hindcasts showed general agreement between the models as they share a common predictor, ASR in June and the observed SIC was well reproduced, especially over the relatively thin-ice regions (of one- or multi-year sea ice). The robust predictability confirms the functional role of ASR in the prediction of SIC. In particular, the SIC prediction in October was quite promising probably due to the pronounced icealbedo feedback. The temporal correlation coefficients between the predicted SIC and the observed SIC were 0.79 and 0.82 by the Markovian and regression models, respectively. Small differences were observed between the two models; the regression model performed slightly better in August and September in terms of temporal correlation coefficients. Meanwhile, the prediction skills of the Markovian model in October were higher in the north of Chukchi, the East Siberian, and the Laptev Seas. A strong non-linear relationship between ASR in June and SIC in October in these areas would have increased the predictability of the Markovian model.
Multivariable Linear Regression Model for Promotional Forecasting:The Coca Cola - Morrisons Case
Zheng, Yiwei/Y
2009-01-01
This paper describes a promotional forecasting model, built by linear regression module in Microsoft Excel. It intends to provide quick and reliable forecasts with a moderate credit and to assist the CPFR between the Coca Cola Enterprises (CCE) and the Morrisons. The model is derived from previous researches and literature review on CPFR, promotion, forecasting and modelling. It is designed as a multivariable linear regression model, which involves several promotional mix as variables includi...
Hayes, Mark A; Cryan, Paul M; Wunder, Michael B
2015-01-01
Understanding seasonal distribution and movement patterns of animals that migrate long distances is an essential part of monitoring and conserving their populations. Compared to migratory birds and other more conspicuous migrants, we know very little about the movement patterns of many migratory bats. Hoary bats (Lasiurus cinereus), a cryptic, wide-ranging, long-distance migrant, comprise a substantial proportion of the tens to hundreds of thousands of bat fatalities estimated to occur each year at wind turbines in North America. We created seasonally-dynamic species distribution models (SDMs) from 2,753 museum occurrence records collected over five decades in North America to better understand the seasonal geographic distributions of hoary bats. We used 5 SDM approaches: logistic regression, multivariate adaptive regression splines, boosted regression trees, random forest, and maximum entropy and consolidated outputs to generate ensemble maps. These maps represent the first formal hypotheses for sex- and season-specific hoary bat distributions. Our results suggest that North American hoary bats winter in regions with relatively long growing seasons where temperatures are moderated by proximity to oceans, and then move to the continental interior for the summer. SDMs suggested that hoary bats are most broadly distributed in autumn-the season when they are most susceptible to mortality from wind turbines; this season contains the greatest overlap between potentially suitable habitat and wind energy facilities. Comparing wind-turbine fatality data to model outputs could test many predictions, such as 'risk from turbines is highest in habitats between hoary bat summering and wintering grounds'. Although future field studies are needed to validate the SDMs, this study generated well-justified and testable hypotheses of hoary bat migration patterns and seasonal distribution.
Hayes, Mark A.; Cryan, Paul M.; Wunder, Michael B.
2015-01-01
Understanding seasonal distribution and movement patterns of animals that migrate long distances is an essential part of monitoring and conserving their populations. Compared to migratory birds and other more conspicuous migrants, we know very little about the movement patterns of many migratory bats. Hoary bats (Lasiurus cinereus), a cryptic, wide-ranging, long-distance migrant, comprise a substantial proportion of the tens to hundreds of thousands of bat fatalities estimated to occur each year at wind turbines in North America. We created seasonally-dynamic species distribution models (SDMs) from 2,753 museum occurrence records collected over five decades in North America to better understand the seasonal geographic distributions of hoary bats. We used 5 SDM approaches: logistic regression, multivariate adaptive regression splines, boosted regression trees, random forest, and maximum entropy and consolidated outputs to generate ensemble maps. These maps represent the first formal hypotheses for sex- and season-specific hoary bat distributions. Our results suggest that North American hoary bats winter in regions with relatively long growing seasons where temperatures are moderated by proximity to oceans, and then move to the continental interior for the summer. SDMs suggested that hoary bats are most broadly distributed in autumn—the season when they are most susceptible to mortality from wind turbines; this season contains the greatest overlap between potentially suitable habitat and wind energy facilities. Comparing wind-turbine fatality data to model outputs could test many predictions, such as ‘risk from turbines is highest in habitats between hoary bat summering and wintering grounds’. Although future field studies are needed to validate the SDMs, this study generated well-justified and testable hypotheses of hoary bat migration patterns and seasonal distribution.
Comparative analysis of regression and artificial neural network models for wind speed prediction
Bilgili, Mehmet; Sahin, Besir
2010-11-01
In this study, wind speed was modeled by linear regression (LR), nonlinear regression (NLR) and artificial neural network (ANN) methods. A three-layer feedforward artificial neural network structure was constructed and a backpropagation algorithm was used for the training of ANNs. To get a successful simulation, firstly, the correlation coefficients between all of the meteorological variables (wind speed, ambient temperature, atmospheric pressure, relative humidity and rainfall) were calculated taking two variables in turn for each calculation. All independent variables were added to the simple regression model. Then, the method of stepwise multiple regression was applied for the selection of the “best” regression equation (model). Thus, the best independent variables were selected for the LR and NLR models and also used in the input layer of the ANN. The results obtained by all methods were compared to each other. Finally, the ANN method was found to provide better performance than the LR and NLR methods.
Forecasting seasonal influenza with a state-space SIR model.
Osthus, Dave; Hickmann, Kyle S; Caragea, Petruţa C; Higdon, Dave; Del Valle, Sara Y
2017-03-01
Seasonal influenza is a serious public health and societal problem due to its consequences resulting from absenteeism, hospitalizations, and deaths. The overall burden of influenza is captured by the Centers for Disease Control and Prevention's influenza-like illness network, which provides invaluable information about the current incidence. This information is used to provide decision support regarding prevention and response efforts. Despite the relatively rich surveillance data and the recurrent nature of seasonal influenza, forecasting the timing and intensity of seasonal influenza in the U.S. remains challenging because the form of the disease transmission process is uncertain, the disease dynamics are only partially observed, and the public health observations are noisy. Fitting a probabilistic state-space model motivated by a deterministic mathematical model [a susceptible-infectious-recovered (SIR) model] is a promising approach for forecasting seasonal influenza while simultaneously accounting for multiple sources of uncertainty. A significant finding of this work is the importance of thoughtfully specifying the prior, as results critically depend on its specification. Our conditionally specified prior allows us to exploit known relationships between latent SIR initial conditions and parameters and functions of surveillance data. We demonstrate advantages of our approach relative to alternatives via a forecasting comparison using several forecast accuracy metrics.
Prediction of the result in race walking using regularized regression models
Krzysztof Przednowek
2013-04-01
Full Text Available The following paper presents the use of regularized linear models as tools to optimize training process. The models were calculated by using data collected from race-walkers' training events. The models used predict the outcomes over a 3 km race and following a prescribed training plan. The material included a total of 122 training patterns made by 21 players. The methods of analysis include: classical model of OLS regression, ridge regression, LASSO regression and elastic net regression. In order to compare and choose the best method a cross-validation of the extit{leave-one-out} was used. All models were calculated using R language with additional packages. The best model was determined by the LASSO method which generates an error of about 26 seconds. The method has simplified the structure of the model by eliminating 5 out of 18 predictors.
Stratton, Margaret D.; Ehrlich, Hanna Y.; Mor, Siobhan M.; Naumova, Elena N.
2017-01-01
Ross River virus (RRV), Barmah Forest virus (BFV), and dengue are three common mosquito-borne diseases in Australia that display notable seasonal patterns. Although all three diseases have been modeled on localized scales, no previous study has used harmonic models to compare seasonality of mosquito-borne diseases on a continent-wide scale. We fit Poisson harmonic regression models to surveillance data on RRV, BFV, and dengue (from 1993, 1995 and 1991, respectively, through 2015) incorporating seasonal, trend, and climate (temperature and rainfall) parameters. The models captured an average of 50-65% variability of the data. Disease incidence for all three diseases generally peaked in January or February, but peak timing was most variable for dengue. The most significant predictor parameters were trend and inter-annual periodicity for BFV, intra-annual periodicity for RRV, and trend for dengue. We found that a Temperature Suitability Index (TSI), designed to reclassify climate data relative to optimal conditions for vector establishment, could be applied to this context. Finally, we extrapolated our models to estimate the impact of a false-positive BFV epidemic in 2013. Creating these models and comparing variations in periodicities may provide insight into historical outbreaks as well as future patterns of mosquito-borne diseases.
Annual and seasonal spatial models for nitrogen oxides in Tehran, Iran
Amini, Heresh; Taghavi-Shahri, Seyed-Mahmood; Henderson, Sarah B.; Hosseini, Vahid; Hassankhany, Hossein; Naderi, Maryam; Ahadi, Solmaz; Schindler, Christian; Künzli, Nino; Yunesian, Masud
2016-09-01
Very few land use regression (LUR) models have been developed for megacities in low- and middle-income countries, but such models are needed to facilitate epidemiologic research on air pollution. We developed annual and seasonal LUR models for ambient oxides of nitrogen (NO, NO2, and NOX) in the Middle Eastern city of Tehran, Iran, using 2010 data from 23 fixed monitoring stations. A novel systematic algorithm was developed for spatial modeling. The R2 values for the LUR models ranged from 0.69 to 0.78 for NO, 0.64 to 0.75 for NO2, and 0.61 to 0.79 for NOx. The most predictive variables were: distance to the traffic access control zone; distance to primary schools; green space; official areas; bridges; and slope. The annual average concentrations of all pollutants were high, approaching those reported for megacities in Asia. At 1000 randomly-selected locations the correlations between cooler and warmer season estimates were 0.64 for NO, 0.58 for NOX, and 0.30 for NO2. Seasonal differences in spatial patterns of pollution are likely driven by differences in source contributions and meteorology. These models provide a basis for understanding long-term exposures and chronic health effects of air pollution in Tehran, where such research has been limited.
Stratton, Margaret D.; Ehrlich, Hanna Y.; Mor, Siobhan M.; Naumova, Elena N.
2017-01-01
Ross River virus (RRV), Barmah Forest virus (BFV), and dengue are three common mosquito-borne diseases in Australia that display notable seasonal patterns. Although all three diseases have been modeled on localized scales, no previous study has used harmonic models to compare seasonality of mosquito-borne diseases on a continent-wide scale. We fit Poisson harmonic regression models to surveillance data on RRV, BFV, and dengue (from 1993, 1995 and 1991, respectively, through 2015) incorporating seasonal, trend, and climate (temperature and rainfall) parameters. The models captured an average of 50–65% variability of the data. Disease incidence for all three diseases generally peaked in January or February, but peak timing was most variable for dengue. The most significant predictor parameters were trend and inter-annual periodicity for BFV, intra-annual periodicity for RRV, and trend for dengue. We found that a Temperature Suitability Index (TSI), designed to reclassify climate data relative to optimal conditions for vector establishment, could be applied to this context. Finally, we extrapolated our models to estimate the impact of a false-positive BFV epidemic in 2013. Creating these models and comparing variations in periodicities may provide insight into historical outbreaks as well as future patterns of mosquito-borne diseases. PMID:28071683
Tan, Qihua; Bathum, L; Christiansen, L
2003-01-01
In this paper, we apply logistic regression models to measure genetic association with human survival for highly polymorphic and pleiotropic genes. By modelling genotype frequency as a function of age, we introduce a logistic regression model with polytomous responses to handle the polymorphic...... situation. Genotype and allele-based parameterization can be used to investigate the modes of gene action and to reduce the number of parameters, so that the power is increased while the amount of multiple testing minimized. A binomial logistic regression model with fractional polynomials is used to capture...
STATISTICAL INFERENCES FOR VARYING-COEFFICINT MODELS BASED ON LOCALLY WEIGHTED REGRESSION TECHNIQUE
梅长林; 张文修; 梁怡
2001-01-01
Some fundamental issues on statistical inferences relating to varying-coefficient regression models are addressed and studied. An exact testing procedure is proposed for checking the goodness of fit of a varying-coefficient model fired by the locally weighted regression technique versus an ordinary linear regression model. Also, an appropriate statistic for testing variation of model parameters over the locations where the observations are collected is constructed and a formal testing approach which is essential to exploring spatial non-stationarity in geography science is suggested.
Integrating Seasonal Oscillations into Basel II Behavioural Scoring Models
Goran Klepac
2007-09-01
Full Text Available The article introduces a new methodology of temporal influence measurement (seasonal oscillations, temporal patterns for behavioural scoring development purposes. The paper shows how significant temporal variables can be recognised and then integrated into the behavioural scoring models in order to improve model performance. Behavioural scoring models are integral parts of the Basel II standard on Internal Ratings-Based Approaches (IRB. The IRB approach much more precisely reflects individual risk bank profile.A solution of the problem of how to analyze and integrate macroeconomic and microeconomic factors represented in time series into behavioural scorecard models will be shown in the paper by using the REF II model.
Seasonal variance in P system models for metapopulations
Daniela Besozzi; Paolo Cazzaniga; Dario Pescini; Giancarlo Mauri
2007-01-01
Metapopulations are ecological models describing the interactions and the behavior of populations living in fragmented habitats. In this paper, metapopulations are modelled by means of dynamical probabilistic P systems, where additional structural features have been defined (e. g., a weighted graph associated with the membrane structure and the reduction of maximal parallelism). In particular, we investigate the influence of stochastic and periodic resource feeding processes, owing to seasonal variance, on emergent metapopulation dynamics.
Koon, Sharon; Petscher, Yaacov
2015-01-01
The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…
Aboveground biomass and carbon stocks modelling using non-linear regression model
Ain Mohd Zaki, Nurul; Abd Latif, Zulkiflee; Nazip Suratman, Mohd; Zainee Zainal, Mohd
2016-06-01
Aboveground biomass (AGB) is an important source of uncertainty in the carbon estimation for the tropical forest due to the variation biodiversity of species and the complex structure of tropical rain forest. Nevertheless, the tropical rainforest holds the most extensive forest in the world with the vast diversity of tree with layered canopies. With the usage of optical sensor integrate with empirical models is a common way to assess the AGB. Using the regression, the linkage between remote sensing and a biophysical parameter of the forest may be made. Therefore, this paper exemplifies the accuracy of non-linear regression equation of quadratic function to estimate the AGB and carbon stocks for the tropical lowland Dipterocarp forest of Ayer Hitam forest reserve, Selangor. The main aim of this investigation is to obtain the relationship between biophysical parameter field plots with the remotely-sensed data using nonlinear regression model. The result showed that there is a good relationship between crown projection area (CPA) and carbon stocks (CS) with Pearson Correlation (p < 0.01), the coefficient of correlation (r) is 0.671. The study concluded that the integration of Worldview-3 imagery with the canopy height model (CHM) raster based LiDAR were useful in order to quantify the AGB and carbon stocks for a larger sample area of the lowland Dipterocarp forest.
MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.
2005-01-01
Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.
MJO prediction using the sub-seasonal to seasonal forecast model of Beijing Climate Center
Liu, Xiangwen; Wu, Tongwen; Yang, Song; Li, Tim; Jie, Weihua; Zhang, Li; Wang, Zaizhi; Liang, Xiaoyun; Li, Qiaoping; Cheng, Yanjie; Ren, Hongli; Fang, Yongjie; Nie, Suping
2017-05-01
By conducting several sets of hindcast experiments using the Beijing Climate Center Climate System Model, which participates in the Sub-seasonal to Seasonal (S2S) Prediction Project, we systematically evaluate the model's capability in forecasting MJO and its main deficiencies. In the original S2S hindcast set, MJO forecast skill is about 16 days. Such a skill shows significant seasonal-to-interannual variations. It is found that the model-dependent MJO forecast skill is more correlated with the Indian Ocean Dipole (IOD) than with the El Niño-Southern Oscillation. The highest skill is achieved in autumn when the IOD attains its maturity. Extended skill is found when the IOD is in its positive phase. MJO forecast skill's close association with the IOD is partially due to the quickly strengthening relationship between MJO amplitude and IOD intensity as lead time increases to about 15 days, beyond which a rapid weakening of the relationship is shown. This relationship transition may cause the forecast skill to decrease quickly with lead time, and is related to the unrealistic amplitude and phase evolutions of predicted MJO over or near the equatorial Indian Ocean during anomalous IOD phases, suggesting a possible influence of exaggerated IOD variability in the model. The results imply that the upper limit of intraseasonal predictability is modulated by large-scale external forcing background state in the tropical Indian Ocean. Two additional sets of hindcast experiments with improved atmosphere and ocean initial conditions (referred to as S2S_IEXP1 and S2S_IEXP2, respectively) are carried out, and the results show that the overall MJO forecast skill is increased to 21-22 days. It is found that the optimization of initial sea surface temperature condition largely accounts for the increase of the overall MJO forecast skill, even though the improved initial atmosphere conditions also play a role. For the DYNAMO/CINDY field campaign period, the forecast skill increases
Combining an additive and tree-based regression model simultaneously: STIMA
Dusseldorp, E.; Conversano, C.; Os, B.J. van
2010-01-01
Additive models and tree-based regression models are two main classes of statistical models used to predict the scores on a continuous response variable. It is known that additive models become very complex in the presence of higher order interaction effects, whereas some tree-based models, such as
Rocconi, Louis M.
2013-01-01
This study examined the differing conclusions one may come to depending upon the type of analysis chosen, hierarchical linear modeling or ordinary least squares (OLS) regression. To illustrate this point, this study examined the influences of seniors' self-reported critical thinking abilities three ways: (1) an OLS regression with the student…
Thomas, Michael S. C.; Knowland, Victoria C. P.; Karmiloff-Smith, Annette
2011-01-01
Loss of previously established behaviors in early childhood constitutes a markedly atypical developmental trajectory. It is found almost uniquely in autism and its cause is currently unknown (Baird et al., 2008). We present an artificial neural network model of developmental regression, exploring the hypothesis that regression is caused by…
Thomas, Michael S. C.; Knowland, Victoria C. P.; Karmiloff-Smith, Annette
2011-01-01
Loss of previously established behaviors in early childhood constitutes a markedly atypical developmental trajectory. It is found almost uniquely in autism and its cause is currently unknown (Baird et al., 2008). We present an artificial neural network model of developmental regression, exploring the hypothesis that regression is caused by…
A seasonal model of the Mediterranean Sea general circulation
Roussenov, Vassil; Stanev, Emil; Artale, Vincenzo; Pinardi, Nadia
1995-07-01
This paper describes the seasonal characteristics of the Mediterranean Sea general circulation as simulated by a primitive equation general circulation model. The forcing is composed of climatological monthly mean atmospheric parameters, which are used to compute the heat and momentum budgets at the air-sea interface of the model. This allows heat fluxes to be determined by a realistic air-sea interaction physics. The Strait of Gibraltar is open, and the model resolution is ? in the horizontal and 19 levels in the vertical. The results show the large seasonal cycle of the circulation and its transient characteristics. The heat budget at the surface is characterized by lateral boundary intensifications occurring in downwelling and up welling areas of the basin. The general circulation is composed of subbasin gyres, and cyclonic motion dominates the northern and anticyclonic motion the southern part of the basin. The Atlantic stream which enters from Gibraltar and assumes the form of different boundary current subsystems is a coherent structure at the surface. At depth it appears as current segments and jets around a vigorous gyre system. The seasonal variability is manifested not only by a change in amplitude and location of the gyres but also by the appearance of seasonally recurrent gyres in different parts of the basin. Distinct westward propagation of these gyres occurs, together with amplitude changes. For the first time a Mersa-Matruh Gyre is successfully simulated due to the introduction of our heat fluxes at the air-sea interface. The seasonal thermocline is formed each summer, and a deep winter mixed layer is produced in the region of Levantine intermediate water formation. Deep water renewal does not occur, probably due to the climatological forcing used.
CONFIDENCE REGIONS IN TERMS OF STATISTICAL CURVATURE FOR AR(q) NONLINEAR REGRESSION MODELS
刘应安; 韦博成
2004-01-01
This paper constructs a set of confidence regions of parameters in terms of statistical curvatures for AR(q) nonlinear regression models. The geometric frameworks are proposed for the model. Then several confidence regions for parameters and parameter subsets in terms of statistical curvatures are given based on the likelihood ratio statistics and score statistics. Several previous results, such as [1] and [2] are extended to AR(q)nonlinear regression models.
Stochastic effects in a seasonally forced epidemic model
Rozhnova, G.; Nunes, A.
2010-10-01
The interplay of seasonality, the system’s nonlinearities and intrinsic stochasticity, is studied for a seasonally forced susceptible-exposed-infective-recovered stochastic model. The model is explored in the parameter region that corresponds to childhood infectious diseases such as measles. The power spectrum of the stochastic fluctuations around the attractors of the deterministic system that describes the model in the thermodynamic limit is computed analytically and validated by stochastic simulations for large system sizes. Size effects are studied through additional simulations. Other effects such as switching between coexisting attractors induced by stochasticity often mentioned in the literature as playing an important role in the dynamics of childhood infectious diseases are also investigated. The main conclusion is that stochastic amplification, rather than these effects, is the key ingredient to understand the observed incidence patterns.
Stochastic effects in a seasonally forced epidemic model
Rozhnova, Ganna
2010-01-01
The interplay of seasonality, the system's nonlinearities and intrinsic stochasticity is studied for a seasonally forced susceptible-exposed-infective-recovered stochastic model. The model is explored in the parameter region that corresponds to childhood infectious diseases such as measles. The power spectrum of the stochastic fluctuations around the attractors of the deterministic system that describes the model in the thermodynamic limit is computed analytically and validated by stochastic simulations for large system sizes. Size effects are studied through additional simulations. Other effects such as switching between coexisting attractors induced by stochasticity often mentioned in the literature as playing an important role in the dynamics of childhood infectious diseases are also investigated. The main conclusion is that stochastic amplification, rather than these effects, is the key ingredient to understand the observed incidence patterns.
Seasonal forecasting and health impact models: challenges and opportunities.
Ballester, Joan; Lowe, Rachel; Diggle, Peter J; Rodó, Xavier
2016-10-01
After several decades of intensive research, steady improvements in understanding and modeling the climate system have led to the development of the first generation of operational health early warning systems in the era of climate services. These schemes are based on collaborations across scientific disciplines, bringing together real-time climate and health data collection, state-of-the-art seasonal climate predictions, epidemiological impact models based on historical data, and an understanding of end user and stakeholder needs. In this review, we discuss the challenges and opportunities of this complex, multidisciplinary collaboration, with a focus on the factors limiting seasonal forecasting as a source of predictability for climate impact models. © 2016 New York Academy of Sciences.
Soldić-Aleksić Jasna
2009-01-01
Full Text Available Market segmentation presents one of the key concepts of the modern marketing. The main goal of market segmentation is focused on creating groups (segments of customers that have similar characteristics, needs, wishes and/or similar behavior regarding the purchase of concrete product/service. Companies can create specific marketing plan for each of these segments and therefore gain short or long term competitive advantage on the market. Depending on the concrete marketing goal, different segmentation schemes and techniques may be applied. This paper presents a predictive market segmentation model based on the application of logistic regression model and CHAID analysis. The logistic regression model was used for the purpose of variables selection (from the initial pool of eleven variables which are statistically significant for explaining the dependent variable. Selected variables were afterwards included in the CHAID procedure that generated the predictive market segmentation model. The model results are presented on the concrete empirical example in the following form: summary model results, CHAID tree, Gain chart, Index chart, risk and classification tables.
Martino, K G; Marks, B P
2007-12-01
Two different microbial modeling procedures were compared and validated against independent data for Listeria monocytogenes growth. The most generally used method is two consecutive regressions: growth parameters are estimated from a primary regression of microbial counts, and a secondary regression relates the growth parameters to experimental conditions. A global regression is an alternative method in which the primary and secondary models are combined, giving a direct relationship between experimental factors and microbial counts. The Gompertz equation was the primary model, and a response surface model was the secondary model. Independent data from meat and poultry products were used to validate the modeling procedures. The global regression yielded the lower standard errors of calibration, 0.95 log CFU/ml for aerobic and 1.21 log CFU/ml for anaerobic conditions. The two-step procedure yielded errors of 1.35 log CFU/ml for aerobic and 1.62 log CFU/ ml for anaerobic conditions. For food products, the global regression was more robust than the two-step procedure for 65% of the cases studied. The robustness index for the global regression ranged from 0.27 (performed better than expected) to 2.60. For the two-step method, the robustness index ranged from 0.42 to 3.88. The predictions were overestimated (fail safe) in more than 50% of the cases using the global regression and in more than 70% of the cases using the two-step regression. Overall, the global regression performed better than the two-step procedure for this specific application.
Antretter, Elfi; Dunkel, Dirk; Osvath, Peter; Voros, Viktor; Fekete, Sandor; Haring, Christian
2006-06-01
The prospective investigation of repetitive nonfatal suicidal behavior is associated with two methodological problems. Due to the commonly used definitions of nonfatal suicidal behavior, clinical samples usually consist of patients with a considerable between-person variability. Second, repeated nonfatal suicidal episodes of the same subjects are likely to be correlated. We examined three regression techniques to comparatively evaluate their efficiency in addressing the given methodological problems. Repeated episodes of nonfatal suicidal behavior were assessed in two independent patient samples during a 2-year follow-up period. The first regression design modeled repetitive nonfatal suicidal behavior as a summary measure. The second regression model treated repeated episodes of the same subject as independent events. The third regression model represented a hierarchical linear model. The estimated mean effects of the first model were likely to be nonrepresentative for a considerable part of the study subjects. The second regression design overemphasized the impact of the predictor variables. The hierarchical linear model most appropriately accounted for the heterogeneity of the samples and the correlated data structure. The nonhierarchical regression designs did not provide appropriate statistical models for the prospective investigation of repetitive nonfatal suicidal behavior. Multilevel modeling provides a convenient alternative.
Jaime Araújo Cobuci
2011-03-01
Full Text Available Records of test-day milk yields of the first three lactations of 25,500 Holstein cows were used to estimate genetic parameters for milk yield by using two alternatives of definition of fixed regression of the random regression models (RRM. Legendre polynomials of fourth and fifth orders were used to model regression of fixed curve (defined based on averages of the populations or multiple sub-populations formed by grouping animals which calved at the same age and in the same season of the year or random lactation curves (additive genetic and permanent enviroment. Akaike information criterion (AIC and Bayesian information criterion (BIC indicated that the models which used multiple regression of fixed lactation curves of lactation multiple regression model with fixed lactation curves had the best fit for the first lactation test-day milk yields and the models which used a single regression of fixed curve had the best fit for the second and third lactations. Heritability for milk yield during lactation estimates did not vary among models but ranged from 0.22 to 0.34, from 0.11 to 0.21, and from 0.10 to 0.20, respectively, in the first three lactations. Similarly to heridability estimates of genetic correlations did not vary among models. The use of single or multiple fixed regressions for fixed lactation curves by RRM does not influence the estimates of genetic parameters for test-day milk yield across lactations.Os registros de produção de leite no dia do controle das três primeiras lactações de 25,5 mil vacas da raça Holandesa foram utilizados para estimar parâmetros genéticos para produção de leite usando duas alternativas de definição da regressão fixa dos modelos de regressão aleatória (MRA. Os polinômios de Legendre de ordens 4 e 5 foram usados para modelar as regressões das curvas fixas (definidas com base nas médias das produções de leite no dia do controle da população ou de múltiplas sub-populações formadas pelo
Mathematical Model of Seasonal Influenza with Treatment in Constant Population
Kharis, M.; Arifudin, R.
2017-04-01
Seasonal Influenza is one of disease that outbreaks periodically at least once every year. This disease caused many people hospitalized. Many hospitalized people as employers would infect production quantities, distribution time, and some economic aspects. It will infect economic growth. Infected people need treatments to reduce infection period and cure the infection. In this paper, we discussed about a mathematical model of seasonal influenza with treatment. Factually, the disease was held in short period, less than one year. Hence, we can assume that the population is constant at the disease outbreak time. In this paper, we analyzed the existence of the equilibrium points of the model and their stability. We also give some simulation to give a geometric image about the results of the analysis process.
An Inventory Model for Special Display Goods with Seasonal Demand
Kawakatsu, Hidefumi
2010-10-01
The present study discusses the retailer's optimal replenishment policy for seasonal products. The demand rate of seasonal merchandise such as clothes, sporting goods, children's toys and electrical home appearances tends to decrease with time after reaching its maximum value. In this study, we focus on "Special Display Goods", which are heaped up in end displays or special areas at retail stores. They are sold at a fast velocity when their quantity displayed is large, but are sold at a low velocity if the quantity becomes small. We develop the model with a finite time horizon (selling period) to determine the optimal replenishment policy, which maximizes the retailer's total profit. Numerical examples are presented to illustrate the theoretical underpinnings of the proposed model.
Regression modeling of streamflow, baseflow, and runoff using geographic information systems.
Zhu, Yuanhong; Day, Rick L
2009-02-01
Regression models for predicting total streamflow (TSF), baseflow (TBF), and storm runoff (TRO) are needed for water resource planning and management. This study used 54 streams with >20 years of streamflow gaging station records during the period October 1971 to September 2001 in Pennsylvania and partitioned TSF into TBF and TRO. TBF was considered a surrogate of groundwater recharge for basins. Regression models for predicting basin-wide TSF, TBF, and TRO were developed under three scenarios that varied in regression variables used for model development. Regression variables representing basin geomorphological, geological, soil, and climatic characteristics were estimated using geographic information systems. All regression models for TSF, TBF, and TRO had R(2) values >0.94 and reasonable prediction errors. The two best TSF models developed under scenarios 1 and 2 had similar absolute prediction errors. The same was true for the two best TBF models. Therefore, any one of the two best TSF and TBF models could be used for respective flow prediction depending on variable availability. The TRO model developed under scenario 1 had smaller absolute prediction errors than that developed under scenario 2. Simplified Area-alone models developed under scenario 3 might be used when variables for using best models are not available, but had lower R(2) values and higher or more variable prediction errors than the best models.
Procedures for adjusting regional regression models of urban-runoff quality using local data
Hoos, A.B.; Sisolak, J.K.
1993-01-01
Statistical operations termed model-adjustment procedures (MAP?s) can be used to incorporate local data into existing regression models to improve the prediction of urban-runoff quality. Each MAP is a form of regression analysis in which the local data base is used as a calibration data set. Regression coefficients are determined from the local data base, and the resulting `adjusted? regression models can then be used to predict storm-runoff quality at unmonitored sites. The response variable in the regression analyses is the observed load or mean concentration of a constituent in storm runoff for a single storm. The set of explanatory variables used in the regression analyses is different for each MAP, but always includes the predicted value of load or mean concentration from a regional regression model. The four MAP?s examined in this study were: single-factor regression against the regional model prediction, P, (termed MAP-lF-P), regression against P,, (termed MAP-R-P), regression against P, and additional local variables (termed MAP-R-P+nV), and a weighted combination of P, and a local-regression prediction (termed MAP-W). The procedures were tested by means of split-sample analysis, using data from three cities included in the Nationwide Urban Runoff Program: Denver, Colorado; Bellevue, Washington; and Knoxville, Tennessee. The MAP that provided the greatest predictive accuracy for the verification data set differed among the three test data bases and among model types (MAP-W for Denver and Knoxville, MAP-lF-P and MAP-R-P for Bellevue load models, and MAP-R-P+nV for Bellevue concentration models) and, in many cases, was not clearly indicated by the values of standard error of estimate for the calibration data set. A scheme to guide MAP selection, based on exploratory data analysis of the calibration data set, is presented and tested. The MAP?s were tested for sensitivity to the size of a calibration data set. As expected, predictive accuracy of all MAP?s for
Comparison of land-use regression models between Great Britain and the Netherlands.
Vienneau, D.; de Hoogh, K.; Beelen, R.M.J.; Fischer, P.; Hoek, G.; Briggs, D.
2010-01-01
Land-use regression models have increasingly been applied for air pollution mapping at typically the city level. Though models generally predict spatial variability well, the structure of models differs widely between studies. The observed differences in the models may be due to artefacts of data an
U.S. Geological Survey, Department of the Interior — This dataset was created using the PRISM (Parameter-elevation Regressions on Independent Slopes Model) climate mapping system, developed by Dr. Christopher Daly,...
Rank Set Sampling in Improving the Estimates of Simple Regression Model
M Iqbal Jeelani
2015-04-01
Full Text Available In this paper Rank set sampling (RSS is introduced with a view of increasing the efficiency of estimates of Simple regression model. Regression model is considered with respect to samples taken from sampling techniques like Simple random sampling (SRS, Systematic sampling (SYS and Rank set sampling (RSS. It is found that R2 and Adj R2 obtained from regression model based on Rank set sample is higher than rest of two sampling schemes. Similarly Root mean square error, p-values, coefficient of variation are much lower in Rank set based regression model, also under validation technique (Jackknifing there is consistency in the measure of R2, Adj R2 and RMSE in case of RSS as compared to SRS and SYS. Results are supported with an empirical study involving a real data set generated of Pinus Wallichiana taken from block Langate of district Kupwara.
Tao Hu; Heng-jian Cui; Xing-wei Tong
2009-01-01
This article considers a semiparametric varying-coefficient partially linear regression model with current status data. The semiparametric varying-coefficient partially linear regression model which is a gen-eralization of the partially linear regression model and varying-coefficient regression model that allows one to explore the possibly nonlinear effect of a certain covariate on the response variable. A Sieve maximum likelihood estimation method is proposed and the asymptotic properties of the proposed estimators are discussed. Under some mild conditions, the estimators are shown to be strongly consistent. The convergence rate of the estima-tor for the unknown smooth function is obtained and the estimator for the unknown parameter is shown to be asymptotically efficient and normally distributed. Simulation studies are conducted to examine the small-sample properties of the proposed estimates and a real dataset is used to illustrate our approach.
Population-based estimates of pesticide intake are needed to characterize exposure for particular demographic groups based on their dietary behaviors. Regression modeling performed on measurements of selected pesticides in composited duplicate diet samples allowed (1) estimation ...
Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition
Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on ...
Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition
Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on ...
Drzewiecki, Wojciech
2016-12-01
In this work nine non-linear regression models were compared for sub-pixel impervious surface area mapping from Landsat images. The comparison was done in three study areas both for accuracy of imperviousness coverage evaluation in individual points in time and accuracy of imperviousness change assessment. The performance of individual machine learning algorithms (Cubist, Random Forest, stochastic gradient boosting of regression trees, k-nearest neighbors regression, random k-nearest neighbors regression, Multivariate Adaptive Regression Splines, averaged neural networks, and support vector machines with polynomial and radial kernels) was also compared with the performance of heterogeneous model ensembles constructed from the best models trained using particular techniques. The results proved that in case of sub-pixel evaluation the most accurate prediction of change may not necessarily be based on the most accurate individual assessments. When single methods are considered, based on obtained results Cubist algorithm may be advised for Landsat based mapping of imperviousness for single dates. However, Random Forest may be endorsed when the most reliable evaluation of imperviousness change is the primary goal. It gave lower accuracies for individual assessments, but better prediction of change due to more correlated errors of individual predictions. Heterogeneous model ensembles performed for individual time points assessments at least as well as the best individual models. In case of imperviousness change assessment the ensembles always outperformed single model approaches. It means that it is possible to improve the accuracy of sub-pixel imperviousness change assessment using ensembles of heterogeneous non-linear regression models.
Madarang, Krish J; Kang, Joo-Hyon
2014-06-01
Stormwater runoff has been identified as a source of pollution for the environment, especially for receiving waters. In order to quantify and manage the impacts of stormwater runoff on the environment, predictive models and mathematical models have been developed. Predictive tools such as regression models have been widely used to predict stormwater discharge characteristics. Storm event characteristics, such as antecedent dry days (ADD), have been related to response variables, such as pollutant loads and concentrations. However it has been a controversial issue among many studies to consider ADD as an important variable in predicting stormwater discharge characteristics. In this study, we examined the accuracy of general linear regression models in predicting discharge characteristics of roadway runoff. A total of 17 storm events were monitored in two highway segments, located in Gwangju, Korea. Data from the monitoring were used to calibrate United States Environmental Protection Agency's Storm Water Management Model (SWMM). The calibrated SWMM was simulated for 55 storm events, and the results of total suspended solid (TSS) discharge loads and event mean concentrations (EMC) were extracted. From these data, linear regression models were developed. R(2) and p-values of the regression of ADD for both TSS loads and EMCs were investigated. Results showed that pollutant loads were better predicted than pollutant EMC in the multiple regression models. Regression may not provide the true effect of site-specific characteristics, due to uncertainty in the data.
Huiliang, Wang; Zening, Wu; Caihong, Hu; Xinzhong, Du
2015-09-01
Nonpoint source (NPS) pollution is considered as the main reason for water quality deterioration; thus, to quantify the NPS loads reliably is the key to implement watershed management practices. In this study, water quality and NPS loads from a watershed with limited data availability were studied in a mountainous area in China. Instantaneous water discharge was measured through the velocity-area method, and samples were taken for water quality analysis in both flood and nonflood days in 2010. The streamflow simulated by Hydrological Simulation Program-Fortran (HSPF) from 1995 to 2013 and a regression model were used to estimate total annual loads of various water quality parameters. The concentrations of total phosphorus (TP) and total nitrogen (TN) were much higher during the flood seasons, but the concentrations of ammonia nitrogen (NH3-N) and nitrate nitrogen (NO3-N) were lower during the flood seasons. Nevertheless, only TP concentration was positively correlated with the flow rate. The fluctuation of annual load from this watershed was significant. Statistical results indicated the significant contribution of pollutant fluxes during flood seasons to annual fluxes. The loads of TP, TN, NH3-N, and NO3-N in the flood seasons were accounted for 58-85, 60-82, 63-88, 64-81% of the total annual loads, respectively. This study presented a new method for estimation of the water and NPS loads in the watershed with limited data availability, which simplified data collection to watershed model and overcame the scale problem of field experiment method.
Random regression models using different functions to model milk flow in dairy cows.
Laureano, M M M; Bignardi, A B; El Faro, L; Cardoso, V L; Tonhati, H; Albuquerque, L G
2014-09-12
We analyzed 75,555 test-day milk flow records from 2175 primiparous Holstein cows that calved between 1997 and 2005. Milk flow was obtained by dividing the mean milk yield (kg) of the 3 daily milking by the total milking time (min) and was expressed as kg/min. Milk flow was grouped into 43 weekly classes. The analyses were performed using a single-trait Random Regression Models that included direct additive genetic, permanent environmental, and residual random effects. In addition, the contemporary group and linear and quadratic effects of cow age at calving were included as fixed effects. Fourth-order orthogonal Legendre polynomial of days in milk was used to model the mean trend in milk flow. The additive genetic and permanent environmental covariance functions were estimated using random regression Legendre polynomials and B-spline functions of days in milk. The model using a third-order Legendre polynomial for additive genetic effects and a sixth-order polynomial for permanent environmental effects, which contained 7 residual classes, proved to be the most adequate to describe variations in milk flow, and was also the most parsimonious. The heritability in milk flow estimated by the most parsimonious model was of moderate to high magnitude.
Effects of seasonal growth on delayed prey-predator model
Gakkhar, Sunita [Department of Mathematics, IIT Roorkee, Roorkee 247667 (India)], E-mail: sungkfma@iitr.ernet.in; Sahani, Saroj Kumar [Department of Mathematics, IIT Roorkee, Roorkee 247667 (India)], E-mail: sarojdma@iitr.ernet.in; Negi, Kuldeep [Department of Mathematics, IIT Roorkee, Roorkee 247667 (India)], E-mail: negikdma@iitr.ernet.in
2009-01-15
The dynamic behavior of a delayed predator-prey system with Holling II functional response is investigated. The stability analysis has been carried out and existence of Hopf bifurcation has been established. The complex dynamic behavior due to time delay has been explored. The effects of seasonal growth on the complex dynamics have been simulated. The model shows a rich variety of behavior, including period doubling, quasi-periodicity, chaos, transient chaos, and windows of periodicity.
Two Strain Dengue Model with Temporary Cross Immunity and Seasonality
Aguiar, Maíra; Ballesteros, Sebastien; Stollenwerk, Nico
2010-09-01
Models on dengue fever epidemiology have previously shown critical fluctuations with power law distributions and also deterministic chaos in some parameter regions due to the multi-strain structure of the disease pathogen. In our first model including well known biological features, we found a rich dynamical structure including limit cycles, symmetry breaking bifurcations, torus bifurcations, coexisting attractors including isola solutions and deterministic chaos (as indicated by positive Lyapunov exponents) in a much larger parameter region, which is also biologically more plausible than the previous results of other researches. Based on these findings we will investigate the model structures further including seasonality.
Modelling QTL effect on BTA06 using random regression test day models.
Suchocki, T; Szyda, J; Zhang, Q
2013-02-01
In statistical models, a quantitative trait locus (QTL) effect has been incorporated either as a fixed or as a random term, but, up to now, it has been mainly considered as a time-independent variable. However, for traits recorded repeatedly, it is very interesting to investigate the variation of QTL over time. The major goal of this study was to estimate the position and effect of QTL for milk, fat, protein yields and for somatic cell score based on test day records, while testing whether the effects are constant or variable throughout lactation. The analysed data consisted of 23 paternal half-sib families (716 daughters of 23 sires) of Chinese Holstein-Friesian cattle genotyped at 14 microsatellites located in the area of the casein loci on BTA6. A sequence of three models was used: (i) a lactation model, (ii) a random regression model with a QTL constant in time and (iii) a random regression model with a QTL variable in time. The results showed that, for each production trait, at least one significant QTL exists. For milk and protein yields, the QTL effect was variable in time, while for fat yield, each of the three models resulted in a significant QTL effect. When a QTL is incorporated into a model as a constant over time, its effect is averaged over lactation stages and may, thereby, be difficult or even impossible to be detected. Our results showed that, in such a situation, only a longitudinal model is able to identify loci significantly influencing trait variation.
无
2007-01-01
To study the sensitivity of inter-subspecific hybrid rice to climatic conditions, the spikelet fertilized rate (SFR) of four types of rice including indica-japonica hybrid, intermediate hybrid, indica and japonica were analyzed during 2000-2004. The inter-subspecific hybrids showed lower SFR, and much higher fluctuation under various climatic conditions than indica and japonica rice, showing the inter-subspecific hybrids were sensitive to ecological conditions. Among 12 climatic factors, the key factor affecting rice SFR was temperature, with the most significant factor being the average temperature of the seven days around panicle flowering (T7). A regressive equation of SFR-temperature by T7, and a comprehensive synthetic model by four important temperature indices were put forward. The optimum temperature for inter-subspecific hybrids was estimated to be 26.1-26.6 ℃, and lower limit of safe temperature to be 22.5-23.3 ℃ for panicle flowering, showing higher by averagely 0.5℃ and 1.7℃, respectively, to be compared with indica and japonica rice. This suggested that inter-subspecific hybrids require proper climatic conditions. During panicle flowering, the suitable daily average temperature was 23.3-29.0 ℃, with the fittest one at 26.1-26.6 ℃. For an application example, optimum heading season for inter-subspecific hybrids in key rice growing areas in China was as same as common pure lines, while inferior limit for safe date of heading was about a ten-day period earlier than those of common pure lines.
The empirical likelihood goodness-of-fit test for regression model
Li-xing ZHU; Yong-song QIN; Wang-li XU
2007-01-01
Goodness-of-fit test for regression modes has received much attention in literature. In this paper, empirical likelihood (EL) goodness-of-fit tests for regression models including classical parametric and autoregressive (AR) time series models are proposed. Unlike the existing locally smoothing and globally smoothing methodologies, the new method has the advantage that the tests are self-scale invariant and that the asymptotic null distribution is chi-squared. Simulations are carried out to illustrate the methodology.
On asymptotics of t-type regression estimation in multiple linear model
无
2004-01-01
We consider a robust estimator (t-type regression estimator) of multiple linear regression model by maximizing marginal likelihood of a scaled t-type error t-distribution.The marginal likelihood can also be applied to the de-correlated response when the withinsubject correlation can be consistently estimated from an initial estimate of the model based on the independent working assumption. This paper shows that such a t-type estimator is consistent.
Climate changes and their effects in the public health: use of poisson regression models
Jonas Bodini Alonso
2010-08-01
Full Text Available In this paper, we analyze the daily number of hospitalizations in São Paulo City, Brazil, in the period of January 01, 2002 to December 31, 2005. This data set relates to pneumonia, coronary ischemic diseases, diabetes and chronic diseases in different age categories. In order to verify the effect of climate changes the following covariates are considered: atmosphere pressure, air humidity, temperature, year season and also a covariate related to the week day when the hospitalization occurred. The possible effects of the assumed covariates in the number of hospitalization are studied using a Poisson regression model in the presence or not of a random effect which captures the possible correlation among the hospitalization accounting for the different age categories in the same day and the extra-Poisson variability for the longitudinal data. The inferences of interest are obtained using the Bayesian paradigm and MCMC (Markov chain Monte Carlo methods.Neste artigo, analisamos os dados relativos aos números diários de hospitalizações na cidade de São Paulo, Brasil no período de 01/01/2002 a 31/12/2005 devido a pneumonia, doenças isquêmicas, diabetes e doenças crônicas e de acordo com a faixa etária. Com o objetivo de estudar o efeito de mudanças climáticas são consideradas algumas covariáveis climáticas os índices diários de pressão atmosférica, umidade do ar, temperatura e estação do ano, e uma covariável relacionada ao dia da semana da ocorrência de hospitalização. Para verificar os efeitos das covariáveis nas respostas dadas pelo numero de hospitalizações, consideramos um modelo de regressão de Poisson na presença ou não de um efeito aleatório que captura a possível correlação entre as contagens para as faixas etárias de um mesmo dia e a variabilidade extra-poisson para os dados longitudinais. As inferências de interesse são obtidas usando o paradigma bayesiano e métodos de simulação MCMC (Monte Carlo
Williams-Sether, Tara; Gross, Tara A.
2016-02-09
Seasonal mean daily flow data from 119 U.S. Geological Survey streamflow-gaging stations in North Dakota; the surrounding states of Montana, Minnesota, and South Dakota; and the Canadian provinces of Manitoba and Saskatchewan with 10 or more years of unregulated flow record were used to develop regression equations for flow duration, n-day high flow and n-day low flow using ordinary least-squares and Tobit regression techniques. Regression equations were developed for seasonal flow durations at the 10th, 25th, 50th, 75th, and 90th percent exceedances; the 1-, 7-, and 30-day seasonal mean high flows for the 10-, 25-, and 50-year recurrence intervals; and the 1-, 7-, and 30-day seasonal mean low flows for the 2-, 5-, and 10-year recurrence intervals. Basin and climatic characteristics determined to be significant explanatory variables in one or more regression equations included drainage area, percentage of basin drainage area that drains to isolated lakes and ponds, ruggedness number, stream length, basin compactness ratio, minimum basin elevation, precipitation, slope ratio, stream slope, and soil permeability. The adjusted coefficient of determination for the n-day high-flow regression equations ranged from 55.87 to 94.53 percent. The Chi2 values for the duration regression equations ranged from 13.49 to 117.94, whereas the Chi2 values for the n-day low-flow regression equations ranged from 4.20 to 49.68.
Developing and testing a global-scale regression model to quantify mean annual streamflow
Barbarossa, Valerio; Huijbregts, Mark A. J.; Hendriks, A. Jan; Beusen, Arthur H. W.; Clavreul, Julie; King, Henry; Schipper, Aafke M.
2017-01-01
Quantifying mean annual flow of rivers (MAF) at ungauged sites is essential for assessments of global water supply, ecosystem integrity and water footprints. MAF can be quantified with spatially explicit process-based models, which might be overly time-consuming and data-intensive for this purpose, or with empirical regression models that predict MAF based on climate and catchment characteristics. Yet, regression models have mostly been developed at a regional scale and the extent to which they can be extrapolated to other regions is not known. In this study, we developed a global-scale regression model for MAF based on a dataset unprecedented in size, using observations of discharge and catchment characteristics from 1885 catchments worldwide, measuring between 2 and 106 km2. In addition, we compared the performance of the regression model with the predictive ability of the spatially explicit global hydrological model PCR-GLOBWB by comparing results from both models to independent measurements. We obtained a regression model explaining 89% of the variance in MAF based on catchment area and catchment averaged mean annual precipitation and air temperature, slope and elevation. The regression model performed better than PCR-GLOBWB for the prediction of MAF, as root-mean-square error (RMSE) values were lower (0.29-0.38 compared to 0.49-0.57) and the modified index of agreement (d) was higher (0.80-0.83 compared to 0.72-0.75). Our regression model can be applied globally to estimate MAF at any point of the river network, thus providing a feasible alternative to spatially explicit process-based global hydrological models.
Regression Model Term Selection for the Analysis of Strain-Gage Balance Calibration Data
Ulbrich, Norbert Manfred; Volden, Thomas R.
2010-01-01
The paper discusses the selection of regression model terms for the analysis of wind tunnel strain-gage balance calibration data. Different function class combinations are presented that may be used to analyze calibration data using either a non-iterative or an iterative method. The role of the intercept term in a regression model of calibration data is reviewed. In addition, useful algorithms and metrics originating from linear algebra and statistics are recommended that will help an analyst (i) to identify and avoid both linear and near-linear dependencies between regression model terms and (ii) to make sure that the selected regression model of the calibration data uses only statistically significant terms. Three different tests are suggested that may be used to objectively assess the predictive capability of the final regression model of the calibration data. These tests use both the original data points and regression model independent confirmation points. Finally, data from a simplified manual calibration of the Ames MK40 balance is used to illustrate the application of some of the metrics and tests to a realistic calibration data set.
A hybrid model using logistic regression and wavelet transformation to detect traffic incidents
Shaurya Agarwal
2016-07-01
Full Text Available This research paper investigates a hybrid model using logistic regression with a wavelet-based feature extraction for detecting traffic incidents. A logistic regression model is suitable when the outcome can take only a limited number of values. For traffic incident detection, the outcome is limited to only two values, the presence or absence of an incident. The logistic regression model used in this study is a generalized linear model (GLM with a binomial response and a logit link function. This paper presents a framework to use logistic regression and wavelet-based feature extraction for traffic incident detection. It investigates the effect of preprocessing data on the performance of incident detection models. Results of this study indicate that logistic regression along with wavelet based feature extraction can be used effectively for incident detection by balancing the incident detection rate and the false alarm rate according to need. Logistic regression on raw data resulted in a maximum detection rate of 95.4% at the cost of 14.5% false alarm rate. Whereas the hybrid model achieved a maximum detection rate of 98.78% at the expense of 6.5% false alarm rate. Results indicate that the proposed approach is practical and efficient; with future improvements in the proposed technique, it will make an effective tool for traffic incident detection.
Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi
2012-01-01
The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.
Chen, Baojiang; Qin, Jing
2014-05-10
In statistical analysis, a regression model is needed if one is interested in finding the relationship between a response variable and covariates. When the response depends on the covariate, then it may also depend on the function of this covariate. If one has no knowledge of this functional form but expect for monotonic increasing or decreasing, then the isotonic regression model is preferable. Estimation of parameters for isotonic regression models is based on the pool-adjacent-violators algorithm (PAVA), where the monotonicity constraints are built in. With missing data, people often employ the augmented estimating method to improve estimation efficiency by incorporating auxiliary information through a working regression model. However, under the framework of the isotonic regression model, the PAVA does not work as the monotonicity constraints are violated. In this paper, we develop an empirical likelihood-based method for isotonic regression model to incorporate the auxiliary information. Because the monotonicity constraints still hold, the PAVA can be used for parameter estimation. Simulation studies demonstrate that the proposed method can yield more efficient estimates, and in some situations, the efficiency improvement is substantial. We apply this method to a dementia study.
Using the Logistic Regression model in supporting decisions of establishing marketing strategies
Cristinel CONSTANTIN
2015-12-01
Full Text Available This paper is about an instrumental research regarding the using of Logistic Regression model for data analysis in marketing research. The decision makers inside different organisation need relevant information to support their decisions regarding the marketing strategies. The data provided by marketing research could be computed in various ways but the multivariate data analysis models can enhance the utility of the information. Among these models we can find the Logistic Regression model, which is used for dichotomous variables. Our research is based on explanation the utility of this model and interpretation of the resulted information in order to help practitioners and researchers to use it in their future investigations
Regression-based air temperature spatial prediction models: an example from Poland
Mariusz Szymanowski
2013-10-01
Full Text Available A Geographically Weighted Regression ? Kriging (GWRK algorithm, based on the local Geographically Weighted Regression (GWR, is applied for spatial prediction of air temperature in Poland. Hengl's decision tree for selecting a suitable prediction model is extended for varying spatial relationships between the air temperature and environmental predictors with an assumption of existing environmental dependence of analyzed temperature variables. The procedure includes the potential choice of a local GWR instead of the global Multiple Linear Regression (MLR method for modeling the deterministic part of spatial variation, which is usual in the standard regression (residual kriging model (MLRK. The analysis encompassed: testing for environmental correlation, selecting an appropriate regression model, testing for spatial autocorrelation of the residual component, and validating the prediction accuracy. The proposed approach was performed for 69 air temperature cases, with time aggregation ranging from daily to annual average air temperatures. The results show that, irrespective of the level of data aggregation, the spatial distribution of temperature is better fitted by local models, and hence is the reason for choosing a GWR instead of the MLR for all variables analyzed. Additionally, in most cases (78% there is spatial autocorrelation in the residuals of the deterministic part, which suggests that the GWR model should be extended by ordinary kriging of residuals to the GWRK form. The decision tree used in this paper can be considered as universal as it encompasses either spatially varying relationships of modeled and explanatory variables or random process that can be modeled by a stochastic extension of the regression model (residual kriging. Moreover, for all cases analyzed, the selection of a method based on the local regression model (GWRK or GWR does not depend on the data aggregation level, showing the potential versatility of the technique.
Prediction on the seasonal behavior of hydrogen sulfide using a neural network model.
Kim, Byungwhan; Lee, Joogong; Jang, Jungyoung; Han, Dongil; Kim, Ki-Hyun
2011-05-05
Models to predict seasonal hydrogen sulfide (H2S) concentrations were constructed using neural networks. To this end, two types of generalized regression neural networks and radial basis function networks are considered and optimized. The input data for H2S were collected from August 2005 to Fall 2006 from a huge industrial complex located in Ansan City, Korea. Three types of seasonal groupings were prepared and one optimized model is built for each dataset. These optimized models were then used for the analysis of the sensitivity and main effect of the parameters. H2S was noted to be very sensitive to rainfall during the spring and summer. In the autumn, its sensitivity showed a strong dependency on wind speed and pressure. Pressure was identified as the most influential parameter during the spring and summer. In the autumn, relative humidity overwhelmingly affected H2S. It was noted that H2S maintained an inverse relationship with a number of parameters (e.g., radiation, wind speed, or dew-point temperature). In contrast, it exhibited a declining trend with a decrease in pressure. An increase in radiation was likely to decrease during spring and summer, but the opposite trend was predicted for the autumn. The overall results of this study thus suggest that the behavior of H2S can be accounted for by a diverse combination of meteorological parameters across seasons.
Prediction on the Seasonal Behavior of Hydrogen Sulfide Using a Neural Network Model
Byungwhan Kim
2011-01-01
Full Text Available Models to predict seasonal hydrogen sulfide (H2S concentrations were constructed using neural networks. To this end, two types of generalized regression neural networks and radial basis function networks are considered and optimized. The input data for H2S were collected from August 2005 to Fall 2006 from a huge industrial complex located in Ansan City, Korea. Three types of seasonal groupings were prepared and one optimized model is built for each dataset. These optimized models were then used for the analysis of the sensitivity and main effect of the parameters. H2S was noted to be very sensitive to rainfall during the spring and summer. In the autumn, its sensitivity showed a strong dependency on wind speed and pressure. Pressure was identified as the most influential parameter during the spring and summer. In the autumn, relative humidity overwhelmingly affected H2S. It was noted that H2S maintained an inverse relationship with a number of parameters (e.g., radiation, wind speed, or dew-point temperature. In contrast, it exhibited a declining trend with a decrease in pressure. An increase in radiation was likely to decrease during spring and summer, but the opposite trend was predicted for the autumn. The overall results of this study thus suggest that the behavior of H2S can be accounted for by a diverse combination of meteorological parameters across seasons.
Intuitionistic Fuzzy Weighted Linear Regression Model with Fuzzy Entropy under Linear Restrictions.
Kumar, Gaurav; Bajaj, Rakesh Kumar
2014-01-01
In fuzzy set theory, it is well known that a triangular fuzzy number can be uniquely determined through its position and entropies. In the present communication, we extend this concept on triangular intuitionistic fuzzy number for its one-to-one correspondence with its position and entropies. Using the concept of fuzzy entropy the estimators of the intuitionistic fuzzy regression coefficients have been estimated in the unrestricted regression model. An intuitionistic fuzzy weighted linear regression (IFWLR) model with some restrictions in the form of prior information has been considered. Further, the estimators of regression coefficients have been obtained with the help of fuzzy entropy for the restricted/unrestricted IFWLR model by assigning some weights in the distance function.
Ahmad A. Saifan
2016-04-01
Full Text Available Regression testing is a safeguarding procedure to validate and verify adapted software, and guarantee that no errors have emerged. However, regression testing is very costly when testers need to re-execute all the test cases against the modified software. This paper proposes a new approach in regression test selection domain. The approach is based on meta-models (test models and structured models to decrease the number of test cases to be used in the regression testing process. The approach has been evaluated using three Java applications. To measure the effectiveness of the proposed approach, we compare the results using the re-test to all approaches. The results have shown that our approach reduces the size of test suite without negative impact on the effectiveness of the fault detection.
Hartmann, Armin; Van Der Kooij, Anita J; Zeeck, Almut
2009-07-01
In explorative regression studies, linear models are often applied without questioning the linearity of the relations between the predictor variables and the dependent variable, or linear relations are taken as an approximation. In this study, the method of regression with optimal scaling transformations is demonstrated. This method does not require predefined nonlinear functions and results in easy-to-interpret transformations that will show the form of the relations. The method is illustrated using data from a German multicenter project on the indication criteria for inpatient or day clinic psychotherapy treatment. The indication criteria to include in the regression model were selected with the Lasso, which is a tool for predictor selection that overcomes the disadvantages of stepwise regression methods. The resulting prediction model indicates that treatment status is (approximately) linearly related to some criteria and nonlinearly related to others.
Modeling personalized head-related impulse response using support vector regression
HUANG Qing-hua; FANG Yong
2009-01-01
A new customization approach based on support vector regression (SVR) is proposed to obtain individual headrelated impulse response (HRIR) without complex measurement and special equipment. Principal component analysis (PCA) is first applied to obtain a few principal components and corresponding weight vectors correlated with individual anthropometric parameters. Then the weight vectors act as output of the nonlinear regression model. Some measured anthropometric parameters are selected as input of the model according to the correlation coefficients between the parameters and the weight vectors. After the regression model is learned from the training data, the individual HRIR can be predicted based on the measured anthropometric parameters. Compared with a back-propagation neural network (BPNN) for nonlinear regression,better generalization and prediction performance for small training samples can be obtained using the proposed PCA-SVR algorithm.
RAINFALL-RUNOFF MODELING IN THE TURKEY RIVER USING NUMERICAL AND REGRESSION METHODS
J. Behmanesh
2015-01-01
Full Text Available Modeling rainfall-runoff relationships in a watershed have an important role in water resources engineering. Researchers have used numerical models for modeling rainfall-runoff process in the watershed because of non-linear nature of rainfall-runoff relationship, vast data requirement and physical models hardness. The main object of this research was to model the rainfall-runoff relationship at the Turkey River in Mississippi. In this research, two numerical models including ANN and ANFIS were used to model the rainfall-runoff process and the best model was chosen. Also, by using SPSS software, the regression equations were developed and then the best equation was selected from regression analysis. The obtained results from the numerical and regression modeling were compared each other. The comparison showed that the model obtained from ANFIS modeling was better than the model obtained from regression modeling. The results also stated that the Turkey river flow rate had a logical relationship with one and two days ago flow rate and one, two and three days ago rainfall values.
RAINFALL-RUNOFF MODELING IN THE TURKEY RIVER USING NUMERICAL AND REGRESSION METHODS
J. Behmanesh
2015-03-01
Full Text Available Modeling rainfall-runoff relationships in a watershed have an important role in water resources engineering. Researchers have used numerical models for modeling rainfall-runoff process in the watershed because of non-linear nature of rainfall-runoff relationship, vast data requirement and physical models hardness. The main object of this research was to model the rainfall-runoff relationship at the Turkey River in Mississippi. In this research, two numerical models including ANN and ANFIS were used to model the rainfall-runoff process and the best model was chosen. Also, by using SPSS software, the regression equations were developed and then the best equation was selected from regression analysis. The obtained results from the numerical and regression modeling were compared each other. The comparison showed that the model obtained from ANFIS modeling was better than the model obtained from regression modeling. The results also stated that the Turkey river flow rate had a logical relationship with one and two days ago flow rate and one, two and three days ago rainfall values.
Electricity demand loads modeling using AutoRegressive Moving Average (ARMA) models
Pappas, S.S. [Department of Information and Communication Systems Engineering, University of the Aegean, Karlovassi, 83 200 Samos (Greece); Ekonomou, L.; Chatzarakis, G.E. [Department of Electrical Engineering Educators, ASPETE - School of Pedagogical and Technological Education, N. Heraklion, 141 21 Athens (Greece); Karamousantas, D.C. [Technological Educational Institute of Kalamata, Antikalamos, 24100 Kalamata (Greece); Katsikas, S.K. [Department of Technology Education and Digital Systems, University of Piraeus, 150 Androutsou Srt., 18 532 Piraeus (Greece); Liatsis, P. [Division of Electrical Electronic and Information Engineering, School of Engineering and Mathematical Sciences, Information and Biomedical Engineering Centre, City University, Northampton Square, London EC1V 0HB (United Kingdom)
2008-09-15
This study addresses the problem of modeling the electricity demand loads in Greece. The provided actual load data is deseasonilized and an AutoRegressive Moving Average (ARMA) model is fitted on the data off-line, using the Akaike Corrected Information Criterion (AICC). The developed model fits the data in a successful manner. Difficulties occur when the provided data includes noise or errors and also when an on-line/adaptive modeling is required. In both cases and under the assumption that the provided data can be represented by an ARMA model, simultaneous order and parameter estimation of ARMA models under the presence of noise are performed. The produced results indicate that the proposed method, which is based on the multi-model partitioning theory, tackles successfully the studied problem. For validation purposes the produced results are compared with three other established order selection criteria, namely AICC, Akaike's Information Criterion (AIC) and Schwarz's Bayesian Information Criterion (BIC). The developed model could be useful in the studies that concern electricity consumption and electricity prices forecasts. (author)
Singh, Kunwar P; Gupta, Shikha; Rai, Premanjali
2014-05-01
Kernel function-based regression models were constructed and applied to a nonlinear hydro-chemical dataset pertaining to surface water for predicting the dissolved oxygen levels. Initial features were selected using nonlinear approach. Nonlinearity in the data was tested using BDS statistics, which revealed the data with nonlinear structure. Kernel ridge regression, kernel principal component regression, kernel partial least squares regression, and support vector regression models were developed using the Gaussian kernel function and their generalization and predictive abilities were compared in terms of several statistical parameters. Model parameters were optimized using the cross-validation procedure. The proposed kernel regression methods successfully captured the nonlinear features of the original data by transforming it to a high dimensional feature space using the kernel function. Performance of all the kernel-based modeling methods used here were comparable both in terms of predictive and generalization abilities. Values of the performance criteria parameters suggested for the adequacy of the constructed models to fit the nonlinear data and their good predictive capabilities.
Shi, J Q; Wang, B; Will, E J; West, R M
2012-11-20
We propose a new semiparametric model for functional regression analysis, combining a parametric mixed-effects model with a nonparametric Gaussian process regression model, namely a mixed-effects Gaussian process functional regression model. The parametric component can provide explanatory information between the response and the covariates, whereas the nonparametric component can add nonlinearity. We can model the mean and covariance structures simultaneously, combining the information borrowed from other subjects with the information collected from each individual subject. We apply the model to dose-response curves that describe changes in the responses of subjects for differing levels of the dose of a drug or agent and have a wide application in many areas. We illustrate the method for the management of renal anaemia. An individual dose-response curve is improved when more information is included by this mechanism from the subject/patient over time, enabling a patient-specific treatment regime.
Atmospheric trace gases and global climate - A seasonal model study
Wang, Wei-Chyung; Molnar, Gyula; Ko, Malcolm K. W.; Goldenberg, Steven; Sze, Nien Dak
1990-01-01
Atmospheric models with seasonal cycles are used to study the possible near-future changes in latitudinal and vertical distributions of atmospheric ozone and temperature caused by increases of trace gases. It is found that increases of CFCs, CH4, and N2O may add to the surface warming from increased CO2. Calculations based on projected trends of CO2, N2O, CH4, and CFCs show that the annual mean and global mean surface temperature could warm by as much as 2.5 C by the year 2050, with larger warming at high latitudes. The results suggest that the warming in the lower stratosphere and upper troposphere is much larger than that at the surface, especially during the summer season.
Atmospheric trace gases and global climate - A seasonal model study
Wang, Wei-Chyung; Molnar, Gyula; Ko, Malcolm K. W.; Goldenberg, Steven; Sze, Nien Dak
1990-01-01
Atmospheric models with seasonal cycles are used to study the possible near-future changes in latitudinal and vertical distributions of atmospheric ozone and temperature caused by increases of trace gases. It is found that increases of CFCs, CH4, and N2O may add to the surface warming from increased CO2. Calculations based on projected trends of CO2, N2O, CH4, and CFCs show that the annual mean and global mean surface temperature could warm by as much as 2.5 C by the year 2050, with larger warming at high latitudes. The results suggest that the warming in the lower stratosphere and upper troposphere is much larger than that at the surface, especially during the summer season.
Seasonal variation in survival and reproduction can be a large source of prediction uncertainty in models used for conservation and management. A seasonally varying matrix population model is developed that incorporates temperature-driven differences in mortality and reproduction...
Kristin L Nichol
Full Text Available BACKGROUND: College and university students experience substantial morbidity from influenza and influenza-like illness, and they can benefit substantially from vaccination. Public health authorities encourage vaccination not only before the influenza season but also into and even throughout the influenza season. We conducted the present study to assess the impact of various vaccination strategies including delayed (i.e., in-season vaccination on influenza outbreaks on a college campus. METHODS/FINDINGS: We used a Susceptible --> Infected --> Recovered (SIR framework for our mathematical models to simulate influenza epidemics in a closed, college campus. We included both students and faculty/staff in the model and derived values for the model parameters from the published literature. The values for key model parameters were varied to assess the impact on the outbreak of various pre-season and delayed vaccination rates; one-way sensitivity analyses were conducted to test the sensitivity of the model outputs to changes in selected parameter values. In the base case, with a pre-season vaccination rate of 20%, no delayed vaccination, and 1 student index case, the total attack rate (total percent infected, TAR was 45%. With higher pre-season vaccination rates TARs were lower. Even if vaccinations were given 30 days after outbreak onset, TARs were still lower than the TAR of 69% in the absence of vaccination. Varying the proportions of vaccinations given pre-season versus delayed until after the onset of the outbreak gave intermediate TAR values. Base case outputs were sensitive to changes in infectious contact rates and infectious periods and a holiday/break schedule. CONCLUSION: Delayed vaccination and holidays/breaks can be important adjunctive measures to complement traditional pre-season influenza vaccination for controlling and preventing influenza in a closed college campus.
Montagne, Denise; Hoek, Gerard; Nieuwenhuijsen, Mark; Lanki, Timo; Pennanen, Arto; Portella, Meritxell; Meliefste, Kees; Eeftens, Marloes; Yli-Tuomi, Tarja; Cirach, Marta; Brunekreef, Bert
2013-08-06
Land use regression (LUR) models are often used to predict long-term average concentrations of air pollutants. Little is known how well LUR models predict personal exposure. In this study, the agreement of LUR models with measured personal exposure was assessed. The measured components were particulate matter with a diameter smaller than 2.5 μm (PM2.5), soot (reflectance of PM2.5), nitrogen oxides (NOx), and nitrogen dioxide (NO2). In Helsinki, Utrecht, and Barcelona, 15 volunteers (from semiurban, urban background, and traffic sites) followed prescribed time activity patterns. Per participant, six 96 h outdoor, indoor, and personal measurements spread over three seasons were conducted. Soot LUR models were significantly correlated with measured average outdoor and personal soot concentrations. Soot LUR models explained 39%, 44%, and 20% of personal exposure variability (R(2)) in Helsinki, Utrecht, and Barcelona. NO2 LUR models significantly predicted outdoor concentrations and personal exposure in Utrecht and Helsinki, whereas NOx and PM2.5 LUR models did not predict personal exposure. PM2.5, NO2, and NOx models were correlated with personal soot, the component least affected by indoor sources. LUR modeled and measured outdoor, indoor, and personal concentrations were highly correlated for all pollutants when data from the three cities were combined. This study supports the use of intraurban LUR models for especially soot in air pollution epidemiology.
Ulrich, David; Parkhouse, Bonnie L.
1982-01-01
An alumni-based model is proposed as an alternative to sports management curriculum design procedures. The model relies on the assessment of curriculum by sport management alumni and uses performance ratings of employers and measures of satisfaction by alumni in a regression model to identify curriculum leading to increased work performance and…
Menendez, P.; Eilers, P.; Tikunov, Y.M.; Bovy, A.G.; Eeuwijk, van F.
2012-01-01
The search for models which link tomato taste attributes to their metabolic profiling, is a main challenge within the breeding programs that aim to enhance tomato flavor. In this paper, we compared such models calculated by the traditional statistical approach, stepwise regression, with models obtai
MULTIPLE LOGISTIC REGRESSION MODEL TO PREDICT RISK FACTORS OF ORAL HEALTH DISEASES
Parameshwar V. Pandit
2012-06-01
Full Text Available Purpose: To analysis the dependence of oral health diseases i.e. dental caries and periodontal disease on considering the number of risk factors through the applications of logistic regression model. Method: The cross sectional study involves a systematic random sample of 1760 permanent dentition aged between 18-40 years in Dharwad, Karnataka, India. Dharwad is situated in North Karnataka. The mean age was 34.26±7.28. The risk factors of dental caries and periodontal disease were established by multiple logistic regression model using SPSS statistical software. Results: The factors like frequency of brushing, timings of cleaning teeth and type of toothpastes are significant persistent predictors of dental caries and periodontal disease. The log likelihood value of full model is –1013.1364 and Akaike’s Information Criterion (AIC is 1.1752 as compared to reduced regression model are -1019.8106 and 1.1748 respectively for dental caries. But, the log likelihood value of full model is –1085.7876 and AIC is 1.2577 followed by reduced regression model are -1019.8106 and 1.1748 respectively for periodontal disease. The area under Receiver Operating Characteristic (ROC curve for the dental caries is 0.7509 (full model and 0.7447 (reduced model; the ROC for the periodontal disease is 0.6128 (full model and 0.5821 (reduced model. Conclusions: The frequency of brushing, timings of cleaning teeth and type of toothpastes are main signifi cant risk factors of dental caries and periodontal disease. The fitting performance of reduced logistic regression model is slightly a better fit as compared to full logistic regression model in identifying the these risk factors for both dichotomous dental caries and periodontal disease.
A von Bertalanffy growth model with a seasonally varying coefficient
Cloern, James E.; Nichols, Frederic H.
1978-01-01
The von Bertalanffy model of body growth is inappropriate for organisms whose growth is restricted to a seasonal period because it assumes that growth rate is invariant with time. Incorporation of a time-varying coefficient significantly improves the capability of the von Bertalanffy equation to describe changing body size of both the bivalve mollusc Macoma balthicain San Francisco Bay and the flathead sole, Hippoglossoides elassodon, in Washington state. This simple modification of the von Bertalanffy model should offer improved predictions of body growth for a variety of other aquatic animals.
Random Modeling of Daily Rainfall and Runoff Using a Seasonal Model and Wavelet Denoising
Chien-ming Chou
2014-01-01
Full Text Available Instead of Fourier smoothing, this study applied wavelet denoising to acquire the smooth seasonal mean and corresponding perturbation term from daily rainfall and runoff data in traditional seasonal models, which use seasonal means for hydrological time series forecasting. The denoised rainfall and runoff time series data were regarded as the smooth seasonal mean. The probability distribution of the percentage coefficients can be obtained from calibrated daily rainfall and runoff data. For validated daily rainfall and runoff data, percentage coefficients were randomly generated according to the probability distribution and the law of linear proportion. Multiplying the generated percentage coefficient by the smooth seasonal mean resulted in the corresponding perturbation term. Random modeling of daily rainfall and runoff can be obtained by adding the perturbation term to the smooth seasonal mean. To verify the accuracy of the proposed method, daily rainfall and runoff data for the Wu-Tu watershed were analyzed. The analytical results demonstrate that wavelet denoising enhances the precision of daily rainfall and runoff modeling of the seasonal model. In addition, the wavelet denoising technique proposed in this study can obtain the smooth seasonal mean of rainfall and runoff processes and is suitable for modeling actual daily rainfall and runoff processes.
Schick, Simon; Rössler, Ole; Weingartner, Rolf
2016-10-01
Based on a hindcast experiment for the period 1982-2013 in 66 sub-catchments of the Swiss Rhine, the present study compares two approaches of building a regression model for seasonal streamflow forecasting. The first approach selects a single "best guess" model, which is tested by leave-one-out cross-validation. The second approach implements the idea of bootstrap aggregating, where bootstrap replicates are employed to select several models, and out-of-bag predictions provide model testing. The target value is mean streamflow for durations of 30, 60 and 90 days, starting with the 1st and 16th day of every month. Compared to the best guess model, bootstrap aggregating reduces the mean squared error of the streamflow forecast by seven percent on average. Thus, if resampling is anyway part of the model building procedure, bootstrap aggregating seems to be a useful strategy in statistical seasonal streamflow forecasting. Since the improved accuracy comes at the cost of a less interpretable model, the approach might be best suited for pure prediction tasks, e.g. as in operational applications.
Structured Additive Regression Models: An R Interface to BayesX
Nikolaus Umlauf
2015-02-01
Full Text Available Structured additive regression (STAR models provide a flexible framework for model- ing possible nonlinear effects of covariates: They contain the well established frameworks of generalized linear models and generalized additive models as special cases but also allow a wider class of effects, e.g., for geographical or spatio-temporal data, allowing for specification of complex and realistic models. BayesX is standalone software package providing software for fitting general class of STAR models. Based on a comprehensive open-source regression toolbox written in C++, BayesX uses Bayesian inference for estimating STAR models based on Markov chain Monte Carlo simulation techniques, a mixed model representation of STAR models, or stepwise regression techniques combining penalized least squares estimation with model selection. BayesX not only covers models for responses from univariate exponential families, but also models from less-standard regression situations such as models for multi-categorical responses with either ordered or unordered categories, continuous time survival data, or continuous time multi-state models. This paper presents a new fully interactive R interface to BayesX: the R package R2BayesX. With the new package, STAR models can be conveniently specified using Rs formula language (with some extended terms, fitted using the BayesX binary, represented in R with objects of suitable classes, and finally printed/summarized/plotted. This makes BayesX much more accessible to users familiar with R and adds extensive graphics capabilities for visualizing fitted STAR models. Furthermore, R2BayesX complements the already impressive capabilities for semiparametric regression in R by a comprehensive toolbox comprising in particular more complex response types and alternative inferential procedures such as simulation-based Bayesian inference.
S. Goyal
2012-03-01
Full Text Available This paper highlights the significance of computational intelligence models for predicting shelf life of processed cheese stored at 7-8 g.C. Linear Layer and Generalized Regression models were developed with input parameters: Soluble nitrogen, pH, Standard plate count, Yeast & mould count, Spores, and sensory score as output parameter. Mean Square Error, Root Mean Square Error, Coefficient of Determination and Nash - Sutcliffo Coefficient were used in order to compare the prediction ability of the models. The study revealed that Generalized Regression computational intelligence models are quite effective in predicting the shelf life of processed cheese stored at 7-8 g.C.
The Relationship between Economic Growth and Money Laundering – a Linear Regression Model
Daniel Rece
2009-09-01
Full Text Available This study provides an overview of the relationship between economic growth and money laundering modeled by a least squares function. The report analyzes statistically data collected from USA, Russia, Romania and other eleven European countries, rendering a linear regression model. The study illustrates that 23.7% of the total variance in the regressand (level of money laundering is “explained” by the linear regression model. In our opinion, this model will provide critical auxiliary judgment and decision support for anti-money laundering service systems.
Carstensen, Bendix
1996-01-01
This paper shows how to fit excess and relative risk regression models to interval censored survival data, and how to implement the models in standard statistical software. The methods developed are used for the analysis of HIV infection rates in a cohort of Danish homosexual men.......This paper shows how to fit excess and relative risk regression models to interval censored survival data, and how to implement the models in standard statistical software. The methods developed are used for the analysis of HIV infection rates in a cohort of Danish homosexual men....
A hybrid land use regression/AERMOD model for predicting intra-urban variation in PM2.5
Michanowicz, Drew R.; Shmool, Jessie L. C.; Tunno, Brett J.; Tripathy, Sheila; Gillooly, Sara; Kinnee, Ellen; Clougherty, Jane E.
2016-04-01
Characterizing near-source spatio-temporal variation is a long -standing challenge in air pollution epidemiology, and common intra-urban modeling approaches [e.g., land use regression (LUR)], do not account for short-term meteorological variation. Atmospheric dispersion modeling approaches, such as AERMOD, can account for near-source pollutant behavior by capturing source-meteorological interactions, but requires external validation and resolved background concentrations. In this study, we integrate AERMOD-based predictions for source-specific fine particle (PM2.5) concentrations into LUR models derived from total ambient PM2.5 measured at 36 unique sites selected to represent different source and elevation profiles, during summer and winter, 2012-2013 in Pittsburgh, Pennsylvania (PA). We modeled PM2.5 emissions from 207 local stationary sources in AERMOD, utilizing the monitoring locations as receptors, and hourly meteorological information matching each sampling period. Finally, we compare results of the integrated LUR/AERMOD hybrid model to those of the AERMOD + background and standard LUR models, at the full domain scale and within a 5 km2 sub-domain surrounding a large industrial facility. The hybrid model improved out-of-sample prediction accuracy by 2-10% over LUR alone, though performance differed by season, in part due to within-season temporal variability. We found differences up to 10 μg/m3 in predicted concentrations, and observed the largest differences within the industrial sub-domain. LUR underestimated concentrations from 500 to 2500 m downwind of major sources. The hybrid modeling approach we developed may help to improve intra-urban exposure estimates, particularly in regions of large industrial sources, sharp elevation gradients, or complex meteorology (e.g., frequent inversion events), such as Pittsburgh, PA. More broadly, the approach may inform the development of spatio-temporal modeling frameworks for air pollution exposure assessment for
A primer for biomedical scientists on how to execute model II linear regression analysis.
Ludbrook, John
2012-04-01
1. There are two very different ways of executing linear regression analysis. One is Model I, when the x-values are fixed by the experimenter. The other is Model II, in which the x-values are free to vary and are subject to error. 2. I have received numerous complaints from biomedical scientists that they have great difficulty in executing Model II linear regression analysis. This may explain the results of a Google Scholar search, which showed that the authors of articles in journals of physiology, pharmacology and biochemistry rarely use Model II regression analysis. 3. I repeat my previous arguments in favour of using least products linear regression analysis for Model II regressions. I review three methods for executing ordinary least products (OLP) and weighted least products (WLP) regression analysis: (i) scientific calculator and/or computer spreadsheet; (ii) specific purpose computer programs; and (iii) general purpose computer programs. 4. Using a scientific calculator and/or computer spreadsheet, it is easy to obtain correct values for OLP slope and intercept, but the corresponding 95% confidence intervals (CI) are inaccurate. 5. Using specific purpose computer programs, the freeware computer program smatr gives the correct OLP regression coefficients and obtains 95% CI by bootstrapping. In addition, smatr can be used to compare the slopes of OLP lines. 6. When using general purpose computer programs, I recommend the commercial programs systat and Statistica for those who regularly undertake linear regression analysis and I give step-by-step instructions in the Supplementary Information as to how to use loss functions.
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-12-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.
Evaluation of Regression Models of Balance Calibration Data Using an Empirical Criterion
Ulbrich, Norbert; Volden, Thomas R.
2012-01-01
An empirical criterion for assessing the significance of individual terms of regression models of wind tunnel strain gage balance outputs is evaluated. The criterion is based on the percent contribution of a regression model term. It considers a term to be significant if its percent contribution exceeds the empirical threshold of 0.05%. The criterion has the advantage that it can easily be computed using the regression coefficients of the gage outputs and the load capacities of the balance. First, a definition of the empirical criterion is provided. Then, it is compared with an alternate statistical criterion that is widely used in regression analysis. Finally, calibration data sets from a variety of balances are used to illustrate the connection between the empirical and the statistical criterion. A review of these results indicated that the empirical criterion seems to be suitable for a crude assessment of the significance of a regression model term as the boundary between a significant and an insignificant term cannot be defined very well. Therefore, regression model term reduction should only be performed by using the more universally applicable statistical criterion.
Modeling Governance KB with CATPCA to Overcome Multicollinearity in the Logistic Regression
Khikmah, L.; Wijayanto, H.; Syafitri, U. D.
2017-04-01
The problem often encounters in logistic regression modeling are multicollinearity problems. Data that have multicollinearity between explanatory variables with the result in the estimation of parameters to be bias. Besides, the multicollinearity will result in error in the classification. In general, to overcome multicollinearity in regression used stepwise regression. They are also another method to overcome multicollinearity which involves all variable for prediction. That is Principal Component Analysis (PCA). However, classical PCA in only for numeric data. Its data are categorical, one method to solve the problems is Categorical Principal Component Analysis (CATPCA). Data were used in this research were a part of data Demographic and Population Survey Indonesia (IDHS) 2012. This research focuses on the characteristic of women of using the contraceptive methods. Classification results evaluated using Area Under Curve (AUC) values. The higher the AUC value, the better. Based on AUC values, the classification of the contraceptive method using stepwise method (58.66%) is better than the logistic regression model (57.39%) and CATPCA (57.39%). Evaluation of the results of logistic regression using sensitivity, shows the opposite where CATPCA method (99.79%) is better than logistic regression method (92.43%) and stepwise (92.05%). Therefore in this study focuses on major class classification (using a contraceptive method), then the selected model is CATPCA because it can raise the level of the major class model accuracy.
A Stochastic Restricted Principal Components Regression Estimator in the Linear Model
Daojiang He
2014-01-01
Full Text Available We propose a new estimator to combat the multicollinearity in the linear model when there are stochastic linear restrictions on the regression coefficients. The new estimator is constructed by combining the ordinary mixed estimator (OME and the principal components regression (PCR estimator, which is called the stochastic restricted principal components (SRPC regression estimator. Necessary and sufficient conditions for the superiority of the SRPC estimator over the OME and the PCR estimator are derived in the sense of the mean squared error matrix criterion. Finally, we give a numerical example and a Monte Carlo study to illustrate the performance of the proposed estimator.
Regression analysis understanding and building business and economic models using Excel
Wilson, J Holton
2012-01-01
The technique of regression analysis is used so often in business and economics today that an understanding of its use is necessary for almost everyone engaged in the field. This book will teach you the essential elements of building and understanding regression models in a business/economic context in an intuitive manner. The authors take a non-theoretical treatment that is accessible even if you have a limited statistical background. It is specifically designed to teach the correct use of regression, while advising you of its limitations and teaching about common pitfalls. This book describe
Modelling the Seasonal Overturning Circulation in the Red Sea
Yao, Fengchao
2015-04-01
The overturning circulation in the Red Sea exhibits a distinct seasonally reversing pattern and is studied using 50-year, high-resolution MIT general circulation model simulations. The seasonal water exchange in the Strait of Bab el Mandeb is successfully simulated, and the structures of the intruding subsurface Gulf of Aden intermediate water are in good agreement with summer observations in 2011. The model results suggest that the summer overturning circulation is driven by the combined effect of the shoaling of the thermocline in the Gulf of Aden resulting from remote winds in the Arabian Sea and an upward surface slope from the Red Sea to the Gulf of Aden set up by local surface winds in the Red Sea. For the winter overturning circulation, the climatological model mean results suggest that the surface inflow intensifies in a western boundary current in the southern Red Sea that switches to an eastern boundary current north of 24°N. The overturning is accomplished through a cyclonic recirculation and a cross-basin overturning circulation in the northern Red Sea, with major sinking occurring along a narrow band of width about 20 km along the eastern boundary and weaker upwelling along the western boundary. The northward pressure gradient force, strong vertical mixing, and horizontal mixing near the boundary are the essential dynamical components in the model\\'s winter overturning circulation.
Hanks, Ephraim M.; Schliep, Erin M.; Hooten, Mevin B.; Hoeting, Jennifer A.
2015-01-01
In spatial generalized linear mixed models (SGLMMs), covariates that are spatially smooth are often collinear with spatially smooth random effects. This phenomenon is known as spatial confounding and has been studied primarily in the case where the spatial support of the process being studied is discrete (e.g., areal spatial data). In this case, the most common approach suggested is restricted spatial regression (RSR) in which the spatial random effects are constrained to be orthogonal to the fixed effects. We consider spatial confounding and RSR in the geostatistical (continuous spatial support) setting. We show that RSR provides computational benefits relative to the confounded SGLMM, but that Bayesian credible intervals under RSR can be inappropriately narrow under model misspecification. We propose a posterior predictive approach to alleviating this potential problem and discuss the appropriateness of RSR in a variety of situations. We illustrate RSR and SGLMM approaches through simulation studies and an analysis of malaria frequencies in The Gambia, Africa.
Estimasi Model Seemingly Unrelated Regression (SUR dengan Metode Generalized Least Square (GLS
Ade Widyaningsih
2014-06-01
Full Text Available Regression analysis is a statistical tool that is used to determine the relationship between two or more quantitative variables so that one variable can be predicted from the other variables. A method that can used to obtain a good estimation in the regression analysis is ordinary least squares method. The least squares method is used to estimate the parameters of one or more regression but relationships among the errors in the response of other estimators are not allowed. One way to overcome this problem is Seemingly Unrelated Regression model (SUR in which parameters are estimated using Generalized Least Square (GLS. In this study, the author applies SUR model using GLS method on world gasoline demand data. The author obtains that SUR using GLS is better than OLS because SUR produce smaller errors than the OLS.
Modeling of retardance in ferrofluid with Taguchi-based multiple regression analysis
Lin, Jing-Fung; Wu, Jyh-Shyang; Sheu, Jer-Jia
2015-03-01
The citric acid (CA) coated Fe3O4 ferrofluids are prepared by a co-precipitation method and the magneto-optical retardance property is measured by a Stokes polarimeter. Optimization and multiple regression of retardance in ferrofluids are executed by combining Taguchi method and Excel. From the nine tests for four parameters, including pH of suspension, molar ratio of CA to Fe3O4, volume of CA, and coating temperature, influence sequence and excellent program are found. Multiple regression analysis and F-test on the significance of regression equation are performed. It is found that the model F value is much larger than Fcritical and significance level P <0.0001. So it can be concluded that the regression model has statistically significant predictive ability. Substituting excellent program into equation, retardance is obtained as 32.703°, higher than the highest value in tests by 11.4%.
Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne
2016-04-01
Existing evidence suggests that ambient ultrafine particles (UFPs) (regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure.
Siegert, Stefan
2017-04-01
Initialised climate forecasts on seasonal time scales, run several months or even years ahead, are now an integral part of the battery of products offered by climate services world-wide. The availability of seasonal climate forecasts from various modeling centres gives rise to multi-model ensemble forecasts. Post-processing such seasonal-to-decadal multi-model forecasts is challenging 1) because the cross-correlation structure between multiple models and observations can be complicated, 2) because the amount of training data to fit the post-processing parameters is very limited, and 3) because the forecast skill of numerical models tends to be low on seasonal time scales. In this talk I will review new statistical post-processing frameworks for multi-model ensembles. I will focus particularly on Bayesian hierarchical modelling approaches, which are flexible enough to capture commonly made assumptions about collective and model-specific biases of multi-model ensembles. Despite the advances in statistical methodology, it turns out to be very difficult to out-perform the simplest post-processing method, which just recalibrates the multi-model ensemble mean by linear regression. I will discuss reasons for this, which are closely linked to the specific characteristics of seasonal multi-model forecasts. I explore possible directions for improvements, for example using informative priors on the post-processing parameters, and jointly modelling forecasts and observations.
A brief introduction to regression designs and mixed-effects modelling by a recent convert
Balling, Laura Winther
2008-01-01
This article discusses the advantages of multiple regression designs over the factorial designs traditionally used in many psycholinguistic experiments. It is shown that regression designs are typically more informative, statistically more powerful and better suited to the analysis of naturalistic tasks. The advantages of including both fixed and random effects are demonstrated with reference to linear mixed-effects models, and problems of collinearity, variable distribution and variable sele...
Strathe, Anders B; Mark, Thomas; Nielsen, Bjarne; Do, Duy Ngoc; KADARMIDEEN, Haja N.; Jensen, Just
2014-01-01
Random regression models were used to estimate covariance functions between cumulated feed intake (CFI) and body weight (BW) in 8424 Danish Duroc pigs. Random regressions on second order Legendre polynomials of age were used to describe genetic and permanent environmental curves in BW and CFI. Based on covariance functions, residual feed intake (RFI) was defined and derived as the conditional genetic variance in feed intake given mid-test breeding value for BW and rate of gain. The heritabili...
Kamaruddin, Ainur Amira; Ali, Zalila; Noor, Norlida Mohd.; Baharum, Adam; Ahmad, Wan Muhamad Amir W.
2014-07-01
Logistic regression analysis examines the influence of various factors on a dichotomous outcome by estimating the probability of the event's occurrence. Logistic regression, also called a logit model, is a statistical procedure used to model dichotomous outcomes. In the logit model the log odds of the dichotomous outcome is modeled as a linear combination of the predictor variables. The log odds ratio in logistic regression provides a description of the probabilistic relationship of the variables and the outcome. In conducting logistic regression, selection procedures are used in selecting important predictor variables, diagnostics are used to check that assumptions are valid which include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers and a test statistic is calculated to determine the aptness of the model. This study used the binary logistic regression model to investigate overweight and obesity among rural secondary school students on the basis of their demographics profile, medical history, diet and lifestyle. The results indicate that overweight and obesity of students are influenced by obesity in family and the interaction between a student's ethnicity and routine meals intake. The odds of a student being overweight and obese are higher for a student having a family history of obesity and for a non-Malay student who frequently takes routine meals as compared to a Malay student.
Improving statistical forecasts of seasonal streamflows using hydrological model output
D. E. Robertson
2013-02-01
Full Text Available Statistical methods traditionally applied for seasonal streamflow forecasting use predictors that represent the initial catchment condition and future climate influences on future streamflows. Observations of antecedent streamflows or rainfall commonly used to represent the initial catchment conditions are surrogates for the true source of predictability and can potentially have limitations. This study investigates a hybrid seasonal forecasting system that uses the simulations from a dynamic hydrological model as a predictor to represent the initial catchment condition in a statistical seasonal forecasting method. We compare the skill and reliability of forecasts made using the hybrid forecasting approach to those made using the existing operational practice of the Australian Bureau of Meteorology for 21 catchments in eastern Australia. We investigate the reasons for differences. In general, the hybrid forecasting system produces forecasts that are more skilful than the existing operational practice and as reliable. The greatest increases in forecast skill tend to be (1 when the catchment is wetting up but antecedent streamflows have not responded to antecedent rainfall, (2 when the catchment is drying and the dominant source of antecedent streamflow is in transition between surface runoff and base flow, and (3 when the initial catchment condition is near saturation intermittently throughout the historical record.
Improving statistical forecasts of seasonal streamflows using hydrological model output
Robertson, D. E.; Pokhrel, P.; Wang, Q. J.
2013-02-01
Statistical methods traditionally applied for seasonal streamflow forecasting use predictors that represent the initial catchment condition and future climate influences on future streamflows. Observations of antecedent streamflows or rainfall commonly used to represent the initial catchment conditions are surrogates for the true source of predictability and can potentially have limitations. This study investigates a hybrid seasonal forecasting system that uses the simulations from a dynamic hydrological model as a predictor to represent the initial catchment condition in a statistical seasonal forecasting method. We compare the skill and reliability of forecasts made using the hybrid forecasting approach to those made using the existing operational practice of the Australian Bureau of Meteorology for 21 catchments in eastern Australia. We investigate the reasons for differences. In general, the hybrid forecasting system produces forecasts that are more skilful than the existing operational practice and as reliable. The greatest increases in forecast skill tend to be (1) when the catchment is wetting up but antecedent streamflows have not responded to antecedent rainfall, (2) when the catchment is drying and the dominant source of antecedent streamflow is in transition between surface runoff and base flow, and (3) when the initial catchment condition is near saturation intermittently throughout the historical record.
Numerical modeling of seasonally freezing ground and permafrost
Nicolsky, Dmitry J.
2007-12-01
This thesis represents a collection of papers on numerical modeling of permafrost and seasonally freezing ground dynamics. An important problem in numerical modeling of temperature dynamics in permafrost and seasonally freezing ground is related to parametrization of already existing models. In this thesis, a variation data assimilation technique is presented to find soil properties by minimizing the discrepancy between in-situ measured temperatures and those computed by the models. The iterative minimization starts from an initial approximation of the soil properties that are found by solving a sequence of simple subproblems. In order to compute the discrepancy, the temperature dynamics is simulated by a new implementation of the finite element method applied to the heat equation with phase change. Despite simplifications in soil physics, the presented technique was successfully applied to recover soil properties, such as thermal conductivity, soil porosity, and the unfrozen water content, at several sites in Alaska. The recovered properties are used in discussion on soil freezing/thawing and permafrost dynamics in other parts of this thesis. Another part of this thesis concerns development of a numerical thermo-mechanical model of seasonal soil freezing on the lateral scale of several meters. The presented model explains observed differential frost heave occurring in non-sorted circle ecosystems north of the Brooks Range in the Alaskan tundra. The model takes into account conservation principles for energy, linear momentum and mass of three constituents: liquid water, ice and solid particles. The conservation principles are reduced to a computationally convenient system of coupled equations for temperature, liquid water pressure, porosity, and the velocity of soil particles in a three-dimensional domain with cylindrical symmetry. Despite a simplified rheology, the model simulates the ground surface motion, temperature, and water dynamics in soil and explains
Akkaya, Ali Volkan [Department of Mechanical Engineering, Yildiz Technical University, 34349 Besiktas, Istanbul (Turkey)
2009-02-15
In this paper, multiple nonlinear regression models for estimation of higher heating value of coals are developed using proximate analysis data obtained generally from the low rank coal samples as-received basis. In this modeling study, three main model structures depended on the number of proximate analysis parameters, which are named the independent variables, such as moisture, ash, volatile matter and fixed carbon, are firstly categorized. Secondly, sub-model structures with different arrangements of the independent variables are considered. Each sub-model structure is analyzed with a number of model equations in order to find the best fitting model using multiple nonlinear regression method. Based on the results of nonlinear regression analysis, the best model for each sub-structure is determined. Among them, the models giving highest correlation for three main structures are selected. Although the selected all three models predicts HHV rather accurately, the model involving four independent variables provides the most accurate estimation of HHV. Additionally, when the chosen model with four independent variables and a literature model are tested with extra proximate analysis data, it is seen that that the developed model in this study can give more accurate prediction of HHV of coals. It can be concluded that the developed model is effective tool for HHV estimation of low rank coals. (author)
First principles modeling of nonlinear incidence rates in seasonal epidemics.
José M Ponciano
2011-02-01
Full Text Available In this paper we used a general stochastic processes framework to derive from first principles the incidence rate function that characterizes epidemic models. We investigate a particular case, the Liu-Hethcote-van den Driessche's (LHD incidence rate function, which results from modeling the number of successful transmission encounters as a pure birth process. This derivation also takes into account heterogeneity in the population with regard to the per individual transmission probability. We adjusted a deterministic SIRS model with both the classical and the LHD incidence rate functions to time series of the number of children infected with syncytial respiratory virus in Banjul, Gambia and Turku, Finland. We also adjusted a deterministic SEIR model with both incidence rate functions to the famous measles data sets from the UK cities of London and Birmingham. Two lines of evidence supported our conclusion that the model with the LHD incidence rate may very well be a better description of the seasonal epidemic processes studied here. First, our model was repeatedly selected as best according to two different information criteria and two different likelihood formulations. The second line of evidence is qualitative in nature: contrary to what the SIRS model with classical incidence rate predicts, the solution of the deterministic SIRS model with LHD incidence rate will reach either the disease free equilibrium or the endemic equilibrium depending on the initial conditions. These findings along with computer intensive simulations of the models' Poincaré map with environmental stochasticity contributed to attain a clear separation of the roles of the environmental forcing and the mechanics of the disease transmission in shaping seasonal epidemics dynamics.
First principles modeling of nonlinear incidence rates in seasonal epidemics.
Ponciano, José M; Capistrán, Marcos A
2011-02-01
In this paper we used a general stochastic processes framework to derive from first principles the incidence rate function that characterizes epidemic models. We investigate a particular case, the Liu-Hethcote-van den Driessche's (LHD) incidence rate function, which results from modeling the number of successful transmission encounters as a pure birth process. This derivation also takes into account heterogeneity in the population with regard to the per individual transmission probability. We adjusted a deterministic SIRS model with both the classical and the LHD incidence rate functions to time series of the number of children infected with syncytial respiratory virus in Banjul, Gambia and Turku, Finland. We also adjusted a deterministic SEIR model with both incidence rate functions to the famous measles data sets from the UK cities of London and Birmingham. Two lines of evidence supported our conclusion that the model with the LHD incidence rate may very well be a better description of the seasonal epidemic processes studied here. First, our model was repeatedly selected as best according to two different information criteria and two different likelihood formulations. The second line of evidence is qualitative in nature: contrary to what the SIRS model with classical incidence rate predicts, the solution of the deterministic SIRS model with LHD incidence rate will reach either the disease free equilibrium or the endemic equilibrium depending on the initial conditions. These findings along with computer intensive simulations of the models' Poincaré map with environmental stochasticity contributed to attain a clear separation of the roles of the environmental forcing and the mechanics of the disease transmission in shaping seasonal epidemics dynamics.
Optimization of Evaporative Demand Models for Seasonal Drought Forecasting
McEvoy, D.; Huntington, J. L.; Hobbins, M.
2015-12-01
Providing reliable seasonal drought forecasts continues to pose a major challenge for scientists, end-users, and the water resources and agricultural communities. Precipitation (Prcp) forecasts beyond weather time scales are largely unreliable, so exploring new avenues to improve seasonal drought prediction is necessary to move towards applications and decision-making based on seasonal forecasts. A recent study has shown that evaporative demand (E0) anomaly forecasts from the Climate Forecast System Version 2 (CFSv2) are consistently more skillful than Prcp anomaly forecasts during drought events over CONUS, and E0 drought forecasts may be particularly useful during the growing season in the farming belts of the central and Midwestern CONUS. For this recent study, we used CFSv2 reforecasts to assess the skill of E0 and of its individual drivers (temperature, humidity, wind speed, and solar radiation), using the American Society for Civil Engineers Standardized Reference Evapotranspiration (ET0) Equation. Moderate skill was found in ET0, temperature, and humidity, with lesser skill in solar radiation, and no skill in wind. Therefore, forecasts of E0 based on models with no wind or solar radiation inputs may prove to be more skillful than the ASCE ET0. For this presentation we evaluate CFSv2 E0 reforecasts (1982-2009) from three different E0 models: (1) ASCE ET0; (2) Hargreaves and Samani (ET-HS), which is estimated from maximum and minimum temperature alone; and (3) Valiantzas (ET-V), which is a modified version of the Penman method for use when wind speed data are not available (or of poor quality) and is driven only by temperature, humidity, and solar radiation. The University of Idaho's gridded meteorological data (METDATA) were used as observations to evaluate CFSv2 and also to determine if ET0, ET-HS, and ET-V identify similar historical drought periods. We focus specifically on CFSv2 lead times of one, two, and three months, and season one forecasts; which are
Study of Mechanical Properties of Wool Type Fabrics using ANCOVA Regression Model
Hristian, L.; Ostafe, M. M.; Manea, L. R.; Apostol, L. L.
2017-06-01
The work has achieved a study on the variation of tensile strength for the four groups of wool fabric type, depending on the fiber composition, the tensile strength of the warp yarns and the weft yarns technological density using ANCOVA regression model. ANCOVA checks the correlation between a dependent variable and the covariate independent variables and removes the variability from the dependent variable that can be accounted for by the covariates. Analysis of covariance models combines analysis of variance with regression analysis techniques. Regarding design, ANCOVA models explain the dependent variable by combining categorical (qualitative) independent variables with continuous (quantitative) variables. There are special extensions to ANCOVA calculations to estimate parameters for both categorical and continuous variables. However ANCOVA models can also be calculated using multiple regression analysis using a design matrix with a mix of dummy-coded qualitative and quantitative variables.
truncSP: An R Package for Estimation of Semi-Parametric Truncated Linear Regression Models
Maria Karlsson
2014-05-01
Full Text Available Problems with truncated data occur in many areas, complicating estimation and inference. Regarding linear regression models, the ordinary least squares estimator is inconsistent and biased for these types of data and is therefore unsuitable for use. Alternative estimators, designed for the estimation of truncated regression models, have been developed. This paper presents the R package truncSP. The package contains functions for the estimation of semi-parametric truncated linear regression models using three different estimators: the symmetrically trimmed least squares, quadratic mode, and left truncated estimators, all of which have been shown to have good asymptotic and ?nite sample properties. The package also provides functions for the analysis of the estimated models. Data from the environmental sciences are used to illustrate the functions in the package.
Cho, C I; Alam, M; Choi, T J; Choy, Y H; Choi, J G; Lee, S S; Cho, K H
2016-05-01
The objectives of the study were to estimate genetic parameters for milk production traits of Holstein cattle using random regression models (RRMs), and to compare the goodness of fit of various RRMs with homogeneous and heterogeneous residual variances. A total of 126,980 test-day milk production records of the first parity Holstein cows between 2007 and 2014 from the Dairy Cattle Improvement Center of National Agricultural Cooperative Federation in South Korea were used. These records included milk yield (MILK), fat yield (FAT), protein yield (PROT), and solids-not-fat yield (SNF). The statistical models included random effects of genetic and permanent environments using Legendre polynomials (LP) of the third to fifth order (L3-L5), fixed effects of herd-test day, year-season at calving, and a fixed regression for the test-day record (third to fifth order). The residual variances in the models were either homogeneous (HOM) or heterogeneous (15 classes, HET15; 60 classes, HET60). A total of nine models (3 orders of polynomials×3 types of residual variance) including L3-HOM, L3-HET15, L3-HET60, L4-HOM, L4-HET15, L4-HET60, L5-HOM, L5-HET15, and L5-HET60 were compared using Akaike information criteria (AIC) and/or Schwarz Bayesian information criteria (BIC) statistics to identify the model(s) of best fit for their respective traits. The lowest BIC value was observed for the models L5-HET15 (MILK; PROT; SNF) and L4-HET15 (FAT), which fit the best. In general, the BIC values of HET15 models for a particular polynomial order was lower than that of the HET60 model in most cases. This implies that the orders of LP and types of residual variances affect the goodness of models. Also, the heterogeneity of residual variances should be considered for the test-day analysis. The heritability estimates of from the best fitted models ranged from 0.08 to 0.15 for MILK, 0.06 to 0.14 for FAT, 0.08 to 0.12 for PROT, and 0.07 to 0.13 for SNF according to days in milk of first
Efficient Quantile Estimation for Functional-Coefficient Partially Linear Regression Models
Zhangong ZHOU; Rong JIANG; Weimin QIAN
2011-01-01
The quantile estimation methods are proposed for functional-coefficient partially linear regression (FCPLR) model by combining nonparametric and functional-coefficient regression (FCR) model.The local linear scheme and the integrated method are used to obtain local quantile estimators of all unknown functions in the FCPLR model.These resulting estimators are asymptotically normal,but each of them has big variance.To reduce variances of these quantile estimators,the one-step backfitting technique is used to obtain the efficient quantile estimators of all unknown functions,and their asymptotic normalities are derived.Two simulated examples are carried out to illustrate the proposed estimation methodology.
Modeling pollen time series using seasonal-trend decomposition procedure based on LOESS smoothing.
Rojo, Jesús; Rivero, Rosario; Romero-Morte, Jorge; Fernández-González, Federico; Pérez-Badia, Rosa
2017-02-01
Analysis of airborne pollen concentrations provides valuable information on plant phenology and is thus a useful tool in agriculture-for predicting harvests in crops such as the olive and for deciding when to apply phytosanitary treatments-as well as in medicine and the environmental sciences. Variations in airborne pollen concentrations, moreover, are indicators of changing plant life cycles. By modeling pollen time series, we can not only identify the variables influencing pollen levels but also predict future pollen concentrations. In this study, airborne pollen time series were modeled using a seasonal-trend decomposition procedure based on LOcally wEighted Scatterplot Smoothing (LOESS) smoothing (STL). The data series-daily Poaceae pollen concentrations over the period 2006-2014-was broken up into seasonal and residual (stochastic) components. The seasonal component was compared with data on Poaceae flowering phenology obtained by field sampling. Residuals were fitted to a model generated from daily temperature and rainfall values, and daily pollen concentrations, using partial least squares regression (PLSR). This method was then applied to predict daily pollen concentrations for 2014 (independent validation data) using results for the seasonal component of the time series and estimates of the residual component for the period 2006-2013. Correlation between predicted and observed values was r = 0.79 (correlation coefficient) for the pre-peak period (i.e., the period prior to the peak pollen concentration) and r = 0.63 for the post-peak period. Separate analysis of each of the components of the pollen data series enables the sources of variability to be identified more accurately than by analysis of the original non-decomposed data series, and for this reason, this procedure has proved to be a suitable technique for analyzing the main environmental factors influencing airborne pollen concentrations.
Modeling pollen time series using seasonal-trend decomposition procedure based on LOESS smoothing
Rojo, Jesús; Rivero, Rosario; Romero-Morte, Jorge; Fernández-González, Federico; Pérez-Badia, Rosa
2016-08-01
Analysis of airborne pollen concentrations provides valuable information on plant phenology and is thus a useful tool in agriculture—for predicting harvests in crops such as the olive and for deciding when to apply phytosanitary treatments—as well as in medicine and the environmental sciences. Variations in airborne pollen concentrations, moreover, are indicators of changing plant life cycles. By modeling pollen time series, we can not only identify the variables influencing pollen levels but also predict future pollen concentrations. In this study, airborne pollen time series were modeled using a seasonal-trend decomposition procedure based on LOcally wEighted Scatterplot Smoothing (LOESS) smoothing (STL). The data series—daily Poaceae pollen concentrations over the period 2006-2014—was broken up into seasonal and residual (stochastic) components. The seasonal component was compared with data on Poaceae flowering phenology obtained by field sampling. Residuals were fitted to a model generated from daily temperature and rainfall values, and daily pollen concentrations, using partial least squares regression (PLSR). This method was then applied to predict daily pollen concentrations for 2014 (independent validation data) using results for the seasonal component of the time series and estimates of the residual component for the period 2006-2013. Correlation between predicted and observed values was r = 0.79 (correlation coefficient) for the pre-peak period (i.e., the period prior to the peak pollen concentration) and r = 0.63 for the post-peak period. Separate analysis of each of the components of the pollen data series enables the sources of variability to be identified more accurately than by analysis of the original non-decomposed data series, and for this reason, this procedure has proved to be a suitable technique for analyzing the main environmental factors influencing airborne pollen concentrations.
Modeling pollen time series using seasonal-trend decomposition procedure based on LOESS smoothing
Rojo, Jesús; Rivero, Rosario; Romero-Morte, Jorge; Fernández-González, Federico; Pérez-Badia, Rosa
2017-02-01
Analysis of airborne pollen concentrations provides valuable information on plant phenology and is thus a useful tool in agriculture—for predicting harvests in crops such as the olive and for deciding when to apply phytosanitary treatments—as well as in medicine and the environmental sciences. Variations in airborne pollen concentrations, moreover, are indicators of changing plant life cycles. By modeling pollen time series, we can not only identify the variables influencing pollen levels but also predict future pollen concentrations. In this study, airborne pollen time series were modeled using a seasonal-trend decomposition procedure based on LOcally wEighted Scatterplot Smoothing (LOESS) smoothing (STL). The data series—daily Poaceae pollen concentrations over the period 2006-2014—was broken up into seasonal and residual (stochastic) components. The seasonal component was compared with data on Poaceae flowering phenology obtained by field sampling. Residuals were fitted to a model generated from daily temperature and rainfall values, and daily pollen concentrations, using partial least squares regression (PLSR). This method was then applied to predict daily pollen concentrations for 2014 (independent validation data) using results for the seasonal component of the time series and estimates of the residual component for the period 2006-2013. Correlation between predicted and observed values was r = 0.79 (correlation coefficient) for the pre-peak period (i.e., the period prior to the peak pollen concentration) and r = 0.63 for the post-peak period. Separate analysis of each of the components of the pollen data series enables the sources of variability to be identified more accurately than by analysis of the original non-decomposed data series, and for this reason, this procedure has proved to be a suitable technique for analyzing the main environmental factors influencing airborne pollen concentrations.
Kahane, Leo H
2007-01-01
Using a friendly, nontechnical approach, the Second Edition of Regression Basics introduces readers to the fundamentals of regression. Accessible to anyone with an introductory statistics background, this book builds from a simple two-variable model to a model of greater complexity. Author Leo H. Kahane weaves four engaging examples throughout the text to illustrate not only the techniques of regression but also how this empirical tool can be applied in creative ways to consider a broad array of topics. New to the Second Edition Offers greater coverage of simple panel-data estimation:
Deep ensemble learning of sparse regression models for brain disease diagnosis.
Suk, Heung-Il; Lee, Seong-Whan; Shen, Dinggang
2017-04-01
Recent studies on brain imaging analysis witnessed the core roles of machine learning techniques in computer-assisted intervention for brain disease diagnosis. Of various machine-learning techniques, sparse regression models have proved their effectiveness in handling high-dimensional data but with a small number of training samples, especially in medical problems. In the meantime, deep learning methods have been making great successes by outperforming the state-of-the-art performances in various applications. In this paper, we propose a novel framework that combines the two conceptually different methods of sparse regression and deep learning for Alzheimer's disease/mild cognitive impairment diagnosis and prognosis. Specifically, we first train multiple sparse regression models, each of which is trained with different values of a regularization control parameter. Thus, our multiple sparse regression models potentially select different feature subsets from the original feature set; thereby they have different powers to predict the response values, i.e., clinical label and clinical scores in our work. By regarding the response values from our sparse regression models as target-level representations, we then build a deep convolutional neural network for clinical decision making, which thus we call 'Deep Ensemble Sparse Regression Network.' To our best knowledge, this is the first work that combines sparse regression models with deep neural network. In our experiments with the ADNI cohort, we validated the effectiveness of the proposed method by achieving the highest diagnostic accuracies in three classification tasks. We also rigorously analyzed our results and compared with the previous studies on the ADNI cohort in the literature.
Kleijnen, J.P.C.
1995-01-01
This tutorial discusses what-if analysis and optimization of System Dynamics models. These problems are solved, using the statistical techniques of regression analysis and design of experiments (DOE). These issues are illustrated by applying the statistical techniques to a System Dynamics model for
de Vries, S O; Fidler, Vaclav; Kuipers, Wietze D; Hunink, Maria G M
1998-01-01
The purpose of this study was to develop a model that predicts the outcome of supervised exercise for intermittent claudication. The authors present an example of the use of autoregressive logistic regression for modeling observed longitudinal data. Data were collected from 329 participants in a six
A Percentile Regression Model for the Number of Errors in Group Conversation Tests.
Liski, Erkki P.; Puntanen, Simo
A statistical model is presented for analyzing the results of group conversation tests in English, developed in a Finnish university study from 1977 to 1981. The model is illustrated with the findings from the study. In this study, estimates of percentile curves for the number of errors are of greater interest than the mean regression line. It was…
Random regression models in the evaluation of the growth curve of Simbrasil beef cattle
Mota, M.; Marques, F.A.; Lopes, P.S.; Hidalgo, A.M.
2013-01-01
Random regression models were used to estimate the types and orders of random effects of (co)variance functions in the description of the growth trajectory of the Simbrasil cattle breed. Records for 7049 animals totaling 18,677 individual weighings were submitted to 15 models from the third to the
Sample Size Determination for Regression Models Using Monte Carlo Methods in R
Beaujean, A. Alexander
2014-01-01
A common question asked by researchers using regression models is, What sample size is needed for my study? While there are formulae to estimate sample sizes, their assumptions are often not met in the collected data. A more realistic approach to sample size determination requires more information such as the model of interest, strength of the…
Random regression models in the evaluation of the growth curve of Simbrasil beef cattle
Mota, M.; Marques, F.A.; Lopes, P.S.; Hidalgo, A.M.
2013-01-01
Random regression models were used to estimate the types and orders of random effects of (co)variance functions in the description of the growth trajectory of the Simbrasil cattle breed. Records for 7049 animals totaling 18,677 individual weighings were submitted to 15 models from the third to the f
Logistic regression models of factors influencing the location of bioenergy and biofuels plants
T.M. Young; R.L. Zaretzki; J.H. Perdue; F.M. Guess; X. Liu
2011-01-01
Logistic regression models were developed to identify significant factors that influence the location of existing wood-using bioenergy/biofuels plants and traditional wood-using facilities. Logistic models provided quantitative insight for variables influencing the location of woody biomass-using facilities. Availability of "thinnings to a basal area of 31.7m2/ha...
Ahmet DEMIR
2015-07-01
Full Text Available Artificial neural network models have been already used on many different fields successfully. However, many researches show that ANN models provide better optimum results than other competitive models in most of the researches. But does it provide optimum solutions in case ANN is proposed as hybrid model? The answer of this question is given in this research by using these models on modelling a forecast for GDP growth of Japan. Multiple regression models utilized as competitive models versus hybrid ANN (ANN + multiple regression models. Results have shown that hybrid model gives better responds than multiple regression models. However, variables, which were significantly affecting GDP growth, were determined and some of the variables, which were assumed to be affecting GDP growth of Japan, were eliminated statistically.
Longitudinal beta regression models for analyzing health-related quality of life scores over time
Hunger Matthias
2012-09-01
Full Text Available Abstract Background Health-related quality of life (HRQL has become an increasingly important outcome parameter in clinical trials and epidemiological research. HRQL scores are typically bounded at both ends of the scale and often highly skewed. Several regression techniques have been proposed to model such data in cross-sectional studies, however, methods applicable in longitudinal research are less well researched. This study examined the use of beta regression models for analyzing longitudinal HRQL data using two empirical examples with distributional features typically encountered in practice. Methods We used SF-6D utility data from a German older age cohort study and stroke-specific HRQL data from a randomized controlled trial. We described the conceptual differences between mixed and marginal beta regression models and compared both models to the commonly used linear mixed model in terms of overall fit and predictive accuracy. Results At any measurement time, the beta distribution fitted the SF-6D utility data and stroke-specific HRQL data better than the normal distribution. The mixed beta model showed better likelihood-based fit statistics than the linear mixed model and respected the boundedness of the outcome variable. However, it tended to underestimate the true mean at the upper part of the distribution. Adjusted group means from marginal beta model and linear mixed model were nearly identical but differences could be observed with respect to standard errors. Conclusions Understanding the conceptual differences between mixed and marginal beta regression models is important for their proper use in the analysis of longitudinal HRQL data. Beta regression fits the typical distribution of HRQL data better than linear mixed models, however, if focus is on estimating group mean scores rather than making individual predictions, the two methods might not differ substantially.
Time-varying parameter auto-regressive models for autocovariance nonstationary time series
FEI WanChun; BAI Lun
2009-01-01
In this paper,autocovariance nonstationary time series is clearly defined on a family of time series.We propose three types of TVPAR (time-varying parameter auto-regressive) models:the full order TVPAR model,the time-unvarying order TVPAR model and the time-varying order TVPAR model for autocovariance nonstationary time series.Related minimum AIC (Akaike information criterion) estimations are carried out.
Time-varying parameter auto-regressive models for autocovariance nonstationary time series
无
2009-01-01
In this paper, autocovariance nonstationary time series is clearly defined on a family of time series. We propose three types of TVPAR (time-varying parameter auto-regressive) models: the full order TVPAR model, the time-unvarying order TVPAR model and the time-varying order TV-PAR model for autocovariance nonstationary time series. Related minimum AIC (Akaike information criterion) estimations are carried out.
Suzuki, Makoto; Sugimura, Yuko; Yamada, Sumio; Omori, Yoshitsugu; Miyamoto, Masaaki; Yamamoto, Jun-ichi
2013-01-01
Cognitive disorders in the acute stage of stroke are common and are important independent predictors of adverse outcome in the long term. Despite the impact of cognitive disorders on both patients and their families, it is still difficult to predict the extent or duration of cognitive impairments. The objective of the present study was, therefore, to provide data on predicting the recovery of cognitive function soon after stroke by differential modeling with logarithmic and linear regression. This study included two rounds of data collection comprising 57 stroke patients enrolled in the first round for the purpose of identifying the time course of cognitive recovery in the early-phase group data, and 43 stroke patients in the second round for the purpose of ensuring that the correlation of the early-phase group data applied to the prediction of each individual's degree of cognitive recovery. In the first round, Mini-Mental State Examination (MMSE) scores were assessed 3 times during hospitalization, and the scores were regressed on the logarithm and linear of time. In the second round, calculations of MMSE scores were made for the first two scoring times after admission to tailor the structures of logarithmic and linear regression formulae to fit an individual's degree of functional recovery. The time course of early-phase recovery for cognitive functions resembled both logarithmic and linear functions. However, MMSE scores sampled at two baseline points based on logarithmic regression modeling could estimate prediction of cognitive recovery more accurately than could linear regression modeling (logarithmic modeling, R(2) = 0.676, Plinear regression modeling, R(2) = 0.598, P<0.0001). Logarithmic modeling based on MMSE scores could accurately predict the recovery of cognitive function soon after the occurrence of stroke. This logarithmic modeling with mathematical procedures is simple enough to be adopted in daily clinical practice.
Makoto Suzuki
Full Text Available Cognitive disorders in the acute stage of stroke are common and are important independent predictors of adverse outcome in the long term. Despite the impact of cognitive disorders on both patients and their families, it is still difficult to predict the extent or duration of cognitive impairments. The objective of the present study was, therefore, to provide data on predicting the recovery of cognitive function soon after stroke by differential modeling with logarithmic and linear regression. This study included two rounds of data collection comprising 57 stroke patients enrolled in the first round for the purpose of identifying the time course of cognitive recovery in the early-phase group data, and 43 stroke patients in the second round for the purpose of ensuring that the correlation of the early-phase group data applied to the prediction of each individual's degree of cognitive recovery. In the first round, Mini-Mental State Examination (MMSE scores were assessed 3 times during hospitalization, and the scores were regressed on the logarithm and linear of time. In the second round, calculations of MMSE scores were made for the first two scoring times after admission to tailor the structures of logarithmic and linear regression formulae to fit an individual's degree of functional recovery. The time course of early-phase recovery for cognitive functions resembled both logarithmic and linear functions. However, MMSE scores sampled at two baseline points based on logarithmic regression modeling could estimate prediction of cognitive recovery more accurately than could linear regression modeling (logarithmic modeling, R(2 = 0.676, P<0.0001; linear regression modeling, R(2 = 0.598, P<0.0001. Logarithmic modeling based on MMSE scores could accurately predict the recovery of cognitive function soon after the occurrence of stroke. This logarithmic modeling with mathematical procedures is simple enough to be adopted in daily clinical practice.
Bracegirdle, Thomas J. [British Antarctic Survey, Cambridge (United Kingdom); Stephenson, David B. [University of Exeter, Mathematics Research Institute, Exeter (United Kingdom); NCAS-Climate, Reading (United Kingdom)
2012-12-15
This study presents projections of twenty-first century wintertime surface temperature changes over the high-latitude regions based on the third Coupled Model Inter-comparison Project (CMIP3) multi-model ensemble. The state-dependence of the climate change response on the present day mean state is captured using a simple yet robust ensemble linear regression model. The ensemble regression approach gives different and more precise estimated mean responses compared to the ensemble mean approach. Over the Arctic in January, ensemble regression gives less warming than the ensemble mean along the boundary between sea ice and open ocean (sea ice edge). Most notably, the results show 3 C less warming over the Barents Sea ({proportional_to} 7 C compared to {proportional_to} 10 C). In addition, the ensemble regression method gives projections that are 30 % more precise over the Sea of Okhostk, Bering Sea and Labrador Sea. For the Antarctic in winter (July) the ensemble regression method gives 2 C more warming over the Southern Ocean close to the Greenwich Meridian ({proportional_to} 7 C compared to {proportional_to} 5 C). Projection uncertainty was almost half that of the ensemble mean uncertainty over the Southern Ocean between 30 W to 90 E and 30 % less over the northern Antarctic Peninsula. The ensemble regression model avoids the need for explicit ad hoc weighting of models and exploits the whole ensemble to objectively identify overly influential outlier models. Bootstrap resampling shows that maximum precision over the Southern Ocean can be obtained with ensembles having as few as only six climate models. (orig.)
A Robbins-Monro procedure for estimation in semiparametric regression models
Bercu, Bernard
2011-01-01
This paper is devoted to the parametric estimation of a shift together with the nonparametric estimation of a regression function in a semiparametric regression model. We implement a Robbins-Monro procedure very efficient and easy to handle. On the one hand, we propose a stochastic algorithm similar to that of Robbins-Monro in order to estimate the shift parameter. A preliminary evaluation of the regression function is not necessary for estimating the shift parameter. On the other hand, we make use of a recursive Nadaraya-Watson estimator for the estimation of the regression function. This kernel estimator takes in account the previous estimation of the shift parameter. We establish the almost sure convergence for both Robbins-Monro and Nadaraya-Watson estimators. The asymptotic normality of our estimates is also provided.
Testing and Modeling Fuel Regression Rate in a Miniature Hybrid Burner
Luciano Fanton
2012-01-01
Full Text Available Ballistic characterization of an extended group of innovative HTPB-based solid fuel formulations for hybrid rocket propulsion was performed in a lab-scale burner. An optical time-resolved technique was used to assess the quasisteady regression history of single perforation, cylindrical samples. The effects of metalized additives and radiant heat transfer on the regression rate of such formulations were assessed. Under the investigated operating conditions and based on phenomenological models from the literature, analyses of the collected experimental data show an appreciable influence of the radiant heat flux from burnt gases and soot for both unloaded and loaded fuel formulations. Pure HTPB regression rate data are satisfactorily reproduced, while the impressive initial regression rates of metalized formulations require further assessment.
SPSS macros to compare any two fitted values from a regression model.
Weaver, Bruce; Dubois, Sacha
2012-12-01
In regression models with first-order terms only, the coefficient for a given variable is typically interpreted as the change in the fitted value of Y for a one-unit increase in that variable, with all other variables held constant. Therefore, each regression coefficient represents the difference between two fitted values of Y. But the coefficients represent only a fraction of the possible fitted value comparisons that might be of interest to researchers. For many fitted value comparisons that are not captured by any of the regression coefficients, common statistical software packages do not provide the standard errors needed to compute confidence intervals or carry out statistical tests-particularly in more complex models that include interactions, polynomial terms, or regression splines. We describe two SPSS macros that implement a matrix algebra method for comparing any two fitted values from a regression model. The !OLScomp and !MLEcomp macros are for use with models fitted via ordinary least squares and maximum likelihood estimation, respectively. The output from the macros includes the standard error of the difference between the two fitted values, a 95% confidence interval for the difference, and a corresponding statistical test with its p-value.
Jedynska, Aleksandra; Hoek, Gerard; Wang, Meng; Yang, Aileen; Eeftens, Marloes; Cyrys, Josef; Keuken, Menno; Ampe, Christophe; Beelen, Rob; Cesaroni, Giulia; Forastiere, Francesco; Cirach, Marta; de Hoogh, Kees; De Nazelle, Audrey; Nystad, Wenche; Akhlaghi, Helgah Makarem; Declercq, Christophe; Stempfelet, Morgane; Eriksen, Kirsten T.; Dimakopoulou, Konstantina; Lanki, Timo; Meliefste, Kees; Nieuwenhuijsen, Mark; Yli-Tuomi, Tarja; Raaschou-Nielsen, Ole; Janssen, Nicole A. H.; Brunekreef, Bert; Kooter, Ingeborg M.
2017-02-01
Oxidative potential (OP) has been suggested as a health-relevant measure of air pollution. Little information is available about OP spatial variation and the possibility to model its spatial variability. Our aim was to measure the spatial variation of OP within and between 10 European study areas. The second aim was to develop land use regression (LUR) models to explain the measured spatial variation. OP was determined with the dithiothreitol (DTT) assay in ten European study areas. DTT of PM2.5 was measured at 16-40 sites per study area, divided over street, urban and regional background sites. Three two-week samples were taken per site in a one-year period in three different seasons. We developed study-area specific LUR models and a LUR model for all study areas combined to explain the spatial variation of OP. Significant contrasts between study areas in OP were found. OP DTT levels were highest in southern Europe. DTT levels at street sites were on average 1.10 times higher than at urban background locations. In 5 of the 10 study areas LUR models could be developed with a median R2 of 33%. A combined study area model explained 30% of the measured spatial variability. Overall, LUR models did not explain spatial variation well, possibly due to low levels of OP DTT and a lack of specific predictor variables.
LINEAR REGRESSION MODEL ESTİMATİON FOR RIGHT CENSORED DATA
Ersin Yılmaz
2016-05-01
Full Text Available In this study, firstly we will define a right censored data. If we say shortly right-censored data is censoring values that above the exact line. This may be related with scaling device. And then we will use response variable acquainted from right-censored explanatory variables. Then the linear regression model will be estimated. For censored data’s existence, Kaplan-Meier weights will be used for the estimation of the model. With the weights regression model will be consistent and unbiased with that. And also there is a method for the censored data that is a semi parametric regression and this method also give useful results for censored data too. This study also might be useful for the health studies because of the censored data used in medical issues generally.
A general framework for the use of logistic regression models in meta-analysis.
Simmonds, Mark C; Higgins, Julian Pt
2016-12-01
Where individual participant data are available for every randomised trial in a meta-analysis of dichotomous event outcomes, "one-stage" random-effects logistic regression models have been proposed as a way to analyse these data. Such models can also be used even when individual participant data are not available and we have only summary contingency table data. One benefit of this one-stage regression model over conventional meta-analysis methods is that it maximises the correct binomial likelihood for the data and so does not require the common assumption that effect estimates are normally distributed. A second benefit of using this model is that it may be applied, with only minor modification, in a range of meta-analytic scenarios, including meta-regression, network meta-analyses and meta-analyses of diagnostic test accuracy. This single model can potentially replace the variety of often complex methods used in these areas. This paper considers, with a range of meta-analysis examples, how random-effects logistic regression models may be used in a number of different types of meta-analyses. This one-stage approach is compared with widely used meta-analysis methods including Bayesian network meta-analysis and the bivariate and hierarchical summary receiver operating characteristic (ROC) models for meta-analyses of diagnostic test accuracy.
Construction of risk prediction model of type 2 diabetes mellitus based on logistic regression
Li Jian
2017-01-01
Full Text Available Objective: to construct multi factor prediction model for the individual risk of T2DM, and to explore new ideas for early warning, prevention and personalized health services for T2DM. Methods: using logistic regression techniques to screen the risk factors for T2DM and construct the risk prediction model of T2DM. Results: Male’s risk prediction model logistic regression equation: logit(P=BMI × 0.735+ vegetables × (−0.671 + age × 0.838+ diastolic pressure × 0.296+ physical activity× (−2.287 + sleep ×(−0.009 +smoking ×0.214; Female’s risk prediction model logistic regression equation: logit(P=BMI ×1.979+ vegetables× (−0.292 + age × 1.355+ diastolic pressure× 0.522+ physical activity × (−2.287 + sleep × (−0.010.The area under the ROC curve of male was 0.83, the sensitivity was 0.72, the specificity was 0.86, the area under the ROC curve of female was 0.84, the sensitivity was 0.75, the specificity was 0.90. Conclusion: This study model data is from a compared study of nested case, the risk prediction model has been established by using the more mature logistic regression techniques, and the model is higher predictive sensitivity, specificity and stability.
Validation of a regression model for standardizing lifetime racing performances of thoroughbreds.
Martin, G S; Strand, E; Kearney, M T
1997-06-01
To determine the relationship between prediction errors of a regression model of racing finish times and earnings or finish position; the relationship between standardized finish times, determined by use of this model, and earnings or finish position; and whether this model was valid when applied to data for horses that underwent surgical treatment. Survey. Records of 6,700 healthy Thoroughbreds racing in Louisiana and of 31 Thoroughbreds with idiopathic left laryngeal hemiplegia that underwent surgical treatment. Predicted and standardized finish times were calculated by use of the regression model for healthy horses, and the relationships between prediction error (actual--predicted finish time) and standardized finish times, and earnings and finish position, were examined. Then, the regression model was applied to data for horses with hemiplegia to determine whether the model was valid when used to calculate predicted and standardized finish times for lifetime performance data. Prediction error and standardized finish times were negatively correlated with earnings and positively correlated with finish position and, thus, appeared to be reliable measures of racing performance. The regression model was found to be valid when applied to lifetime performance records of horses with laryngeal hemiplegia. Prediction error and standardized finish times are measures of racing performance that can be used to compare performances among Thoroughbred racehorses across a variety of circumstances that would otherwise confound comparison.
Gang WU
2016-01-01
Full Text Available Objective To analyze the risk factors for prognosis in intracerebral hemorrhage using decision tree (classification and regression tree, CART model and logistic regression model. Methods CART model and logistic regression model were established according to the risk factors for prognosis of patients with cerebral hemorrhage. The differences in the results were compared between the two methods. Results Logistic regression analyses showed that hematoma volume (OR-value 0.953, initial Glasgow Coma Scale (GCS score (OR-value 1.210, pulmonary infection (OR-value 0.295, and basal ganglia hemorrhage (OR-value 0.336 were the risk factors for the prognosis of cerebral hemorrhage. The results of CART analysis showed that volume of hematoma and initial GCS score were the main factors affecting the prognosis of cerebral hemorrhage. The effects of two models on the prognosis of cerebral hemorrhage were similar (Z-value 0.402, P=0.688. Conclusions CART model has a similar value to that of logistic model in judging the prognosis of cerebral hemorrhage, and it is characterized by using transactional analysis between the risk factors, and it is more intuitive. DOI: 10.11855/j.issn.0577-7402.2015.12.13
Modeling Phosphorous Losses from Seasonal Manure Application Schemes
Menzies, E.; Walter, M. T.
2015-12-01
Excess nutrient loading, especially nitrogen and phosphorus, to surface waters is a common and significant problem throughout the United States. While pollution remediation efforts are continuously improving, the most effective treatment remains to limit the source. Appropriate timing of fertilizer application to reduce nutrient losses is currently a hotly debated topic in the Northeastern United States; winter spreading of manure is under special scrutiny. We plan to evaluate the loss of phosphorous to surface waters from agricultural systems under varying seasonal fertilization schemes in an effort to determine the impacts of fertilizers applied throughout the year. The Cayuga Lake basin, located in the Finger Lakes region of New York State, is a watershed dominated by agriculture where a wide array of land management strategies can be found. The evaluation will be conducted on the Fall Creek Watershed, a large sub basin in the Cayuga Lake Watershed. The Fall Creek Watershed covers approximately 33,000 ha in central New York State with approximately 50% of this land being used for agriculture. We plan to use the Soil and Water Assessment Tool (SWAT) to model a number of seasonal fertilization regimes such as summer only spreading and year round spreading (including winter applications), as well as others. We will use the model to quantify the phosphorous load to surface waters from these different fertilization schemes and determine the impacts of manure applied at different times throughout the year. More detailed knowledge about how seasonal fertilization schemes impact phosphorous losses will provide more information to stakeholders concerning the impacts of agriculture on surface water quality. Our results will help farmers and extensionists make more informed decisions about appropriate timing of manure application for reduced phosphorous losses and surface water degradation as well as aid law makers in improving policy surrounding manure application.
Linear regression models of floor surface parameters on friction between Neolite and quarry tiles.
Chang, Wen-Ruey; Matz, Simon; Grönqvist, Raoul; Hirvonen, Mikko
2010-01-01
For slips and falls, friction is widely used as an indicator of surface slipperiness. Surface parameters, including surface roughness and waviness, were shown to influence friction by correlating individual surface parameters with the measured friction. A collective input from multiple surface parameters as a predictor of friction, however, could provide a broader perspective on the contributions from all the surface parameters evaluated. The objective of this study was to develop regression models between the surface parameters and measured friction. The dynamic friction was measured using three different mixtures of glycerol and water as contaminants. Various surface roughness and waviness parameters were measured using three different cut-off lengths. The regression models indicate that the selected surface parameters can predict the measured friction coefficient reliably in most of the glycerol concentrations and cut-off lengths evaluated. The results of the regression models were, in general, consistent with those obtained from the correlation between individual surface parameters and the measured friction in eight out of nine conditions evaluated in this experiment. A hierarchical regression model was further developed to evaluate the cumulative contributions of the surface parameters in the final iteration by adding these parameters to the regression model one at a time from the easiest to measure to the most difficult to measure and evaluating their impacts on the adjusted R(2) values. For practical purposes, the surface parameter R(a) alone would account for the majority of the measured friction even if it did not reach a statistically significant level in some of the regression models.
Improved sub-seasonal meteorological forecast skill using weighted multi-model ensemble simulations
Wanders, Niko|info:eu-repo/dai/nl/364253940; Wood, Eric F.
2016-01-01
Sub-seasonal to seasonal weather and hydrological forecasts have the potential to provide vital information for a variety of water-related decision makers. Here, we investigate the skill of four sub-seasonal forecast models from phase-2 of the North American Multi-Model Ensemble using reforecasts
Blind identification of threshold auto-regressive model for machine fault diagnosis
LI Zhinong; HE Yongyong; CHU Fulei; WU Zhaotong
2007-01-01
A blind identification method was developed for the threshold auto-regressive (TAR) model. The method had good identification accuracy and rapid convergence, especially for higher order systems. The proposed method was then combined with the hidden Markov model (HMM) to determine the auto-regressive (AR) coefficients for each interval used for feature extraction, with the HMM as a classifier. The fault diagnoses during the speed-up and speed- down processes for rotating machinery have been success- fully completed. The result of the experiment shows that the proposed method is practical and effective.
Methods and applications of linear models regression and the analysis of variance
Hocking, Ronald R
2013-01-01
Praise for the Second Edition"An essential desktop reference book . . . it should definitely be on your bookshelf." -Technometrics A thoroughly updated book, Methods and Applications of Linear Models: Regression and the Analysis of Variance, Third Edition features innovative approaches to understanding and working with models and theory of linear regression. The Third Edition provides readers with the necessary theoretical concepts, which are presented using intuitive ideas rather than complicated proofs, to describe the inference that is appropriate for the methods being discussed. The book
Analysis of Multivariate Experimental Data Using A Simplified Regression Model Search Algorithm
Ulbrich, Norbert Manfred
2013-01-01
A new regression model search algorithm was developed in 2011 that may be used to analyze both general multivariate experimental data sets and wind tunnel strain-gage balance calibration data. The new algorithm is a simplified version of a more complex search algorithm that was originally developed at the NASA Ames Balance Calibration Laboratory. The new algorithm has the advantage that it needs only about one tenth of the original algorithm's CPU time for the completion of a search. In addition, extensive testing showed that the prediction accuracy of math models obtained from the simplified algorithm is similar to the prediction accuracy of math models obtained from the original algorithm. The simplified algorithm, however, cannot guarantee that search constraints related to a set of statistical quality requirements are always satisfied in the optimized regression models. Therefore, the simplified search algorithm is not intended to replace the original search algorithm. Instead, it may be used to generate an alternate optimized regression model of experimental data whenever the application of the original search algorithm either fails or requires too much CPU time. Data from a machine calibration of NASA's MK40 force balance is used to illustrate the application of the new regression model search algorithm.
Accounting for spatial effects in land use regression for urban air pollution modeling.
Bertazzon, Stefania; Johnson, Markey; Eccles, Kristin; Kaplan, Gilaad G
2015-01-01
In order to accurately assess air pollution risks, health studies require spatially resolved pollution concentrations. Land-use regression (LUR) models estimate ambient concentrations at a fine spatial scale. However, spatial effects such as spatial non-stationarity and spatial autocorrelation can reduce the accuracy of LUR estimates by increasing regression errors and uncertainty; and statistical methods for resolving these effects--e.g., spatially autoregressive (SAR) and geographically weighted regression (GWR) models--may be difficult to apply simultaneously. We used an alternate approach to address spatial non-stationarity and spatial autocorrelation in LUR models for nitrogen dioxide. Traditional models were re-specified to include a variable capturing wind speed and direction, and re-fit as GWR models. Mean R(2) values for the resulting GWR-wind models (summer: 0.86, winter: 0.73) showed a 10-20% improvement over traditional LUR models. GWR-wind models effectively addressed both spatial effects and produced meaningful predictive models. These results suggest a useful method for improving spatially explicit models.
Dons, Evi; Van Poppel, Martine; Int Panis, Luc; De Prins, Sofie; Berghmans, Patrick; Koppen, Gudrun; Matheeussen, Christine
2014-04-01
In the HEAPS (Health Effects of Air Pollution in Antwerp Schools) study the importance of traffic-related air pollution on the school and home location on children's health was assessed. 130 children (aged 6 to 12) from two schools participated in a biomonitoring study measuring oxidative stress, inflammation and cardiovascular markers. Personal exposure of schoolchildren to black carbon (BC) and nitrogen dioxide (NO2) was assessed using both measured and modeled concentrations. Air quality measurements were done in two seasons at approximately 50 locations, including the schools. The land use regression technique was applied to model concentrations at the children's home address and at the schools. In this paper the results of the exposure analysis are given. Concentrations measured at school 2h before the medical examination were used for assessing health effects of short term exposure. Over two seasons, this short term BC exposure ranged from 514 ng/m(3) to 6285 ng/m(3), and for NO2 from 11 μg/m(3) to 36 μg/m(3). An integrated exposure was determined until 10 days before the child's examination, taking into account exposures at home and at school and the time spent in each of these microenvironments. Land use regression estimates were therefore recalculated into daily concentrations by using the temporal trend observed at a fixed monitor of the official air quality network. Concentrations at the children's homes were modeled to estimate long term exposure (from 1457 ng/m(3) to 3874 ng/m(3) for BC; and from 19 μg/m(3) to 51 μg/m(3) for NO2). The land use regression technique proved to be a fast and accurate means for estimating long term and daily BC and NO2 exposure for children living in the Antwerp area. The spatial and temporal resolution was tailored to the needs of the epidemiologists involved in this study. Copyright © 2014 Elsevier B.V. All rights reserved.
无
2002-01-01
The thermal induced errors can account for as much as 70% of the dimensional errors on a workpiece. Accurate modeling of errors is an essential part of error compensation. Base on analyzing the existing approaches of the thermal error modeling for machine tools, a new approach of regression orthogonal design is proposed, which combines the statistic theory with machine structures, surrounding condition, engineering judgements, and experience in modeling. A whole computation and analysis procedure is given. ...
Stahel-Donoho kernel estimation for fixed design nonparametric regression models
LIN; Lu
2006-01-01
This paper reports a robust kernel estimation for fixed design nonparametric regression models.A Stahel-Donoho kernel estimation is introduced,in which the weight functions depend on both the depths of data and the distances between the design points and the estimation points.Based on a local approximation,a computational technique is given to approximate to the incomputable depths of the errors.As a result the new estimator is computationally efficient.The proposed estimator attains a high breakdown point and has perfect asymptotic behaviors such as the asymptotic normality and convergence in the mean squared error.Unlike the depth-weighted estimator for parametric regression models,this depth-weighted nonparametric estimator has a simple variance structure and then we can compare its efficiency with the original one.Some simulations show that the new method can smooth the regression estimation and achieve some desirable balances between robustness and efficiency.
Bayesian Bandwidth Selection for a Nonparametric Regression Model with Mixed Types of Regressors
Xibin Zhang
2016-04-01
Full Text Available This paper develops a sampling algorithm for bandwidth estimation in a nonparametric regression model with continuous and discrete regressors under an unknown error density. The error density is approximated by the kernel density estimator of the unobserved errors, while the regression function is estimated using the Nadaraya-Watson estimator admitting continuous and discrete regressors. We derive an approximate likelihood and posterior for bandwidth parameters, followed by a sampling algorithm. Simulation results show that the proposed approach typically leads to better accuracy of the resulting estimates than cross-validation, particularly for smaller sample sizes. This bandwidth estimation approach is applied to nonparametric regression model of the Australian All Ordinaries returns and the kernel density estimation of gross domestic product (GDP growth rates among the organisation for economic co-operation and development (OECD and non-OECD countries.
Wun Wong
2003-01-01
Full Text Available The assessment of medical outcomes is important in the effort to contain costs, streamline patient management, and codify medical practices. As such, it is necessary to develop predictive models that will make accurate predictions of these outcomes. The neural network methodology has often been shown to perform as well, if not better, than the logistic regression methodology in terms of sample predictive performance. However, the logistic regression method is capable of providing an explanation regarding the relationship(s between variables. This explanation is often crucial to understanding the clinical underpinnings of the disease process. Given the respective strengths of the methodologies in question, the combined use of a statistical (i.e., logistic regression and machine learning (i.e., neural network technology in the classification of medical outcomes is warranted under appropriate conditions. The study discusses these conditions and describes an approach for combining the strengths of the models.
Su, Liyun; Zhao, Yanyong; Yan, Tianshun; Li, Fenglan
2012-01-01
Multivariate local polynomial fitting is applied to the multivariate linear heteroscedastic regression model. Firstly, the local polynomial fitting is applied to estimate heteroscedastic function, then the coefficients of regression model are obtained by using generalized least squares method. One noteworthy feature of our approach is that we avoid the testing for heteroscedasticity by improving the traditional two-stage method. Due to non-parametric technique of local polynomial estimation, it is unnecessary to know the form of heteroscedastic function. Therefore, we can improve the estimation precision, when the heteroscedastic function is unknown. Furthermore, we verify that the regression coefficients is asymptotic normal based on numerical simulations and normal Q-Q plots of residuals. Finally, the simulation results and the local polynomial estimation of real data indicate that our approach is surely effective in finite-sample situations.
Suhartono Suhartono
2005-01-01
Full Text Available Many business and economic time series are non-stationary time series that contain trend and seasonal variations. Seasonality is a periodic and recurrent pattern caused by factors such as weather, holidays, or repeating promotions. A stochastic trend is often accompanied with the seasonal variations and can have a significant impact on various forecasting methods. In this paper, we will investigate and compare some forecasting methods for modeling time series with both trend and seasonal patterns. These methods are Winter's, Decomposition, Time Series Regression, ARIMA and Neural Networks models. In this empirical research, we study on the effectiveness of the forecasting performance, particularly to answer whether a complex method always give a better forecast than a simpler method. We use a real data, that is airline passenger data. The result shows that the more complex model does not always yield a better result than a simpler one. Additionally, we also find the possibility to do further research especially the use of hybrid model by combining some forecasting method to get better forecast, for example combination between decomposition (as data preprocessing and neural network model.
Replica analysis of overfitting in regression models for time-to-event data
Coolen, A. C. C.; Barrett, J. E.; Paga, P.; Perez-Vicente, C. J.
2017-09-01
Overfitting, which happens when the number of parameters in a model is too large compared to the number of data points available for determining these parameters, is a serious and growing problem in survival analysis. While modern medicine presents us with data of unprecedented dimensionality, these data cannot yet be used effectively for clinical outcome prediction. Standard error measures in maximum likelihood regression, such as p-values and z-scores, are blind to overfitting, and even for Cox’s proportional hazards model (the main tool of medical statisticians), one finds in literature only rules of thumb on the number of samples required to avoid overfitting. In this paper we present a mathematical theory of overfitting in regression models for time-to-event data, which aims to increase our quantitative understanding of the problem and provide practical tools with which to correct regression outcomes for the impact of overfitting. It is based on the replica method, a statistical mechanical technique for the analysis of heterogeneous many-variable systems that has been used successfully for several decades in physics, biology, and computer science, but not yet in medical statistics. We develop the theory initially for arbitrary regression models for time-to-event data, and verify its predictions in detail for the popular Cox model.
Wen-Cheng Wang
2014-01-01
Full Text Available It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models.
Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen
2014-01-01
It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models. PMID:24574916
Logistic回归模型及其应用%Logistic regression model and its application
常振海; 刘薇
2012-01-01
为了利用Logistic模型提高多分类定性因变量的预测准确率,在二分类Logistic回归模型的基础上,对实际统计数据建立三类别的Logistic模型.采用似然比检验法对自变量的显著性进行检验,剔除了不显著的变量;对每个类别的因变量都确定了1个线性回归函数,并进行了模型检验.分析结果表明,在处理因变量为定性变量的回归分析中,Logistic模型具有很好的预测准确度和实用推广性.%To improve the forecasting accuracy of the multinomial qualitative dependent variable by using logistic model,ternary logistic model is established for actual statistical data based on binary logistic regression model.The significance of independent variables is tested by using the likelihood ratio test method to remove the non-significant variable.A linear regression function is determined for each category dependent variable,and the models are tested.The analysis results show that logistic regression model has good predictive accuracy and practical promotional value in handling regression analysis of qualitative dependent variable.
Post-L1-Penalized Estimators in High-Dimensional Linear Regression Models
Belloni, Alexandre
2010-01-01
In this paper we study the post-penalized estimator which applies ordinary, unpenalized linear regression to the model selected by the first step penalized estimators, typically the LASSO. We show that post-LASSO can perform as well or nearly as well as the LASSO in terms of the rate of convergence. We show that this performance occurs even if the LASSO-based model selection "fails", in the sense of missing some components of the "true" regression model. Furthermore, post-LASSO can perform strictly better than LASSO, in the sense of a strictly faster rate of convergence, if the LASSO-based model selection correctly includes all components of the "true" model as a subset and enough sparsity is obtained. Of course, in the extreme case, when LASSO perfectly selects the true model, the past-LASSO estimator becomes the oracle estimator. We show that the results hold in both parametric and non-parametric models; and by the "true" model we mean the best $s$-dimensional approximation to the true regression model, whe...
Stigter, T. Y.; Ribeiro, L.; Dill, A. M. M. Carvalho
2008-07-01
SummaryFactorial regression models, based on correspondence analysis, are built to explain the high nitrate concentrations in groundwater beneath an agricultural area in the south of Portugal, exceeding 300 mg/l, as a function of chemical variables, electrical conductivity (EC), land use and hydrogeological setting. Two important advantages of the proposed methodology are that qualitative parameters can be involved in the regression analysis and that multicollinearity is avoided. Regression is performed on eigenvectors extracted from the data similarity matrix, the first of which clearly reveals the impact of agricultural practices and hydrogeological setting on the groundwater chemistry of the study area. Significant correlation exists between response variable NO3- and explanatory variables Ca 2+, Cl -, SO42-, depth to water, aquifer media and land use. Substituting Cl - by the EC results in the most accurate regression model for nitrate, when disregarding the four largest outliers (model A). When built solely on land use and hydrogeological setting, the regression model (model B) is less accurate but more interesting from a practical viewpoint, as it is based on easily obtainable data and can be used to predict nitrate concentrations in groundwater in other areas with similar conditions. This is particularly useful for conservative contaminants, where risk and vulnerability assessment methods, based on assumed rather than established correlations, generally produce erroneous results. Another purpose of the models can be to predict the future evolution of nitrate concentrations under influence of changes in land use or fertilization practices, which occur in compliance with policies such as the Nitrates Directive. Model B predicts a 40% decrease in nitrate concentrations in groundwater of the study area, when horticulture is replaced by other land use with much lower fertilization and irrigation rates.
Xu, Xu; McGorry, Raymond W; Lin, Jia-Hua
2014-06-01
Tissue overloading is a major contributor to shoulder musculoskeletal injuries. Previous studies attempted to use regression-based methods to predict muscle activities from shoulder kinematics and shoulder kinetics. While a regression-based method can address co-contraction of the antagonist muscles as opposed to the optimization method, most of these regression models were based on limited shoulder postures. The purpose of this study was to develop a set of regression equations to predict the 10th percentile, the median, and the 90th percentile of normalized electromyography (nEMG) activities from shoulder postures and net shoulder moments. Forty participants generated various 3-D shoulder moments at 96 static postures. The nEMG of 16 shoulder muscles was measured and the 3-D net shoulder moment was calculated using a static biomechanical model. A stepwise regression was used to derive the regression equations. The results indicated the measured range of the 3-D shoulder moment in this study was similar to those observed during work requiring light physical capacity. The r(2) of all the regression equations ranged between 0.228 and 0.818. For the median of the nEMG, the average r(2) among all 16 muscles was 0.645, and the five muscles with the greatest r(2) were the three deltoids, supraspinatus, and infraspinatus. The results can be used by practitioners to estimate the range of the shoulder muscle activities given a specific arm posture and net shoulder moment. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.
Proposal of a regressive model for the hourly diffuse solar radiation under all sky conditions
Ruiz-Arias, J.A.; Alsamamra, H.; Tovar-Pescador, J.; Pozo-Vazquez, D. [Department of Physics, Building A3-066, University of Jaen, 23071 Jaen (Spain)
2010-05-15
In this work, we propose a new regressive model for the estimation of the hourly diffuse solar irradiation under all sky conditions. This new model is based on the sigmoid function and uses the clearness index and the relative optical mass as predictors. The model performance was compared against other five regressive models using radiation data corresponding to 21 stations in the USA and Europe. In a first part, the 21 stations were grouped into seven subregions (corresponding to seven different climatic regions) and all the models were locally-fitted and evaluated using these seven datasets. Results showed that the new proposed model provides slightly better estimates. Particularly, this new model provides a relative root mean square error in the range 25-35% and a relative mean bias error in the range -15% to 15%, depending on the region. In a second part, the potential global character of the new model was evaluated. To this end, the model was fitted using the whole dataset. Results showed that the global fitting model provides overall better estimates that the locally-fitted models, with relative root mean square error values ranging 20-35% and a relative mean bias error ranging -5% to -12%. Additionally, the new proposed model showed some advantages compared to other evaluated models. Particularly, the sigmoid behaviour of this model is able to provide physically reliable estimates for extreme values of the clearness index even though using less parameter than other tested models. (author)
The limiting behavior of the estimated parameters in a misspecified random field regression model
Dahl, Christian Møller; Qin, Yu
convenient new uniform convergence results that we propose. This theory may have applications beyond those presented here. Our results indicate that classical statistical inference techniques, in general, works very well for random field regression models in finite samples and that these models succesfully......This paper examines the limiting properties of the estimated parameters in the random field regression model recently proposed by Hamilton (Econometrica, 2001). Though the model is parametric, it enjoys the flexibility of the nonparametric approach since it can approximate a large collection...... of nonlinear functions and it has the added advantage that there is no "curse of dimensionality."Contrary to existing literature on the asymptotic properties of the estimated parameters in random field models our results do not require that the explanatory variables are sampled on a grid. However...
Zhao Haijun; Ma Yan; Huang Xiaohong; Su Yujie
2008-01-01
Predicting heartbeat message arrival time is crucial for the quality of failure detection service over internet. However, internet dynamic characteristics make it very difficult to understand message behavior and accurately predict heartbeat arrival time. To solve this problem, a novel black-box model is proposed to predict the next heartbeat arrival time. Heartbeat arrival time is modeled as auto-regressive process, heartbeat sending time is modeled as exogenous variable, the model's coefficients are estimated based on the sliding window of observations and this result is used to predict the next heartbeat arrival time. Simulation shows that this adaptive auto-regressive exogenous (ARX) model can accurately capture heartbeat arrival dynamics and minimize prediction error in different network environments.
Modeling Zero – Inflated Regression of Road Accidents at Johor Federal Road F001
Prasetijo Joewono
2016-01-01
Full Text Available This study focused on the Poisson regression with excess zero outcomes on the response variable. A generalized linear modelling technique such as Poisson regression model and Negative Binomial model was found to be insignificant in explaining and handle over dispersion which due to high amount of zeros thus Zero Inflated model was introduced to overcome the problem. The application work on the number of road accidents on F001 Jalan Jb – Air Hitam. Data on road accident were collected for five-year period from 2010 through 2014. The result from analysis show that ZINB model performed best, in terms of the comparative criteria based on the P value less than 0.05.
Profile-driven regression for modeling and runtime optimization of mobile networks
McClary, Dan; Syrotiuk, Violet; Kulahci, Murat
2010-01-01
of throughput in a mobile ad hoc network, a self-organizing collection of mobile wireless nodes without any fixed infrastructure. The intermediate models generated in profile-driven regression are used to fit an overall model of throughput, and are also used to optimize controllable factors at runtime. Unlike......Computer networks often display nonlinear behavior when examined over a wide range of operating conditions. There are few strategies available for modeling such behavior and optimizing such systems as they run. Profile-driven regression is developed and applied to modeling and runtime optimization...... others, the throughput model accounts for node speed. The resulting optimization is very effective; locally optimizing the network factors at runtime results in throughput as much as six times higher than that achieved with the factors at their default levels....
APPLICATION OF PARTIAL LEAST SQUARES REGRESSION FOR AUDIO-VISUAL SPEECH PROCESSING AND MODELING
A. L. Oleinik
2015-09-01
Full Text Available Subject of Research. The paper deals with the problem of lip region image reconstruction from speech signal by means of Partial Least Squares regression. Such problems arise in connection with development of audio-visual speech processing methods. Audio-visual speech consists of acoustic and visual components (called modalities. Applications of audio-visual speech processing methods include joint modeling of voice and lips’ movement dynamics, synchronization of audio and video streams, emotion recognition, liveness detection. Method. Partial Least Squares regression was applied to solve the posed problem. This method extracts components of initial data with high covariance. These components are used to build regression model. Advantage of this approach lies in the possibility of achieving two goals: identification of latent interrelations between initial data components (e.g. speech signal and lip region image and approximation of initial data component as a function of another one. Main Results. Experimental research on reconstruction of lip region images from speech signal was carried out on VidTIMIT audio-visual speech database. Results of the experiment showed that Partial Least Squares regression is capable of solving reconstruction problem. Practical Significance. Obtained findings give the possibility to assert that Partial Least Squares regression is successfully applicable for solution of vast variety of audio-visual speech processing problems: from synchronization of audio and video streams to liveness detection.
Significance tests to determine the direction of effects in linear regression models.
Wiedermann, Wolfgang; Hagmann, Michael; von Eye, Alexander
2015-02-01
Previous studies have discussed asymmetric interpretations of the Pearson correlation coefficient and have shown that higher moments can be used to decide on the direction of dependence in the bivariate linear regression setting. The current study extends this approach by illustrating that the third moment of regression residuals may also be used to derive conclusions concerning the direction of effects. Assuming non-normally distributed variables, it is shown that the distribution of residuals of the correctly specified regression model (e.g., Y is regressed on X) is more symmetric than the distribution of residuals of the competing model (i.e., X is regressed on Y). Based on this result, 4 one-sample tests are discussed which can be used to decide which variable is more likely to be the response and which one is more likely to be the explanatory variable. A fifth significance test is proposed based on the differences of skewness estimates, which leads to a more direct test of a hypothesis that is compatible with direction of dependence. A Monte Carlo simulation study was performed to examine the behaviour of the procedures under various degrees of associations, sample sizes, and distributional properties of the underlying population. An empirical example is given which illustrates the application of the tests in practice.
Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P
2015-01-01
This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of
Anke Hüls
2017-05-01
Full Text Available Antimicrobial resistance in livestock is a matter of general concern. To develop hygiene measures and methods for resistance prevention and control, epidemiological studies on a population level are needed to detect factors associated with antimicrobial resistance in livestock holdings. In general, regression models are used to describe these relationships between environmental factors and resistance outcome. Besides the study design, the correlation structures of the different outcomes of antibiotic resistance and structural zero measurements on the resistance outcome as well as on the exposure side are challenges for the epidemiological model building process. The use of appropriate regression models that acknowledge these complexities is essential to assure valid epidemiological interpretations. The aims of this paper are (i to explain the model building process comparing several competing models for count data (negative binomial model, quasi-Poisson model, zero-inflated model, and hurdle model and (ii to compare these models using data from a cross-sectional study on antibiotic resistance in animal husbandry. These goals are essential to evaluate which model is most suitable to identify potential prevention measures. The dataset used as an example in our analyses was generated initially to study the prevalence and associated factors for the appearance of cefotaxime-resistant Escherichia coli in 48 German fattening pig farms. For each farm, the outcome was the count of samples with resistant bacteria. There was almost no overdispersion and only moderate evidence of excess zeros in the data. Our analyses show that it is essential to evaluate regression models in studies analyzing the relationship between environmental factors and antibiotic resistances in livestock. After model comparison based on evaluation of model predictions, Akaike information criterion, and Pearson residuals, here the hurdle model was judged to be the most appropriate
Goodness-of-fit tests and model diagnostics for negative binomial regression of RNA sequencing data.
Gu Mi
Full Text Available This work is about assessing model adequacy for negative binomial (NB regression, particularly (1 assessing the adequacy of the NB assumption, and (2 assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models.
Goodness-of-fit tests and model diagnostics for negative binomial regression of RNA sequencing data.
Mi, Gu; Di, Yanming; Schafer, Daniel W
2015-01-01
This work is about assessing model adequacy for negative binomial (NB) regression, particularly (1) assessing the adequacy of the NB assumption, and (2) assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq) data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models.
Nick, Todd G; Campbell, Kathleen M
2007-01-01
The Medical Subject Headings (MeSH) thesaurus used by the National Library of Medicine defines logistic regression models as "statistical models which describe the relationship between a qualitative dependent variable (that is, one which can take only certain discrete values, such as the presence or absence of a disease) and an independent variable." Logistic regression models are used to study effects of predictor variables on categorical outcomes and normally the outcome is binary, such as presence or absence of disease (e.g., non-Hodgkin's lymphoma), in which case the model is called a binary logistic model. When there are multiple predictors (e.g., risk factors and treatments) the model is referred to as a multiple or multivariable logistic regression model and is one of the most frequently used statistical model in medical journals. In this chapter, we examine both simple and multiple binary logistic regression models and present related issues, including interaction, categorical predictor variables, continuous predictor variables, and goodness of fit.
[A mathematical model of the seasonal morbidity of shigellosis].
Boev, B V; Bondarenko, V M; Prokop'eva, N V; Raigosa Anaya, M; García de Alba, H; San Román, R T
1993-01-01
A new epidemiologically significant mathematical model for the prognosis of seasonal morbidity in dysentery caused by S. flexneri and S. sonnei has been developed. This model may be used for solving problems on the epidemiology of Shigella infections. In this model quantitative ratios are determined by means of the system of nonlinear integral-differential equations in partial derivatives of the first order with edge conditions of the integral type. This model makes it possible to make multiple calculations with a view to obtaining the most probable picture of the development of the epidemic process at individual territories, to ascertain and make prognosis the terms and peaks of morbidity rises year after year in succession. The model permits the evaluation of specific features of the course of dysentery in patients of different ages in different groups of the population, affected by various nosological forms of shigellae. The relationships indicated by the model have been realized in the form of the computer program "SHIGELLA C" permitting multiple calculations of dysentery morbidity by means of an IBM PC/AT.
Kiviet, J.F.; Phillips, G.D.A.
2014-01-01
In dynamic regression models conditional maximum likelihood (least-squares) coefficient and variance estimators are biased. Using expansion techniques an approximation is obtained to the bias in variance estimation yielding a bias corrected variance estimator. This is achieved for both the standard
Modeling protein tandem mass spectrometry data with an extended linear regression strategy.
Liu, Han; Bonner, Anthony J; Emili, Andrew
2004-01-01
Tandem mass spectrometry (MS/MS) has emerged as a cornerstone of proteomics owing in part to robust spectral interpretation algorithm. The intensity patterns presented in mass spectra are useful information for identification of peptides and proteins. However, widely used algorithms can not predicate the peak intensity patterns exactly. We have developed a systematic analytical approach based on a family of extended regression models, which permits routine, large scale protein expression profile modeling. By proving an important technical result that the regression coefficient vector is just the eigenvector corresponding to the least eigenvalue of a space transformed version of the original data, this extended regression problem can be reduced to a SVD decomposition problem, thus gain the robustness and efficiency. To evaluate the performance of our model, from 60,960 spectra, we chose 2,859 with high confidence, non redundant matches as training data, based on this specific problem, we derived some measurements of goodness of fit to show that our modeling method is reasonable. The issues of overfitting and underfitting are also discussed. This extended regression strategy therefore offers an effective and efficient framework for in-depth investigation of complex mammalian proteomes.
Li, Spencer D.
2011-01-01
Mediation analysis in child and adolescent development research is possible using large secondary data sets. This article provides an overview of two statistical methods commonly used to test mediated effects in secondary analysis: multiple regression and structural equation modeling (SEM). Two empirical studies are presented to illustrate the…
Simple multiple regression model for long range forecasting of Indian summer monsoon rainfall
Sadhuram, Y.; Murthy, T.V.R.
) and ISMR is found to be 0.62. The multiple correlation using the above two parameters is 0.85 which explains 72% variance in ISMR. Using the above two parameters a linear multiple regression model to predict ISMR is developed. The results are comparable...
Cason, Gerald J.; Cason, Carolyn L.
A more familiar and efficient method for estimating the parameters of Cason and Cason's model was examined. Using a two-step analysis based on linear regression, rather than the direct search interative procedure, gave about equally good results while providing a 33 to 1 computer processing time advantage, across 14 cohorts of junior medical…
FRICTION MODELING OF Al-Mg ALLOY SHEETS BASED ON MULTIPLE REGRESSION ANALYSIS AND NEURAL NETWORKS
Hirpa G. Lemu
2017-03-01
Full Text Available This article reports a proposed approach to a frictional resistance description in sheet metal forming processes that enables determination of the friction coefficient value under a wide range of friction conditions without performing time-consuming experiments. The motivation for this proposal is the fact that there exists a considerable amount of factors affect the friction coefficient value and as a result building analytical friction model for specified process conditions is practically impossible. In this proposed approach, a mathematical model of friction behaviour is created using multiple regression analysis and artificial neural networks. The regression analysis was performed using a subroutine in MATLAB programming code and STATISTICA Neural Networks was utilized to build an artificial neural networks model. The effect of different training strategies on the quality of neural networks was studied. As input variables for regression model and training of radial basis function networks, generalized regression neural networks and multilayer networks the results of strip drawing friction test were utilized. Four kinds of Al-Mg alloy sheets were used as a test material.
Sieve M-estimation for semiparametric varying-coefficient partially linear regression model
无
2010-01-01
This article considers a semiparametric varying-coefficient partially linear regression model.The semiparametric varying-coefficient partially linear regression model which is a generalization of the partially linear regression model and varying-coefficient regression model that allows one to explore the possibly nonlinear effect of a certain covariate on the response variable.A sieve M-estimation method is proposed and the asymptotic properties of the proposed estimators are discussed.Our main object is to estimate the nonparametric component and the unknown parameters simultaneously.It is easier to compute and the required computation burden is much less than the existing two-stage estimation method.Furthermore,the sieve M-estimation is robust in the presence of outliers if we choose appropriate ρ(·).Under some mild conditions,the estimators are shown to be strongly consistent;the convergence rate of the estimator for the unknown nonparametric component is obtained and the estimator for the unknown parameter is shown to be asymptotically normally distributed.Numerical experiments are carried out to investigate the performance of the proposed method.
Larsen, Ulrik; Pierobon, Leonardo; Wronski, Jorrit;
2014-01-01
to power. In this study we propose four linear regression models to predict the maximum obtainable thermal efficiency for simple and recuperated ORCs. A previously derived methodology is able to determine the maximum thermal efficiency among many combinations of fluids and processes, given the boundary...
Susan L. King
2003-01-01
The performance of two classifiers, logistic regression and neural networks, are compared for modeling noncatastrophic individual tree mortality for 21 species of trees in West Virginia. The output of the classifier is usually a continuous number between 0 and 1. A threshold is selected between 0 and 1 and all of the trees below the threshold are classified as...
Enders, Craig K.
2001-01-01
Examined the performance of a recently available full information maximum likelihood (FIML) estimator in a multiple regression model with missing data using Monte Carlo simulation and considering the effects of four independent variables. Results indicate that FIML estimation was superior to that of three ad hoc techniques, with less bias and less…
Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza
2014-01-01
This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…
Ling, Ru; Liu, Jiawang
2011-12-01
To construct prediction model for health workforce and hospital beds in county hospitals of Hunan by multiple linear regression. We surveyed 16 counties in Hunan with stratified random sampling according to uniform questionnaires,and multiple linear regression analysis with 20 quotas selected by literature view was done. Independent variables in the multiple linear regression model on medical personnels in county hospitals included the counties' urban residents' income, crude death rate, medical beds, business occupancy, professional equipment value, the number of devices valued above 10 000 yuan, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, and utilization rate of hospital beds. Independent variables in the multiple linear regression model on county hospital beds included the the population of aged 65 and above in the counties, disposable income of urban residents, medical personnel of medical institutions in county area, business occupancy, the total value of professional equipment, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, utilization rate of hospital beds, and length of hospitalization. The prediction model shows good explanatory and fitting, and may be used for short- and mid-term forecasting.
A combined gray neural network model of seasonal heating load forecast
QIAOXiaozhuang; YANGChangzhi
2003-01-01
Seasonal heating load time sequence has the double trends of increasing and fluctuating, so it''s difficult to select a model to forecast it. In this paper, a combined model of gray model and artificial neural network model was presented to forecast seasonal heating load. A concrete model was established and was verified through actual examples.
Lukianenko Iryna H.
2014-01-01
Full Text Available The article considers possibilities and specific features of modelling economic phenomena with the help of the category of models that unite elements of econometric regressions and artificial neural networks. This category of models contains auto-regression neural networks (AR-NN, regressions of smooth transition (STR/STAR, multi-mode regressions of smooth transition (MRSTR/MRSTAR and smooth transition regressions with neural coefficients (NCSTR/NCSTAR. Availability of the neural network component allows models of this category achievement of a high empirical authenticity, including reproduction of complex non-linear interrelations. On the other hand, the regression mechanism expands possibilities of interpretation of the obtained results. An example of multi-mode monetary rule is used to show one of the cases of specification and interpretation of this model. In particular, the article models and interprets principles of management of the UAH exchange rate that come into force when economy passes from a relatively stable into a crisis state.
Neural Network and Regression Soft Model Extended for PAX-300 Aircraft Engine
Patnaik, Surya N.; Hopkins, Dale A.
2002-01-01
In fiscal year 2001, the neural network and regression capabilities of NASA Glenn Research Center's COMETBOARDS design optimization testbed were extended to generate approximate models for the PAX-300 aircraft engine. The analytical model of the engine is defined through nine variables: the fan efficiency factor, the low pressure of the compressor, the high pressure of the compressor, the high pressure of the turbine, the low pressure of the turbine, the operating pressure, and three critical temperatures (T(sub 4), T(sub vane), and T(sub metal)). Numerical Propulsion System Simulation (NPSS) calculations of the specific fuel consumption (TSFC), as a function of the variables can become time consuming, and numerical instabilities can occur during these design calculations. "Soft" models can alleviate both deficiencies. These approximate models are generated from a set of high-fidelity input-output pairs obtained from the NPSS code and a design of the experiment strategy. A neural network and a regression model with 45 weight factors were trained for the input/output pairs. Then, the trained models were validated through a comparison with the original NPSS code. Comparisons of TSFC versus the operating pressure and of TSFC versus the three temperatures (T(sub 4), T(sub vane), and T(sub metal)) are depicted in the figures. The overall performance was satisfactory for both the regression and the neural network model. The regression model required fewer calculations than the neural network model, and it produced marginally superior results. Training the approximate methods is time consuming. Once trained, the approximate methods generated the solution with only a trivial computational effort, reducing the solution time from hours to less than a minute.
An empirical approach to update multivariate regression models intended for routine industrial use
Garcia-Mencia, M.V.; Andrade, J.M.; Lopez-Mahia, P.; Prada, D. [University of La Coruna, La Coruna (Spain). Dept. of Analytical Chemistry
2000-11-01
Many problems currently tackled by analysts are highly complex and, accordingly, multivariate regression models need to be developed. Two intertwined topics are important when such models are to be applied within the industrial routines: (1) Did the model account for the 'natural' variance of the production samples? (2) Is the model stable on time? This paper focuses on the second topic and it presents an empirical approach where predictive models developed by using Mid-FTIR and PLS and PCR hold its utility during about nine months when used to predict the octane number of platforming naphthas in a petrochemical refinery. 41 refs., 10 figs., 1 tab.
BOOTSTRAP WAVELET IN THE NONPARAMETRIC REGRESSION MODEL WITH WEAKLY DEPENDENT PROCESSES
林路; 张润楚
2004-01-01
This paper introduces a method of bootstrap wavelet estimation in a nonparametric regression model with weakly dependent processes for both fixed and random designs. The asymptotic bounds for the bias and variance of the bootstrap wavelet estimators are given in the fixed design model. The conditional normality for a modified version of the bootstrap wavelet estimators is obtained in the fixed model. The consistency for the bootstrap wavelet estimator is also proved in the random design model. These results show that the bootstrap wavelet method is valid for the model with weakly dependent processes.
A brief introduction to regression designs and mixed-effects modelling by a recent convert
Balling, Laura Winther
2008-01-01
This article discusses the advantages of multiple regression designs over the factorial designs traditionally used in many psycholinguistic experiments. It is shown that regression designs are typically more informative, statistically more powerful and better suited to the analysis of naturalistic...... tasks. The advantages of including both fixed and random effects are demonstrated with reference to linear mixed-effects models, and problems of collinearity, variable distribution and variable selection are discussed. The advantages of these techniques are exemplified in an analysis of a word...
Ge-mai Chen; Jin-hong You
2005-01-01
Consider a repeated measurement partially linear regression model with an unknown vector pasemiparametric generalized least squares estimator (SGLSE) ofβ, we propose an iterative weighted semiparametric least squares estimator (IWSLSE) and show that it improves upon the SGLSE in terms of asymptotic covariance matrix. An adaptive procedure is given to determine the number of iterations. We also show that when the number of replicates is less than or equal to two, the IWSLSE can not improve upon the SGLSE.These results are generalizations of those in [2] to the case of semiparametric regressions.
Regressions by leaps and bounds and biased estimation techniques in yield modeling
Marquina, N. E. (Principal Investigator)
1979-01-01
The author has identified the following significant results. It was observed that OLS was not adequate as an estimation procedure when the independent or regressor variables were involved in multicollinearities. This was shown to cause the presence of small eigenvalues of the extended correlation matrix A'A. It was demonstrated that the biased estimation techniques and the all-possible subset regression could help in finding a suitable model for predicting yield. Latent root regression was an excellent tool that found how many predictive and nonpredictive multicollinearities there were.
A Study of Wind Statistics Through Auto-Regressive and Moving-Average (ARMA) Modeling
尹彰; 周宗仁
2001-01-01
Statistical properties of winds near the Taichung Harbour are investigated. The 26 years′incomplete data of wind speeds, measured on an hourly basis, are used as reference. The possibility of imputation using simulated results of the Auto-Regressive (AR), Moving-Average (MA), and/or Auto-Regressive and Moving-Average (ARMA) models is studied. Predictions of the 25-year extreme wind speeds based upon the augmented data are compared with the original series. Based upon the results, predictions of the 50- and 100-year extreme wind speeds are then made.
Menon Carlo
2011-09-01
Full Text Available Abstract Background Several regression models have been proposed for estimation of isometric joint torque using surface electromyography (SEMG signals. Common issues related to torque estimation models are degradation of model accuracy with passage of time, electrode displacement, and alteration of limb posture. This work compares the performance of the most commonly used regression models under these circumstances, in order to assist researchers with identifying the most appropriate model for a specific biomedical application. Methods Eleven healthy volunteers participated in this study. A custom-built rig, equipped with a torque sensor, was used to measure isometric torque as each volunteer flexed and extended his wrist. SEMG signals from eight forearm muscles, in addition to wrist joint torque data were gathered during the experiment. Additional data were gathered one hour and twenty-four hours following the completion of the first data gathering session, for the purpose of evaluating the effects of passage of time and electrode displacement on accuracy of models. Acquired SEMG signals were filtered, rectified, normalized and then fed to models for training. Results It was shown that mean adjusted coefficient of determination (Ra2 values decrease between 20%-35% for different models after one hour while altering arm posture decreased mean Ra2 values between 64% to 74% for different models. Conclusions Model estimation accuracy drops significantly with passage of time, electrode displacement, and alteration of limb posture. Therefore model retraining is crucial for preserving estimation accuracy. Data resampling can significantly reduce model training time without losing estimation accuracy. Among the models compared, ordinary least squares linear regression model (OLS was shown to have high isometric torque estimation accuracy combined with very short training times.
The applicability of linear regression models in working environments' thermal evaluation.
Pablo Adamoglu de Oliveira
2006-04-01
Full Text Available The simultaneous analysis of thermal variables with normal distribution with the aim of checking if there is any significative correlation among them or if there is the possibility of making predictions of the values of some of them based on others’ values is considered a problem of great importance in statistics studies. The aim of this paper is to study the applicability of linear regression models in working environments’ thermal comfort studies, thus contributing for the comprehension of the possible environmental cooling, heating or winding needs. It starts with a bibliographical research, followed by a field research, data collection and and software statistical-mathematical data treatment. It was then performed data analysis and the construction of the regression linear models using the t and F tests for determining the consistency of the models and their parameters, as well as the building of conclusions based on the information obtained and on the significance of the mathematical models built.
Sumit Goyal
2011-07-01
Full Text Available Coffee as beverage is prepared from the roasted seeds (beans of the coffee plant. Coffee is the second most important product in the international market in terms of volume trade and the most important in terms of value. Artificial neural engineering and regression models were developed to predict shelf life of instant coffee drink. Colour and appearance, flavour, viscosity and sediment were used as input parameters. Overall acceptability was used as output parameter. The dataset consisted of experimentally developed 50 observations. The dataset was divided into two disjoint subsets, namely, training set containing 40 observations (80% of total observations and test set comprising of 10 observations (20% of total observations. The network was trained with 500 epochs. Neural network toolbox under Matlab 7.0 software was used for training the models. From the investigation it was revealed that multiple linear regression model was superior over radial basis model for forecasting shelf life of instant coffee drink.
A multivariate linear regression model for the Jordanian industrial electric energy consumption
Al-Ghandoor, A.; Nahleh, Y.A.; Sandouqa, Y.; Al-Salaymeh, M. [Hashemite Univ., Zarqa (Jordan). Dept. of Industrial Engineering
2007-08-09
The amount of electricity used by the industrial sector in Jordan is an important driver for determining the future energy needs of the country. This paper proposed a model to simulate electricity and energy consumption by industry. The general model approach was based on multivariate regression analysis to provide valuable information regarding energy demands and analysis, and to identify the various factors that influence Jordanian industrial electricity consumption. It was determined that industrial gross output and capacity utilization are the most important variables that drive electricity consumption. The results revealed that the multivariate linear regression model can be used to adequately model the Jordanian industrial electricity consumption with coefficient of determination (R2) and adjusted R2 values of 99.3 and 99.2 per cent, respectively. 19 refs., 4 tabs., 2 figs.
Floating Car Data Based Nonparametric Regression Model for Short-Term Travel Speed Prediction
WENG Jian-cheng; HU Zhong-wei; YU Quan; REN Fu-tian
2007-01-01
A K-nearest neighbor (K-NN) based nonparametric regression model was proposed to predict travel speed for Beijing expressway. By using the historical traffic data collected from the detectors in Beijing expressways, a specically designed database was developed via the processes including data filtering, wavelet analysis and clustering. The relativity based weighted Euclidean distance was used as the distance metric to identify the K groups of nearest data series. Then, a K-NN nonparametric regression model was built to predict the average travel speeds up to 6 min into the future. Several randomly selected travel speed data series,collected from the floating car data (FCD) system, were used to validate the model. The results indicate that using the FCD, the model can predict average travel speeds with an accuracy of above 90%, and hence is feasible and effective.
Chardon, Jérémy; Hingray, Benoit; Favre, Anne-Catherine
2016-04-01
Scenarios of surface weather required for the impact studies have to be unbiased and adapted to the space and time scales of the considered hydro-systems. Hence, surface weather scenarios obtained from global climate models and/or numerical weather prediction models are not really appropriated. Outputs of these models have to be post-processed, which is often carried out thanks to Statistical Downscaling Methods (SDMs). Among those SDMs, approaches based on regression are often applied. For a given station, a regression link can be established between a set of large scale atmospheric predictors and the surface weather variable. These links are then used for the prediction of the latter. However, physical processes generating surface weather vary in time. This is well known for precipitation for instance. The most relevant predictors and the regression link are also likely to vary in time. A better prediction skill is thus classically obtained with a seasonal stratification of the data. Another strategy is to identify the most relevant predictor set and establish the regression link from dates that are similar - or analog - to the target date. In practice, these dates can be selected thanks to an analog model. In this study, we explore the possibility of improving the local performance of an analog model - where the analogy is applied to the geopotential heights 1000 and 500 hPa - using additional local scale predictors for the probabilistic prediction of the Safran precipitation over France. For each prediction day, the prediction is obtained from two GLM regression models - for both the occurrence and the quantity of precipitation - for which predictors and parameters are estimated from the analog dates. Firstly, the resulting combined model noticeably allows increasing the prediction performance by adapting the downscaling link for each prediction day. Secondly, the selected predictors for a given prediction depend on the large scale situation and on the
Exploratory regression analysis: a tool for selecting models and determining predictor importance.
Braun, Michael T; Oswald, Frederick L
2011-06-01
Linear regression analysis is one of the most important tools in a researcher's toolbox for creating and testing predictive models. Although linear regression analysis indicates how strongly a set of predictor variables, taken together, will predict a relevant criterion (i.e., the multiple R), the analysis cannot indicate which predictors are the most important. Although there is no definitive or unambiguous method for establishing predictor variable importance, there are several accepted methods. This article reviews those methods for establishing predictor importance and provides a program (in Excel) for implementing them (available for direct download at http://dl.dropbox.com/u/2480715/ERA.xlsm?dl=1) . The program investigates all 2(p) - 1 submodels and produces several indices of predictor importance. This exploratory approach to linear regression, similar to other exploratory data analysis techniques, has the potential to yield both theoretical and practical benefits.
Bayesian Method of Moments (BMOM) Analysis of Mean and Regression Models
Zellner, Arnold
2008-01-01
A Bayesian method of moments/instrumental variable (BMOM/IV) approach is developed and applied in the analysis of the important mean and multiple regression models. Given a single set of data, it is shown how to obtain posterior and predictive moments without the use of likelihood functions, prior densities and Bayes' Theorem. The posterior and predictive moments, based on a few relatively weak assumptions, are then used to obtain maximum entropy densities for parameters, realized error terms and future values of variables. Posterior means for parameters and realized error terms are shown to be equal to certain well known estimates and rationalized in terms of quadratic loss functions. Conditional maxent posterior densities for means and regression coefficients given scale parameters are in the normal form while scale parameters' maxent densities are in the exponential form. Marginal densities for individual regression coefficients, realized error terms and future values are in the Laplace or double-exponenti...
A note on constrained M-estimation and its recursive analog in multivariate linear regression models
RAO; Calyampudi; R
2009-01-01
In this paper,the constrained M-estimation of the regression coeffcients and scatter parameters in a general multivariate linear regression model is considered.Since the constrained M-estimation is not easy to compute,an up-dating recursion procedure is proposed to simplify the com-putation of the estimators when a new observation is obtained.We show that,under mild conditions,the recursion estimates are strongly consistent.In addition,the asymptotic normality of the recursive constrained M-estimators of regression coeffcients is established.A Monte Carlo simulation study of the recursion estimates is also provided.Besides,robustness and asymptotic behavior of constrained M-estimators are briefly discussed.
Fatigue design of a cellular phone folder using regression model-based multi-objective optimization
Kim, Young Gyun; Lee, Jongsoo
2016-08-01
In a folding cellular phone, the folding device is repeatedly opened and closed by the user, which eventually results in fatigue damage, particularly to the front of the folder. Hence, it is important to improve the safety and endurance of the folder while also reducing its weight. This article presents an optimal design for the folder front that maximizes its fatigue endurance while minimizing its thickness. Design data for analysis and optimization were obtained experimentally using a test jig. Multi-objective optimization was carried out using a nonlinear regression model. Three regression methods were employed: back-propagation neural networks, logistic regression and support vector machines. The AdaBoost ensemble technique was also used to improve the approximation. Two-objective Pareto-optimal solutions were identified using the non-dominated sorting genetic algorithm (NSGA-II). Finally, a numerically optimized solution was validated against experimental product data, in terms of both fatigue endurance and thickness index.
A quantile regression approach for modelling a Health-Related Quality of Life Measure
Giulia Cavrini
2013-05-01
Full Text Available Objective. The aim of this study is to propose a new approach for modeling the EQ-5D index and EQ-5D VAS in order to explain the lifestyle determinants effect using the quantile regression analysis. Methods. Data was collected within a cross-sectional study that involved a probabilistic sample of 1,622 adults randomly selected from the population register of two Health Authorities of Bologna in northern Italy. The perceived health status of people was measured using the EQ-5D questionnaire. The Visual Analogue Scale included in the EQ-5D Questionnaire, the EQ-VAS, and the EQ-5D index were used to obtain the synthetic measures of quality of life. To model EQ-VAS Score and EQ-5D index, a quantile regression analysis was employed. Quantile Regression is a way to estimate the conditional quantiles of the VAS Score distribution in a linear model, in order to have a more complete view of possible associations between a measure of Health Related Quality of Life (dependent variable and socio-demographic and determinants data. This methodological approach was preferred to an OLS regression because of the EQ-VAS Score and EQ-5D index typical distribution. Main Results. The analysis suggested that age, gender, and comorbidity can explain variability in perceived health status measured by the EQ-5D index and the VAS.
Comparison of a Bayesian Network with a Logistic Regression Model to Forecast IgA Nephropathy
Michel Ducher
2013-01-01
Full Text Available Models are increasingly used in clinical practice to improve the accuracy of diagnosis. The aim of our work was to compare a Bayesian network to logistic regression to forecast IgA nephropathy (IgAN from simple clinical and biological criteria. Retrospectively, we pooled the results of all biopsies (n=155 performed by nephrologists in a specialist clinical facility between 2002 and 2009. Two groups were constituted at random. The first subgroup was used to determine the parameters of the models adjusted to data by logistic regression or Bayesian network, and the second was used to compare the performances of the models using receiver operating characteristics (ROC curves. IgAN was found (on pathology in 44 patients. Areas under the ROC curves provided by both methods were highly significant but not different from each other. Based on the highest Youden indices, sensitivity reached (100% versus 67% and specificity (73% versus 95% using the Bayesian network and logistic regression, respectively. A Bayesian network is at least as efficient as logistic regression to estimate the probability of a patient suffering IgAN, using simple clinical and biological data obtained during consultation.
Comparison of a Bayesian network with a logistic regression model to forecast IgA nephropathy.
Ducher, Michel; Kalbacher, Emilie; Combarnous, François; Finaz de Vilaine, Jérome; McGregor, Brigitte; Fouque, Denis; Fauvel, Jean Pierre
2013-01-01
Models are increasingly used in clinical practice to improve the accuracy of diagnosis. The aim of our work was to compare a Bayesian network to logistic regression to forecast IgA nephropathy (IgAN) from simple clinical and biological criteria. Retrospectively, we pooled the results of all biopsies (n = 155) performed by nephrologists in a specialist clinical facility between 2002 and 2009. Two groups were constituted at random. The first subgroup was used to determine the parameters of the models adjusted to data by logistic regression or Bayesian network, and the second was used to compare the performances of the models using receiver operating characteristics (ROC) curves. IgAN was found (on pathology) in 44 patients. Areas under the ROC curves provided by both methods were highly significant but not different from each other. Based on the highest Youden indices, sensitivity reached (100% versus 67%) and specificity (73% versus 95%) using the Bayesian network and logistic regression, respectively. A Bayesian network is at least as efficient as logistic regression to estimate the probability of a patient suffering IgAN, using simple clinical and biological data obtained during consultation.
Terzer, S.; Wassenaar, L. I.; Araguás-Araguás, L. J.; Aggarwal, P. K.
2013-11-01
A regionalized cluster-based water isotope prediction (RCWIP) approach, based on the Global Network of Isotopes in Precipitation (GNIP), was demonstrated for the purposes of predicting point- and large-scale spatio-temporal patterns of the stable isotope composition (δ2H, δ18O) of precipitation around the world. Unlike earlier global domain and fixed regressor models, RCWIP predefined 36 climatic cluster domains and tested all model combinations from an array of climatic and spatial regressor variables to obtain the best predictive approach to each cluster domain, as indicated by root-mean-squared error (RMSE) and variogram analysis. Fuzzy membership fractions were thereafter used as the weights to seamlessly amalgamate results of the optimized climatic zone prediction models into a single predictive mapping product, such as global or regional amount-weighted mean annual, mean monthly, or growing-season δ18O/δ2H in precipitation. Comparative tests revealed the RCWIP approach outperformed classical global-fixed regression-interpolation-based models more than 67% of the time, and clearly improved upon predictive accuracy and precision. All RCWIP isotope mapping products are available as gridded GeoTIFF files from the IAEA website (www.iaea.org/water) and are for use in hydrology, climatology, food authenticity, ecology, and forensics.
S. Terzer
2013-06-01
Full Text Available A Regionalized Climatic Water Isotope Prediction (RCWIP approach, based on the Global Network for Isotopes in Precipitation (GNIP, was demonstrated for the purposes of predicting point- and large-scale spatiotemporal patterns of the stable isotope compositions of water (δ2H, δ18O in precipitation around the world. Unlike earlier global domain and fixed regressor models, RCWIP pre-defined thirty-six climatic cluster domains, and tested all model combinations from an array of climatic and spatial regressor variables to obtain the best predictive approach to each cluster domain, as indicated by RMSE and variogram analysis. Fuzzy membership fractions were thereafter used as the weights to seamlessly amalgamate results of the optimized climatic zone prediction models into a single predictive mapping product, such as global or regional amount-weighted mean annual, mean monthly or growing-season δ18O/δ2H in precipitation. Comparative tests revealed the RCWIP approach outperformed classical global-fixed regression-interpolation based models more than 67% of the time, and significantly improved upon predictive accuracy and precision. All RCWIP isotope mapping products are available as gridded GeoTIFF files from the IAEA website (www.iaea.org/water and are for use in hydrology, climatology, food authenticity, ecology, and forensics.
Passenger Flow Prediction of Subway Transfer Stations Based on Nonparametric Regression Model
Yujuan Sun
2014-01-01
Full Text Available Passenger flow is increasing dramatically with accomplishment of subway network system in big cities of China. As convergence nodes of subway lines, transfer stations need to assume more passengers due to amount transfer demand among different lines. Then, transfer facilities have to face great pressure such as pedestrian congestion or other abnormal situations. In order to avoid pedestrian congestion or warn the management before it occurs, it is very necessary to predict the transfer passenger flow to forecast pedestrian congestions. Thus, based on nonparametric regression theory, a transfer passenger flow prediction model was proposed. In order to test and illustrate the prediction model, data of transfer passenger flow for one month in XIDAN transfer station were used to calibrate and validate the model. By comparing with Kalman filter model and support vector machine regression model, the results show that the nonparametric regression model has the advantages of high accuracy and strong transplant ability and could predict transfer passenger flow accurately for different intervals.
Rachna Aggarwal
2014-12-01
Full Text Available This paper presents Reliability Based Design Optimization (RBDO model to deal with uncertainties involved in concrete mix design process. The optimization problem is formulated in such a way that probabilistic concrete mix input parameters showing random characteristics are determined by minimizing the cost of concrete subjected to concrete compressive strength constraint for a given target reliability. Linear and quadratic models based on Ordinary Least Square Regression (OLSR, Traditional Ridge Regression (TRR and Generalized Ridge Regression (GRR techniques have been explored to select the best model to explicitly represent compressive strength of concrete. The RBDO model is solved by Sequential Optimization and Reliability Assessment (SORA method using fully quadratic GRR model. Optimization results for a wide range of target compressive strength and reliability levels of 0.90, 0.95 and 0.99 have been reported. Also, safety factor based Deterministic Design Optimization (DDO designs for each case are obtained. It has been observed that deterministic optimal designs are cost effective but proposed RBDO model gives improved design performance.
Kovalchik, Stephanie A; Varadhan, Ravi; Fetterman, Barbara; Poitras, Nancy E; Wacholder, Sholom; Katki, Hormuzd A
2013-02-28
Estimates of absolute risks and risk differences are necessary for evaluating the clinical and population impact of biomedical research findings. We have developed a linear-expit regression model (LEXPIT) to incorporate linear and nonlinear risk effects to estimate absolute risk from studies of a binary outcome. The LEXPIT is a generalization of both the binomial linear and logistic regression models. The coefficients of the LEXPIT linear terms estimate adjusted risk differences, whereas the exponentiated nonlinear terms estimate residual odds ratios. The LEXPIT could be particularly useful for epidemiological studies of risk association, where adjustment for multiple confounding variables is common. We present a constrained maximum likelihood estimation algorithm that ensures the feasibility of risk estimates of the LEXPIT model and describe procedures for defining the feasible region of the parameter space, judging convergence, and evaluating boundary cases. Simulations demonstrate that the methodology is computationally robust and yields feasible, consistent estimators. We applied the LEXPIT model to estimate the absolute 5-year risk of cervical precancer or cancer associated with different Pap and human papillomavirus test results in 167,171 women undergoing screening at Kaiser Permanente Northern California. The LEXPIT model found an increased risk due to abnormal Pap test in human papillomavirus-negative that was not detected with logistic regression. Our R package blm provides free and easy-to-use software for fitting the LEXPIT model.
Rovadoscki, Gregori A; Petrini, Juliana; Ramirez-Diaz, Johanna; Pertile, Simone F N; Pertille, Fábio; Salvian, Mayara; Iung, Laiza H S; Rodriguez, Mary Ana P; Zampar, Aline; Gaya, Leila G; Carvalho, Rachel S B; Coelho, Antonio A D; Savino, Vicente J M; Coutinho, Luiz L; Mourão, Gerson B
2016-09-01
Repeated measures from the same individual have been analyzed by using repeatability and finite dimension models under univariate or multivariate analyses. However, in the last decade, the use of random regression models for genetic studies with longitudinal data have become more common. Thus, the aim of this research was to estimate genetic parameters for body weight of four experimental chicken lines by using univariate random regression models. Body weight data from hatching to 84 days of age (n = 34,730) from four experimental free-range chicken lines (7P, Caipirão da ESALQ, Caipirinha da ESALQ and Carijó Barbado) were used. The analysis model included the fixed effects of contemporary group (gender and rearing system), fixed regression coefficients for age at measurement, and random regression coefficients for permanent environmental effects and additive genetic effects. Heterogeneous variances for residual effects were considered, and one residual variance was assigned for each of six subclasses of age at measurement. Random regression curves were modeled by using Legendre polynomials of the second and third orders, with the best model chosen based on the Akaike Information Criterion, Bayesian Information Criterion, and restricted maximum likelihood. Multivariate analyses under the same animal mixed model were also performed for the validation of the random regression models. The Legendre polynomials of second order were better for describing the growth curves of the lines studied. Moderate to high heritabilities (h(2) = 0.15 to 0.98) were estimated for body weight between one and 84 days of age, suggesting that selection for body weight at all ages can be used as a selection criteria. Genetic correlations among body weight records obtained through multivariate analyses ranged from 0.18 to 0.96, 0.12 to 0.89, 0.06 to 0.96, and 0.28 to 0.96 in 7P, Caipirão da ESALQ, Caipirinha da ESALQ, and Carijó Barbado chicken lines, respectively. Results indicate that
Li, Chunjian; Andersen, Søren Vang
2007-01-01
We propose two blind system identification methods that exploit the underlying dynamics of non-Gaussian signals. The two signal models to be identified are: an Auto-Regressive (AR) model driven by a discrete-state Hidden Markov process, and the same model whose output is perturbed by white Gaussian...
A review of a priori regression models for warfarin maintenance dose prediction.
Francis, Ben; Lane, Steven; Pirmohamed, Munir; Jorgensen, Andrea
2014-01-01
A number of a priori warfarin dosing algorithms, derived using linear regression methods, have been proposed. Although these dosing algorithms may have been validated using patients derived from the same centre, rarely have they been validated using a patient cohort recruited from another centre. In order to undertake external validation, two cohorts were utilised. One cohort formed by patients from a prospective trial and the second formed by patients in the control arm of the EU-PACT trial. Of these, 641 patients were identified as having attained stable dosing and formed the dataset used for validation. Predicted maintenance doses from six criterion fulfilling regression models were then compared to individual patient stable warfarin dose. Predictive ability was assessed with reference to several statistics including the R-square and mean absolute error. The six regression models explained different amounts of variability in the stable maintenance warfarin dose requirements of the patients in the two validation cohorts; adjusted R-squared values ranged from 24.2% to 68.6%. An overview of the summary statistics demonstrated that no one dosing algorithm could be considered optimal. The larger validation cohort from the prospective trial produced more consistent statistics across the six dosing algorithms. The study found that all the regression models performed worse in the validation cohort when compared to the derivation cohort. Further, there was little difference between regression models that contained pharmacogenetic coefficients and algorithms containing just non-pharmacogenetic coefficients. The inconsistency of results between the validation cohorts suggests that unaccounted population specific factors cause variability in dosing algorithm performance. Better methods for dosing that take into account inter- and intra-individual variability, at the initiation and maintenance phases of warfarin treatment, are needed.
A review of a priori regression models for warfarin maintenance dose prediction.
Ben Francis
Full Text Available A number of a priori warfarin dosing algorithms, derived using linear regression methods, have been proposed. Although these dosing algorithms may have been validated using patients derived from the same centre, rarely have they been validated using a patient cohort recruited from another centre. In order to undertake external validation, two cohorts were utilised. One cohort formed by patients from a prospective trial and the second formed by patients in the control arm of the EU-PACT trial. Of these, 641 patients were identified as having attained stable dosing and formed the dataset used for validation. Predicted maintenance doses from six criterion fulfilling regression models were then compared to individual patient stable warfarin dose. Predictive ability was assessed with reference to several statistics including the R-square and mean absolute error. The six regression models explained different amounts of variability in the stable maintenance warfarin dose requirements of the patients in the two validation cohorts; adjusted R-squared values ranged from 24.2% to 68.6%. An overview of the summary statistics demonstrated that no one dosing algorithm could be considered optimal. The larger validation cohort from the prospective trial produced more consistent statistics across the six dosing algorithms. The study found that all the regression models performed worse in the validation cohort when compared to the derivation cohort. Further, there was little difference between regression models that contained pharmacogenetic coefficients and algorithms containing just non-pharmacogenetic coefficients. The inconsistency of results between the validation cohorts suggests that unaccounted population specific factors cause variability in dosing algorithm performance. Better methods for dosing that take into account inter- and intra-individual variability, at the initiation and maintenance phases of warfarin treatment, are needed.
APPLICATION OF REGRESSION MODELLING TECHNIQUES IN DESALINATION OF SEA WATER BY MEMBRANE DISTILLATION
SELVI S. R
2015-08-01
Full Text Available The objective of this work is to gain an idea about the statistical significance of experimental parameters on the performance of membrane distillation. In this work the raw sea water sample without pretreatment was collected from Puducherry and desalinated using direct contact membrane distillation method. Experimental data analysis was carried out using statistical methods. The experimental data involves the effects of feed temperature, feed flow rate and feed concentration on the permeate flux. In statistical methods, regression model was developed to correlate the significance of input parameters like feed temperature, feed concentration and feed flow rate with the output parameter like permeate flux in the process of membrane distillation. Since the performance of the membrane distillation in the desalination of water is characterised by permeate flux, regression model using simple linear method was carried out. Goodness of model fitting should always has to be validated. Regression model was validated using ANOVA. Estimates of ANOVA for the parameter study was given and the coefficient obtained by regression analysis was specified in the regression equation and concluded that the highest coefficient of input parameter is significant, highly influences the response. Feed flow rate and feed temperature has higher influence on permeate flux than that of feed concentration. The coefficient of feed concentration was found to be negative which indicates less significant factor on permeate flux. The chemical composition of sea water was given by water quality analysis . TDS of membrane distilled water was found to be 18ppm than the initial feed TDS of sea water 27,720 ppm. From the experimental work it was found, salt rejection as 99% and water analysis report confirms the quality of distillate obtained by this desalination process as potable water.
Identifying of risks in pricing using a regression model of demand on price dependence
O.I. Yashkina
2016-09-01
Full Text Available The aim of the article. The main purpose of the article is to describe scientific and methodological approaches of determining the price elasticity of demand as a regression model based on the price and risk assessment of price variations on the received model. The results of the analysis. The study is based on the assumption that the index of price elasticity of demand on high-tech innovation is not constant as it is commonly understood in the classical sense. On the stage of commodity market release and subsequent sales growth, the index of price elasticity of demand may vary within certain limits. Index value and thereafter market response are closely related to the current price. Achieving the stated purpose of the article is possible when having factual information about prices and corresponding volumes of sales of new high-tech products for a short period of time, on the basis of which types of demand and prices interrelation are modeled. Risk assessment of pricing and profit optimization by the regression of demand depending on price consists of three stages: a obtaining of a regression model of the demand on the price; b obtaining of function of demand price elasticity and risk assessment of pricing depending on behavior of the function; c determination of the price of company to receive a maximum operating profit based on the specific model of price to demand function. To receive the regression model of dependence of demand on price it is recommended to use specific reference models. The article includes linear, hyperbolic and parabolic models. The regression dependence of price elasticity of demand on price for each of the reference models of demand is obtained on the basis of the function elasticity concept in mathematical analysis. The concept of «function of price elasticity of demand» expresses this dependence. For the received functions of price elasticity of demand, the article provides intervals with the highest and lowest
Prediction of Mind-Wandering with Electroencephalogram and Non-linear Regression Modeling.
Kawashima, Issaku; Kumano, Hiroaki
2017-01-01
Mind-wandering (MW), task-unrelated thought, has been examined by researchers in an increasing number of articles using models to predict whether subjects are in MW, using numerous physiological variables. However, these models are not applicable in general situations. Moreover, they output only binary classification. The current study suggests that the combination of electroencephalogram (EEG) variables and non-linear regression modeling can be a good indicator of MW intensity. We recorded EEGs of 50 subjects during the performance of a Sustained Attention to Response Task, including a thought sampling probe that inquired the focus of attention. We calculated the power and coherence value and prepared 35 patterns of variable combinations and applied Support Vector machine Regression (SVR) to them. Finally, we chose four SVR models: two of them non-linear models and the others linear models; two of the four models are composed of a limited number of electrodes to satisfy model usefulness. Examination using the held-out data indicated that all models had robust predictive precision and provided significantly better estimations than a linear regression model using single electrode EEG variables. Furthermore, in limited electrode condition, non-linear SVR model showed significantly better precision than linear SVR model. The method proposed in this study helps investigations into MW in various little-examined situations. Further, by measuring MW with a high temporal resolution EEG, unclear aspects of MW, such as time series variation, are expected to be revealed. Furthermore, our suggestion that a few electrodes can also predict MW contributes to the development of neuro-feedback studies.