Hao, Lingxin
2007-01-01
Quantile Regression, the first book of Hao and Naiman's two-book series, establishes the seldom recognized link between inequality studies and quantile regression models. Though separate methodological literature exists for each subject, the authors seek to explore the natural connections between this increasingly sought-after tool and research topics in the social sciences. Quantile regression as a method does not rely on assumptions as restrictive as those for the classical linear regression; though more traditional models such as least squares linear regression are more widely utilized, Hao
Quantile regression modeling for Malaysian automobile insurance premium data
Fuzi, Mohd Fadzli Mohd; Ismail, Noriszura; Jemain, Abd Aziz
2015-09-01
Quantile regression is a robust regression to outliers compared to mean regression models. Traditional mean regression models like Generalized Linear Model (GLM) are not able to capture the entire distribution of premium data. In this paper we demonstrate how a quantile regression approach can be used to model net premium data to study the effects of change in the estimates of regression parameters (rating classes) on the magnitude of response variable (pure premium). We then compare the results of quantile regression model with Gamma regression model. The results from quantile regression show that some rating classes increase as quantile increases and some decrease with decreasing quantile. Further, we found that the confidence interval of median regression (τ = O.5) is always smaller than Gamma regression in all risk factors.
DEFF Research Database (Denmark)
Fitzenberger, Bernd; Wilke, Ralf Andreas
2015-01-01
Quantile regression is emerging as a popular statistical approach, which complements the estimation of conditional mean models. While the latter only focuses on one aspect of the conditional distribution of the dependent variable, the mean, quantile regression provides more detailed insights by m...... treatment of the topic is based on the perspective of applied researchers using quantile regression in their empirical work....
Hierarchical linear regression models for conditional quantiles
Institute of Scientific and Technical Information of China (English)
TIAN Maozai; CHEN Gemai
2006-01-01
The quantile regression has several useful features and therefore is gradually developing into a comprehensive approach to the statistical analysis of linear and nonlinear response models,but it cannot deal effectively with the data with a hierarchical structure.In practice,the existence of such data hierarchies is neither accidental nor ignorable,it is a common phenomenon.To ignore this hierarchical data structure risks overlooking the importance of group effects,and may also render many of the traditional statistical analysis techniques used for studying data relationships invalid.On the other hand,the hierarchical models take a hierarchical data structure into account and have also many applications in statistics,ranging from overdispersion to constructing min-max estimators.However,the hierarchical models are virtually the mean regression,therefore,they cannot be used to characterize the entire conditional distribution of a dependent variable given high-dimensional covariates.Furthermore,the estimated coefficient vector (marginal effects)is sensitive to an outlier observation on the dependent variable.In this article,a new approach,which is based on the Gauss-Seidel iteration and taking a full advantage of the quantile regression and hierarchical models,is developed.On the theoretical front,we also consider the asymptotic properties of the new method,obtaining the simple conditions for an n1/2-convergence and an asymptotic normality.We also illustrate the use of the technique with the real educational data which is hierarchical and how the results can be explained.
Modeling energy expenditure in children and adolescents using quantile regression
Advanced mathematical models have the potential to capture the complex metabolic and physiological processes that result in energy expenditure (EE). Study objective is to apply quantile regression (QR) to predict EE and determine quantile-dependent variation in covariate effects in nonobese and obes...
Group Lasso for high dimensional sparse quantile regression models
Kato, Kengo
2011-01-01
This paper studies the statistical properties of the group Lasso estimator for high dimensional sparse quantile regression models where the number of explanatory variables (or the number of groups of explanatory variables) is possibly much larger than the sample size while the number of variables in "active" groups is sufficiently small. We establish a non-asymptotic bound on the $\\ell_{2}$-estimation error of the estimator. This bound explains situations under which the group Lasso estimator is potentially superior/inferior to the $\\ell_{1}$-penalized quantile regression estimator in terms of the estimation error. We also propose a data-dependent choice of the tuning parameter to make the method more practical, by extending the original proposal of Belloni and Chernozhukov (2011) for the $\\ell_{1}$-penalized quantile regression estimator. As an application, we analyze high dimensional additive quantile regression models. We show that under a set of primitive regularity conditions, the group Lasso estimator c...
DEFF Research Database (Denmark)
Bache, Stefan Holst
A new and alternative quantile regression estimator is developed and it is shown that the estimator is root n-consistent and asymptotically normal. The estimator is based on a minimax ‘deviance function’ and has asymptotically equivalent properties to the usual quantile regression estimator. It is......, however, a different and therefore new estimator. It allows for both linear- and nonlinear model specifications. A simple algorithm for computing the estimates is proposed. It seems to work quite well in practice but whether it has theoretical justification is still an open question....
Time-adaptive quantile regression
DEFF Research Database (Denmark)
Møller, Jan Kloppenborg; Nielsen, Henrik Aalborg; Madsen, Henrik
2008-01-01
An algorithm for time-adaptive quantile regression is presented. The algorithm is based on the simplex algorithm, and the linear optimization formulation of the quantile regression problem is given. The observations have been split to allow a direct use of the simplex algorithm. The simplex method...... and an updating procedure are combined into a new algorithm for time-adaptive quantile regression, which generates new solutions on the basis of the old solution, leading to savings in computation time. The suggested algorithm is tested against a static quantile regression model on a data set with wind power...... production, where the models combine splines and quantile regression. The comparison indicates superior performance for the time-adaptive quantile regression in all the performance parameters considered....
Quantile regression theory and applications
Davino, Cristina; Vistocco, Domenico
2013-01-01
A guide to the implementation and interpretation of Quantile Regression models This book explores the theory and numerous applications of quantile regression, offering empirical data analysis as well as the software tools to implement the methods. The main focus of this book is to provide the reader with a comprehensivedescription of the main issues concerning quantile regression; these include basic modeling, geometrical interpretation, estimation and inference for quantile regression, as well as issues on validity of the model, diagnostic tools. Each methodological aspect is explored and
Two-step variable selection in quantile regression models
Directory of Open Access Journals (Sweden)
FAN Yali
2015-06-01
Full Text Available We propose a two-step variable selection procedure for high dimensional quantile regressions,in which the dimension of the covariates, pn is much larger than the sample size n. In the first step, we perform l1 penalty, and we demonstrate that the first step penalized estimator with the LASSO penalty can reduce the model from an ultra-high dimensional to a model whose size has the same order as that of the true model, and the selected model can cover the true model. The second step excludes the remained irrelevant covariates by applying the adaptive LASSO penalty to the reduced model obtained from the first step. Under some regularity conditions, we show that our procedure enjoys the model selection consistency. We conduct a simulation study and a real data analysis to evaluate the finite sample performance of the proposed approach.
Linear Quantile Mixed Models: The lqmm Package for Laplace Quantile Regression
Directory of Open Access Journals (Sweden)
Marco Geraci
2014-05-01
Full Text Available Inference in quantile analysis has received considerable attention in the recent years. Linear quantile mixed models (Geraci and Bottai 2014 represent a ?exible statistical tool to analyze data from sampling designs such as multilevel, spatial, panel or longitudinal, which induce some form of clustering. In this paper, I will show how to estimate conditional quantile functions with random e?ects using the R package lqmm. Modeling, estimation and inference are discussed in detail using a real data example. A thorough description of the optimization algorithms is also provided.
Efficient Quantile Estimation for Functional-Coefficient Partially Linear Regression Models
Institute of Scientific and Technical Information of China (English)
Zhangong ZHOU; Rong JIANG; Weimin QIAN
2011-01-01
The quantile estimation methods are proposed for functional-coefficient partially linear regression (FCPLR) model by combining nonparametric and functional-coefficient regression (FCR) model.The local linear scheme and the integrated method are used to obtain local quantile estimators of all unknown functions in the FCPLR model.These resulting estimators are asymptotically normal,but each of them has big variance.To reduce variances of these quantile estimators,the one-step backfitting technique is used to obtain the efficient quantile estimators of all unknown functions,and their asymptotic normalities are derived.Two simulated examples are carried out to illustrate the proposed estimation methodology.
Autchariyapanitkul, K; S Chanaim; Sriboonchitta, S; DENOEUX, T
2014-01-01
International audience; We consider an inference method for prediction based on belief functions in quantile regression with an asymmetric Laplace distribution. We apply this method to the capital asset pricing model to estimate the beta coefficient and measure volatility under various market conditions at given quantiles. Likelihood-based belief functions are constructed from historical data of the securities in the S&P500 market. The results give us evidence on the systematic risk, in the f...
Competing Risks Quantile Regression at Work
DEFF Research Database (Denmark)
Dlugosz, Stephan; Lo, Simon M. S.; Wilke, Ralf
2017-01-01
Despite its emergence as a frequently used method for the empirical analysis of multivariate data, quantile regression is yet to become a mainstream tool for the analysis of duration data. We present a pioneering empirical study on the grounds of a competing risks quantile regression model. We use...
Directory of Open Access Journals (Sweden)
Wen-Cheng Wang
2014-01-01
Full Text Available It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models.
Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen
2014-01-01
It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models. PMID:24574916
A quantile regression approach for modelling a Health-Related Quality of Life Measure
Directory of Open Access Journals (Sweden)
Giulia Cavrini
2013-05-01
Full Text Available Objective. The aim of this study is to propose a new approach for modeling the EQ-5D index and EQ-5D VAS in order to explain the lifestyle determinants effect using the quantile regression analysis. Methods. Data was collected within a cross-sectional study that involved a probabilistic sample of 1,622 adults randomly selected from the population register of two Health Authorities of Bologna in northern Italy. The perceived health status of people was measured using the EQ-5D questionnaire. The Visual Analogue Scale included in the EQ-5D Questionnaire, the EQ-VAS, and the EQ-5D index were used to obtain the synthetic measures of quality of life. To model EQ-VAS Score and EQ-5D index, a quantile regression analysis was employed. Quantile Regression is a way to estimate the conditional quantiles of the VAS Score distribution in a linear model, in order to have a more complete view of possible associations between a measure of Health Related Quality of Life (dependent variable and socio-demographic and determinants data. This methodological approach was preferred to an OLS regression because of the EQ-VAS Score and EQ-5D index typical distribution. Main Results. The analysis suggested that age, gender, and comorbidity can explain variability in perceived health status measured by the EQ-5D index and the VAS.
Haben, Stephen
2016-01-01
We present a model for generating probabilistic forecasts by combining kernel density estimation (KDE) and quantile regression techniques, as part of the probabilistic load forecasting track of the Global Energy Forecasting Competition 2014. The KDE method is initially implemented with a time-decay parameter. We later improve this method by conditioning on the temperature or the period of the week variables to provide more accurate forecasts. Secondly, we develop a simple but effective quantile regression forecast. The novel aspects of our methodology are two-fold. First, we introduce symmetry into the time-decay parameter of the kernel density estimation based forecast. Secondly we combine three probabilistic forecasts with different weights for different periods of the month.
Statistical downscaling modeling with quantile regression using lasso to estimate extreme rainfall
Santri, Dewi; Wigena, Aji Hamim; Djuraidah, Anik
2016-02-01
Rainfall is one of the climatic elements with high diversity and has many negative impacts especially extreme rainfall. Therefore, there are several methods that required to minimize the damage that may occur. So far, Global circulation models (GCM) are the best method to forecast global climate changes include extreme rainfall. Statistical downscaling (SD) is a technique to develop the relationship between GCM output as a global-scale independent variables and rainfall as a local- scale response variable. Using GCM method will have many difficulties when assessed against observations because GCM has high dimension and multicollinearity between the variables. The common method that used to handle this problem is principal components analysis (PCA) and partial least squares regression. The new method that can be used is lasso. Lasso has advantages in simultaneuosly controlling the variance of the fitted coefficients and performing automatic variable selection. Quantile regression is a method that can be used to detect extreme rainfall in dry and wet extreme. Objective of this study is modeling SD using quantile regression with lasso to predict extreme rainfall in Indramayu. The results showed that the estimation of extreme rainfall (extreme wet in January, February and December) in Indramayu could be predicted properly by the model at quantile 90th.
Quantile Regression With Measurement Error
Wei, Ying
2009-08-27
Regression quantiles can be substantially biased when the covariates are measured with error. In this paper we propose a new method that produces consistent linear quantile estimation in the presence of covariate measurement error. The method corrects the measurement error induced bias by constructing joint estimating equations that simultaneously hold for all the quantile levels. An iterative EM-type estimation algorithm to obtain the solutions to such joint estimation equations is provided. The finite sample performance of the proposed method is investigated in a simulation study, and compared to the standard regression calibration approach. Finally, we apply our methodology to part of the National Collaborative Perinatal Project growth data, a longitudinal study with an unusual measurement error structure. © 2009 American Statistical Association.
Functional data analysis of generalized regression quantiles
Guo, Mengmeng
2013-11-05
Generalized regression quantiles, including the conditional quantiles and expectiles as special cases, are useful alternatives to the conditional means for characterizing a conditional distribution, especially when the interest lies in the tails. We develop a functional data analysis approach to jointly estimate a family of generalized regression quantiles. Our approach assumes that the generalized regression quantiles share some common features that can be summarized by a small number of principal component functions. The principal component functions are modeled as splines and are estimated by minimizing a penalized asymmetric loss measure. An iterative least asymmetrically weighted squares algorithm is developed for computation. While separate estimation of individual generalized regression quantiles usually suffers from large variability due to lack of sufficient data, by borrowing strength across data sets, our joint estimation approach significantly improves the estimation efficiency, which is demonstrated in a simulation study. The proposed method is applied to data from 159 weather stations in China to obtain the generalized quantile curves of the volatility of the temperature at these stations. © 2013 Springer Science+Business Media New York.
Testing for Stock Market Contagion: A Quantile Regression Approach
S.Y. Park (Sung); W. Wang (Wendun); N. Huang (Naijing)
2015-01-01
markdownabstract__Abstract__ Regarding the asymmetric and leptokurtic behavior of financial data, we propose a new contagion test in the quantile regression framework that is robust to model misspecification. Unlike conventional correlation-based tests, the proposed quantile contagion test
Competing Risks Quantile Regression at Work
DEFF Research Database (Denmark)
Dlugosz, Stephan; Lo, Simon M. S.; Wilke, Ralf
2017-01-01
Despite its emergence as a frequently used method for the empirical analysis of multivariate data, quantile regression is yet to become a mainstream tool for the analysis of duration data. We present a pioneering empirical study on the grounds of a competing risks quantile regression model. We us...... into the distribution of transitions out of maternity leave. It is found that cumulative incidences implied by the quantile regression model differ from those implied by a proportional hazards model. To foster the use of the model, we make an R-package (cmprskQR) available....... large-scale maternity duration data with multiple competing risks derived from German linked social security records to analyse how public policies are related to the length of economic inactivity of young mothers after giving birth. Our results show that the model delivers detailed insights...
Ben Alaya, M. A.; Chebana, F.; Ouarda, T. B. M. J.
2016-09-01
Statistical downscaling techniques are required to refine atmosphere-ocean global climate data and provide reliable meteorological information such as a realistic temporal variability and relationships between sites and variables in a changing climate. To this end, the present paper introduces a modular structure combining two statistical tools of increasing interest during the last years: (1) Gaussian copula and (2) quantile regression. The quantile regression tool is employed to specify the entire conditional distribution of downscaled variables and to address the limitations of traditional regression-based approaches whereas the Gaussian copula is performed to describe and preserve the dependence between both variables and sites. A case study based on precipitation and maximum and minimum temperatures from the province of Quebec, Canada, is used to evaluate the performance of the proposed model. Obtained results suggest that this approach is capable of generating series with realistic correlation structures and temporal variability. Furthermore, the proposed model performed better than a classical multisite multivariate statistical downscaling model for most evaluation criteria.
Quantile regression applied to spectral distance decay
Rocchini, D.; Cade, B.S.
2008-01-01
Remotely sensed imagery has long been recognized as a powerful support for characterizing and estimating biodiversity. Spectral distance among sites has proven to be a powerful approach for detecting species composition variability. Regression analysis of species similarity versus spectral distance allows us to quantitatively estimate the amount of turnover in species composition with respect to spectral and ecological variability. In classical regression analysis, the residual sum of squares is minimized for the mean of the dependent variable distribution. However, many ecological data sets are characterized by a high number of zeroes that add noise to the regression model. Quantile regressions can be used to evaluate trend in the upper quantiles rather than a mean trend across the whole distribution of the dependent variable. In this letter, we used ordinary least squares (OLS) and quantile regressions to estimate the decay of species similarity versus spectral distance. The achieved decay rates were statistically nonzero (p species similarity when habitats are more similar. In this letter, we demonstrated the power of using quantile regressions applied to spectral distance decay to reveal species diversity patterns otherwise lost or underestimated by OLS regression. ?? 2008 IEEE.
Focused information criterion and model averaging based on weighted composite quantile regression
Xu, Ganggang
2013-08-13
We study the focused information criterion and frequentist model averaging and their application to post-model-selection inference for weighted composite quantile regression (WCQR) in the context of the additive partial linear models. With the non-parametric functions approximated by polynomial splines, we show that, under certain conditions, the asymptotic distribution of the frequentist model averaging WCQR-estimator of a focused parameter is a non-linear mixture of normal distributions. This asymptotic distribution is used to construct confidence intervals that achieve the nominal coverage probability. With properly chosen weights, the focused information criterion based WCQR estimators are not only robust to outliers and non-normal residuals but also can achieve efficiency close to the maximum likelihood estimator, without assuming the true error distribution. Simulation studies and a real data analysis are used to illustrate the effectiveness of the proposed procedure. © 2013 Board of the Foundation of the Scandinavian Journal of Statistics..
Testing for Stock Market Contagion: A Quantile Regression Approach
S.Y. Park (Sung); W. Wang (Wendun); N. Huang (Naijing)
2015-01-01
markdownabstract__Abstract__ Regarding the asymmetric and leptokurtic behavior of financial data, we propose a new contagion test in the quantile regression framework that is robust to model misspecification. Unlike conventional correlation-based tests, the proposed quantile contagion test allows
Zhang, Hanze; Huang, Yangxin; Wang, Wei; Chen, Henian; Langland-Orban, Barbara
2017-01-01
In longitudinal AIDS studies, it is of interest to investigate the relationship between HIV viral load and CD4 cell counts, as well as the complicated time effect. Most of common models to analyze such complex longitudinal data are based on mean-regression, which fails to provide efficient estimates due to outliers and/or heavy tails. Quantile regression-based partially linear mixed-effects models, a special case of semiparametric models enjoying benefits of both parametric and nonparametric models, have the flexibility to monitor the viral dynamics nonparametrically and detect the varying CD4 effects parametrically at different quantiles of viral load. Meanwhile, it is critical to consider various data features of repeated measurements, including left-censoring due to a limit of detection, covariate measurement error, and asymmetric distribution. In this research, we first establish a Bayesian joint models that accounts for all these data features simultaneously in the framework of quantile regression-based partially linear mixed-effects models. The proposed models are applied to analyze the Multicenter AIDS Cohort Study (MACS) data. Simulation studies are also conducted to assess the performance of the proposed methods under different scenarios.
Directory of Open Access Journals (Sweden)
Łaszkiewicz Edyta
2016-12-01
Full Text Available The aim of this paper is to evaluate which university’s characteristics have the greatest impact on the competitiveness of universities in their ability to attract better students in Russia. We examined the impact of three groups of factors,related to teaching, research and entrepreneurial activities of universities. The quantile regression model was applied for the subsample of public and private higher education institutions localized in Russia.
Quantile regression provides a fuller analysis of speed data.
Hewson, Paul
2008-03-01
Considerable interest already exists in terms of assessing percentiles of speed distributions, for example monitoring the 85th percentile speed is a common feature of the investigation of many road safety interventions. However, unlike the mean, where t-tests and ANOVA can be used to provide evidence of a statistically significant change, inference on these percentiles is much less common. This paper examines the potential role of quantile regression for modelling the 85th percentile, or any other quantile. Given that crash risk may increase disproportionately with increasing relative speed, it may be argued these quantiles are of more interest than the conditional mean. In common with the more usual linear regression, quantile regression admits a simple test as to whether the 85th percentile speed has changed following an intervention in an analogous way to using the t-test to determine if the mean speed has changed by considering the significance of parameters fitted to a design matrix. Having briefly outlined the technique and briefly examined an application with a widely published dataset concerning speed measurements taken around the introduction of signs in Cambridgeshire, this paper will demonstrate the potential for quantile regression modelling by examining recent data from Northamptonshire collected in conjunction with a "community speed watch" programme. Freely available software is used to fit these models and it is hoped that the potential benefits of using quantile regression methods when examining and analysing speed data are demonstrated.
2013-01-01
Based on the relevant data from 1985 to 2010, this thesis uses a quantile regression model to make an empirical research about the effect of GDP and exchange rate on foreign exchange reserve. The findings show that: Both GDP and exchange rate have a remarkable influence on the size of foreign exchange reserve and the effect of exchange rate on foreign exchange reserve is higher than GDP at mean place and middle and lower quantile, smaller than GDP at higher quantile. At all the examined quant...
Non-crossing weighted kernel quantile regression with right censored data.
Bang, Sungwan; Eo, Soo-Heang; Cho, Yong Mee; Jhun, Myoungshic; Cho, HyungJun
2016-01-01
Regarding survival data analysis in regression modeling, multiple conditional quantiles are useful summary statistics to assess covariate effects on survival times. In this study, we consider an estimation problem of multiple nonlinear quantile functions with right censored survival data. To account for censoring in estimating a nonlinear quantile function, weighted kernel quantile regression (WKQR) has been developed by using the kernel trick and inverse-censoring-probability weights. However, the individually estimated quantile functions based on the WKQR often cross each other and consequently violate the basic properties of quantiles. To avoid this problem of quantile crossing, we propose the non-crossing weighted kernel quantile regression (NWKQR), which estimates multiple nonlinear conditional quantile functions simultaneously by enforcing the non-crossing constraints on kernel coefficients. The numerical results are presented to demonstrate the competitive performance of the proposed NWKQR over the WKQR.
Villain, Jonathan; Minguez, Laetitia; Halm-Lemeille, Marie-Pierre; Durrieu, Gilles; Bureau, Ronan
2016-02-01
The acute toxicities of 36 pharmaceuticals towards green algae were estimated from a set of quantile regression models representing the first global quantitative structure-activity relationships. The selection of these pharmaceuticals was based on their predicted environmental concentrations. An agreement between the estimated values and the observed acute toxicity values was found for several families of pharmaceuticals, in particular, for antidepressants. A recent classification (BDDCS) of drugs based on ADME properties (Absorption, Distribution, Metabolism and Excretion) was clearly correlated with the acute ecotoxicities towards algae. Over-estimation of toxicity from our QSAR models was observed for classes 2, 3 and 4 whereas our model results were in agreement for the class 1 pharmaceuticals. Clarithromycin, a class 3 antibiotic characterized by weak metabolism and high solubility, was the most toxic to algae (molecular stability and presence in surface water).
Analysis of retirement income adequacy using quantile regression: A case study in Malaysia
Alaudin, Ros Idayuwati; Ismail, Noriszura; Isa, Zaidi
2015-09-01
Quantile regression is a statistical analysis that does not restrict attention to the conditional mean and therefore, permitting the approximation of the whole conditional distribution of a response variable. Quantile regression is a robust regression to outliers compared to mean regression models. In this paper, we demonstrate how quantile regression approach can be used to analyze the ratio of projected wealth to needs (wealth-needs ratio) during retirement.
Directory of Open Access Journals (Sweden)
Lu Fang-Yuan
2013-02-01
Full Text Available Based on the relevant data from 1985 to 2010, this thesis uses a quantile regression model to make an empirical research about the effect of GDP and exchange rate on foreign exchange reserve. The findings show that: Both GDP and exchange rate have a remarkable influence on the size of foreign exchange reserve and the effect of exchange rate on foreign exchange reserve is higher than GDP at mean place and middle and lower quantile, smaller than GDP at higher quantile. At all the examined quantiles elastic coefficients of GDP and exchange rate are positive and among them, the elastic coefficients of GDP present us a reverse "V" model with the conditional distribution altering from low to high, that is, the impact of GDP on foreign exchange reserve shows an increasing trend when the latter is smaller, but begins to decrease when the latter reaches to a certain level; the elastic coefficients of exchange rate at lower quantiles are bigger than that of higher quantiles.
A Frisch-Newton Algorithm for Sparse Quantile Regression
Institute of Scientific and Technical Information of China (English)
Roger Koenker; Pin Ng
2005-01-01
Recent experience has shown that interior-point methods using a log barrier approach are far superior to classical simplex methods for computing solutions to large parametric quantile regression problems.In many large empirical applications, the design matrix has a very sparse structure. A typical example is the classical fixed-effect model for panel data where the parametric dimension of the model can be quite large, but the number of non-zero elements is quite small. Adopting recent developments in sparse linear algebra we introduce a modified version of the Frisch-Newton algorithm for quantile regression described in Portnoy and Koenker[28].The new algorithm substantially reduces the storage (memory) requirements and increases computational speed.The modified algorithm also facilitates the development of nonparametric quantile regression methods. The pseudo design matrices employed in nonparametric quantile regression smoothing are inherently sparse in both the fidelity and roughness penalty components. Exploiting the sparse structure of these problems opens up a whole range of new possibilities for multivariate smoothing on large data sets via ANOVA-type decomposition and partial linear models.
BUSINESS GROWTH STRATEGIES OF ILLINOIS FARMS: A QUANTILE REGRESSION APPROACH
Hennings, Enrique; Katchova, Ani L.
2005-01-01
This study examines the business strategies employed by Illinois farms to maintain equity growth using quantile regression analysis. Using data from the Farm Business Farm Management system, this study finds that the effect of different business strategies on equity growth rates differs between quantiles. Financial management strategies have a positive effect for farms situated in the highest quantile of equity growth, while for farms in the lowest quantile the effect on equity growth is nega...
An Analysis of Bank Service Satisfaction Based on Quantile Regression and Grey Relational Analysis
Directory of Open Access Journals (Sweden)
Wen-Tsao Pan
2016-01-01
Full Text Available Bank service satisfaction is vital to the success of a bank. In this paper, we propose to use the grey relational analysis to gauge the levels of service satisfaction of the banks. With the grey relational analysis, we compared the effects of different variables on service satisfaction. We gave ranks to the banks according to their levels of service satisfaction. We further used the quantile regression model to find the variables that affected the satisfaction of a customer at a specific quantile of satisfaction level. The result of the quantile regression analysis provided a bank manager with information to formulate policies to further promote satisfaction of the customers at different quantiles of satisfaction level. We also compared the prediction accuracies of the regression models at different quantiles. The experiment result showed that, among the seven quantile regression models, the median regression model has the best performance in terms of RMSE, RTIC, and CE performance measures.
Statistical modelling with quantile functions
Gilchrist, Warren
2000-01-01
Galton used quantiles more than a hundred years ago in describing data. Tukey and Parzen used them in the 60s and 70s in describing populations. Since then, the authors of many papers, both theoretical and practical, have used various aspects of quantiles in their work. Until now, however, no one put all the ideas together to form what turns out to be a general approach to statistics.Statistical Modelling with Quantile Functions does just that. It systematically examines the entire process of statistical modelling, starting with using the quantile function to define continuous distributions. The author shows that by using this approach, it becomes possible to develop complex distributional models from simple components. A modelling kit can be developed that applies to the whole model - deterministic and stochastic components - and this kit operates by adding, multiplying, and transforming distributions rather than data.Statistical Modelling with Quantile Functions adds a new dimension to the practice of stati...
Hallin, Marc; Šiman, Miroslav; 10.1214/09-AOS723
2010-01-01
A new multivariate concept of quantile, based on a directional version of Koenker and Bassett's traditional regression quantiles, is introduced for multivariate location and multiple-output regression problems. In their empirical version, those quantiles can be computed efficiently via linear programming techniques. Consistency, Bahadur representation and asymptotic normality results are established. Most importantly, the contours generated by those quantiles are shown to coincide with the classical halfspace depth contours associated with the name of Tukey. This relation does not only allow for efficient depth contour computations by means of parametric linear programming, but also for transferring from the quantile to the depth universe such asymptotic results as Bahadur representations. Finally, linear programming duality opens the way to promising developments in depth-related multivariate rank-based inference.
Cannon, Alex J.
2011-09-01
The qrnn package for R implements the quantile regression neural network, which is an artificial neural network extension of linear quantile regression. The model formulation follows from previous work on the estimation of censored regression quantiles. The result is a nonparametric, nonlinear model suitable for making probabilistic predictions of mixed discrete-continuous variables like precipitation amounts, wind speeds, or pollutant concentrations, as well as continuous variables. A differentiable approximation to the quantile regression error function is adopted so that gradient-based optimization algorithms can be used to estimate model parameters. Weight penalty and bootstrap aggregation methods are used to avoid overfitting. For convenience, functions for quantile-based probability density, cumulative distribution, and inverse cumulative distribution functions are also provided. Package functions are demonstrated on a simple precipitation downscaling task.
Spatial Quantile Regression In Analysis Of Healthy Life Years In The European Union Countries
Directory of Open Access Journals (Sweden)
Trzpiot Grażyna
2016-12-01
Full Text Available The paper investigates the impact of the selected factors on the healthy life years of men and women in the EU countries. The multiple quantile spatial autoregression models are used in order to account for substantial differences in the healthy life years and life quality across the EU members. Quantile regression allows studying dependencies between variables in different quantiles of the response distribution. Moreover, this statistical tool is robust against violations of the classical regression assumption about the distribution of the error term. Parameters of the models were estimated using instrumental variable method (Kim, Muller 2004, whereas the confidence intervals and p-values were bootstrapped.
Quantile Regression in the Study of Developmental Sciences
Petscher, Yaacov; Logan, Jessica A. R.
2014-01-01
Linear regression analysis is one of the most common techniques applied in developmental research, but only allows for an estimate of the average relations between the predictor(s) and the outcome. This study describes quantile regression, which provides estimates of the relations between the predictor(s) and outcome, but across multiple points of…
Non-Stationary Hydrologic Frequency Analysis using B-Splines Quantile Regression
Nasri, B.; St-Hilaire, A.; Bouezmarni, T.; Ouarda, T.
2015-12-01
Hydrologic frequency analysis is commonly used by engineers and hydrologists to provide the basic information on planning, design and management of hydraulic structures and water resources system under the assumption of stationarity. However, with increasing evidence of changing climate, it is possible that the assumption of stationarity would no longer be valid and the results of conventional analysis would become questionable. In this study, we consider a framework for frequency analysis of extreme flows based on B-Splines quantile regression, which allows to model non-stationary data that have a dependence on covariates. Such covariates may have linear or nonlinear dependence. A Markov Chain Monte Carlo (MCMC) algorithm is used to estimate quantiles and their posterior distributions. A coefficient of determination for quantiles regression is proposed to evaluate the estimation of the proposed model for each quantile level. The method is applied on annual maximum and minimum streamflow records in Ontario, Canada. Climate indices are considered to describe the non-stationarity in these variables and to estimate the quantiles in this case. The results show large differences between the non-stationary quantiles and their stationary equivalents for annual maximum and minimum discharge with high annual non-exceedance probabilities. Keywords: Quantile regression, B-Splines functions, MCMC, Streamflow, Climate indices, non-stationarity.
Ahn, Kuk-Hyun; Palmer, Richard
2016-09-01
Despite wide use of regression-based regional flood frequency analysis (RFFA) methods, the majority are based on either ordinary least squares (OLS) or generalized least squares (GLS). This paper proposes 'spatial proximity' based RFFA methods using the spatial lagged model (SLM) and spatial error model (SEM). The proposed methods are represented by two frameworks: the quantile regression technique (QRT) and parameter regression technique (PRT). The QRT develops prediction equations for flooding quantiles in average recurrence intervals (ARIs) of 2, 5, 10, 20, and 100 years whereas the PRT provides prediction of three parameters for the selected distribution. The proposed methods are tested using data incorporating 30 basin characteristics from 237 basins in Northeastern United States. Results show that generalized extreme value (GEV) distribution properly represents flood frequencies in the study gages. Also, basin area, stream network, and precipitation seasonality are found to be the most effective explanatory variables in prediction modeling by the QRT and PRT. 'Spatial proximity' based RFFA methods provide reliable flood quantile estimates compared to simpler methods. Compared to the QRT, the PRT may be recommended due to its accuracy and computational simplicity. The results presented in this paper may serve as one possible guidepost for hydrologists interested in flood analysis at ungaged sites.
Determinants of Birthweight Outcomes: Quantile Regressions Based on Panel Data
DEFF Research Database (Denmark)
Bache, Stefan Holst; Dahl, Christian Møller; Kristensen, Johannes Tang
regression framework in order to control for heterogeneity and to infer conclusions about causality across the entire birthweight distribution. We obtain estimation results for maternal smoking and other interesting determinants, applying these to data obtained from Aarhus University Hospital, Skejby...... to the possibility that smoking habits can be influenced through policy conduct. It is widely believed that maternal smoking reduces birthweight; however, the crucial difficulty in estimating such effects is the unobserved heterogeneity among mothers. We consider extensions of three panel data models to a quantile...... and significance of prenatal smoking. Controlling for unobserved effects does not change the fact that smoking reduces birthweight, but it shows that the effect is primarily a problem in the left tail of the distribution on a slightly smaller scale....
Zhao, Wei; Fan, Shaojia; Guo, Hai; Gao, Bo; Sun, Jiaren; Chen, Laiguo
2016-11-01
The quantile regression (QR) method has been increasingly introduced to atmospheric environmental studies to explore the non-linear relationship between local meteorological conditions and ozone mixing ratios. In this study, we applied QR for the first time, together with multiple linear regression (MLR), to analyze the dominant meteorological parameters influencing the mean, 10th percentile, 90th percentile and 99th percentile of maximum daily 8-h average (MDA8) ozone concentrations in 2000-2015 in Hong Kong. The dominance analysis (DA) was used to assess the relative importance of meteorological variables in the regression models. Results showed that the MLR models worked better at suburban and rural sites than at urban sites, and worked better in winter than in summer. QR models performed better in summer for 99th and 90th percentiles and performed better in autumn and winter for 10th percentile. And QR models also performed better in suburban and rural areas for 10th percentile. The top 3 dominant variables associated with MDA8 ozone concentrations, changing with seasons and regions, were frequently associated with the six meteorological parameters: boundary layer height, humidity, wind direction, surface solar radiation, total cloud cover and sea level pressure. Temperature rarely became a significant variable in any season, which could partly explain the peak of monthly average ozone concentrations in October in Hong Kong. And we found the effect of solar radiation would be enhanced during extremely ozone pollution episodes (i.e., the 99th percentile). Finally, meteorological effects on MDA8 ozone had no significant changes before and after the 2010 Asian Games.
Institute of Scientific and Technical Information of China (English)
Xi-zhi Wu; Mao-zai Tian
2008-01-01
Quantile regression is gradually emerging as a powerful tool for estimating models of conditional quantile functions, and therefore research in this area has vastly increased in the past two decades. This paper, with the quantile regression technique, is the first comprehensive longitudinal study on mathematics participation data collected in Alberta, Canada. The major advantage of longitudinal study is its capability to separate the so-called cohort and age effects in the context of population studies. One aim of this paper is to study whether the family background factors alter performance on the mathematical achievement of the strongest students in the same way as that of weaker students based on the large longitudinal sample of 2000,2001 and 2002 mathematics participation longitudinal data set. The interesting findings suggest that there may be differential family background factor effects at different points in the mathematical achievement conditional distribution.
Relationship between Urbanization and Cancer Incidence in Iran Using Quantile Regression.
Momenyan, Somayeh; Sadeghifar, Majid; Sarvi, Fatemeh; Khodadost, Mahmoud; Mosavi-Jarrahi, Alireza; Ghaffari, Mohammad Ebrahim; Sekhavati, Eghbal
2016-01-01
Quantile regression is an efficient method for predicting and estimating the relationship between explanatory variables and percentile points of the response distribution, particularly for extreme percentiles of the distribution. To study the relationship between urbanization and cancer morbidity, we here applied quantile regression. This cross-sectional study was conducted for 9 cancers in 345 cities in 2007 in Iran. Data were obtained from the Ministry of Health and Medical Education and the relationship between urbanization and cancer morbidity was investigated using quantile regression and least square regression. Fitting models were compared using AIC criteria. R (3.0.1) software and the Quantreg package were used for statistical analysis. With the quantile regression model all percentiles for breast, colorectal, prostate, lung and pancreas cancers demonstrated increasing incidence rate with urbanization. The maximum increase for breast cancer was in the 90th percentile (β=0.13, p-valuecancer was in the 75th percentile (β=0.048, p-valuecancer the 95th percentile (β=0.55, p-valuecancer was in 95th percentile (β=0.52, p-value=0.006), for pancreas cancer was in 10th percentile (β=0.011, p-valuecancers, with increasing urbanization, the incidence rate was decreased. The maximum decrease for gastric cancer was in the 90th percentile(β=0.003, p-valuecancer the 95th (β=0.04, p-value=0.4) and for skin cancer also the 95th (β=0.145, p-value=0.071). The AIC showed that for upper percentiles, the fitting of quantile regression was better than least square regression. According to the results of this study, the significant impact of urbanization on cancer morbidity requirs more effort and planning by policymakers and administrators in order to reduce risk factors such as pollution in urban areas and ensure proper nutrition recommendations are made.
Directory of Open Access Journals (Sweden)
I Nyoman Radiarta
2011-12-01
Full Text Available In the development of scallop cultivation in Japan, larvae collection and propagation become an important factor. Although the monitoring program has been conducted, modeling of species distribution is becoming an important tool for understanding the effects of environmental changes and resources management. This study was conducted to construct a model for providing estimation of the scallop larvae distribution in Funka Bay, Hokkaido, Japan using the integration of remote sensing, Regression Quantile (RQ and Geographic Information System (GIS-based model. Data on scallop larvae were collected during one year spawning season from April to July 2003. Environmental parameters were extracted from multi sensor remotely sensed data (chlorophyll-a and sea surface temperature and a hydrographic chart (water depth. These parameters together with larvae data were then analyzed using RQ. Finally, spatial models were constructed within a GIS by combining the RQ models with digital map of environmental parameters. The results show that the model was best explained by using only sea surface temperature. The highest larvae densities were predicted in a relatively broad distribution along with the shallow water regions (Toyoura and Sawara to Yakumo and the deeper water areas (center of the bay. The spatial model built from the RQ provided robust estimation of the scallop larvae distributions in the study area, as confirmed by model validation using independent data. These findings could contribute on the monitoring program in this region in order to distinguish the potential areas for an effective spat collection.
Does intense monitoring matter? A quantile regression approach
Directory of Open Access Journals (Sweden)
Fekri Ali Shawtari
2017-06-01
Full Text Available Corporate governance has become a centre of attention in corporate management at both micro and macro levels due to adverse consequences and repercussion of insufficient accountability. In this study, we include the Malaysian stock market as sample to explore the impact of intense monitoring on the relationship between intellectual capital performance and market valuation. The objectives of the paper are threefold: i to investigate whether intense monitoring affects the intellectual capital performance of listed companies; ii to explore the impact of intense monitoring on firm value; iii to examine the extent to which the directors serving more than two board committees affects the linkage between intellectual capital performance and firms' value. We employ two approaches, namely, the Ordinary Least Square (OLS and the quantile regression approach. The purpose of the latter is to estimate and generate inference about conditional quantile functions. This method is useful when the conditional distribution does not have a standard shape such as an asymmetric, fat-tailed, or truncated distribution. In terms of variables, the intellectual capital is measured using the value added intellectual coefficient (VAIC, while the market valuation is proxied by firm's market capitalization. The findings of the quantile regression shows that some of the results do not coincide with the results of OLS. We found that intensity of monitoring does not influence the intellectual capital of all firms. It is also evident that intensity of monitoring does not influence the market valuation. However, to some extent, it moderates the relationship between intellectual capital performance and market valuation. This paper contributes to the existing literature as it presents new empirical evidences on the moderating effects of the intensity of monitoring of the board committees on the relationship between performance and intellectual capital.
A probabilistic risk assessment for dengue fever by a threshold based-quantile regression
Chiu, Chuan-Hung; Tan, Yih-Chi; Wen, Tzai-Hung; Chien, Lung-Chang; Yu, Hwa-Lung
2014-05-01
This article introduces an important concept "return period" to analyze potential incident rate of dengue fever by bringing together two models: the quantile regression model and the threshold-based method. The return period provided the frequency of incidence of dengue fever, and established the risk maps for potential incidence of dengue fever to point out highest risk in certain areas. A threshold-based linear quantile regression model was constructed to find significantly main effects and interactions based on collinearity test and stepwise selection, and also showed the performance of our model via pseudo R2. Finally, the spatial risk maps of the specified return periods and average incident rates were given, and indicated that high population density place (e.g., residential area), water conservancy facilities, and corresponding interactions could lead to a positive influence on dengue fever. These factors would be the key point to disease protection in a given study area.
Determinants of the Slovak Enterprises Profi tability: Quantile Regression Approach
Directory of Open Access Journals (Sweden)
Štefan Kováč
2013-09-01
Full Text Available Th e goal of this paper is to analyze profi tability of the Slovak enterprises by means of quantile regression. Th eanalysis is based on individual data from the 2001, 2006 and 2011 fi nancial statements of the Slovak companies.Profi tability is proxied by ratio of profi t/loss to total assets, and twelve covariates are used in the study,including two nominal variables: region and sector. According to the fi ndings size, short- and long-term indebtedness,ratio of long-term assets to total assets, ratio of sales revenue to cost of sales, region and sectorare the possible determinants of profi tability of the companies in Slovakia. Th e results further suggest that thechanges over time have infl uenced the magnitude of the eff ects of given variables.
Semiparametric Quantile Modelling of Hierarchical Data
Institute of Scientific and Technical Information of China (English)
Mao Zai TIAN; Man Lai TANG; Ping Shing CHAN
2009-01-01
The classic hierarchical linear model formulation provides a considerable flexibility for modelling the random effects structure and a powerful tool for analyzing nested data that arise in various areas such as biology, economics and education. However, it assumes the within-group errors to be independently and identically distributed (i.i.d.) and models at all levels to be linear. Most importantly, traditional hierarchical models (just like other ordinary mean regression methods) cannot characterize the entire conditional distribution of a dependent variable given a set of covariates and fail to yield robust estimators. In this article, we relax the aforementioned and normality assumptions, and develop a so-called Hierarchical Semiparametric Quantile Regression Models in which the within-group errors could be heteroscedastic and models at some levels are allowed to be nonparametric. We present the ideas with a 2-level model. The level-l model is specified as a nonparametric model whereas level-2 model is set as a parametric model. Under the proposed semiparametric setting the vector of partial derivatives of the nonparametric function in level-1 becomes the response variable vector in level 2. The proposed method allows us to model the fixed effects in the innermost level (i.e., level 2) as a function of the covariates instead of a constant effect. We outline some mild regularity conditions required for convergence and asymptotic normality for our estimators. We illustrate our methodology with a real hierarchical data set from a laboratory study and some simulation studies.
Roscoe, K. L.; Weerts, A. H.
2012-04-01
Water level predictions in rivers are used by operational managers to make water management decisions. Such decisions can concern water routing in times of drought, operation of weirs, and actions for flood protection, such as evacuation. Understanding the uncertainty in the predictions can help managers make better-informed decisions. Conditional Quantile Regression is a method that can be used to determine the uncertainty in forecasted water levels by providing an estimate of the probability density function of the error in the prediction conditional on the forecasted water level. To derive this relationship, a series of forecasts and errors in the forecasts (residuals) are required. Thus, conditional quantile regressions can be derived for locations where both observations and forecasts are available. However, 1D-hydraulic models that are used for operational forecasting produce forecasts at intermediate points where no measurements are available but for which predictive uncertainty estimates are also desired for decision making. The objective of our study is to test if interpolation methods can be used to adequately estimate conditional quantile regressions at these in-between locations. For this purpose, five years of hindcasts were used at seven stations along the IJssel River in the Netherlands. Residuals in water level hindcasts were interpolated at the five in-between lying stations. The interpolation was based solely on distance and the interpolated residuals were compared to the measured residuals at stations at the in-between locations. The resulting interpolated residuals estimated the measured residuals well, especially for longer lead times. Quantile regression was then carried out using the series of forecasts and interpolated residuals at the in-between stations. The interpolated quantile regressions were compared with regressions calibrated using the actual residuals at the in-between stations. Results show that even a simple interpolation based
Cannon, Alex
2017-04-01
univariate technique, and cannot incorporate information from additional covariates, for example ENSO state or physiographic controls on extreme rainfall within a region. Here, the univariate MQR model is extended to allow the use of multiple covariates. Multivariate monotone quantile regression (MMQR) is based on a single hidden-layer feedforward network with the quantile regression error function and partial monotonicity constraints. The MMQR model is demonstrated via Monte Carlo simulations and the estimation and visualization of regional trends in moderate rainfall extremes based on homogenized sub-daily precipitation data at stations in Canada.
Predictive densities for day-ahead electricity prices using time-adaptive quantile regression
DEFF Research Database (Denmark)
Jónsson, Tryggvi; Pinson, Pierre; Madsen, Henrik;
2014-01-01
is compared to that of four benchmark approaches and the well-known the generalist autoregressive conditional heteroskedasticity (GARCH) model over a three-year evaluation period. While all benchmarks are outperformed in terms of forecasting skill overall, the superiority of the semi-parametric model over......A large part of the decision-making problems actors of the power system are facing on a daily basis requires scenarios for day-ahead electricity market prices. These scenarios are most likely to be generated based on marginal predictive densities for such prices, then enhanced with a temporal...... dependence structure. A semi-parametric methodology for generating such densities is presented: it includes: (i) a time-adaptive quantile regression model for the 5%–95% quantiles; and (ii) a description of the distribution tails with exponential distributions. The forecasting skill of the proposed model...
Chernozhukov, Victor
2009-01-01
Quantile regression is an increasingly important empirical tool in economics and other sciences for analyzing the impact of a set of regressors on the conditional distribution of an outcome. Extremal quantile regression, or quantile regression applied to the tails, is of interest in many economic and financial applications, such as conditional value-at-risk, production efficiency, and adjustment bands in (S,s) models. In this paper we provide feasible inference tools for extremal conditional quantile models that rely upon extreme value approximations to the distribution of self-normalized quantile regression statistics. The methods are simple to implement and can be of independent interest even in the non-regression case. We illustrate the results with two empirical examples analyzing extreme fluctuations of a stock return and extremely low percentiles of live infants' birthweights in the range between 250 and 1500 grams.
A Software Reliability Model Using Quantile Function
Directory of Open Access Journals (Sweden)
Bijamma Thomas
2014-01-01
Full Text Available We study a class of software reliability models using quantile function. Various distributional properties of the class of distributions are studied. We also discuss the reliability characteristics of the class of distributions. Inference procedures on parameters of the model based on L-moments are studied. We apply the proposed model to a real data set.
Return-Volatility Relationship: Insights from Linear and Non-Linear Quantile Regression
D.E. Allen (David); A.K. Singh (Abhay); R.J. Powell (Robert); M.J. McAleer (Michael); J. Taylor (James); L. Thomas (Lyn)
2013-01-01
textabstractThe purpose of this paper is to examine the asymmetric relationship between price and implied volatility and the associated extreme quantile dependence using linear and non linear quantile regression approach. Our goal in this paper is to demonstrate that the relationship between the
Forecasting Uncertainty in Electricity Smart Meter Data by Boosting Additive Quantile Regression
Taieb, Souhaib Ben
2016-03-02
Smart electricity meters are currently deployed in millions of households to collect detailed individual electricity consumption data. Compared with traditional electricity data based on aggregated consumption, smart meter data are much more volatile and less predictable. There is a need within the energy industry for probabilistic forecasts of household electricity consumption to quantify the uncertainty of future electricity demand in order to undertake appropriate planning of generation and distribution. We propose to estimate an additive quantile regression model for a set of quantiles of the future distribution using a boosting procedure. By doing so, we can benefit from flexible and interpretable models, which include an automatic variable selection. We compare our approach with three benchmark methods on both aggregated and disaggregated scales using a smart meter data set collected from 3639 households in Ireland at 30-min intervals over a period of 1.5 years. The empirical results demonstrate that our approach based on quantile regression provides better forecast accuracy for disaggregated demand, while the traditional approach based on a normality assumption (possibly after an appropriate Box-Cox transformation) is a better approximation for aggregated demand. These results are particularly useful since more energy data will become available at the disaggregated level in the future.
Refining Our Understanding of Beta through Quantile Regressions
Directory of Open Access Journals (Sweden)
Allen B. Atkins
2014-05-01
Full Text Available The Capital Asset Pricing Model (CAPM has been a key theory in financial economics since the 1960s. One of its main contributions is to attempt to identify how the risk of a particular stock is related to the risk of the overall stock market using the risk measure Beta. If the relationship between an individual stock’s returns and the returns of the market exhibit heteroskedasticity, then the estimates of Beta for different quantiles of the relationship can be quite different. The behavioral ideas first proposed by Kahneman and Tversky (1979, which they called prospect theory, postulate that: (i people exhibit “loss-aversion” in a gain frame; and (ii people exhibit “risk-seeking” in a loss frame. If this is true, people could prefer lower Beta stocks after they have experienced a gain and higher Beta stocks after they have experienced a loss. Stocks that exhibit converging heteroskedasticity (22.2% of our sample should be preferred by investors, and stocks that exhibit diverging heteroskedasticity (12.6% of our sample should not be preferred. Investors may be able to benefit by choosing portfolios that are more closely aligned with their preferences.
Liu, Xiang; Saat, M Rapik; Qin, Xiao; Barkan, Christopher P L
2013-10-01
Derailments are the most common type of freight-train accidents in the United States. Derailments cause damage to infrastructure and rolling stock, disrupt services, and may cause casualties and harm the environment. Accordingly, derailment analysis and prevention has long been a high priority in the rail industry and government. Despite the low probability of a train derailment, the potential for severe consequences justify the need to better understand the factors influencing train derailment severity. In this paper, a zero-truncated negative binomial (ZTNB) regression model is developed to estimate the conditional mean of train derailment severity. Recognizing that the mean is not the only statistic describing data distribution, a quantile regression (QR) model is also developed to estimate derailment severity at different quantiles. The two regression models together provide a better understanding of train derailment severity distribution. Results of this work can be used to estimate train derailment severity under various operational conditions and by different accident causes. This research is intended to provide insights regarding development of cost-efficient train safety policies. Copyright © 2013 Elsevier Ltd. All rights reserved.
Unconditional quantile regressions to determine the social gradient of obesity in Spain 1993-2014.
Rodriguez-Caro, Alejandro; Vallejo-Torres, Laura; Lopez-Valcarcel, Beatriz
2016-10-19
There is a well-documented social gradient in obesity in most developed countries. Many previous studies have conventionally categorised individuals according to their body mass index (BMI), focusing on those above a certain threshold and thus ignoring a large amount of the BMI distribution. Others have used linear BMI models, relying on mean effects that may mask substantial heterogeneity in the effects of socioeconomic variables across the population. In this study, we measure the social gradient of the BMI distribution of the adult population in Spain over the past two decades (1993-2014), using unconditional quantile regressions. We use three socioeconomic variables (education, income and social class) and evaluate differences in the corresponding effects on different percentiles of the log-transformed BMI distribution. Quantile regression methods have the advantage of estimating the socioeconomic effect across the whole BMI distribution allowing for this potential heterogeneity. The results showed a large and increasing social gradient in obesity in Spain, especially among females. There is, however, a large degree of heterogeneity in the socioeconomic effect across the BMI distribution, with patterns that vary according to the socioeconomic indicator under study. While the income and educational gradient is greater at the end of the BMI distribution, the main impact of social class is around the median BMI values. A steeper social gradient is observed with respect to educational level rather than household income or social class. The findings of this study emphasise the heterogeneous nature of the relationship between social factors and obesity across the BMI distribution as a whole. Quantile regression methods might provide a more suitable framework for exploring the complex socioeconomic gradient of obesity.
Exploratory quantile regression with many covariates: an application to adverse birth outcomes.
Burgette, Lane F; Reiter, Jerome P; Miranda, Marie Lynn
2011-11-01
Covariates may affect continuous responses differently at various points of the response distribution. For example, some exposure might have minimal impact on conditional means, whereas it might lower conditional 10th percentiles sharply. Such differential effects can be important to detect. In studies of the determinants of birth weight, for instance, it is critical to identify exposures like the one above, since low birth weight is a risk factor for later health problems. Effects of covariates on the tails of distributions can be obscured by models (such as linear regression) that estimate conditional means; however, effects on tails can be detected by quantile regression. We present 2 approaches for exploring high-dimensional predictor spaces to identify important predictors for quantile regression. These are based on the lasso and elastic net penalties. We apply the approaches to a prospective cohort study of adverse birth outcomes that includes a wide array of demographic, medical, psychosocial, and environmental variables. Although tobacco exposure is known to be associated with lower birth weights, the analysis suggests an interesting interaction effect not previously reported: tobacco exposure depresses the 20th and 30th percentiles of birth weight more strongly when mothers have high levels of lead in their blood compared with those who have low blood lead levels.
Cade, Brian S.; Noon, Barry R.; Scherer, Rick D.; Keane, John J.
2017-01-01
Counts of avian fledglings, nestlings, or clutch size that are bounded below by zero and above by some small integer form a discrete random variable distribution that is not approximated well by conventional parametric count distributions such as the Poisson or negative binomial. We developed a logistic quantile regression model to provide estimates of the empirical conditional distribution of a bounded discrete random variable. The logistic quantile regression model requires that counts are randomly jittered to a continuous random variable, logit transformed to bound them between specified lower and upper values, then estimated in conventional linear quantile regression, repeating the 3 steps and averaging estimates. Back-transformation to the original discrete scale relies on the fact that quantiles are equivariant to monotonic transformations. We demonstrate this statistical procedure by modeling 20 years of California Spotted Owl fledgling production (0−3 per territory) on the Lassen National Forest, California, USA, as related to climate, demographic, and landscape habitat characteristics at territories. Spotted Owl fledgling counts increased nonlinearly with decreasing precipitation in the early nesting period, in the winter prior to nesting, and in the prior growing season; with increasing minimum temperatures in the early nesting period; with adult compared to subadult parents; when there was no fledgling production in the prior year; and when percentage of the landscape surrounding nesting sites (202 ha) with trees ≥25 m height increased. Changes in production were primarily driven by changes in the proportion of territories with 2 or 3 fledglings. Average variances of the discrete cumulative distributions of the estimated fledgling counts indicated that temporal changes in climate and parent age class explained 18% of the annual variance in owl fledgling production, which was 34% of the total variance. Prior fledgling production explained as much of
Habyarimana, Faustin; Zewotir, Temesgen; Ramroop, Shaun
2017-06-17
Childhood anemia is among the most significant health problems faced by public health departments in developing countries. This study aims at assessing the determinants and possible spatial effects associated with childhood anemia in Rwanda. The 2014/2015 Rwanda Demographic and Health Survey (RDHS) data was used. The analysis was done using the structured spatial additive quantile regression model. The findings of this study revealed that the child's age; the duration of breastfeeding; gender of the child; the nutritional status of the child (whether underweight and/or wasting); whether the child had a fever; had a cough in the two weeks prior to the survey or not; whether the child received vitamin A supplementation in the six weeks before the survey or not; the household wealth index; literacy of the mother; mother's anemia status; mother's age at the birth are all significant factors associated with childhood anemia in Rwanda. Furthermore, significant structured spatial location effects on childhood anemia was found.
Directory of Open Access Journals (Sweden)
Eliseu Verly-Jr
Full Text Available A reduction in homocysteine concentration due to the use of supplemental folic acid is well recognized, although evidence of the same effect for natural folate sources, such as fruits and vegetables (FV, is lacking. The traditional statistical analysis approaches do not provide further information. As an alternative, quantile regression allows for the exploration of the effects of covariates through percentiles of the conditional distribution of the dependent variable.To investigate how the associations of FV intake with plasma total homocysteine (tHcy differ through percentiles in the distribution using quantile regression.A cross-sectional population-based survey was conducted among 499 residents of Sao Paulo City, Brazil. The participants provided food intake and fasting blood samples. Fruit and vegetable intake was predicted by adjusting for day-to-day variation using a proper measurement error model. We performed a quantile regression to verify the association between tHcy and the predicted FV intake. The predicted values of tHcy for each percentile model were calculated considering an increase of 200 g in the FV intake for each percentile.The results showed that tHcy was inversely associated with FV intake when assessed by linear regression whereas, the association was different when using quantile regression. The relationship with FV consumption was inverse and significant for almost all percentiles of tHcy. The coefficients increased as the percentile of tHcy increased. A simulated increase of 200 g in the FV intake could decrease the tHcy levels in the overall percentiles, but the higher percentiles of tHcy benefited more.This study confirms that the effect of FV intake on lowering the tHcy levels is dependent on the level of tHcy using an innovative statistical approach. From a public health point of view, encouraging people to increase FV intake would benefit people with high levels of tHcy.
Zhu, H.; Duan, L; Guo, Y.; K. Yu
2016-01-01
This study investigates the impact of foreign direct investment (FDI), economic growth and energy consumption on carbon emissions in five selected member countries in the Association of South East Asian Nations (ASEAN-5), including Indonesia, Malaysia, the Philippines, Singapore and Thailand. This paper employs a panel quantile regression model that takes unobserved individual heterogeneity and distributional heterogeneity into consideration. Moreover, to avoid an omitted variable bias, certa...
Directory of Open Access Journals (Sweden)
Nora Fenske
Full Text Available BACKGROUND: Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. OBJECTIVE: We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. DESIGN: Using cross-sectional data for children aged 0-24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. RESULTS: At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. CONCLUSIONS: Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role.
Quantile regression for the statistical analysis of immunological data with many non-detects
Eilers Paul HC; Röder Esther; Savelkoul Huub FJ; van Wijk Roy
2012-01-01
Abstract Background Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced statistical techniques currently available for the analysis of datasets with non-detects can only be used if a small percentage of the data are non-detects. Methods and results Quantile regression, a genera...
A quantile count model of water depth constraints on Cape Sable seaside sparrows
Cade, B.S.; Dong, Q.
2008-01-01
1. A quantile regression model for counts of breeding Cape Sable seaside sparrows Ammodramus maritimus mirabilis (L.) as a function of water depth and previous year abundance was developed based on extensive surveys, 1992-2005, in the Florida Everglades. The quantile count model extends linear quantile regression methods to discrete response variables, providing a flexible alternative to discrete parametric distributional models, e.g. Poisson, negative binomial and their zero-inflated counterparts. 2. Estimates from our multiplicative model demonstrated that negative effects of increasing water depth in breeding habitat on sparrow numbers were dependent on recent occupation history. Upper 10th percentiles of counts (one to three sparrows) decreased with increasing water depth from 0 to 30 cm when sites were not occupied in previous years. However, upper 40th percentiles of counts (one to six sparrows) decreased with increasing water depth for sites occupied in previous years. 3. Greatest decreases (-50% to -83%) in upper quantiles of sparrow counts occurred as water depths increased from 0 to 15 cm when previous year counts were 1, but a small proportion of sites (5-10%) held at least one sparrow even as water depths increased to 20 or 30 cm. 4. A zero-inflated Poisson regression model provided estimates of conditional means that also decreased with increasing water depth but rates of change were lower and decreased with increasing previous year counts compared to the quantile count model. Quantiles computed for the zero-inflated Poisson model enhanced interpretation of this model but had greater lack-of-fit for water depths > 0 cm and previous year counts 1, conditions where the negative effect of water depths were readily apparent and fitted better with the quantile count model.
Institute of Scientific and Technical Information of China (English)
吕亚召; 张日权; 赵为华; 刘吉彩
2014-01-01
本文提出复合最小化平均分位数损失估计方法(composite minimizing average check loss estimation,CMACLE)用于实现部分线性单指标模型(partial linear single-index models,PLSIM)的复合分位数回归(composite quantile regression,CQR).首先基于高维核函数构造参数部分的复合分位数回归意义下的相合估计,在此相合估计的基础上,通过采用指标核函数进一步得到参数和非参数函数的可达最优收敛速度的估计,并建立所得估计的渐近正态性,比较PLSIM的CQR估计和最小平均方差估计(MAVE)的相对渐近效率.进一步地,本文提出CQR框架下PLSIM的变量选择方法,证明所提变量选择方法的oracle性质.随机模拟和实例分析验证了所提方法在有限样本时的表现,证实了所提方法的优良性.
Worm plot to diagnose fit in quantile regression
Buuren, S. van
2007-01-01
The worm plot is a series of detrended Q-Q plots, split by covariate levels. The worm plot is a diagnostic tool for visualizing how well a statistical model fits the data, for finding locations at which the fit can be improved, and for comparing the fit of different models. This paper shows how the
Worm plot to diagnose fit in quantile regression
Buuren, S. van
2007-01-01
The worm plot is a series of detrended Q-Q plots, split by covariate levels. The worm plot is a diagnostic tool for visualizing how well a statistical model fits the data, for finding locations at which the fit can be improved, and for comparing the fit of different models. This paper shows how
Worm plot to diagnose fit in quantile regression
Buuren, S. van
2007-01-01
The worm plot is a series of detrended Q-Q plots, split by covariate levels. The worm plot is a diagnostic tool for visualizing how well a statistical model fits the data, for finding locations at which the fit can be improved, and for comparing the fit of different models. This paper shows how the
Institute of Scientific and Technical Information of China (English)
XING Chunbing
2007-01-01
In this paper,quantile regressions is used to estimate wage equations of different ownerships.Quantile regressions give us distributions rather than a single estimate of the returns both to education and experience in each ownership sector.For state-owned enterprises (SOE),the returns to education tended to be larger at the bottom of the conditional distribution of wages in 1991 and 1993,and there was no such trend in 1997.For the private sector,however,the retrains to education tended to be larger at the top positions in 1993 and 1997.It is also found that the growth rates of the wages at the bottom of the conditional distribution of wages arc higher than those at the top in SOEs.No such patterns for the private sector is found.It is suggested the wage mechanism in the private sector is more market-oriented.
Direct Marketing and the Structure of Farm Sales: An Unconditional Quantile Regression Approach
Park, Timothy A.
2015-01-01
This paper examines the impact of participation in direct marketing on the entire distribution of farm sales using the unconditional quantile regression (UQR) estimator. Our analysis yields unbiased estimates of the unconditional impact of direct marketing on farm sales and reveals the heterogeneous effects that occur across the distribution of farm sales. The impacts of direct marketing efforts are uniformly negative across the UQR results, but declines in sales tend to grow smaller as sales...
DIFFERENCES IN DECLINE: QUANTILE REGRESSION OF MALE–FEMALE EARNINGS DIFFERENTIAL IN MALAYSIA
SIEW CHING GOY; GERAINT JOHNES
2015-01-01
Semiparametric estimation has gained significant attention in the study of wage inequality between men and women in recent years. By extending the wage gap at the mean towards the entire wage distribution using quantile regression, it enables researchers to ascertain the direction and the proportions of differences in characteristics and returns to these characteristics at different parts of the wage distribution. This line of research has been prominent in western society but has not yet bee...
Direct Marketing and the Structure of Farm Sales: An Unconditional Quantile Regression Approach
Park, Timothy A.
2015-01-01
This paper examines the impact of participation in direct marketing on the entire distribution of farm sales using the unconditional quantile regression (UQR) estimator. Our analysis yields unbiased estimates of the unconditional impact of direct marketing on farm sales and reveals the heterogeneous effects that occur across the distribution of farm sales. The impacts of direct marketing efforts are uniformly negative across the UQR results, but declines in sales tend to grow smaller as sales...
A quantile regression approach to the analysis of the quality of life determinants in the elderly
Directory of Open Access Journals (Sweden)
Serena Broccoli
2013-05-01
Full Text Available Objective. The aim of this study is to explain the effect of important covariates on the health-related quality of life (HRQol in elderly subjects. Methods. Data were collected within a longitudinal study that involves 5256 subject, aged +or= 65. The Visual Analogue Scale inclused in the EQ-5D Questionnaire, tha EQ-VAS, was used to obtain a synthetic measure of quality of life. To model EQ-VAS Score a quantile regression analysis was employed. This methodological approach was preferred to an OLS regression becouse of the EQ-VAS Score typical distribution. The main covariates are: amount of weekly physical activity, reported problems in Activity of Daily Living, presence of cardiovascular diseases, diabetes, hypercolesterolemia, hypertension, joints pains, as well as socio-demographic information. Main Results. 1 Even a low level of physical activity significantly influences quality of life in a positive way; 2 ADL problems, at least one cardiovascular disease and joint pain strongly decrease the quality of life.
A-Collapsibility of Distribution Dependence and Quantile Regression Coefficients
Meerschaert, Mark M
2010-01-01
The Yule-Simpson paradox notes that an association between random variables can be reversed when averaged over a background variable. Cox and Wermuth (2003) introduced the concept of distribution dependence between two random variables X and Y , and developed two dependence conditions, each of which guarantees that reversal cannot occur. Ma, Xie and Geng (2006) studied the collapsibility of distribution dependence over a background variable W, under a rather strong homogeneity condition. Collapsibility ensures the association remains the same for conditional and marginal models, so that Yule-Simpson reversal cannot occur. In this paper, we investigate a more general condition for avoiding e?ect reversal: A-collapsibility. The conditions of Cox and Wermuth imply A-collapsibility, without assuming homogeneity. In fact, we show that, when W is a binary variable, collapsibility is equivalent to A-collapsibility plus homogeneity, and A-collapsibility is equivalent to the conditions of Cox and Wermuth. Recently, Co...
Directory of Open Access Journals (Sweden)
Horst Entorf
2015-07-01
Full Text Available Two alternative hypotheses – referred to as opportunity- and stigma-based behavior – suggest that the magnitude of the link between unemployment and crime also depends on preexisting local crime levels. In order to analyze conjectured nonlinearities between both variables, we use quantile regressions applied to German district panel data. While both conventional OLS and quantile regressions confirm the positive link between unemployment and crime for property crimes, results for assault differ with respect to the method of estimation. Whereas conventional mean regressions do not show any significant effect (which would confirm the usual result found for violent crimes in the literature, quantile regression reveals that size and importance of the relationship are conditional on the crime rate. The partial effect is significantly positive for moderately low and median quantiles of local assault rates.
Directory of Open Access Journals (Sweden)
Sholto Radford
Full Text Available Mindfulness has been suggested to be an important protective factor for emotional health. However, this effect might vary with regard to context. This study applied a novel statistical approach, quantile regression, in order to investigate the relation between trait mindfulness and residual depressive symptoms in individuals with a history of recurrent depression, while taking into account symptom severity and number of episodes as contextual factors. Rather than fitting to a single indicator of central tendency, quantile regression allows exploration of relations across the entire range of the response variable. Analysis of self-report data from 274 participants with a history of three or more previous episodes of depression showed that relatively higher levels of mindfulness were associated with relatively lower levels of residual depressive symptoms. This relationship was most pronounced near the upper end of the response distribution and moderated by the number of previous episodes of depression at the higher quantiles. The findings suggest that with lower levels of mindfulness, residual symptoms are less constrained and more likely to be influenced by other factors. Further, the limiting effect of mindfulness on residual symptoms is most salient in those with higher numbers of episodes.
Radford, Sholto; Eames, Catrin; Brennan, Kate; Lambert, Gwladys; Crane, Catherine; Williams, J. Mark G.; Duggan, Danielle S.; Barnhofer, Thorsten
2014-01-01
Mindfulness has been suggested to be an important protective factor for emotional health. However, this effect might vary with regard to context. This study applied a novel statistical approach, quantile regression, in order to investigate the relation between trait mindfulness and residual depressive symptoms in individuals with a history of recurrent depression, while taking into account symptom severity and number of episodes as contextual factors. Rather than fitting to a single indicator of central tendency, quantile regression allows exploration of relations across the entire range of the response variable. Analysis of self-report data from 274 participants with a history of three or more previous episodes of depression showed that relatively higher levels of mindfulness were associated with relatively lower levels of residual depressive symptoms. This relationship was most pronounced near the upper end of the response distribution and moderated by the number of previous episodes of depression at the higher quantiles. The findings suggest that with lower levels of mindfulness, residual symptoms are less constrained and more likely to be influenced by other factors. Further, the limiting effect of mindfulness on residual symptoms is most salient in those with higher numbers of episodes. PMID:24988072
EFFECT OF HUMAN CAPITAL ON MAIZE PRODUCTIVITY IN GHANA: A QUANTILE REGRESSION APPROACH
Directory of Open Access Journals (Sweden)
Isaac Nyamekye
2016-04-01
Full Text Available Agriculture continues to play an important role in the economy of most African countries. Thus, productivity growth in agriculture is necessary for economic growth and poverty reduction of the region. While, theoretically, investing in human capital improves productivity, the empirical evidence is somewhat mixed, especially in developing countries. In Ghana, maize is associated with household food security, and low-income households are considered food insecure if they have no maize in stock. But, due to low productivity, Ghanaian farmers are yet to produce enough to meet local demand. Using quantile and OLS regression techniques, this study contributes to the literature on human capital and productivity by assessing the effect of human capital (captured by education, farming experience and access to extension services on maize productivity in Ghana. The results suggest that although human capital has no significant effect on maize yields, its effect on productivity varies across quantiles.
Directory of Open Access Journals (Sweden)
Giovanni Bonaccolto
2016-07-01
Full Text Available Several market and macro-level variables influence the evolution of equity risk in addition to the well-known volatility persistence. However, the impact of those covariates might change depending on the risk level, being different between low and high volatility states. By combining equity risk estimates, obtained from the Realized Range Volatility, corrected for microstructure noise and jumps, and quantile regression methods, we evaluate the forecasting implications of the equity risk determinants in different volatility states and, without distributional assumptions on the realized range innovations, we recover both the points and the conditional distribution forecasts. In addition, we analyse how the the relationships among the involved variables evolve over time, through a rolling window procedure. The results show evidence of the selected variables’ relevant impacts and, particularly during periods of market stress, highlight heterogeneous effects across quantiles.
Study on Adaptive Lasso Quantile Regression for Panel Data Models%面板数据的自适应 Lasso 分位回归方法研究
Institute of Scientific and Technical Information of China (English)
李子强; 田茂再; 罗幼喜
2014-01-01
如何在对参数进行估计的同时自动选择重要解释变量，一直是面板数据分位回归模型中讨论的热点问题之一。通过构造一种含多重随机效应的贝叶斯分层分位回归模型，在假定固定效应系数先验服从一种新的条件Laplace分布的基础上，给出了模型参数估计的Gibbs抽样算法。考虑到不同重要程度的解释变量权重系数压缩程度应该不同，所构造的先验信息具有自适应性的特点，能够准确地对模型中重要解释变量进行自动选取，且设计的切片Gibbs抽样算法能够快速有效地解决模型中各个参数的后验均值估计问题。模拟结果显示，新方法在参数估计精确度和变量选择准确度上均优于现有文献的常用方法。通过对中国各地区多个宏观经济指标的面板数据进行建模分析，演示了新方法估计参数与挑选变量的能力。%How to do parameter estimation and variable selection simultaneously is a hot issue in the study of quantile regression for panel data models .On the base of the assumption that the fixed effect coefficients are subject to a novel conditional Laplace prior ,the paper constructs a hierarchical Bayesian quantile regression model and gives the Gibbs sample algorithm for the unknow n parameter estimation .In consideration of different explain variables should have different shrinkage degree ,the proposed prior has the property of adaptivity ,w hich could select the important explain variables in the model automatically . Furthermore ,the slice Gibbs sample algorithm that the paper proposed is able to estimate the posteriori mean estimation of unknown parameter quickly and efficiently .Monte Carlo simulation study indicates that the proposed method is obviously superior to the existing methods in literatures on the accuracy of parameter estimation and variable selection .Finally ,the paper gives a research of modeling the panel data including several
Quantile hydrologic model selection and model structure deficiency assessment: 2. Applications
Pande, S.
2013-01-01
Quantile hydrologic model selection and structure deficiency assessment is applied in three case studies. The performance of quantile model selection problem is rigorously evaluated using a model structure on the French Broad river basin data set. The case study shows that quantile model selection
Quantile regression and Bayesian cluster detection to identify radon prone areas.
Sarra, Annalina; Fontanella, Lara; Valentini, Pasquale; Palermi, Sergio
2016-11-01
Albeit the dominant source of radon in indoor environments is the geology of the territory, many studies have demonstrated that indoor radon concentrations also depend on dwelling-specific characteristics. Following a stepwise analysis, in this study we propose a combined approach to delineate radon prone areas. We first investigate the impact of various building covariates on indoor radon concentrations. To achieve a more complete picture of this association, we exploit the flexible formulation of a Bayesian spatial quantile regression, which is also equipped with parameters that controls the spatial dependence across data. The quantitative knowledge of the influence of each significant building-specific factor on the measured radon levels is employed to predict the radon concentrations that would have been found if the sampled buildings had possessed standard characteristics. Those normalised radon measures should reflect the geogenic radon potential of the underlying ground, which is a quantity directly related to the geological environment. The second stage of the analysis is aimed at identifying radon prone areas, and to this end, we adopt a Bayesian model for spatial cluster detection using as reference unit the building with standard characteristics. The case study is based on a data set of more than 2000 indoor radon measures, available for the Abruzzo region (Central Italy) and collected by the Agency of Environmental Protection of Abruzzo, during several indoor radon monitoring surveys. Copyright © 2016 Elsevier Ltd. All rights reserved.
Structured Additive Quantile Regression for Assessing the Determinants of Childhood Anemia in Rwanda
Directory of Open Access Journals (Sweden)
Faustin Habyarimana
2017-06-01
Full Text Available Childhood anemia is among the most significant health problems faced by public health departments in developing countries. This study aims at assessing the determinants and possible spatial effects associated with childhood anemia in Rwanda. The 2014/2015 Rwanda Demographic and Health Survey (RDHS data was used. The analysis was done using the structured spatial additive quantile regression model. The findings of this study revealed that the child’s age; the duration of breastfeeding; gender of the child; the nutritional status of the child (whether underweight and/or wasting; whether the child had a fever; had a cough in the two weeks prior to the survey or not; whether the child received vitamin A supplementation in the six weeks before the survey or not; the household wealth index; literacy of the mother; mother’s anemia status; mother’s age at the birth are all significant factors associated with childhood anemia in Rwanda. Furthermore, significant structured spatial location effects on childhood anemia was found.
Mulyani, Sri; Andriyana, Yudhie; Sudartianto
2017-03-01
Mean regression is a statistical method to explain the relationship between the response variable and the predictor variable based on the central tendency of the data (mean) of the response variable. The parameter estimation in mean regression (with Ordinary Least Square or OLS) generates a problem if we apply it to the data with a symmetric, fat-tailed, or containing outlier. Hence, an alternative method is necessary to be used to that kind of data, for example quantile regression method. The quantile regression is a robust technique to the outlier. This model can explain the relationship between the response variable and the predictor variable, not only on the central tendency of the data (median) but also on various quantile, in order to obtain complete information about that relationship. In this study, a quantile regression is developed with a nonparametric approach such as smoothing spline. Nonparametric approach is used if the prespecification model is difficult to determine, the relation between two variables follow the unknown function. We will apply that proposed method to poverty data. Here, we want to estimate the Percentage of Poor People as the response variable involving the Human Development Index (HDI) as the predictor variable.
Domestic Multinationals and Foreign-Owned Firms in Italy: Evidence from Quantile Regression
Directory of Open Access Journals (Sweden)
Grasseni, Mara
2010-06-01
Full Text Available This paper investigates the performance differences across and within foreign-owned firms and domestic multinationals in Italy. Used for the empirical analysis are non-parametric tests based on the concept of first order stochastic dominance and quantile regression technique. The firm-level analysis distinguishes between foreign-owned firms of different nationalities and domestic MNEs according to the location of their FDI, and it focuses not only on productivity but also on differences in average wages, capital intensity, and financial and non-financial indicators, namely ROS, ROI and debt leverage. Overall, the results provide evidence of remarkable heterogeneity across and within multinationals. In particular, it seems not possible to identify a clear foreign advantage at least in terms of productivity, because foreign-owned firms do not outperform domestic multinationals. Interesting results are obtained when focusing on ROS and ROI, where the profitability gaps change as one moves from the bottom to the top of the conditional distribution. Domestic multinationals investing only in developed countries present higher ROS and ROI compared with the subgroups of foreign-owned firms, but only at the lower quantiles, while at the upper quantiles the advantage seems to favour foreign firms. Finally, in regard to domestic multinationals, there is strong evidence that those active only in less developed countries persistently exhibit the worst performances
Quantile uncertainty and value-at-risk model risk.
Alexander, Carol; Sarabia, José María
2012-08-01
This article develops a methodology for quantifying model risk in quantile risk estimates. The application of quantile estimates to risk assessment has become common practice in many disciplines, including hydrology, climate change, statistical process control, insurance and actuarial science, and the uncertainty surrounding these estimates has long been recognized. Our work is particularly important in finance, where quantile estimates (called Value-at-Risk) have been the cornerstone of banking risk management since the mid 1980s. A recent amendment to the Basel II Accord recommends additional market risk capital to cover all sources of "model risk" in the estimation of these quantiles. We provide a novel and elegant framework whereby quantile estimates are adjusted for model risk, relative to a benchmark which represents the state of knowledge of the authority that is responsible for model risk. A simulation experiment in which the degree of model risk is controlled illustrates how to quantify Value-at-Risk model risk and compute the required regulatory capital add-on for banks. An empirical example based on real data shows how the methodology can be put into practice, using only two time series (daily Value-at-Risk and daily profit and loss) from a large bank. We conclude with a discussion of potential applications to nonfinancial risks.
Chris Sakellariou
2005-01-01
Singapore is among the countries who have well-developed vocational education and training programs. It follows a centralized planning model in which the needs of industry and business are closely matched to the output of the education system. This study examines the pattern of returns to formal vs. vocational education across quantiles. It is hypothesized that heterogeneity in “abilities” which contribute to higher earnings is related to schooling acquisition. It is found that marginal retur...
Distribution of Budget Shares for Food: An Application of Quantile Regression to Food Security 1
Directory of Open Access Journals (Sweden)
Charles B. Moss
2016-04-01
Full Text Available This study examines, using quantile regression, the linkage between food security and efforts to enhance smallholder coffee producer incomes in Rwanda. Even though in Rwanda smallholder coffee producer incomes have increased, inhabitants these areas still experience stunting and wasting. This study examines whether the distribution of the income elasticity for food is the same for coffee and noncoffee growing provinces. We find that that the share of expenditures on food is statistically different in coffee growing and noncoffee growing provinces. Thus, the increase in expenditure on food is smaller for coffee growing provinces than noncoffee growing provinces.
Quantile Regression for Right-Censored and Length-Biased Data
Institute of Scientific and Technical Information of China (English)
Xue-rong CHEN; Yong ZHOU
2012-01-01
Length-biased data arise in many important fields,including epidemiological cohort studies,cancer screening trials and labor economics.Analysis of such data has attracted much attention in the literature.In this paper we propose a quantile regression approach for analyzing right-censored and length-biased data.We derive an inverse probability weighted estimating equation corresponding to the quantile regression to correct the bias due to length-bias sampling and informative censoring. This method can easily handle informative censoring induced by length-biased sampling.This is an appealing feature of our proposed method since it is generally difficult to obtain unbiased estimates of risk factors in the presence of length-bias and informative censoring.We establish the consistency and asymptotic distribution of the proposed estimator using empirical process techniques.A resampling method is adopted to estimate the variance of the estimator.We conduct simulation studies to evaluate its finite sample performance and use a real data set to illustrate the application of the proposed method.
Institute of Scientific and Technical Information of China (English)
冷冬; 巴曙松
2014-01-01
Asset price bubble is the deviation of price from the fundamental values .The existence of a bubble changes investors'expectations ,boosts asset prices and causes investors to underestimate risk ,and endangers the stability of market operation .Here ,the bubble of Shanghai and Shenzhen stock market was measured ,and the impact of bubbles on market risk based on quantile regression model was studied .The results show that market risk is correlated with bubbles ;the larger the bubble ,the greater the risk ,and the greater its impacts on long‐term risk than on short‐term risk .Both short‐term bubble and long‐term bubble affect market risk :w hile in the short term ,a bubble boosts asset prices and reduces risk ;w hile in the long term ,the probability of a bubble's collapse increases ,thus increasing market risk .%资产“泡沫”是指资产的市场价格对其基础价值的偏离。泡沫的存在会改变投资者预期，助推资产价格上涨，使投资者低估风险，危及市场稳定运行。在度量沪深股票市场中的泡沫程度的基础上，应用分位数回归模型研究了泡沫对市场风险的影响。得出以下结论：泡沫与市场风险存在相关关系，泡沫越大市场风险越大，且泡沫对长期风险的影响比短期风险更大。市场风险受到短期泡沫和长期泡沫的共同影响。泡沫在短期内助推资产价格上涨，使短期内市场风险下降；而长期来看，泡沫破裂可能性增大，市场风险增加。
Tighe, Elizabeth L.; Schatschneider, Christopher
2015-01-01
The purpose of this study was to investigate the joint and unique contributions of morphological awareness and vocabulary knowledge at five reading comprehension levels in Adult Basic Education (ABE) students. We introduce the statistical technique of multiple quantile regression, which enabled us to assess the predictive utility of morphological awareness and vocabulary knowledge at multiple points (quantiles) along the continuous distribution of reading comprehension. To demonstrate the efficacy of our multiple quantile regression analysis, we compared and contrasted our results with a traditional multiple regression analytic approach. Our results indicated that morphological awareness and vocabulary knowledge accounted for a large portion of the variance (82-95%) in reading comprehension skills across all quantiles. Morphological awareness exhibited the greatest unique predictive ability at lower levels of reading comprehension whereas vocabulary knowledge exhibited the greatest unique predictive ability at higher levels of reading comprehension. These results indicate the utility of using multiple quantile regression to assess trajectories of component skills across multiple levels of reading comprehension. The implications of our findings for ABE programs are discussed. PMID:25351773
Stock Market Autoregressive Dynamics: A Multinational Comparative Study with Quantile Regression
Directory of Open Access Journals (Sweden)
Lili Li
2016-01-01
Full Text Available We study the nonlinear autoregressive dynamics of stock index returns in seven major advanced economies (G7 and China. The quantile autoregression model (QAR enables us to investigate the autocorrelation across the whole spectrum of return distribution, which provides more insightful conditional information on multinational stock market dynamics than conventional time series models. The relation between index return and contemporaneous trading volume is also investigated. While prior studies have mixed results on stock market autocorrelations, we find that the dynamics is usually state dependent. The results for G7 stock markets exhibit conspicuous similarities, but they are in manifest contrast to the findings on Chinese stock markets.
Directory of Open Access Journals (Sweden)
Margherita Velucchi
2014-09-01
Full Text Available Labor productivity is very complex to analyze across time, sectors and countries. In particular, in Italy, labor productivity has shown a prolonged slowdown but sector analyses highlight the presence of specific niches that have good levels of productivity and performance. This paper investigates how firms' characteristics might have affected the dynamics of the Italian service and manufacturing firms labor productivity in recent years (1998-2007, comparing them and focusing on some relevant sectors. We use a micro level original panel from the Italian National Institute of Statistics (ISTAT and a longitudinal quantile regression approach that allow us to show that labor productivity is highly heterogeneous across sectors and that the links between labor productivity and firms' characteristics are not constant across quantiles. We show that average estimates obtained via GLS do not capture the complex dynamics and heterogeneity of the service and manufacturing firms' labor productivity. Using this approach, we show that innovativeness and human capital, in particular, have a very strong impact on fostering labor productivity of lower productive firms. From the sector analysis on four service' sectors (restaurants & hotels, trade distributors, trade shops and legal & accountants we show that heterogeneity is more intense at a sector level and we derive some common features that may be useful in terms of policy implications.
Jaber, Abobaker M; Ismail, Mohd Tahir; Altaher, Alsaidi M
2014-01-01
This paper mainly forecasts the daily closing price of stock markets. We propose a two-stage technique that combines the empirical mode decomposition (EMD) with nonparametric methods of local linear quantile (LLQ). We use the proposed technique, EMD-LLQ, to forecast two stock index time series. Detailed experiments are implemented for the proposed method, in which EMD-LPQ, EMD, and Holt-Winter methods are compared. The proposed EMD-LPQ model is determined to be superior to the EMD and Holt-Winter methods in predicting the stock closing prices.
Directory of Open Access Journals (Sweden)
Abobaker M. Jaber
2014-01-01
Full Text Available This paper mainly forecasts the daily closing price of stock markets. We propose a two-stage technique that combines the empirical mode decomposition (EMD with nonparametric methods of local linear quantile (LLQ. We use the proposed technique, EMD-LLQ, to forecast two stock index time series. Detailed experiments are implemented for the proposed method, in which EMD-LPQ, EMD, and Holt-Winter methods are compared. The proposed EMD-LPQ model is determined to be superior to the EMD and Holt-Winter methods in predicting the stock closing prices.
Simulating Quantile Models with Applications to Economics and Management
Machado, José A. F.
2010-05-01
The massive increase in the speed of computers over the past forty years changed the way that social scientists, applied economists and statisticians approach their trades and also the very nature of the problems that they could feasibly tackle. The new methods that use intensively computer power go by the names of "computer-intensive" or "simulation". My lecture will start with bird's eye view of the uses of simulation in Economics and Statistics. Then I will turn out to my own research on uses of computer- intensive methods. From a methodological point of view the question I address is how to infer marginal distributions having estimated a conditional quantile process, (Counterfactual Decomposition of Changes in Wage Distributions using Quantile Regression," Journal of Applied Econometrics 20, 2005). Illustrations will be provided of the use of the method to perform counterfactual analysis in several different areas of knowledge.
Directory of Open Access Journals (Sweden)
Abobaker M. Jaber
2014-01-01
Full Text Available Empirical mode decomposition (EMD is particularly useful in analyzing nonstationary and nonlinear time series. However, only partial data within boundaries are available because of the bounded support of the underlying time series. Consequently, the application of EMD to finite time series data results in large biases at the edges by increasing the bias and creating artificial wiggles. This study introduces a new two-stage method to automatically decrease the boundary effects present in EMD. At the first stage, local polynomial quantile regression (LLQ is applied to provide an efficient description of the corrupted and noisy data. The remaining series is assumed to be hidden in the residuals. Hence, EMD is applied to the residuals at the second stage. The final estimate is the summation of the fitting estimates from LLQ and EMD. Simulation was conducted to assess the practical performance of the proposed method. Results show that the proposed method is superior to classical EMD.
Prediction of Risk Premium Based on Quantile Regression%基于分位回归的风险保费预测
Institute of Scientific and Technical Information of China (English)
杨亮; 孟生旺
2016-01-01
风险保费预测是非寿险费率厘定的重要组成部分。在传统的分位回归厘定风险保费中，通常假设分位数水平是事先给定的，缺乏一定的客观性。为此，提出了一种应用分位回归厘定风险保费的新方法。基于破产概率确定保单组合的总风险保费，建立个体保单的分位回归模型，并与总风险保费建立等式关系，通过数值方法求解出分位数水平，实现对个体保单风险保费的预测。通过一组实际数据分析表明，该方法具有良好的预测效果。%Prediction of risk premiums is an important part in the non-life insurance premium ratemaking.During the determining the risk premium in the traditional quantile regression,quantile level is generally given in advance,which is lack of obj ectivity.Therefore,we propose a new method for determining the risk premium by applying of quantile regression.Firstly,determining the total risk premiums based on the probability of bankruptcy;secondly,to establish quantile regression model in individual policies,and establish relationships with total risk premium;finally,to solve for quantile level by numerical methods,obtain the individual risk premiums.Basing on a set of actual data,the results demonstrates that the method has high forecasting accuracy.
Binder, Martin; Coad, Alex
2010-01-01
Standard regression techniques are only able to give an incomplete picture of the relationship between subjective well-being and its determinants since the very idea of conventional estimators such as OLS is the averaging out over the whole distribution: studies based on such regression techniques thus are implicitly only interested in Average Joe's happiness. Using cross-sectional data from the British Household Panel Survey (BHPS) for the year 2006, we apply quantile regressions to analyze ...
Tighe, Elizabeth L.; Schatschneider, Christopher
2016-01-01
The purpose of this study was to investigate the joint and unique contributions of morphological awareness and vocabulary knowledge at five reading comprehension levels in adult basic education (ABE) students. We introduce the statistical technique of multiple quantile regression, which enabled us to assess the predictive utility of morphological…
Directory of Open Access Journals (Sweden)
Koyin Chang
2013-09-01
Full Text Available To understand the impact of drinking and driving laws on drinking and driving fatality rates, this study explored the different effects these laws have on areas with varying severity rates for drinking and driving. Unlike previous studies, this study employed quantile regression analysis. Empirical results showed that policies based on local conditions must be used to effectively reduce drinking and driving fatality rates; that is, different measures should be adopted to target the specific conditions in various regions. For areas with low fatality rates (low quantiles, people’s habits and attitudes toward alcohol should be emphasized instead of transportation safety laws because “preemptive regulations” are more effective. For areas with high fatality rates (or high quantiles, “ex-post regulations” are more effective, and impact these areas approximately 0.01% to 0.05% more than they do areas with low fatality rates.
DEFF Research Database (Denmark)
Nielsen, Henrik Aalborg; Madsen, Henrik; Nielsen, Torben Skov
2006-01-01
For operational planning it is important to provide information about the situation-dependent uncertainty of a wind power forecast. Factors which influence the uncertainty of a wind power forecast include the predictability of the actual meteorological situation, the level of the predicted wind...... speed (due to the non-linearity of the power curve) and the forecast horizon. With respect to the predictability of the actual meteorological situation a number of explanatory variables are considered, some inspired by the literature. The article contains an overview of related work within the field....... An existing wind power forecasting system (Zephyr/WPPT) is considered and it is shown how analysis of the forecast error can be used to build a model of the quantiles of the forecast error. Only explanatory variables or indices which are predictable are considered, whereby the model obtained can be used...
On Quantile Regression in Reproducing Kernel Hilbert Spaces with Data Sparsity Constraint.
Zhang, Chong; Liu, Yufeng; Wu, Yichao
2016-04-01
For spline regressions, it is well known that the choice of knots is crucial for the performance of the estimator. As a general learning framework covering the smoothing splines, learning in a Reproducing Kernel Hilbert Space (RKHS) has a similar issue. However, the selection of training data points for kernel functions in the RKHS representation has not been carefully studied in the literature. In this paper we study quantile regression as an example of learning in a RKHS. In this case, the regular squared norm penalty does not perform training data selection. We propose a data sparsity constraint that imposes thresholding on the kernel function coefficients to achieve a sparse kernel function representation. We demonstrate that the proposed data sparsity method can have competitive prediction performance for certain situations, and have comparable performance in other cases compared to that of the traditional squared norm penalty. Therefore, the data sparsity method can serve as a competitive alternative to the squared norm penalty method. Some theoretical properties of our proposed method using the data sparsity constraint are obtained. Both simulated and real data sets are used to demonstrate the usefulness of our data sparsity constraint.
Scale and scope economies in nursing homes: a quantile regression approach.
Christensen, Eric W
2004-04-01
Nursing homes vary widely between facilities with very few beds and facilities with several hundred beds. Previous studies, which estimate nursing home scale and scope economies, do not account for this heterogeneity and implicitly assume that all nursing homes face the same cost structure. To account for heterogeneity, this paper uses quantile regression to estimate cost functions for skilled and intermediate care nursing homes. The results show that the parameters of nursing home cost functions vary significantly by output mix and across the cost distribution. Estimates show that product-specific scale economies systematically increase across the cost distribution for both skilled and intermediate care facilities, with diseconomies of scale in the lower deciles and no significant scale economies in the higher deciles. As for ray scale economies, estimates show economies of scale in the lower deciles and diseconomies of scale or no significant scale economies at higher deciles. The estimates also show that scope economies exist in the lower cost deciles and that no scope economies exist in the higher cost deciles. Additionally, the degree of scope economies monotonically decreases across the deciles.
Dunn, Richard A; Tan, Andrew K G; Nayga, Rodolfo M
2012-01-01
Obesity prevalence is unequally distributed across gender and ethnic group in Malaysia. In this paper, we examine the role of socioeconomic inequality in explaining these disparities. The body mass index (BMI) distributions of Malays and Chinese, the two largest ethnic groups in Malaysia, are estimated through the use of quantile regression. The differences in the BMI distributions are then decomposed into two parts: attributable to differences in socioeconomic endowments and attributable to differences in responses to endowments. For both males and females, the BMI distribution of Malays is shifted toward the right of the distribution of Chinese, i.e., Malays exhibit higher obesity rates. In the lower 75% of the distribution, differences in socioeconomic endowments explain none of this difference. At the 90th percentile, differences in socioeconomic endowments account for no more than 30% of the difference in BMI between ethnic groups. Our results demonstrate that the higher levels of income and education that accrue with economic development will likely not eliminate obesity inequality. This leads us to conclude that reduction of obesity inequality, as well the overall level of obesity, requires increased efforts to alter the lifestyle behaviors of Malaysians.
Modeling Autoregressive Processes with Moving-Quantiles-Implied Nonlinearity
Directory of Open Access Journals (Sweden)
Isao Ishida
2015-01-01
Full Text Available We introduce and investigate some properties of a class of nonlinear time series models based on the moving sample quantiles in the autoregressive data generating process. We derive a test fit to detect this type of nonlinearity. Using the daily realized volatility data of Standard & Poor’s 500 (S&P 500 and several other indices, we obtained good performance using these models in an out-of-sample forecasting exercise compared with the forecasts obtained based on the usual linear heterogeneous autoregressive and other models of realized volatility.
Zhang, Qun; Zhang, Qunzhi; Sornette, Didier
2016-01-01
We augment the existing literature using the Log-Periodic Power Law Singular (LPPLS) structures in the log-price dynamics to diagnose financial bubbles by providing three main innovations. First, we introduce the quantile regression to the LPPLS detection problem. This allows us to disentangle (at least partially) the genuine LPPLS signal and the a priori unknown complicated residuals. Second, we propose to combine the many quantile regressions with a multi-scale analysis, which aggregates and consolidates the obtained ensembles of scenarios. Third, we define and implement the so-called DS LPPLS Confidence™ and Trust™ indicators that enrich considerably the diagnostic of bubbles. Using a detailed study of the "S&P 500 1987" bubble and presenting analyses of 16 historical bubbles, we show that the quantile regression of LPPLS signals contributes useful early warning signals. The comparison between the constructed signals and the price development in these 16 historical bubbles demonstrates their significant predictive ability around the real critical time when the burst/rally occurs.
Estimating earnings losses due to mental illness: a quantile regression approach.
Marcotte, Dave E; Wilcox-Gök, Virginia
2003-09-01
The ability of workers to remain productive and sustain earnings when afflicted with mental illness depends importantly on access to appropriate treatment and on flexibility and support from employers. In the United States there is substantial variation in access to health care and sick leave and other employment flexibilities across the earnings distribution. Consequently, a worker's ability to work and how much his/her earnings are impeded likely depend upon his/her position in the earnings distribution. Because of this, focusing on average earnings losses may provide insufficient information on the impact of mental illness in the labor market. In this paper, we examine the effects of mental illness on earnings by recognizing that effects could vary across the distribution of earnings. Using data from the National Comorbidity Survey, we employ a quantile regression estimator to identify the effects at key points in the earnings distribution. We find that earnings effects vary importantly across the distribution. While average effects are often not large, mental illness more commonly imposes earnings losses at the lower tail of the distribution, especially for women. In only one case do we find an illness to have negative effects across the distribution. Mental illness can have larger negative impacts on economic outcomes than previously estimated, even if those effects are not uniform. Consequently, researchers and policy makers alike should not be placated by findings that mean earnings effects are relatively small. Such estimates miss important features of how and where mental illness is associated with real economic losses for the ill.
DEFF Research Database (Denmark)
Hansen, Aslak Hedemann
2008-01-01
The development in the consumption of fruit and vegetables in the period 1999-2004 in Denmark was investigated using quantile regression and two previously overlooked problems were identified. First, the change in the ten percent quantile samples decreased. This could have been caused by changes...... for this development is probably due to low income groups becoming relatively more income constrained since the gap to the high income group have grown considerably at the lower end of the distribution. The second problem was that the education inducing gap became larger in 2004 indicating that uneducated people have...... not responded as well to the health related information flow. These results suggest that information campaigns have not been as successful as previously thought; more importantly the results indicate that information campaigns alone will do a poor job in solving the identified problems. Other instruments...
Alternative regression models to assess increase in childhood BMI
Directory of Open Access Journals (Sweden)
Mansmann Ulrich
2008-09-01
Full Text Available Abstract Background Body mass index (BMI data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Methods Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs, quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS. We analyzed data of 4967 children participating in the school entry health examination in Bavaria, Germany, from 2001 to 2002. TV watching, meal frequency, breastfeeding, smoking in pregnancy, maternal obesity, parental social class and weight gain in the first 2 years of life were considered as risk factors for obesity. Results GAMLSS showed a much better fit regarding the estimation of risk factors effects on transformed and untransformed BMI data than common GLMs with respect to the generalized Akaike information criterion. In comparison with GAMLSS, quantile regression allowed for additional interpretation of prespecified distribution quantiles, such as quantiles referring to overweight or obesity. The variables TV watching, maternal BMI and weight gain in the first 2 years were directly, and meal frequency was inversely significantly associated with body composition in any model type examined. In contrast, smoking in pregnancy was not directly, and breastfeeding and parental social class were not inversely significantly associated with body composition in GLM models, but in GAMLSS and partly in quantile regression models. Risk factor specific BMI percentile curves could be estimated from GAMLSS and quantile regression models. Conclusion GAMLSS and quantile regression seem to be more appropriate than common GLMs for risk factor modeling of BMI data.
The Public-Private Sector Wage Gap in Zambia in the 1990s: A Quantile Regression Approach
DEFF Research Database (Denmark)
Nielsen, Helena Skyt; Rosholm, Michael
2001-01-01
of economic transition, because items as privatization and deregulation were on the political agenda. The focus is placed on the public-private sector wage gap, and the results show that this gap was relatively favorable for the low-skilled and less favorable for the high-skilled. This picture was further......We investigate the determinants of wages in Zambia and based on the quantile regression approach, we analyze how their effects differ at different points in the wage distribution and over time. We use three cross-sections of Zambian household data from the early nineties, which was a period...
The Public-Private Sector Wage Gap in Zambia in the 1990s: A Quantile Regression Approach
DEFF Research Database (Denmark)
Nielsen, Helena Skyt; Rosholm, Michael
2001-01-01
We investigate the determinants of wages in Zambia and based on the quantile regression approach, we analyze how their effects differ at different points in the wage distribution and over time. We use three cross-sections of Zambian household data from the early nineties, which was a period...... of economic transition, because items as privatization and deregulation were on the political agenda. The focus is placed on the public-private sector wage gap, and the results show that this gap was relatively favorable for the low-skilled and less favorable for the high-skilled. This picture was further...
Camacho, Oscar M; Eldridge, Alison; Proctor, Christopher J; McAdam, Kevin
2015-08-01
Approximately 100 toxicants have been identified in cigarette smoke, to which exposure has been linked to a range of serious diseases in smokers. Smoking machines have been used to quantify toxicant emissions from cigarettes for regulatory reporting. The World Health Organization Study Group on Tobacco Product Regulation has proposed a regulatory scenario to identify median values for toxicants found in commercially available products, which could be used to set mandated limits on smoke emissions. We present an alternative approach, which used quantile regression to estimate reference percentiles to help contextualise the toxicant yields of commercially available products with respect to a reference analyte, such as tar or nicotine. To illustrate this approach we examined four toxicants (acetone, N'-nitrosoanatabine, phenol and pyridine) with respect to tar, and explored International Organization for Standardization (ISO) and Health Canada Intense (HCI) regimes. We compared this approach with other methods for assessing toxicants in cigarette smoke, such as ratios to nicotine or tar, and linear regression. We concluded that the quantile regression approach effectively represented data distributions across toxicants for both ISO and HCI regimes. This method provides robust, transparent and intuitive percentile estimates in relation to any desired reference value within the data space. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Janson, Lucas; Rajaratnam, Bala
Great strides have been made in the field of reconstructing past temperatures based on models relating temperature to temperature-sensitive paleoclimate proxies. One of the goals of such reconstructions is to assess if current climate is anomalous in a millennial context. These regression based approaches model the conditional mean of the temperature distribution as a function of paleoclimate proxies (or vice versa). Some of the recent focus in the area has considered methods which help reduce the uncertainty inherent in such statistical paleoclimate reconstructions, with the ultimate goal of improving the confidence that can be attached to such endeavors. A second important scientific focus in the subject area is the area of forward models for proxies, the goal of which is to understand the way paleoclimate proxies are driven by temperature and other environmental variables. One of the primary contributions of this paper is novel statistical methodology for (1) quantile regression with autoregressive residual structure, (2) estimation of corresponding model parameters, (3) development of a rigorous framework for specifying uncertainty estimates of quantities of interest, yielding (4) statistical byproducts that address the two scientific foci discussed above. We show that by using the above statistical methodology we can demonstrably produce a more robust reconstruction than is possible by using conditional-mean-fitting methods. Our reconstruction shares some of the common features of past reconstructions, but we also gain useful insights. More importantly, we are able to demonstrate a significantly smaller uncertainty than that from previous regression methods. In addition, the quantile regression component allows us to model, in a more complete and flexible way than least squares, the conditional distribution of temperature given proxies. This relationship can be used to inform forward models relating how proxies are driven by temperature.
Pal, Debdatta; Mitra, Subrata Kumar
2016-10-01
This study used a quantile autoregressive distributed lag (QARDL) model to capture asymmetric impact of rainfall on food production in India. It was found that the coefficient corresponding to the rainfall in the QARDL increased till the 75th quantile and started decreasing thereafter, though it remained in the positive territory. Another interesting finding is that at the 90th quantile and above the coefficients of rainfall though remained positive was not statistically significant and therefore, the benefit of high rainfall on crop production was not conclusive. However, the impact of other determinants, such as fertilizer and pesticide consumption, is quite uniform over the whole range of the distribution of food grain production.
Institute of Scientific and Technical Information of China (English)
谢兰云; 王维国
2012-01-01
Our R&D investment is less and R&D intensity is low, these conditions have strictly limited our innovational capability. The paper uses stepwise least squares regression to determine the major factors based on what-if of the influence factors about Chinese R&D investment, then uses quantile regression model to study the dynamic changes of their impact on R&D investment on different quantile. We find that the influence of government R&D investment is more stable. With the increase in R&D investment, the effect of economic development is gradually reducing and the effect of industrial structure is gradually increasing. Based on these, the paper has some advices about the policy of increasing R&D investment.%针对我国R&D投入较少,R&D强度较低的现状,在对影响我国R&D投入因素进行假设分析的基础上,利用逐步回归方法对相关影响因素进行了筛选,得出经济发展水平、政府科技投入和产业结构是影响我国科技投入的主要因素,然后利用分位数回归模型对我国R&D投入在不同分位点上各影响因素的作用进行了详细的研究,研究发现政府科技投入的影响较稳定,随R&D投入的增多经济发展水平对其的影响在逐渐减少,而产业结构对其的影响则在逐渐增大,最后在此基础上提出了提高我国R&D投入的相关政策建议.
Quantile hydrologic model selection and model structure deficiency assessment: 1. Theory
Pande, S.
2013-01-01
A theory for quantile based hydrologic model selection and model structure deficiency assessment is presented. The paper demonstrates that the degree to which a model selection problem is constrained by the model structure (measured by the Lagrange multipliers of the constraints) quantifies
Quantile hydrologic model selection and model structure deficiency assessment: 1. Theory
Pande, S.
2013-01-01
A theory for quantile based hydrologic model selection and model structure deficiency assessment is presented. The paper demonstrates that the degree to which a model selection problem is constrained by the model structure (measured by the Lagrange multipliers of the constraints) quantifies structur
Feizi, Awat; Aliyari, Roqayeh; Roohafza, Hamidreza
2012-01-01
The present paper aimed at investigating the association between perceived stress and major life events stressors in Iranian general population. In a cross-sectional large-scale community-based study, 4583 people aged 19 and older, living in Isfahan, Iran, were investigated. Logistic quantile regression was used for modeling perceived stress, measured by GHQ questionnaire, as the bounded outcome (dependent), variable, and as a function of most important stressful life events, as the predictor variables, controlling for major lifestyle and sociodemographic factors. This model provides empirical evidence of the predictors' effects heterogeneity depending on individual location on the distribution of perceived stress. The results showed that among four stressful life events, family conflicts and social problems were more correlated with level of perceived stress. Higher levels of education were negatively associated with perceived stress and its coefficients monotonically decrease beyond the 30th percentile. Also, higher levels of physical activity were associated with perception of low levels of stress. The pattern of gender's coefficient over the majority of quantiles implied that females are more affected by stressors. Also high perceived stress was associated with low or middle levels of income. The results of current research suggested that in a developing society with high prevalence of stress, interventions targeted toward promoting financial and social equalities, social skills training, and healthy lifestyle may have the potential benefits for large parts of the population, most notably female and lower educated people.
Directory of Open Access Journals (Sweden)
Awat Feizi
2012-01-01
Full Text Available Objective. The present paper aimed at investigating the association between perceived stress and major life events stressors in Iranian general population. Methods. In a cross-sectional large-scale community-based study, 4583 people aged 19 and older, living in Isfahan, Iran, were investigated. Logistic quantile regression was used for modeling perceived stress, measured by GHQ questionnaire, as the bounded outcome (dependent, variable, and as a function of most important stressful life events, as the predictor variables, controlling for major lifestyle and sociodemographic factors. This model provides empirical evidence of the predictors’ effects heterogeneity depending on individual location on the distribution of perceived stress. Results. The results showed that among four stressful life events, family conflicts and social problems were more correlated with level of perceived stress. Higher levels of education were negatively associated with perceived stress and its coefficients monotonically decrease beyond the 30th percentile. Also, higher levels of physical activity were associated with perception of low levels of stress. The pattern of gender’s coefficient over the majority of quantiles implied that females are more affected by stressors. Also high perceived stress was associated with low or middle levels of income. Conclusions. The results of current research suggested that in a developing society with high prevalence of stress, interventions targeted toward promoting financial and social equalities, social skills training, and healthy lifestyle may have the potential benefits for large parts of the population, most notably female and lower educated people.
中国性别工资差异的分位数回归分析%Quantile Regression Analysis of Gender Wage Gap in China
Institute of Scientific and Technical Information of China (English)
陈建宝; 段景辉
2009-01-01
本文首先考察了中国健康和营养调查数据库(CHNS)中的有关中国城市男性和女性工资抽样调查数据(1988～2005年)的分布特征;然后,分别应用分位数回归建模和分解方法对中国性别工资状况进行了分析,以期发现在不同时期、不同分位数下,影响性别工资的关键因素、演变过程以及它们对性别工资差异的贡献大小,并提出了解决我国性别工资差异的相关政策和建议.%We firstly use some descriptive statistical methods to investigate the distributional characteristics of the related gender wage data from 1988 to 2005 in cities of China, the data set comes from the database of CHNS.Then, we employ approaches of quantile regression modeling and decomposition respectively to do a- nalysis.The research aims on finding the key influential factors of gender wageomd their evolution processes and how big of the contributions of them to gender wage gaps under different quantiles and periods.Finally, some polices and suggestions for solving gender wage gaps in China are proposed.
Amugsi, Dickson A; Dimbuene, Zacharie T; Bakibinga, Pauline; Kimani-Murage, Elizabeth W; Haregu, Tilahun Nigatu; Mberu, Blessing
2016-01-01
Objectives To (a) assess the association between dietary diversity (DD) score, socioeconomic status (SES) and maternal body mass index (BMI), and (b) the variation of the effects of DD and SES at different points of the conditional distribution of the BMI. Methods The study used Demographic and Health Surveys round 5 data sets from Ghana, Namibia and Sao Tome and Principe. The outcome variable for the analysis was maternal BMI. The DD score was computed using 24-hour dietary recall data. Quantile regression (QR) was used to examine the relationship between DD and SES, and maternal BMI, adjusting for other covariates. The QR allows the covariate effects to vary across the entire distribution of maternal BMI. Results Women who consumed an additional unit of DD achieved an increase of 0.245 in BMI for those in the 90th quantile in Ghana. The effect of household wealth increases for individuals across all quantiles of the BMI distribution and in all the 3 countries. A unit change in the household wealth score was associated with an increase of 0.038, 0.052 and 0.065 units increase in BMI for individuals in the 5th quantile in Ghana, Namibia and Sao Tome and Principe, respectively. Also, 0.237, 0.301 and 0.174 units increased for those in the 90th quantile in Ghana, Namibia and Sao Tome and Principe, respectively. Education had a significant positive effect on maternal BMI across all quantiles in Namibia and negative effect at the 5th, 10th and 90th quantiles in Sao Tome and Principe. Conclusions There is heterogeneity in the effects of DD and SES on maternal BMI. Studies focusing on the effects of diet and socioeconomic determinants on maternal BMI should examine patterns of effects at different points of the conditional distribution of the BMI and not just the average effect. PMID:27678544
Yang, D.; Shiau, J.
2013-12-01
ABSTRACT BODY: Abstract Surface water quality is an essential issue in water-supply for human uses and sustaining healthy ecosystem of rivers. However, water quality of rivers is easily influenced by anthropogenic activities such as urban development and wastewater disposal. Long-term monitoring of water quality can assess whether water quality of rivers deteriorates or not. Taiwan is a population-dense area and heavily depends on surface water for domestic, industrial, and agricultural uses. Dong-gang River is one of major resources in southern Taiwan for agricultural requirements. The water-quality data of four monitoring stations of the Dong-gang River for the period of 2000-2012 are selected for trend analysis. The parameters used to characterize water quality of rivers include biochemical oxygen demand (BOD), dissolved oxygen (DO), suspended solids (SS), and ammonia nitrogen (NH3-N). These four water-quality parameters are integrated into an index called river pollution index (RPI) to indicate the pollution level of rivers. Although widely used non-parametric Mann-Kendall test and linear regression exhibit computational efficiency to identify trends of water-quality indices, limitations of such approaches include sensitive to outliers and estimations of conditional mean only. Quantile regression, capable of identifying changes over time of any percentile values, is employed in this study to detect long-term trend of water-quality indices for the Dong-gang River located in southern Taiwan. The results show that Dong-gang River 4 stations from 2000 to 2012 monthly long-term trends in water quality.To analyze s Dong-gang River long-term water quality trends and pollution characteristics. The results showed that the bridge measuring ammonia Long-dong, BOD5 measure in that station on a downward trend, DO, and SS is on the rise, River Pollution Index (RPI) on a downward trend. The results form Chau-Jhou station also ahowed simialar trends .more and more near the
Li, T.; Sun, L.; Zou, L.
2009-01-01
This study assesses the impact of government shareholding on corporate performance using a sample of 643 non-financial companies listed on the Chinese stock exchanges. In view of the controversial empirical findings in the literature and the limitations of the least squares regressions, we adopt the
Hua, Anh N.; Keenan, Janice M.
2017-01-01
One of the most important findings to emerge from recent reading comprehension research is that there are large differences between tests in what they assess--specifically, the extent to which performance depends on word recognition versus listening comprehension skills. Because this research used ordinary least squares regression, it is not clear…
Quantile regression for the statistical analysis of immunological data with many non-detects
Eilers, P.H.C.; Roder, E.; Savelkoul, H.F.J.; Wijk, van R.G.
2012-01-01
Background Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced statistical techni
Quantile regression for the statistical analysis of immunological data with many non-detects
P.H.C. Eilers (Paul); E. Röder (Esther); H.F.J. Savelkoul (Huub); R. Gerth van Wijk (Roy)
2012-01-01
textabstractBackground: Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced stati
Flexible survival regression modelling
DEFF Research Database (Denmark)
Cortese, Giuliana; Scheike, Thomas H; Martinussen, Torben
2009-01-01
Regression analysis of survival data, and more generally event history data, is typically based on Cox's regression model. We here review some recent methodology, focusing on the limitations of Cox's regression model. The key limitation is that the model is not well suited to represent time-varyi...
Unitary Response Regression Models
Lipovetsky, S.
2007-01-01
The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…
Chen, Shih-Neng; Tseng, Jauling
2010-01-01
Objective: To assess various marginal effects of nutrient intakes, health behaviours and nutrition knowledge on the entire distribution of body mass index (BMI) across individuals. Design: Quantitative and distributional study. Setting: Taiwan. Methods: This study applies Becker's (1965) model of health production to construct an individual's BMI…
Forecasting Value-at-Risk Using Nonlinear Regression Quantiles and the Intraday Range
C.W.S. Chen (Cathy); R. Gerlach (Richard); B.B.K. Hwang (Bruce); M.J. McAleer (Michael)
2011-01-01
textabstractValue-at-Risk (VaR) is commonly used for financial risk measurement. It has recently become even more important, especially during the 2008-09 global financial crisis. We propose some novel nonlinear threshold conditional autoregressive VaR (CAViar) models that incorporate intra-day pric
Energy Technology Data Exchange (ETDEWEB)
Henze, G. P.; Pless, S.; Petersen, A.; Long, N.; Scambos, A. T.
2014-02-01
Approaches are needed to continuously characterize the energy performance of commercial buildings to allow for (1) timely response to excess energy use by building operators; and (2) building occupants to develop energy awareness and to actively engage in reducing energy use. Energy information systems, often involving graphical dashboards, are gaining popularity in presenting energy performance metrics to occupants and operators in a (near) real-time fashion. Such an energy information system, called Building Agent, has been developed at NREL and incorporates a dashboard for public display. Each building is, by virtue of its purpose, location, and construction, unique. Thus, assessing building energy performance is possible only in a relative sense, as comparison of absolute energy use out of context is not meaningful. In some cases, performance can be judged relative to average performance of comparable buildings. However, in cases of high-performance building designs, such as NREL's Research Support Facility (RSF) discussed in this report, relative performance is meaningful only when compared to historical performance of the facility or to a theoretical maximum performance of the facility as estimated through detailed building energy modeling.
Institute of Scientific and Technical Information of China (English)
倪中新; 薛文骏
2012-01-01
本文基于2003年至2010年中国商业银行的面板数据,通过面板数据模型分位数回归来研究中国商业银行的规模、非利息收入结构、贷款质量等因素对于商业银行利润增长的影响作用。我们发现,伴随着中国商业银行利润的增长,银行员工与固定资产的利润贡献率出呈现出倒U型的形状。相比国有商业银行,股份制商业银行的员工对银行利润贡献率明显降低。从目前来看,银行非利息收入比对银行利润的推动作用需要进一步的提高,中国商业银行特别是股份制商业银行在发展过程中不应忽视对贷款质量的管理与控制。%This paper employs the method of quantile regression for Panel Data Model to investigate how banks＇ profits are influenced by their scale,income structure,credit quality,etc.,using the Panel Data from Chinese Commercial Banks over the period from 2003 to 2010.The authors find that： with the rapid increase of the banks＇ profit,the employees and fixed asset＇s profit contribution margin shows the shape of inverted U-shape.Compared with the employees＇ profit contribution in state-owned commercial banks,joint-stock commercial banks have a significantly lower profit contribution.For the time being,the non-interest business of commercial banks should be further improved to boost profit.China＇s commercial banks,especially joint-stock commercial banks,should not neglect to manage and control credit quality in their development process.
Hosseini, Reza
2010-01-01
It is widely claimed that the quantile function is equivariant under increasing transformations. We show by a counterexample that this is not true (even for strictly increasing transformations). However, we show that the quantile function is equivariant under left continuous increasing transformations. We also provide an equivariance relation for continuous decreasing transformations. In the case that the transformation is not continuous, we show that while the transformed quantile at p can b...
Shastri, Hiteshri; Ghosh, Subimal; Karmakar, Subhankar
2017-02-01
Forecasting of extreme precipitation events at a regional scale is of high importance due to their severe impacts on society. The impacts are stronger in urban regions due to high flood potential as well high population density leading to high vulnerability. Although significant scientific improvements took place in the global models for weather forecasting, they are still not adequate at a regional scale (e.g., for an urban region) with high false alarms and low detection. There has been a need to improve the weather forecast skill at a local scale with probabilistic outcome. Here we develop a methodology with quantile regression, where the reliably simulated variables from Global Forecast System are used as predictors and different quantiles of rainfall are generated corresponding to that set of predictors. We apply this method to a flood-prone coastal city of India, Mumbai, which has experienced severe floods in recent years. We find significant improvements in the forecast with high detection and skill scores. We apply the methodology to 10 ensemble members of Global Ensemble Forecast System and find a reduction in ensemble uncertainty of precipitation across realizations with respect to that of original precipitation forecasts. We validate our model for the monsoon season of 2006 and 2007, which are independent of the training/calibration data set used in the study. We find promising results and emphasize to implement such data-driven methods for a better probabilistic forecast at an urban scale primarily for an early flood warning.
A hierarchical Bayesian GEV model for improving local and regional flood quantile estimates
Lima, Carlos H. R.; Lall, Upmanu; Troy, Tara; Devineni, Naresh
2016-10-01
We estimate local and regional Generalized Extreme Value (GEV) distribution parameters for flood frequency analysis in a multilevel, hierarchical Bayesian framework, to explicitly model and reduce uncertainties. As prior information for the model, we assume that the GEV location and scale parameters for each site come from independent log-normal distributions, whose mean parameter scales with the drainage area. From empirical and theoretical arguments, the shape parameter for each site is shrunk towards a common mean. Non-informative prior distributions are assumed for the hyperparameters and the MCMC method is used to sample from the joint posterior distribution. The model is tested using annual maximum series from 20 streamflow gauges located in an 83,000 km2 flood prone basin in Southeast Brazil. The results show a significant reduction of uncertainty estimates of flood quantile estimates over the traditional GEV model, particularly for sites with shorter records. For return periods within the range of the data (around 50 years), the Bayesian credible intervals for the flood quantiles tend to be narrower than the classical confidence limits based on the delta method. As the return period increases beyond the range of the data, the confidence limits from the delta method become unreliable and the Bayesian credible intervals provide a way to estimate satisfactory confidence bands for the flood quantiles considering parameter uncertainties and regional information. In order to evaluate the applicability of the proposed hierarchical Bayesian model for regional flood frequency analysis, we estimate flood quantiles for three randomly chosen out-of-sample sites and compare with classical estimates using the index flood method. The posterior distributions of the scaling law coefficients are used to define the predictive distributions of the GEV location and scale parameters for the out-of-sample sites given only their drainage areas and the posterior distribution of the
Hosseini, Reza
2010-01-01
It is widely claimed that the quantile function is equivariant under increasing transformations. We show by a counterexample that this is not true (even for strictly increasing transformations). However, we show that the quantile function is equivariant under left continuous increasing transformations. We also provide an equivariance relation for continuous decreasing transformations. In the case that the transformation is not continuous, we show that while the transformed quantile at p can be arbitrarily far from the quantile of the transformed at p (in terms of absolute difference), the probability mass between the two is zero. We also show by an example that weighted definition of the median is not equivariant under even strictly increasing continuous transformations.
Olsen, Margaret A; Tian, Fang; Wallace, Anna E; Nickel, Katelin B; Warren, David K; Fraser, Victoria J; Selvam, Nandini; Hamilton, Barton H
2017-02-01
To determine the impact of surgical site infections (SSIs) on health care costs following common ambulatory surgical procedures throughout the cost distribution. Data on costs of SSIs following ambulatory surgery are sparse, particularly variation beyond just mean costs. We performed a retrospective cohort study of persons undergoing cholecystectomy, breast-conserving surgery, anterior cruciate ligament reconstruction, and hernia repair from December 31, 2004 to December 31, 2010 using commercial insurer claims data. SSIs within 90 days post-procedure were identified; infections during a hospitalization or requiring surgery were considered serious. We used quantile regression, controlling for patient, operative, and postoperative factors to examine the impact of SSIs on 180-day health care costs throughout the cost distribution. The incidence of serious and nonserious SSIs was 0.8% and 0.2%, respectively, after 21,062 anterior cruciate ligament reconstruction, 0.5% and 0.3% after 57,750 cholecystectomy, 0.6% and 0.5% after 60,681 hernia, and 0.8% and 0.8% after 42,489 breast-conserving surgery procedures. Serious SSIs were associated with significantly higher costs than nonserious SSIs for all 4 procedures throughout the cost distribution. The attributable cost of serious SSIs increased for both cholecystectomy and hernia repair as the quantile of total costs increased ($38,410 for cholecystectomy with serious SSI vs no SSI at the 70th percentile of costs, up to $89,371 at the 90th percentile). SSIs, particularly serious infections resulting in hospitalization or surgical treatment, were associated with significantly increased health care costs after 4 common surgical procedures. Quantile regression illustrated the differential effect of serious SSIs on health care costs at the upper end of the cost distribution.
Quantile plots in the analysis of heteroscedastic models
National Research Council Canada - National Science Library
Pepió Viñals, Montserrat; Polo Miranda, Carlos
1992-01-01
Recent developments in quality engineering methods have led to considerable interest in the analysis of variance, buiding a dispersion model, identifying important effects from replicated experiments...
Estimating Quantile Families of Loss Distributions for Non-Life Insurance Modelling via L-Moments
Directory of Open Access Journals (Sweden)
Gareth W. Peters
2016-05-01
Full Text Available This paper discusses different classes of loss models in non-life insurance settings. It then overviews the class of Tukey transform loss models that have not yet been widely considered in non-life insurance modelling, but offer opportunities to produce flexible skewness and kurtosis features often required in loss modelling. In addition, these loss models admit explicit quantile specifications which make them directly relevant for quantile based risk measure calculations. We detail various parameterisations and sub-families of the Tukey transform based models, such as the g-and-h, g-and-k and g-and-j models, including their properties of relevance to loss modelling. One of the challenges that are amenable to practitioners when fitting such models is to perform robust estimation of the model parameters. In this paper we develop a novel, efficient, and robust procedure for estimating the parameters of this family of Tukey transform models, based on L-moments. It is shown to be more efficient than the current state of the art estimation methods for such families of loss models while being simple to implement for practical purposes.
TWO REGRESSION CREDIBILITY MODELS
Directory of Open Access Journals (Sweden)
Constanţa-Nicoleta BODEA
2010-03-01
Full Text Available In this communication we will discuss two regression credibility models from Non – Life Insurance Mathematics that can be solved by means of matrix theory. In the first regression credibility model, starting from a well-known representation formula of the inverse for a special class of matrices a risk premium will be calculated for a contract with risk parameter θ. In the next regression credibility model, we will obtain a credibility solution in the form of a linear combination of the individual estimate (based on the data of a particular state and the collective estimate (based on aggregate USA data. To illustrate the solution with the properties mentioned above, we shall need the well-known representation theorem for a special class of matrices, the properties of the trace for a square matrix, the scalar product of two vectors, the norm with respect to a positive definite matrix given in advance and the complicated mathematical properties of conditional expectations and of conditional covariances.
Mission Drift of Microfinance based on Quantiles Regression%基于分位数回归的小额信贷目标偏移研究
Institute of Scientific and Technical Information of China (English)
张颖慧; 聂强
2016-01-01
By use of quantiles regression method, the paper carries out quantiles regression analysis after screening and arranging MIX data, studies mission drift of microfiance in different situations and reviews the influence of effective de-mand deficiency on poor customers.The results show that mission drift of microfinance exists in middle-end market and below, which signifies in microfinance servicing for low end market; that assets scale has adverse effect on coverage depth.Therefore, social performance management of microfinance should be focused on, and special subsidy and tech-nique supports should be offered to small sized microfinance organizations servicing low-end market.%通过面板数据分位数回归分析, 本文分析了小额信贷的目标偏移问题, 从机构层面验证贫困客户有效需求不足对目标偏移的影响. 结果表明: 目标偏移现象在中端以下市场明显存在,在面向底端市场服务的小额信贷机构中表现得尤为明显; 小额信贷机构的资产规模对覆盖深度具有显著负面影响. 因此, 小额信贷社会绩效管理应集中于中端市场以下部分, 对目标市场定位为底端市场的小型小额信贷机构更应给予特别的财税支持与技术服务.
Farias, Albert J; Hansen, Ryan N; Zeliadt, Steven B; Ornelas, India J; Li, Christopher I; Thompson, Beti
2016-08-01
Adherence to adjuvant endocrine therapy (AET) for estrogen receptor-positive breast cancer remains suboptimal, which suggests that women are not getting the full benefit of the treatment to reduce breast cancer recurrence and mortality. The majority of studies on adherence to AET focus on identifying factors among those women at the highest levels of adherence and provide little insight on factors that influence medication use across the distribution of adherence. To understand how factors influence adherence among women across low and high levels of adherence. A retrospective evaluation was conducted using the Truven Health MarketScan Commercial Claims and Encounters Database from 2007-2011. Privately insured women aged 18-64 years who were recently diagnosed and treated for breast cancer and who initiated AET within 12 months of primary treatment were assessed. Adherence was measured as the proportion of days covered (PDC) over a 12-month period. Simultaneous multivariable quantile regression was used to assess the association between treatment and demographic factors, use of mail order pharmacies, medication switching, and out-of-pocket costs and adherence. The effect of each variable was examined at the 40th, 60th, 80th, and 95th quantiles. Among the 6,863 women in the cohort, mail order pharmacies had the greatest influence on adherence at the 40th quantile, associated with a 29.6% (95% CI = 22.2-37.0) higher PDC compared with retail pharmacies. Out-of-pocket cost for a 30-day supply of AET greater than $20 was associated with an 8.6% (95% CI = 2.8-14.4) lower PDC versus $0-$9.99. The main factors that influenced adherence at the 95th quantile were mail order pharmacies, associated with a 4.4% higher PDC (95% CI = 3.8-5.0) versus retail pharmacies, and switching AET medication 2 or more times, associated with a 5.6% lower PDC versus not switching (95% CI = 2.3-9.0). Factors associated with adherence differed across quantiles. Addressing the use of mail order
Directory of Open Access Journals (Sweden)
B. Thrasher
2012-09-01
Full Text Available When applying a quantile mapping-based bias correction to daily temperature extremes simulated by a global climate model (GCM, the transformed values of maximum and minimum temperatures are changed, and the diurnal temperature range (DTR can become physically unrealistic. While causes are not thoroughly explored, there is a strong relationship between GCM biases in snow albedo feedback during snowmelt and bias correction resulting in unrealistic DTR values. We propose a technique to bias correct DTR, based on comparing observations and GCM historic simulations, and combine that with either bias correcting daily maximum temperatures and calculating daily minimum temperatures or vice versa. By basing the bias correction on a base period of 1961–1980 and validating it during a test period of 1981–1999, we show that bias correcting DTR and maximum daily temperature can produce more accurate estimations of daily temperature extremes while avoiding the pathological cases of unrealistic DTR values.
Yu, Hwa-Lung; Wang, Chih-Hsin
2013-02-05
Understanding the daily changes in ambient air quality concentrations is important to the assessing human exposure and environmental health. However, the fine temporal scales (e.g., hourly) involved in this assessment often lead to high variability in air quality concentrations. This is because of the complex short-term physical and chemical mechanisms among the pollutants. Consequently, high heterogeneity is usually present in not only the averaged pollution levels, but also the intraday variance levels of the daily observations of ambient concentration across space and time. This characteristic decreases the estimation performance of common techniques. This study proposes a novel quantile-based Bayesian maximum entropy (QBME) method to account for the nonstationary and nonhomogeneous characteristics of ambient air pollution dynamics. The QBME method characterizes the spatiotemporal dependence among the ambient air quality levels based on their location-specific quantiles and accounts for spatiotemporal variations using a local weighted smoothing technique. The epistemic framework of the QBME method can allow researchers to further consider the uncertainty of space-time observations. This study presents the spatiotemporal modeling of daily CO and PM10 concentrations across Taiwan from 1998 to 2009 using the QBME method. Results show that the QBME method can effectively improve estimation accuracy in terms of lower mean absolute errors and standard deviations over space and time, especially for pollutants with strong nonhomogeneous variances across space. In addition, the epistemic framework can allow researchers to assimilate the site-specific secondary information where the observations are absent because of the common preferential sampling issues of environmental data. The proposed QBME method provides a practical and powerful framework for the spatiotemporal modeling of ambient pollutants.
Directory of Open Access Journals (Sweden)
Cüneyt AKAR
2013-01-01
Full Text Available In this study, the financial market stability is investigated for the emerging market countries of Morgan Stanley Capital International (MSCI, Europe, the Middle East and Africa index by using quantile regression based new empirical test proposed by Baur and Schulze (2009. The daily logarithmic return dataset covers the period of June 1, 2002 to February 17, 2011. The results show that Poland and Morocco exhibit financial market stability among the investigated countries.
Directory of Open Access Journals (Sweden)
Marcos Rocha
2010-01-01
Full Text Available Este trabalho investiga se existem modificações relevantes entre os quantis de rendimento nos retornos por escolaridade, ou seja, se os retornos por escolaridade mudaram ao longo do período para quantis diferentes, e entre estratos educacionais para os diversos quantis de rendimento (se os rendimentos relativos por quantil mudaram entre indivíduos com anos de escolaridade diversa. Foi feita análise diferenciada da evolução dessas categorias através da metodologia das regressões quantílicas nas equações de salário para um continuum de tempo, de 1996 a 2004, usando a base de dados da Pesquisa Nacional por Amostra de Domicílios (Pnad. O cenário geral dos resultados mostra que a análise quantílica por rendimento acentua a convergência entre os quantis de renda e estratos educacionais, assim como um padrão de comportamento de relativa constância para os quantis de renda mais altos. A análise vista por estratos educacionais enfatiza a queda relativa dos rendimentos do ensino fundamental e a relativa constância dos rendimentos do estrato da pós-graduação.This study investigates if there are relevant modifications between income quantiles in returns to schooling, that is, if the schooling returns in the different income quantiles has changed, and between schooling categories in the different income quantiles (if the relative income quantile has changed in the different schooling categories. The study of these schooling categories was performed through the use of the quantile regressions approach in the wage equation for the period 1996-2004, using data base drawn from Pesquisa Nacional por Amostra de Domicílios (Pnad. The big picture from the results has shown that the quantile analysis by income underlines the convergence between the income quantiles and schooling categories and also a relative steady pattern for the higher quantile income individuals. The analysis made by schooling categories underlines the relative loss of
基于分位数回归的金融市场稳定性检验%Test for Financial Market Stability Based on Quantile Regression Method
Institute of Scientific and Technical Information of China (English)
史金凤; 刘维奇; 杨威
2011-01-01
Based on the angles of the volatility, this paper gives a definition of financial market stability,proposes the test based on quantile regression method, and then tests the stability of Shanghai Market using the method. The empirical results show that the Shanghai stock market developed from unstable to stable, and particularly after the global financial crisis triggered by the U.S. subprime mortgage crisis, it has entered a stable state in relatively fast manner. The test method performs robust to the selections of systematic shock and periods of volatility. Meanwhile, the change of stock market stability indicate that a good range of policies for global financial crisis play a role in promoting a stable and healthy development of financial market.%本文立足于收益波动率的视角界定了金融市场稳定的内涵,提出了基于分位数回归的检验金融市场稳定的方法,并运用该方法对我国股票市场的稳定性做了实证分析.结果显示,上海股票市场从不稳定状态向稳定状态发展,特别是在美国次贷危机引发的全球金融危机之后较快地进入了稳定状态,该结论同时也通过了来自系统性冲击和波动率周期选取的稳健性检验,并且支持了我国政府应对全球性金融危机出台各项政策的积极效应和正面效应.
Forecasting with Dynamic Regression Models
Pankratz, Alan
2012-01-01
One of the most widely used tools in statistical forecasting, single equation regression models is examined here. A companion to the author's earlier work, Forecasting with Univariate Box-Jenkins Models: Concepts and Cases, the present text pulls together recent time series ideas and gives special attention to possible intertemporal patterns, distributed lag responses of output to input series and the auto correlation patterns of regression disturbance. It also includes six case studies.
Bind, Marie-Abele; Peters, Annette; Koutrakis, Petros; Coull, Brent; Vokonas, Pantel; Schwartz, Joel
2016-08-01
Previous studies have observed associations between air pollution and heart disease. Susceptibility to air pollution effects has been examined mostly with a test of effect modification, but little evidence is available whether air pollution distorts cardiovascular risk factor distribution. This paper aims to examine distributional and heterogeneous effects of air pollution on known cardiovascular biomarkers. A total of 1,112 men from the Normative Aging Study and residents of the greater Boston, Massachusetts, area with mean age of 69 years at baseline were included in this study during the period 1995-2013. We used quantile regression and random slope models to investigate distributional effects and heterogeneity in the traffic-related responses on blood pressure, heart rate variability, repolarization, lipids, and inflammation. We considered 28-day averaged exposure to particle number, PM2.5 black carbon, and PM2.5 mass concentrations (measured at a single monitor near the site of the study visits). We observed some evidence suggesting distributional effects of traffic-related pollutants on systolic blood pressure, heart rate variability, corrected QT interval, low density lipoprotein (LDL) cholesterol, triglyceride, and intercellular adhesion molecule-1 (ICAM-1). For example, among participants with LDL cholesterol below 80 mg/dL, an interquartile range increase in PM2.5 black carbon exposure was associated with a 7-mg/dL (95% CI: 5, 10) increase in LDL cholesterol, while among subjects with LDL cholesterol levels close to 160 mg/dL, the same exposure was related to a 16-mg/dL (95% CI: 13, 20) increase in LDL cholesterol. We observed similar heterogeneous associations across low versus high percentiles of the LDL distribution for PM2.5 mass and particle number. These results suggest that air pollution distorts the distribution of cardiovascular risk factors, and that, for several outcomes, effects may be greatest among individuals who are already at high risk
Modified Regression Correlation Coefficient for Poisson Regression Model
Kaengthong, Nattacha; Domthong, Uthumporn
2017-09-01
This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).
Ridge Regression for Interactive Models.
Tate, Richard L.
1988-01-01
An exploratory study of the value of ridge regression for interactive models is reported. Assuming that the linear terms in a simple interactive model are centered to eliminate non-essential multicollinearity, a variety of common models, representing both ordinal and disordinal interactions, are shown to have "orientations" that are favorable to…
Directory of Open Access Journals (Sweden)
Ana Carolina Campana Nascimento
2012-03-01
Full Text Available O objetivo principal neste estudo foi analisar a influência de variáveis técnicas e econômicas sobre os índices de eficiência técnica de produtores de leite de Minas Gerais ao longo de pontos distintos da distribuição dos índices de eficiência utilizando-se a técnica de regressão quantílica. Os índices de eficiência técnica foram estimados com base em um modelo de fronteira estocástica utilizando-se dados de 875 produtores de leite do estado de Minas Gerais coletados no ano de 2005. Os principais resultados revelaram, na fronteira de produção, que possivelmente está havendo utilização extensiva do fator terra. De modo geral, a variável percentual de vacas em lactação foi a mais relevante na explicação da eficiência técnica em todos os quantis estudados, enquanto o percentual de mão-de-obra familiar utilizado foi importante para explicar apenas os menores níveis de eficiência. Além disso, foi encontrada diferença significativa entre os coeficientes estimados dos quantis em estudo, o que mostra que as variáveis explicativas não têm o mesmo impacto no aumento da eficiência em todos os pontos da distribuição.The objective of this study was to evaluate the influence of technical and economic variables on the indices of technical efficiency of milk from Minas Gerais throughout distinct points of distribution of the efficiency indices by the technique of quantile regression. The technical efficiency indices were estimated based on a stochastic frontier model, using data from 875 milk producers in Minas Gerais state, Brazil, collected in 2005. The main results from production frontier showed the extensive use of the land factor. Overall, the variable percentage of lactating cows was the more relevant in explaining technical efficiency in all analyzed quantiles, whereas the percentage of household labor was important to explain only the lower levels of efficiency. Moreover, significant differences between the
Evaluation of the airport capacity based on quantile regression%基于分位数回归的机场容量评估
Institute of Scientific and Technical Information of China (English)
翟文鹏; 陈梵驿; 金嗣博
2016-01-01
对北京首都国际机场航班运行的历史数据进行了统计分析,建立了进离场小时架次及时间的三维数据结构,并利用DBSCAN算法进行聚类分析;在消除“噪点”的基础上,利用改进后的分位数回归算法生成分时段的具有95％概率的机场小时运行能力包线.仿真结果表明,各个时段的机场进离场架次具有一定的时间特性和差异:8点到10点处于离港早高峰,离港小时架次远大于进港小时架次;10点到22点进离港航班较为均衡,机场全负荷工作;其他时间段航班小时架次未达到最大容量值.%3D data structure of the arrival-departure hour sorties and time were established based on the statistical analysis of the historical data of capital airport flight operation,and proceeded the cluster analysis with DBSCAN algorithm.The airport hour operation capacity envelope with 95 ％ probability was generated by using quantile regression algorithm on the basis of eliminating "noise".Simulation results show that the arrival and departure sorties of the airport has certain amount of time characteristics and differences:at 8:00-10:00 am in the morning rush,departure hour sorties are far greater than arrival hour sorties;at 10:00 am-22:00 pm,arrival and departure flights is relatively balanced and full load working at the airport;other times flight hour sorties reduced and have not reached maximum capacity.
Inferential Models for Linear Regression
Directory of Open Access Journals (Sweden)
Zuoyi Zhang
2011-09-01
Full Text Available Linear regression is arguably one of the most widely used statistical methods in applications. However, important problems, especially variable selection, remain a challenge for classical modes of inference. This paper develops a recently proposed framework of inferential models (IMs in the linear regression context. In general, an IM is able to produce meaningful probabilistic summaries of the statistical evidence for and against assertions about the unknown parameter of interest and, moreover, these summaries are shown to be properly calibrated in a frequentist sense. Here we demonstrate, using simple examples, that the IM framework is promising for linear regression analysis --- including model checking, variable selection, and prediction --- and for uncertain inference in general.
Superquantile Regression: Theory, Algorithms, and Applications
2014-12-01
Isabel. I love having you in my arms, and although you are still too young to understand what a hug is, your warmth has given me the strength and...squares and the quantile regression models adjust to changes in the data set, denoted by the red dots. Notice that the observa- tions are moved upwards...model hardly changes. If we change this observation in red even further upwards, we would notice no more changes in the quantile regression function
Heteroscedasticity checks for regression models
Institute of Scientific and Technical Information of China (English)
无
2001-01-01
For checking on heteroscedasticity in regression models, a unified approach is proposed to constructing test statistics in parametric and nonparametric regression models. For nonparametric regression, the test is not affected sensitively by the choice of smoothing parameters which are involved in estimation of the nonparametric regression function. The limiting null distribution of the test statistic remains the same in a wide range of the smoothing parameters. When the covariate is one-dimensional, the tests are, under some conditions, asymptotically distribution-free. In the high-dimensional cases, the validity of bootstrap approximations is investigated. It is shown that a variant of the wild bootstrap is consistent while the classical bootstrap is not in the general case, but is applicable if some extra assumption on conditional variance of the squared error is imposed. A simulation study is performed to provide evidence of how the tests work and compare with tests that have appeared in the literature. The approach may readily be extended to handle partial linear, and linear autoregressive models.
Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…
Heteroscedasticity checks for regression models
Institute of Scientific and Technical Information of China (English)
ZHU; Lixing
2001-01-01
［1］Carroll, R. J., Ruppert, D., Transformation and Weighting in Regression, New York: Chapman and Hall, 1988.［2］Cook, R. D., Weisberg, S., Diagnostics for heteroscedasticity in regression, Biometrika, 1988, 70: 1—10.［3］Davidian, M., Carroll, R. J., Variance function estimation, J. Amer. Statist. Assoc., 1987, 82: 1079—1091.［4］Bickel, P., Using residuals robustly I: Tests for heteroscedasticity, Ann. Statist., 1978, 6: 266—291.［5］Carroll, R. J., Ruppert, D., On robust tests for heteroscedasticity, Ann. Statist., 1981, 9: 205—209.［6］Eubank, R. L., Thomas, W., Detecting heteroscedasticity in nonparametric regression, J. Roy. Statist. Soc., Ser. B, 1993, 55: 145—155.［7］Diblasi, A., Bowman, A., Testing for constant variance in a linear model, Statist. and Probab. Letters, 1997, 33: 95—103.［8］Dette, H., Munk, A., Testing heteoscedasticity in nonparametric regression, J. R. Statist. Soc. B, 1998, 60: 693—708.［9］Müller, H. G., Zhao, P. L., On a semi-parametric variance function model and a test for heteroscedasticity, Ann. Statist., 1995, 23: 946—967.［10］Stute, W., Manteiga, G., Quindimil, M. P., Bootstrap approximations in model checks for regression, J. Amer. Statist. Asso., 1998, 93: 141—149.［11］Stute, W., Thies, G., Zhu, L. X., Model checks for regression: An innovation approach, Ann. Statist., 1998, 26: 1916—1939.［12］Shorack, G. R., Wellner, J. A., Empirical Processes with Applications to Statistics, New York: Wiley, 1986.［13］Efron, B., Bootstrap methods: Another look at the jackknife, Ann. Statist., 1979, 7: 1—26.［14］Wu, C. F. J., Jackknife, bootstrap and other re-sampling methods in regression analysis, Ann. Statist., 1986, 14: 1261—1295.［15］H rdle, W., Mammen, E., Comparing non-parametric versus parametric regression fits, Ann. Statist., 1993, 21: 1926—1947.［16］Liu, R. Y., Bootstrap procedures under some non-i.i.d. models, Ann. Statist., 1988, 16: 1696—1708.［17
Institute of Scientific and Technical Information of China (English)
米白冰; 赵亚玲; 党少农; 李强; 杨睿海; 颜虹
2015-01-01
目的 应用分位数回归分析陕西省汉中市农村居民健康调查数据,探讨当地居民健康相关生命质量(HRQOL)的分布特点和影响因素,并展示分位数回归应用于HRQOL分析的价值.方法 使用横断面调查获得的2 737名被调查者的资料,采用SF-36量表评估被调查者的HRQOL现状,应用分位数回归模型分析精神健康状况(MCS)和躯体健康状况(PCS)得分,并了解HRQOL状态及其影响因素.结果 汉中市农村居民HRQOL分布情况与国内其他地区类似,但不同分位点MCS和PCS得分的影响因素及其影响程度有差异.整体而言,婚姻状况、教育程度、体力活动、既往疾病史对MCS和PCS得分存在显著影响.结论 了解汉中市农村居民HRQOL分布特征及其影响因素,可有针对性地采取措施提升当地居民的HRQOL.%Objective This study aimed to apply quantile regression to study Hanzhong rural residents health survey data,explore the local distribution characteristics of health-related quality of life (HRQOL) and influencing factors and present the value of quantile regression applying in analysis of HRQOL.Methods In this cross-sectional population-based study,we evaluated the HRQOL of 2 737 subjects through filling Short-Form Health Survey (SF-36).Quantile regression model was used to compare MCS and PCS scores and evaluate the associated factors.Results With different quantiles MCS and PCS score,the associated factors and influence degree were different.In general,the influences of marital status,educational level,physical activity,history of disease and HRQOL in the part of the percentile scores were significant.Conclusion Analysis of the distribution of HRQOL of rural residents in Hanzhong and influencing factors would benefit the improvement of HRQOL of local residents.
Directory of Open Access Journals (Sweden)
Anke L B Günther
Full Text Available BACKGROUND: Breastfeeding may lower chronic disease risk by long-term effects on hormonal status and adiposity, but the relations remain uncertain. OBJECTIVE: To prospectively investigate the association of breastfeeding with the growth hormone- (GH insulin-like growth factor- (IGF axis, insulin sensitivity, body composition and body fat distribution in younger adulthood (18-37 years. DESIGN: Data from 233 (54% female participants of a German cohort, the Dortmund Nutritional and Anthropometric Longitudinally Designed (DONALD Study, with prospective data on infant feeding were analyzed. Multivariable linear as well as quantile regression were performed with full breastfeeding (not: ≤ 2, short: 3-17, long: >17 weeks as exposure and adult IGF-I, IGF binding proteins (IGFBP -1, -2, -3, homeostasis model assessment of insulin resistance (HOMA-IR, fat mass index, fat-free mass index, and waist circumference as outcomes. RESULTS: After adjustment for early life and socio-economic factors, women who had been breastfed longer displayed higher adult IGFBP-2 (p(trend = 0.02 and lower values of HOMA-IR (p(trend = 0.004. Furthermore, in women breastfeeding duration was associated with a lower mean fat mass index (p(trend = 0.01, fat-free mass index (p(trend = 0.02 and waist circumference (p(trend = 0.004 in young adulthood. However, there was no relation to IGF-I, IGFBP-1 and IGFBP-3 (all p(trend > 0.05. Associations for IGFBP-2 and fat mass index were more pronounced at higher, for waist circumference at very low or high percentiles of the distribution. In men, there was no consistent relation of breastfeeding with any outcome. CONCLUSIONS: Our data suggest that breastfeeding may have long-term, favorable effects on extremes of adiposity and insulin metabolism in women, but not in men. In both sexes, breastfeeding does not seem to induce programming of the GH-IGF-axis.
Conditional Quantile Processes based on Series or Many Regressors
Belloni, Alexandre; Fernandez-Val, Ivan
2011-01-01
Quantile regression (QR) is a principal regression method for analyzing the impact of covariates on outcomes. The impact is described by the conditional quantile function and its functionals. In this paper we develop the nonparametric QR series framework, covering many regressors as a special case, for performing inference on the entire conditional quantile function and its linear functionals. In this framework, we approximate the entire conditional quantile function by a linear combination of series terms with quantile-specific coefficients and estimate the function-valued coefficients from the data. We develop large sample theory for the empirical QR coefficient process, namely we obtain uniform strong approximations to the empirical QR coefficient process by conditionally pivotal and Gaussian processes, as well as by gradient and weighted bootstrap processes. We apply these results to obtain estimation and inference methods for linear functionals of the conditional quantile function, such as the conditiona...
Girardi, P.; Pastres, R.; Gaetan, C.; Mangin, A.; Taji, M. A.
2015-12-01
In this paper, we present the results of a classification of Adriatic waters, based on spatial time series of remotely sensed Chlorophyll type-a. The study was carried out using a clustering procedure combining quantile smoothing and an agglomerative clustering algorithms. The smoothing function includes a seasonal term, thus allowing one to classify areas according to “similar” seasonal evolution, as well as according to “similar” trends. This methodology, which is here applied for the first time to Ocean Colour data, is more robust with respect to other classical methods, as it does not require any assumption on the probability distribution of the data. This approach was applied to the classification of an eleven year long time series, from January 2002 to December 2012, of monthly values of Chlorophyll type-a concentrations covering the whole Adriatic Sea. The data set was made available by ACRI (http://hermes.acri.fr) in the framework of the Glob-Colour Project (http://www.globcolour.info). Data were obtained by calibrating Ocean Colour data provided by different satellite missions, such as MERIS, SeaWiFS and MODIS. The results clearly show the presence of North-South and West-East gradient in the level of Chlorophyll, which is consistent with literature findings. This analysis could provide a sound basis for the identification of “water bodies” and of Chlorophyll type-a thresholds which define their Good Ecological Status, in terms of trophic level, as required by the implementation of the Marine Strategy Framework Directive. The forthcoming availability of Sentinel-3 OLCI data, in continuity of the previous missions, and with perspective of more than a 15-year monitoring system, offers a real opportunity of expansion of our study as a strong support to the implementation of both the EU Marine Strategy Framework Directive and the UNEP-MAP Ecosystem Approach in the Mediterranean.
Semiparametric Regression and Model Refining
Institute of Scientific and Technical Information of China (English)
无
2002-01-01
This paper presents a semiparametric adjustment method suitable for general cases.Assuming that the regularizer matrix is positive definite,the calculation method is discussed and the corresponding formulae are presented.Finally,a simulated adjustment problem is constructed to explain the method given in this paper.The results from the semiparametric model and G-M model are compared.The results demonstrate that the model errors or the systematic errors of the observations can be detected correctly with the semiparametric estimate method.
Regression modeling of ground-water flow
Cooley, R.L.; Naff, R.L.
1985-01-01
Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
[From clinical judgment to linear regression model.
Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O
2013-01-01
When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R(2)) indicates the importance of independent variables in the outcome.
Etchevers, Anne; Le Tertre, Alain; Lucas, Jean-Paul; Bretin, Philippe; Oulhote, Youssef; Le Bot, Barbara; Glorennec, Philippe
2015-01-01
Blood lead levels (BLLs) have substantially decreased in recent decades in children in France. However, further reducing exposure is a public health goal because there is no clear toxicological threshold. The identification of the environmental determinants of BLLs as well as risk factors associated with high BLLs is important to update prevention strategies. We aimed to estimate the contribution of environmental sources of lead to different BLLs in children in France. We enrolled 484 children aged from 6months to 6years, in a nationwide cross-sectional survey in 2008-2009. We measured lead concentrations in blood and environmental samples (water, soils, household settled dusts, paints, cosmetics and traditional cookware). We performed two models: a multivariate generalized additive model on the geometric mean (GM), and a quantile regression model on the 10th, 25th, 50th, 75th and 90th quantile of BLLs. The GM of BLLs was 13.8μg/L (=1.38μg/dL) (95% confidence intervals (CI): 12.7-14.9) and the 90th quantile was 25.7μg/L (CI: 24.2-29.5). Household and common area dust, tap water, interior paint, ceramic cookware, traditional cosmetics, playground soil and dust, and environmental tobacco smoke were associated with the GM of BLLs. Household dust and tap water made the largest contributions to both the GM and the 90th quantile of BLLs. The concentration of lead in dust was positively correlated with all quantiles of BLLs even at low concentrations. Lead concentrations in tap water above 5μg/L were also positively correlated with the GM, 75th and 90th quantiles of BLLs in children drinking tap water. Preventative actions must target household settled dust and tap water to reduce the BLLs of children in France. The use of traditional cosmetics should be avoided whereas ceramic cookware should be limited to decorative purposes.
Regression Model With Elliptically Contoured Errors
Arashi, M; Tabatabaey, S M M
2012-01-01
For the regression model where the errors follow the elliptically contoured distribution (ECD), we consider the least squares (LS), restricted LS (RLS), preliminary test (PT), Stein-type shrinkage (S) and positive-rule shrinkage (PRS) estimators for the regression parameters. We compare the quadratic risks of the estimators to determine the relative dominance properties of the five estimators.
Sign and Quantiles of the Realized Stock-Bond Correlation
DEFF Research Database (Denmark)
Aslanidis, Nektarios; Christiansen, Charlotte
We scrutinize the monthly realized stock-bond correlation based upon high frequency returns. In particular, we use a probit model to track the dynamics of the sign of the correlation relative to its various economic forces. The sign is predictable to a large extent with bond market liquidity being...... the most important variable. Moreover, stock market volatility, inflation uncertainty, short rate volatility, and bond volatility have significant effects upon the sign. In addition, we use quantile regressions to pin down the systematic variation of the extreme tails of the realized stock-bond correlation...... of the stock-bond correlation....
The Infinite Hierarchical Factor Regression Model
Rai, Piyush
2009-01-01
We propose a nonparametric Bayesian factor regression model that accounts for uncertainty in the number of factors, and the relationship between factors. To accomplish this, we propose a sparse variant of the Indian Buffet Process and couple this with a hierarchical model over factors, based on Kingman's coalescent. We apply this model to two problems (factor analysis and factor regression) in gene-expression data analysis.
The Transmuted Geometric-Weibull distribution: Properties, Characterizations and Regression Models
Directory of Open Access Journals (Sweden)
Zohdy M Nofal
2017-06-01
Full Text Available We propose a new lifetime model called the transmuted geometric-Weibull distribution. Some of its structural properties including ordinary and incomplete moments, quantile and generating functions, probability weighted moments, Rényi and q-entropies and order statistics are derived. The maximum likelihood method is discussed to estimate the model parameters by means of Monte Carlo simulation study. A new location-scale regression model is introduced based on the proposed distribution. The new distribution is applied to two real data sets to illustrate its flexibility. Empirical results indicate that proposed distribution can be alternative model to other lifetime models available in the literature for modeling real data in many areas.
Strupczewski, Witold G.; Bogdanowich, Ewa; Debele, Sisay
2016-04-01
Under Polish climate conditions the series of Annual Maxima (AM) flows are usually a mixture of peak flows of thaw- and rainfall- originated floods. The northern, lowland regions are dominated by snowmelt floods whilst in mountainous regions the proportion of rainfall floods is predominant. In many stations the majority of AM can be of snowmelt origin, but the greatest peak flows come from rainfall floods or vice versa. In a warming climate, precipitation is less likely to occur as snowfall. A shift from a snow- towards a rain-dominated regime results in a decreasing trend in mean and standard deviations of winter peak flows whilst rainfall floods do not exhibit any trace of non-stationarity. That is why a simple form of trends (i.e. linear trends) are more difficult to identify in AM time-series than in Seasonal Maxima (SM), usually winter season time-series. Hence it is recommended to analyse trends in SM, where a trend in standard deviation strongly influences the time -dependent upper quantiles. The uncertainty associated with the extrapolation of the trend makes it necessary to apply a relationship for trend which has time derivative tending to zero, e.g. we can assume a new climate equilibrium epoch approaching, or a time horizon is limited by the validity of the trend model. For both winter and summer SM time series, at least three distributions functions with trend model in the location, scale and shape parameters are estimated by means of the GAMLSS package using the ML-techniques. The resulting trend estimates in mean and standard deviation are mutually compared to the observed trends. Then, using AIC measures as weights, a multi-model distribution is constructed for each of two seasons separately. Further, assuming a mutual independence of the seasonal maxima, an AM model with time-dependent parameters can be obtained. The use of a multi-model approach can alleviate the effects of different and often contradictory trends obtained by using and identifying
Directory of Open Access Journals (Sweden)
F. Hoss
2014-10-01
et al., 2011; López López et al., 2014. This study adds the rise rate of the river stage in the last 24 and 48 h and the forecast error 24 and 48 h ago to the QR model. Including those four variables significantly improved the forecasts, as measured by the Brier Skill Score (BSS. Mainly, the resolution increases, as the original QR implementation already delivered high reliability. Combining the forecast with the other four variables results in much less favorable BSSs. Lastly, the forecast performance does not depend on the size of the training dataset, but on the year, the river gage, lead time and event threshold that are being forecast. We find that each event threshold requires a separate model configuration or at least calibration.
Applied Regression Modeling A Business Approach
Pardoe, Iain
2012-01-01
An applied and concise treatment of statistical regression techniques for business students and professionals who have little or no background in calculusRegression analysis is an invaluable statistical methodology in business settings and is vital to model the relationship between a response variable and one or more predictor variables, as well as the prediction of a response value given values of the predictors. In view of the inherent uncertainty of business processes, such as the volatility of consumer spending and the presence of market uncertainty, business professionals use regression a
A new bivariate negative binomial regression model
Faroughi, Pouya; Ismail, Noriszura
2014-12-01
This paper introduces a new form of bivariate negative binomial (BNB-1) regression which can be fitted to bivariate and correlated count data with covariates. The BNB regression discussed in this study can be fitted to bivariate and overdispersed count data with positive, zero or negative correlations. The joint p.m.f. of the BNB1 distribution is derived from the product of two negative binomial marginals with a multiplicative factor parameter. Several testing methods were used to check overdispersion and goodness-of-fit of the model. Application of BNB-1 regression is illustrated on Malaysian motor insurance dataset. The results indicated that BNB-1 regression has better fit than bivariate Poisson and BNB-2 models with regards to Akaike information criterion.
Quantiles, parametric-select density estimation, and bi-information parameter estimators
Parzen, E.
1982-01-01
A quantile-based approach to statistical analysis and probability modeling of data is presented which formulates statistical inference problems as functional inference problems in which the parameters to be estimated are density functions. Density estimators can be non-parametric (computed independently of model identified) or parametric-select (approximated by finite parametric models that can provide standard models whose fit can be tested). Exponential models and autoregressive models are approximating densities which can be justified as maximum entropy for respectively the entropy of a probability density and the entropy of a quantile density. Applications of these ideas are outlined to the problems of modeling: (1) univariate data; (2) bivariate data and tests for independence; and (3) two samples and likelihood ratios. It is proposed that bi-information estimation of a density function can be developed by analogy to the problem of identification of regression models.
Institute of Scientific and Technical Information of China (English)
何耀耀; 许启发; 杨善林; 余本功
2013-01-01
According to the problem of short-term load forecasting in the power system, this paper proposed a probability density forecasting method using radical basis function (RBF) neural network quantile regression based on the existed researches on combination forecasting and probability interval prediction. The probability density function of load at any period in a day was evaluated. The proposed method can obtain more useful information than point prediction and interval prediction, and can implement the whole probability distribution forecasting for future load. The practical data of a city in China show that the proposed probability density forecasting method can gain more accurate result of point prediction and obtain the forecasting results of integrated probability density function of short-term load.%针对电力系统短期负荷预测问题,在现有的组合预测和概率性区间预测的基础上,提出了基于RBF神经网络分位数回归的概率密度预测方法,得出未来一天中任意时期负荷的概率密度函数,可以得到比点预测和区间预测更多的有用信息,实现了对未来负荷完整概率分布的预测.中国某市实际数据的预测结果表明,提出的概率密度预测方法不仅能得出较为精确的点预测结果,而且能够获得短期负荷完整的概率密度函数预测结果.
Directory of Open Access Journals (Sweden)
Roberta Schein Bigio
2011-06-01
,4% consumieron la recomendación mínima de 400g/día de FLV y 22% no consumieron ningún tipo de FLV. En los modelos de regresión cuantílica, ajustados por el consumo energético, el intervalo etario y sexo, la renta domiciliar per capita y la escolaridad del jefe de familia se asociaron positivamente al consumo de FLV, mientras que el hábito de fumar se asoció negativamente. Renta se asoció significativamente a los menores percentiles de ingestión (p20 al p55; tabaquismo a los percentiles intermediarios (p45 al p75 y escolaridad del jefe de familia a los percentiles finales de consumo de FLV (p70 al p95. CONCLUSIONES: El consumo de FLV por adolescentes de Sao Paulo se mostró por debajo de las recomendaciones del Ministerio de la Salud Brasileño y es influenciado por la renta domiciliaria per capita, por la escolaridad del jefe de familia y por el hábito de fumar.OBJECTIVE: To analyze fruit and vegetable intake in adolescents and to identify associated factors. METHODS: A population-based cross-sectional study was conducted with a representative sample of 812 adolescents of both sexes in the city of São Paulo, Southeastern Brazil, in 2003. Food consumption was measured with the 24-hour dietary recall. Fruit and vegetable intake was described in percentiles. Quantile regression models were used to investigate the association between this intake and explanatory variables. RESULTS: Of all adolescents interviewed, 6.4% consumed the minimum recommendation of 400 g/day of fruits and vegetables and 22% did not consume any type of fruits and vegetables. According to quantile regression models, adjusted for energy intake, age group and sex, per capita household income and head of household's level of education were positively associated with fruit and vegetable intake, whereas smoking habit showed a negative association. Income was significantly associated with lower intake percentiles (p20 to p55; smoking, with intermediate percentiles (p45 to p75; and head of household
A Spline Regression Model for Latent Variables
Harring, Jeffrey R.
2014-01-01
Spline (or piecewise) regression models have been used in the past to account for patterns in observed data that exhibit distinct phases. The changepoint or knot marking the shift from one phase to the other, in many applications, is an unknown parameter to be estimated. As an extension of this framework, this research considers modeling the…
Regression modeling methods, theory, and computation with SAS
Panik, Michael
2009-01-01
Regression Modeling: Methods, Theory, and Computation with SAS provides an introduction to a diverse assortment of regression techniques using SAS to solve a wide variety of regression problems. The author fully documents the SAS programs and thoroughly explains the output produced by the programs.The text presents the popular ordinary least squares (OLS) approach before introducing many alternative regression methods. It covers nonparametric regression, logistic regression (including Poisson regression), Bayesian regression, robust regression, fuzzy regression, random coefficients regression,
Directory of Open Access Journals (Sweden)
Orkun ÇELİK
2014-06-01
Full Text Available The topic of wage differentials is a frequently problem in the labor markets of developing countries such as Turkey. These wage differentials can arise from the market imperfections as well as the individual quality differences. Individual wage differences are arise from the differences at the rural and urban areas employees’ access to the education, the health services and the level of development and the other reasons. Wage differences arising from the individual quality differences find a large place in The Human Capital Theory. These quality differences reflect on wages over time and lead to income differences between the individuals. The aim of this study is to analyze income differences between male and female employees for public and private sector in rural and urban areas in Turkey. Ordinary Least Squares and Quantile Regression Models results by using TURKSTAT’s Household Labor Force Survey micro data set of 2011 are comparatively discussed. Number of studies that have been made in the literature under the QR model and wage differentials in the labor market has limited. This study will contribute to the limited literature in terms of basic human capital as well as the extended models.
On accuracy of upper quantiles estimation
Directory of Open Access Journals (Sweden)
I. Markiewicz
2010-11-01
Full Text Available Flood frequency analysis (FFA entails the estimation of the upper tail of a probability density function (PDF of annual peak flows obtained from either the annual maximum series or partial duration series. In hydrological practice, the properties of various methods of upper quantiles estimation are identified with the case of known population distribution function. In reality, the assumed hypothetical model differs from the true one and one cannot assess the magnitude of error caused by model misspecification in respect to any estimated statistics. The opinion about the accuracy of the methods of upper quantiles estimation formed from the case of known population distribution function is upheld. The above-mentioned issue is the subject of the paper. The accuracy of large quantile assessments obtained from the four estimation methods is compared to two-parameter log-normal and log-Gumbel distributions and their three-parameter counterparts, i.e., three-parameter log-normal and GEV distributions. The cases of true and false hypothetical models are considered. The accuracy of flood quantile estimates depends on the sample size, the distribution type (both true and hypothetical, and strongly depends on the estimation method. In particular, the maximum likelihood method loses its advantageous properties in case of model misspecification.
On accuracy of upper quantiles estimation
Markiewicz, I.; Strupczewski, W. G.; Kochanek, K.
2010-11-01
Flood frequency analysis (FFA) entails the estimation of the upper tail of a probability density function (PDF) of annual peak flows obtained from either the annual maximum series or partial duration series. In hydrological practice, the properties of various methods of upper quantiles estimation are identified with the case of known population distribution function. In reality, the assumed hypothetical model differs from the true one and one cannot assess the magnitude of error caused by model misspecification in respect to any estimated statistics. The opinion about the accuracy of the methods of upper quantiles estimation formed from the case of known population distribution function is upheld. The above-mentioned issue is the subject of the paper. The accuracy of large quantile assessments obtained from the four estimation methods is compared to two-parameter log-normal and log-Gumbel distributions and their three-parameter counterparts, i.e., three-parameter log-normal and GEV distributions. The cases of true and false hypothetical models are considered. The accuracy of flood quantile estimates depends on the sample size, the distribution type (both true and hypothetical), and strongly depends on the estimation method. In particular, the maximum likelihood method loses its advantageous properties in case of model misspecification.
Constrained regression models for optimization and forecasting
Directory of Open Access Journals (Sweden)
P.J.S. Bruwer
2003-12-01
Full Text Available Linear regression models and the interpretation of such models are investigated. In practice problems often arise with the interpretation and use of a given regression model in spite of the fact that researchers may be quite "satisfied" with the model. In this article methods are proposed which overcome these problems. This is achieved by constructing a model where the "area of experience" of the researcher is taken into account. This area of experience is represented as a convex hull of available data points. With the aid of a linear programming model it is shown how conclusions can be formed in a practical way regarding aspects such as optimal levels of decision variables and forecasting.
Over, Thomas; Saito, Riki J.; Veilleux, Andrea; Sharpe, Jennifer B.; Soong, David T.; Ishii, Audrey
2016-06-28
This report provides two sets of equations for estimating peak discharge quantiles at annual exceedance probabilities (AEPs) of 0.50, 0.20, 0.10, 0.04, 0.02, 0.01, 0.005, and 0.002 (recurrence intervals of 2, 5, 10, 25, 50, 100, 200, and 500 years, respectively) for watersheds in Illinois based on annual maximum peak discharge data from 117 watersheds in and near northeastern Illinois. One set of equations was developed through a temporal analysis with a two-step least squares-quantile regression technique that measures the average effect of changes in the urbanization of the watersheds used in the study. The resulting equations can be used to adjust rural peak discharge quantiles for the effect of urbanization, and in this study the equations also were used to adjust the annual maximum peak discharges from the study watersheds to 2010 urbanization conditions.The other set of equations was developed by a spatial analysis. This analysis used generalized least-squares regression to fit the peak discharge quantiles computed from the urbanization-adjusted annual maximum peak discharges from the study watersheds to drainage-basin characteristics. The peak discharge quantiles were computed by using the Expected Moments Algorithm following the removal of potentially influential low floods defined by a multiple Grubbs-Beck test. To improve the quantile estimates, generalized skew coefficients were obtained from a newly developed regional skew model in which the skew increases with the urbanized land use fraction. The drainage-basin characteristics used as explanatory variables in the spatial analysis include drainage area, the fraction of developed land, the fraction of land with poorly drained soils or likely water, and the basin slope estimated as the ratio of the basin relief to basin perimeter.This report also provides the following: (1) examples to illustrate the use of the spatial and urbanization-adjustment equations for estimating peak discharge quantiles at
High dimensional linear regression models under long memory dependence and measurement error
Kaul, Abhishek
This dissertation consists of three chapters. The first chapter introduces the models under consideration and motivates problems of interest. A brief literature review is also provided in this chapter. The second chapter investigates the properties of Lasso under long range dependent model errors. Lasso is a computationally efficient approach to model selection and estimation, and its properties are well studied when the regression errors are independent and identically distributed. We study the case, where the regression errors form a long memory moving average process. We establish a finite sample oracle inequality for the Lasso solution. We then show the asymptotic sign consistency in this setup. These results are established in the high dimensional setup (p> n) where p can be increasing exponentially with n. Finally, we show the consistency, n½ --d-consistency of Lasso, along with the oracle property of adaptive Lasso, in the case where p is fixed. Here d is the memory parameter of the stationary error sequence. The performance of Lasso is also analysed in the present setup with a simulation study. The third chapter proposes and investigates the properties of a penalized quantile based estimator for measurement error models. Standard formulations of prediction problems in high dimension regression models assume the availability of fully observed covariates and sub-Gaussian and homogeneous model errors. This makes these methods inapplicable to measurement errors models where covariates are unobservable and observations are possibly non sub-Gaussian and heterogeneous. We propose weighted penalized corrected quantile estimators for the regression parameter vector in linear regression models with additive measurement errors, where unobservable covariates are nonrandom. The proposed estimators forgo the need for the above mentioned model assumptions. We study these estimators in both the fixed dimension and high dimensional sparse setups, in the latter setup, the
Early Home Activities and Oral Language Skills in Middle Childhood: A Quantile Analysis.
Law, James; Rush, Robert; King, Tom; Westrupp, Elizabeth; Reilly, Sheena
2017-02-23
Oral language development is a key outcome of elementary school, and it is important to identify factors that predict it most effectively. Commonly researchers use ordinary least squares regression with conclusions restricted to average performance conditional on relevant covariates. Quantile regression offers a more sophisticated alternative. Using data of 17,687 children from the United Kingdom's Millennium Cohort Study, we compared ordinary least squares and quantile models with language development (verbal similarities) at 11 years as the outcome. Gender had more of an effect at the top of the distribution, whereas poverty, early language, and reading to the child had a greater effect at the bottom. The picture for TV watching was more mixed. The results are discussed in terms of the provision of universal and targeted interventions. © 2017 The Authors. Child Development © 2017 Society for Research in Child Development, Inc.
A Skew-Normal Mixture Regression Model
Liu, Min; Lin, Tsung-I
2014-01-01
A challenge associated with traditional mixture regression models (MRMs), which rest on the assumption of normally distributed errors, is determining the number of unobserved groups. Specifically, even slight deviations from normality can lead to the detection of spurious classes. The current work aims to (a) examine how sensitive the commonly…
Modeling confounding by half-sibling regression
DEFF Research Database (Denmark)
Schölkopf, Bernhard; Hogg, David W; Wang, Dun
2016-01-01
We describe a method for removing the effect of confounders to reconstruct a latent quantity of interest. The method, referred to as "half-sibling regression," is inspired by recent work in causal inference using additive noise models. We provide a theoretical justification, discussing both...
Bayesian multimodel inference for geostatistical regression models.
Directory of Open Access Journals (Sweden)
Devin S Johnson
Full Text Available The problem of simultaneous covariate selection and parameter inference for spatial regression models is considered. Previous research has shown that failure to take spatial correlation into account can influence the outcome of standard model selection methods. A Markov chain Monte Carlo (MCMC method is investigated for the calculation of parameter estimates and posterior model probabilities for spatial regression models. The method can accommodate normal and non-normal response data and a large number of covariates. Thus the method is very flexible and can be used to fit spatial linear models, spatial linear mixed models, and spatial generalized linear mixed models (GLMMs. The Bayesian MCMC method also allows a priori unequal weighting of covariates, which is not possible with many model selection methods such as Akaike's information criterion (AIC. The proposed method is demonstrated on two data sets. The first is the whiptail lizard data set which has been previously analyzed by other researchers investigating model selection methods. Our results confirmed the previous analysis suggesting that sandy soil and ant abundance were strongly associated with lizard abundance. The second data set concerned pollution tolerant fish abundance in relation to several environmental factors. Results indicate that abundance is positively related to Strahler stream order and a habitat quality index. Abundance is negatively related to percent watershed disturbance.
Institute of Scientific and Technical Information of China (English)
程诚; 姚远
2014-01-01
21世纪以来，社会资本在反贫困治理中被各界人士寄予厚望。但它到底会扩大还是缩小收入不平等，仍无共识。运用分位数回归发现：家庭社会资本和社区社会资本对我国西部农村地区不平等的影响截然相反。前者对于特困家庭的作用更小，是扩大收入不平等程度的因素。而社区社会资本特别有利于低收入家庭，它的提高会显著降低农村的不平等程度。因此，培育农民自发组织、发展正规专业的社团组织等社会性力量，对缓解贫困、缩小收入差距、维护社会稳定等有重大现实意义。%Most people have paid close attention to the effect of social capital on the governance tactics to anti-poverty since the beginning of the tw enty-first century .But it is still a puzzle w hether social capital could enlarge or reduce income inequality in our society .Based on quantile regression ,this paper finds that family and communi-ty social capital are both important factors to determine income .But they are poles apart to income inequality in Western rural areas of China .Income inequality will increase ,while family social capital is enhanced .Inequality , however ,will depress if community social capital is advanced .This paper consists that supporting social forces , such as self-organizations and mass organizations ,could relieve poverty ,reduce income inequality in rural China . Social capital is a powerful factor to maintain social harmony and stability .
An Application on Multinomial Logistic Regression Model
Directory of Open Access Journals (Sweden)
Abdalla M El-Habil
2012-03-01
Full Text Available Normal 0 false false false EN-US X-NONE X-NONE This study aims to identify an application of Multinomial Logistic Regression model which is one of the important methods for categorical data analysis. This model deals with one nominal/ordinal response variable that has more than two categories, whether nominal or ordinal variable. This model has been applied in data analysis in many areas, for example health, social, behavioral, and educational.To identify the model by practical way, we used real data on physical violence against children, from a survey of Youth 2003 which was conducted by Palestinian Central Bureau of Statistics (PCBS. Segment of the population of children in the age group (10-14 years for residents in Gaza governorate, size of 66,935 had been selected, and the response variable consisted of four categories. Eighteen of explanatory variables were used for building the primary multinomial logistic regression model. Model had been tested through a set of statistical tests to ensure its appropriateness for the data. Also the model had been tested by selecting randomly of two observations of the data used to predict the position of each observation in any classified group it can be, by knowing the values of the explanatory variables used. We concluded by using the multinomial logistic regression model that we can able to define accurately the relationship between the group of explanatory variables and the response variable, identify the effect of each of the variables, and we can predict the classification of any individual case.
Regression Models for Count Data in R
Directory of Open Access Journals (Sweden)
Christian Kleiber
2008-06-01
Full Text Available The classical Poisson, geometric and negative binomial regression models for count data belong to the family of generalized linear models and are available at the core of the statistics toolbox in the R system for statistical computing. After reviewing the conceptual and computational features of these methods, a new implementation of hurdle and zero-inﬂated regression models in the functions hurdle( and zeroinfl( from the package pscl is introduced. It re-uses design and functionality of the basic R functions just as the underlying conceptual tools extend the classical models. Both hurdle and zero-inﬂated model, are able to incorporate over-dispersion and excess zeros-two problems that typically occur in count data sets in economics and the social sciences—better than their classical counterparts. Using cross-section data on the demand for medical care, it is illustrated how the classical as well as the zero-augmented models can be ﬁtted, inspected and tested in practice.
Parametric Regression Models Using Reversed Hazard Rates
Directory of Open Access Journals (Sweden)
Asokan Mulayath Variyath
2014-01-01
Full Text Available Proportional hazard regression models are widely used in survival analysis to understand and exploit the relationship between survival time and covariates. For left censored survival times, reversed hazard rate functions are more appropriate. In this paper, we develop a parametric proportional hazard rates model using an inverted Weibull distribution. The estimation and construction of confidence intervals for the parameters are discussed. We assess the performance of the proposed procedure based on a large number of Monte Carlo simulations. We illustrate the proposed method using a real case example.
Bayesian model selection in Gaussian regression
Abramovich, Felix
2009-01-01
We consider a Bayesian approach to model selection in Gaussian linear regression, where the number of predictors might be much larger than the number of observations. From a frequentist view, the proposed procedure results in the penalized least squares estimation with a complexity penalty associated with a prior on the model size. We investigate the optimality properties of the resulting estimator. We establish the oracle inequality and specify conditions on the prior that imply its asymptotic minimaxity within a wide range of sparse and dense settings for "nearly-orthogonal" and "multicollinear" designs.
Bayesian Inference of a Multivariate Regression Model
Directory of Open Access Journals (Sweden)
Marick S. Sinay
2014-01-01
Full Text Available We explore Bayesian inference of a multivariate linear regression model with use of a flexible prior for the covariance structure. The commonly adopted Bayesian setup involves the conjugate prior, multivariate normal distribution for the regression coefficients and inverse Wishart specification for the covariance matrix. Here we depart from this approach and propose a novel Bayesian estimator for the covariance. A multivariate normal prior for the unique elements of the matrix logarithm of the covariance matrix is considered. Such structure allows for a richer class of prior distributions for the covariance, with respect to strength of beliefs in prior location hyperparameters, as well as the added ability, to model potential correlation amongst the covariance structure. The posterior moments of all relevant parameters of interest are calculated based upon numerical results via a Markov chain Monte Carlo procedure. The Metropolis-Hastings-within-Gibbs algorithm is invoked to account for the construction of a proposal density that closely matches the shape of the target posterior distribution. As an application of the proposed technique, we investigate a multiple regression based upon the 1980 High School and Beyond Survey.
General regression and representation model for classification.
Directory of Open Access Journals (Sweden)
Jianjun Qian
Full Text Available Recently, the regularized coding-based classification methods (e.g. SRC and CRC show a great potential for pattern classification. However, most existing coding methods assume that the representation residuals are uncorrelated. In real-world applications, this assumption does not hold. In this paper, we take account of the correlations of the representation residuals and develop a general regression and representation model (GRR for classification. GRR not only has advantages of CRC, but also takes full use of the prior information (e.g. the correlations between representation residuals and representation coefficients and the specific information (weight matrix of image pixels to enhance the classification performance. GRR uses the generalized Tikhonov regularization and K Nearest Neighbors to learn the prior information from the training data. Meanwhile, the specific information is obtained by using an iterative algorithm to update the feature (or image pixel weights of the test sample. With the proposed model as a platform, we design two classifiers: basic general regression and representation classifier (B-GRR and robust general regression and representation classifier (R-GRR. The experimental results demonstrate the performance advantages of proposed methods over state-of-the-art algorithms.
Adaptive regression for modeling nonlinear relationships
Knafl, George J
2016-01-01
This book presents methods for investigating whether relationships are linear or nonlinear and for adaptively fitting appropriate models when they are nonlinear. Data analysts will learn how to incorporate nonlinearity in one or more predictor variables into regression models for different types of outcome variables. Such nonlinear dependence is often not considered in applied research, yet nonlinear relationships are common and so need to be addressed. A standard linear analysis can produce misleading conclusions, while a nonlinear analysis can provide novel insights into data, not otherwise possible. A variety of examples of the benefits of modeling nonlinear relationships are presented throughout the book. Methods are covered using what are called fractional polynomials based on real-valued power transformations of primary predictor variables combined with model selection based on likelihood cross-validation. The book covers how to formulate and conduct such adaptive fractional polynomial modeling in the s...
Institute of Scientific and Technical Information of China (English)
张良桥
2014-01-01
选择经济先发居民为研究对象，通过构建半参数及分位数回归模型，系统研究绝对收入、相对收入、期望收入等变量对幸福感的影响，我们发现：绝对收入对幸福感有显著的正相关关系；相对收入对高学历者影响不显著，但对低学历者影响显著；绝对收入对幸福感越低者的影响越显著；相对收入对中低幸福感群体影响显著，每周工作时间对低幸福感群体影响显著且强度大，但对高幸福感群体的影响力在降低。%By constituting semi-parametric regression and quantile regression model ,the author systemati-cally studied the impacts on subjective well-being by variables such as absolute income ,relative income , expected income ,etc .It is found that :remarkable positive correlation between absolute income and happi-ness ;the impact of relative income on people having received higher education is not significant ,but that on those of low education is significant ;absolute income has a significant impact on people of lower happi-ness ;relative income has a higher impact on people of medium-low level happiness ;weekly working hour has a significant and strong impact on people of low happiness ,with reduced impact level on those of high happiness .
Regression Models For Saffron Yields in Iran
S. H, Sanaeinejad; S. N, Hosseini
Saffron is an important crop in social and economical aspects in Khorassan Province (Northeast of Iran). In this research wetried to evaluate trends of saffron yield in recent years and to study the relationship between saffron yield and the climate change. A regression analysis was used to predict saffron yield based on 20 years of yield data in Birjand, Ghaen and Ferdows cities.Climatologically data for the same periods was provided by database of Khorassan Climatology Center. Climatologically data includedtemperature, rainfall, relative humidity and sunshine hours for ModelI, and temperature and rainfall for Model II. The results showed the coefficients of determination for Birjand, Ferdows and Ghaen for Model I were 0.69, 0.50 and 0.81 respectively. Also coefficients of determination for the same cities for model II were 0.53, 0.50 and 0.72 respectively. Multiple regression analysisindicated that among weather variables, temperature was the key parameter for variation ofsaffron yield. It was concluded that increasing temperature at spring was the main cause of declined saffron yield during recent years across the province. Finally, yield trend was predicted for the last 5 years using time series analysis.
Quantile equivalence to evaluate compliance with habitat management objectives
Cade, Brian S.; Johnson, Pamela R.
2011-01-01
Equivalence estimated with linear quantile regression was used to evaluate compliance with habitat management objectives at Arapaho National Wildlife Refuge based on monitoring data collected in upland (5,781 ha; n = 511 transects) and riparian and meadow (2,856 ha, n = 389 transects) habitats from 2005 to 2008. Quantiles were used because the management objectives specified proportions of the habitat area that needed to comply with vegetation criteria. The linear model was used to obtain estimates that were averaged across 4 y. The equivalence testing framework allowed us to interpret confidence intervals for estimated proportions with respect to intervals of vegetative criteria (equivalence regions) in either a liberal, benefit-of-doubt or conservative, fail-safe approach associated with minimizing alternative risks. Simple Boolean conditional arguments were used to combine the quantile equivalence results for individual vegetation components into a joint statement for the multivariable management objectives. For example, management objective 2A required at least 809 ha of upland habitat with a shrub composition ≥0.70 sagebrush (Artemisia spp.), 20–30% canopy cover of sagebrush ≥25 cm in height, ≥20% canopy cover of grasses, and ≥10% canopy cover of forbs on average over 4 y. Shrub composition and canopy cover of grass each were readily met on >3,000 ha under either conservative or liberal interpretations of sampling variability. However, there were only 809–1,214 ha (conservative to liberal) with ≥10% forb canopy cover and 405–1,098 ha with 20–30%canopy cover of sagebrush ≥25 cm in height. Only 91–180 ha of uplands simultaneously met criteria for all four components, primarily because canopy cover of sagebrush and forbs was inversely related when considered at the spatial scale (30 m) of a sample transect. We demonstrate how the quantile equivalence analyses also can help refine the numerical specification of habitat objectives and explore
Inferring gene regression networks with model trees
Directory of Open Access Journals (Sweden)
Aguilar-Ruiz Jesus S
2010-10-01
Full Text Available Abstract Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear
Fused Adaptive Lasso for Spatial and Temporal Quantile Function Estimation
Sun, Ying
2015-09-01
Quantile functions are important in characterizing the entire probability distribution of a random variable, especially when the tail of a skewed distribution is of interest. This article introduces new quantile function estimators for spatial and temporal data with a fused adaptive Lasso penalty to accommodate the dependence in space and time. This method penalizes the difference among neighboring quantiles, hence it is desirable for applications with features ordered in time or space without replicated observations. The theoretical properties are investigated and the performances of the proposed methods are evaluated by simulations. The proposed method is applied to particulate matter (PM) data from the Community Multiscale Air Quality (CMAQ) model to characterize the upper quantiles, which are crucial for studying spatial association between PM concentrations and adverse human health effects. © 2016 American Statistical Association and the American Society for Quality.
Entrepreneurial intention modeling using hierarchical multiple regression
Directory of Open Access Journals (Sweden)
Marina Jeger
2014-12-01
Full Text Available The goal of this study is to identify the contribution of effectuation dimensions to the predictive power of the entrepreneurial intention model over and above that which can be accounted for by other predictors selected and confirmed in previous studies. As is often the case in social and behavioral studies, some variables are likely to be highly correlated with each other. Therefore, the relative amount of variance in the criterion variable explained by each of the predictors depends on several factors such as the order of variable entry and sample specifics. The results show the modest predictive power of two dimensions of effectuation prior to the introduction of the theory of planned behavior elements. The article highlights the main advantages of applying hierarchical regression in social sciences as well as in the specific context of entrepreneurial intention formation, and addresses some of the potential pitfalls that this type of analysis entails.
Smooth conditional distribution function and quantiles under random censorship.
Leconte, Eve; Poiraud-Casanova, Sandrine; Thomas-Agnan, Christine
2002-09-01
We consider a nonparametric random design regression model in which the response variable is possibly right censored. The aim of this paper is to estimate the conditional distribution function and the conditional alpha-quantile of the response variable. We restrict attention to the case where the response variable as well as the explanatory variable are unidimensional and continuous. We propose and discuss two classes of estimators which are smooth with respect to the response variable as well as to the covariate. Some simulations demonstrate that the new methods have better mean square error performances than the generalized Kaplan-Meier estimator introduced by Beran (1981) and considered in the literature by Dabrowska (1989, 1992) and Gonzalez-Manteiga and Cadarso-Suarez (1994).
Institute of Scientific and Technical Information of China (English)
刘建民; 毛军; 王蓓
2015-01-01
利用1999~2012年样本数据，采用空间聚类分析方法研究了中国省域税收收入空间聚类分布格局以及通过分位数回归模型分析了税负水平、税收结构和税收不确定性对我国居民消费水平在不同分位点上产生的区域效应。实证结果表明，在参数异质性假设条件下，税收负担挤入居民消费水平，而税收不确定因素挤出居民消费水平；商品税、所得税与财产税对居民消费水平的影响，在不同税收收入水平下，呈现出具有差异性的区域空间特征。%Using the sample data from year 1 999 to 2012,this paper employs the spatial clus-tering analysis method to study the characteristics of spatial distribution of the provincial tax rev-enue in China,and analyzes the regional effects of tax level,tax structure and tax uncertainty on the consumption level of residents through the quantile regression model.The empirical result shows that under the assumption of parameters heterogeneity the tax burden crowds in the con-sumption level of residents while the tax uncertainty crowds it out.Besides,the effects of the commodity tax,income tax and property tax on the consumption level of residents present differ-ential regional spatial characteristics at various levels of tax revenue.
Boosted Regression Tree Models to Explain Watershed ...
Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on the Index of Biotic Integrity (IBI), were also analyzed. Seasonal BRT models at two spatial scales (watershed and riparian buffered area [RBA]) for nitrite-nitrate (NO2-NO3), total Kjeldahl nitrogen, and total phosphorus (TP) and annual models for the IBI score were developed. Two primary factors — location within the watershed (i.e., geographic position, stream order, and distance to a downstream confluence) and percentage of urban land cover (both scales) — emerged as important predictor variables. Latitude and longitude interacted with other factors to explain the variability in summer NO2-NO3 concentrations and IBI scores. BRT results also suggested that location might be associated with indicators of sources (e.g., land cover), runoff potential (e.g., soil and topographic factors), and processes not easily represented by spatial data indicators. Runoff indicators (e.g., Hydrological Soil Group D and Topographic Wetness Indices) explained a substantial portion of the variability in nutrient concentrations as did point sources for TP in the summer months. The results from our BRT approach can help prioritize areas for nutrient management in mixed-use and heavily impacted watershed
Introduction to the use of regression models in epidemiology.
Bender, Ralf
2009-01-01
Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.
Mayr, Andreas; Hothorn, Torsten; Fenske, Nora
2012-01-25
The construction of prediction intervals (PIs) for future body mass index (BMI) values of individual children based on a recent German birth cohort study with n = 2007 children is problematic for standard parametric approaches, as the BMI distribution in childhood is typically skewed depending on age. We avoid distributional assumptions by directly modelling the borders of PIs by additive quantile regression, estimated by boosting. We point out the concept of conditional coverage to prove the accuracy of PIs. As conditional coverage can hardly be evaluated in practical applications, we conduct a simulation study before fitting child- and covariate-specific PIs for future BMI values and BMI patterns for the present data. The results of our simulation study suggest that PIs fitted by quantile boosting cover future observations with the predefined coverage probability and outperform the benchmark approach. For the prediction of future BMI values, quantile boosting automatically selects informative covariates and adapts to the age-specific skewness of the BMI distribution. The lengths of the estimated PIs are child-specific and increase, as expected, with the age of the child. Quantile boosting is a promising approach to construct PIs with correct conditional coverage in a non-parametric way. It is in particular suitable for the prediction of BMI patterns depending on covariates, since it provides an interpretable predictor structure, inherent variable selection properties and can even account for longitudinal data structures.
Pourmokhtarian, A.; Driscoll, C. T.; Campbell, J. L.; Hayhoe, K.
2012-12-01
Dynamic hydrochemical models are useful tools to understand and predict the interactive effects of climate change, atmospheric CO2, and atmospheric deposition on the hydrology and water quality of forested watersheds. Although application of these models for climate projections necessitates the use of climatic variables simulated by atmosphere-ocean general circulation models (AOGCMs) to determine inputs to drive model projections. Due to the coarse resolution of AOGCMs, outputs need to be downscaled to bridge the gap between coarse spatial resolution and higher resolution required for hydrochemical models. This research compares two different statistical downscaling approaches; Gridded Quantile Mapping (BCSD) and Station-based Daily Asynchronous Regression, and their effects on potential biogeochemical responses of forested watershed. In this study, we used the biogeochemical model, PnET-BGC, to assess, compare and contrast the effects of these two downscaling approaches on potential future changes in temperature, precipitation, solar radiation and atmospheric CO2 and their effects in projections of pools, concentrations, and fluxes of major elements at Hubbard Brook Experimental Forest in New Hampshire, U.S. Future emissions scenarios were developed from monthly output from three AOGCMs (HadCM3, GFDL, PCM) in conjunction with potential lower and upper bounds of projected atmospheric CO2 (550 and 970 ppm by 2099, respectively). The climate projections from both downscaling approaches indicate that over the 21st century, average air temperature will increase with simultaneous increases in annual average precipitation. The modeling results from both downscaling approaches suggest that climate change is projected to cause substantial temporal shifts in hydrologic and hydrochemistry patterns. The choice of downscaling approach had a major impact on the streamflow simulations, which was directly related to the ability of the downscaling approach to mimic observed
Model performance analysis and model validation in logistic regression
Directory of Open Access Journals (Sweden)
Rosa Arboretti Giancristofaro
2007-10-01
Full Text Available In this paper a new model validation procedure for a logistic regression model is presented. At first, we illustrate a brief review of different techniques of model validation. Next, we define a number of properties required for a model to be considered "good", and a number of quantitative performance measures. Lastly, we describe a methodology for the assessment of the performance of a given model by using an example taken from a management study.
SMOOTH TRANSITION LOGISTIC REGRESSION MODEL TREE
RODRIGO PINTO MOREIRA
2008-01-01
Este trabalho tem como objetivo principal adaptar o modelo STR-Tree, o qual é a combinação de um modelo Smooth Transition Regression com Classification and Regression Tree (CART), a fim de utilizá-lo em Classificação. Para isto algumas alterações foram realizadas em sua forma estrutural e na estimação. Devido ao fato de estarmos fazendo classificação de variáveis dependentes binárias, se faz necessária a utilização das técnicas empregadas em Regressão Logística, dessa forma a estimação dos pa...
Model selection in kernel ridge regression
DEFF Research Database (Denmark)
Exterkate, Peter
2013-01-01
Kernel ridge regression is a technique to perform ridge regression with a potentially infinite number of nonlinear transformations of the independent variables as regressors. This method is gaining popularity as a data-rich nonlinear forecasting tool, which is applicable in many different contexts....... The influence of the choice of kernel and the setting of tuning parameters on forecast accuracy is investigated. Several popular kernels are reviewed, including polynomial kernels, the Gaussian kernel, and the Sinc kernel. The latter two kernels are interpreted in terms of their smoothing properties......, and the tuning parameters associated to all these kernels are related to smoothness measures of the prediction function and to the signal-to-noise ratio. Based on these interpretations, guidelines are provided for selecting the tuning parameters from small grids using cross-validation. A Monte Carlo study...
A Dirty Model for Multiple Sparse Regression
Jalali, Ali; Sanghavi, Sujay
2011-01-01
Sparse linear regression -- finding an unknown vector from linear measurements -- is now known to be possible with fewer samples than variables, via methods like the LASSO. We consider the multiple sparse linear regression problem, where several related vectors -- with partially shared support sets -- have to be recovered. A natural question in this setting is whether one can use the sharing to further decrease the overall number of samples required. A line of recent research has studied the use of \\ell_1/\\ell_q norm block-regularizations with q>1 for such problems; however these could actually perform worse in sample complexity -- vis a vis solving each problem separately ignoring sharing -- depending on the level of sharing. We present a new method for multiple sparse linear regression that can leverage support and parameter overlap when it exists, but not pay a penalty when it does not. A very simple idea: we decompose the parameters into two components and regularize these differently. We show both theore...
Logistic Regression Model on Antenna Control Unit Autotracking Mode
2015-10-20
412TW-PA-15240 Logistic Regression Model on Antenna Control Unit Autotracking Mode DANIEL T. LAIRD AIR FORCE TEST CENTER EDWARDS AFB, CA...OCT 15 4. TITLE AND SUBTITLE Logistic Regression Model on Antenna Control Unit Autotracking Mode 5a. CONTRACT NUMBER 5b. GRANT...alternative-hypothesis. This paper will present an Antenna Auto- tracking model using Logistic Regression modeling. This paper presents an example of
Quantile Acoustic Vectors vs. MFCC Applied to Speaker Verification
Directory of Open Access Journals (Sweden)
Mayorga-Ortiz Pedro
2014-02-01
Full Text Available In this paper we describe speaker and command recognition related experiments, through quantile vectors and Gaussian Mixture Modelling (GMM. Over the past several years GMM and MFCC have become two of the dominant approaches for modelling speaker and speech recognition applications. However, memory and computational costs are important drawbacks, because autonomous systems suffer processing and power consumption constraints; thus, having a good trade-off between accuracy and computational requirements is mandatory. We decided to explore another approach (quantile vectors in several tasks and a comparison with MFCC was made. Quantile acoustic vectors are proposed for speaker verification and command recognition tasks and the results showed very good recognition efficiency. This method offered a good trade-off between computation times, characteristics vector complexity and overall achieved efficiency.
Quantile Acoustic Vectors vs. MFCC Applied to Speaker Verification
Directory of Open Access Journals (Sweden)
Mayorga-Ortiz Pedro
2014-02-01
Full Text Available In this paper we describe speaker and command recognition related experiments, through quantile vectors and Gaussian Mixture Modelling (GMM. Over the past several years GMM and MFCC have become two of the dominant approaches for modelling speaker and speech recognition applications. However, memory and computational costs are important drawbacks, because autonomous systems suffer processing and power consumption constraints; thus, having a good trade-off between accuracy and computational requirements is mandatory. We decided to explore another approach (quantile vectors in several tasks and a comparison with MFCC was made. Quantile acoustic vectors are proposed for speaker verification and command recognition tasks and the results showed very good recognition efficiency. This method offered a good trade-off between computation times, characteristics vector complexity and overall achieved efficiency.
Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model
Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami
2017-06-01
A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.
Multiple Retrieval Models and Regression Models for Prior Art Search
Lopez, Patrice
2009-01-01
This paper presents the system called PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach presents three main characteristics: 1. The usage of multiple retrieval models (KL, Okapi) and term index definitions (lemma, phrase, concept) for the three languages considered in the present track (English, French, German) producing ten different sets of ranked results. 2. The merging of the different results based on multiple regression models using an additional validation set created from the patent collection. 3. The exploitation of patent metadata and of the citation structures for creating restricted initial working sets of patents and for producing a final re-ranking regression model. As we exploit specific metadata of the patent documents and the citation relations only at the creation of initial working sets and during the final post ranking step, our architecture remains generic and easy to extend.
Relative risk regression models with inverse polynomials.
Ning, Yang; Woodward, Mark
2013-08-30
The proportional hazards model assumes that the log hazard ratio is a linear function of parameters. In the current paper, we model the log relative risk as an inverse polynomial, which is particularly suitable for modeling bounded and asymmetric functions. The parameters estimated by maximizing the partial likelihood are consistent and asymptotically normal. The advantages of the inverse polynomial model over the ordinary polynomial model and the fractional polynomial model for fitting various asymmetric log relative risk functions are shown by simulation. The utility of the method is further supported by analyzing two real data sets, addressing the specific question of the location of the minimum risk threshold.
Model Selection in Kernel Ridge Regression
DEFF Research Database (Denmark)
Exterkate, Peter
Kernel ridge regression is gaining popularity as a data-rich nonlinear forecasting tool, which is applicable in many different contexts. This paper investigates the influence of the choice of kernel and the setting of tuning parameters on forecast accuracy. We review several popular kernels......, including polynomial kernels, the Gaussian kernel, and the Sinc kernel. We interpret the latter two kernels in terms of their smoothing properties, and we relate the tuning parameters associated to all these kernels to smoothness measures of the prediction function and to the signal-to-noise ratio. Based...... on these interpretations, we provide guidelines for selecting the tuning parameters from small grids using cross-validation. A Monte Carlo study confirms the practical usefulness of these rules of thumb. Finally, the flexible and smooth functional forms provided by the Gaussian and Sinc kernels makes them widely...
Combining logistic regression and neural networks to create predictive models.
Spackman, K. A.
1992-01-01
Neural networks are being used widely in medicine and other areas to create predictive models from data. The statistical method that most closely parallels neural networks is logistic regression. This paper outlines some ways in which neural networks and logistic regression are similar, shows how a small modification of logistic regression can be used in the training of neural network models, and illustrates the use of this modification for variable selection and predictive model building wit...
Directory of Open Access Journals (Sweden)
Hong-Juan Li
2013-04-01
Full Text Available Electric load forecasting is an important issue for a power utility, associated with the management of daily operations such as energy transfer scheduling, unit commitment, and load dispatch. Inspired by strong non-linear learning capability of support vector regression (SVR, this paper presents a SVR model hybridized with the empirical mode decomposition (EMD method and auto regression (AR for electric load forecasting. The electric load data of the New South Wales (Australia market are employed for comparing the forecasting performances of different forecasting models. The results confirm the validity of the idea that the proposed model can simultaneously provide forecasting with good accuracy and interpretability.
Stochastic Approximation Methods for Latent Regression Item Response Models
von Davier, Matthias; Sinharay, Sandip
2010-01-01
This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…
Symbolic regression of generative network models
Menezes, Telmo
2014-01-01
Networks are a powerful abstraction with applicability to a variety of scientific fields. Models explaining their morphology and growth processes permit a wide range of phenomena to be more systematically analysed and understood. At the same time, creating such models is often challenging and requires insights that may be counter-intuitive. Yet there currently exists no general method to arrive at better models. We have developed an approach to automatically detect realistic decentralised network growth models from empirical data, employing a machine learning technique inspired by natural selection and defining a unified formalism to describe such models as computer programs. As the proposed method is completely general and does not assume any pre-existing models, it can be applied "out of the box" to any given network. To validate our approach empirically, we systematically rediscover pre-defined growth laws underlying several canonical network generation models and credible laws for diverse real-world netwo...
Vargas, M.; Crossa, J.; Eeuwijk, van F.A.; Ramirez, M.E.; Sayre, K.
1999-01-01
Partial least squares (PLS) and factorial regression (FR) are statistical models that incorporate external environmental and/or cultivar variables for studying and interpreting genotype × environment interaction (GEl). The Additive Main effect and Multiplicative Interaction (AMMI) model uses only th
Corporate prediction models, ratios or regression analysis?
Bijnen, E.J.; Wijn, M.F.C.M.
1994-01-01
The models developed in the literature with respect to the prediction of a company s failure are based on ratios. It has been shown before that these models should be rejected on theoretical grounds. Our study of industrial companies in the Netherlands shows that the ratios which are used in
Sparse Volterra and Polynomial Regression Models: Recoverability and Estimation
Kekatos, Vassilis
2011-01-01
Volterra and polynomial regression models play a major role in nonlinear system identification and inference tasks. Exciting applications ranging from neuroscience to genome-wide association analysis build on these models with the additional requirement of parsimony. This requirement has high interpretative value, but unfortunately cannot be met by least-squares based or kernel regression methods. To this end, compressed sampling (CS) approaches, already successful in linear regression settings, can offer a viable alternative. The viability of CS for sparse Volterra and polynomial models is the core theme of this work. A common sparse regression task is initially posed for the two models. Building on (weighted) Lasso-based schemes, an adaptive RLS-type algorithm is developed for sparse polynomial regressions. The identifiability of polynomial models is critically challenged by dimensionality. However, following the CS principle, when these models are sparse, they could be recovered by far fewer measurements. ...
Mixed Frequency Data Sampling Regression Models: The R Package midasr
Directory of Open Access Journals (Sweden)
Eric Ghysels
2016-08-01
Full Text Available When modeling economic relationships it is increasingly common to encounter data sampled at different frequencies. We introduce the R package midasr which enables estimating regression models with variables sampled at different frequencies within a MIDAS regression framework put forward in work by Ghysels, Santa-Clara, and Valkanov (2002. In this article we define a general autoregressive MIDAS regression model with multiple variables of different frequencies and show how it can be specified using the familiar R formula interface and estimated using various optimization methods chosen by the researcher. We discuss how to check the validity of the estimated model both in terms of numerical convergence and statistical adequacy of a chosen regression specification, how to perform model selection based on a information criterion, how to assess forecasting accuracy of the MIDAS regression model and how to obtain a forecast aggregation of different MIDAS regression models. We illustrate the capabilities of the package with a simulated MIDAS regression model and give two empirical examples of application of MIDAS regression.
Impact of multicollinearity on small sample hydrologic regression models
Kroll, Charles N.; Song, Peter
2013-06-01
Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.
ASYMPTOTIC EFFICIENT ESTIMATION IN SEMIPARAMETRIC NONLINEAR REGRESSION MODELS
Institute of Scientific and Technical Information of China (English)
ZhuZhongyi; WeiBocheng
1999-01-01
In this paper, the estimation method based on the “generalized profile likelihood” for the conditionally parametric models in the paper given by Severini and Wong (1992) is extendedto fixed design semiparametrie nonlinear regression models. For these semiparametrie nonlinear regression models,the resulting estimator of parametric component of the model is shown to beasymptotically efficient and the strong convergence rate of nonparametric component is investigated. Many results (for example Chen (1988) ,Gao & Zhao (1993), Rice (1986) et al. ) are extended to fixed design semiparametric nonlinear regression models.
Support vector regression model for complex target RCS predicting
Institute of Scientific and Technical Information of China (English)
Wang Gu; Chen Weishi; Miao Jungang
2009-01-01
The electromagnetic scattering computation has developed rapidly for many years; some computing problems for complex and coated targets cannot be solved by using the existing theory and computing models. A computing model based on data is established for making up the insufficiency of theoretic models. Based on the "support vector regression method", which is formulated on the principle of minimizing a structural risk, a data model to predicate the unknown radar cross section of some appointed targets is given. Comparison between the actual data and the results of this predicting model based on support vector regression method proved that the support vector regression method is workable and with a comparative precision.
Rank-preserving regression: a more robust rank regression model against outliers.
Chen, Tian; Kowalski, Jeanne; Chen, Rui; Wu, Pan; Zhang, Hui; Feng, Changyong; Tu, Xin M
2016-08-30
Mean-based semi-parametric regression models such as the popular generalized estimating equations are widely used to improve robustness of inference over parametric models. Unfortunately, such models are quite sensitive to outlying observations. The Wilcoxon-score-based rank regression (RR) provides more robust estimates over generalized estimating equations against outliers. However, the RR and its extensions do not sufficiently address missing data arising in longitudinal studies. In this paper, we propose a new approach to address outliers under a different framework based on the functional response models. This functional-response-model-based alternative not only addresses limitations of the RR and its extensions for longitudinal data, but, with its rank-preserving property, even provides more robust estimates than these alternatives. The proposed approach is illustrated with both real and simulated data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Nonlinear and Non Normal Regression Models in Physiological Research
1984-01-01
Applications of nonlinear and non normal regression models are in increasing order for appropriate interpretation of complex phenomenon of biomedical sciences. This paper reviews critically some applications of these models physiological research.
The use of GLS regression in regional hydrologic analyses
Griffis, V. W.; Stedinger, J. R.
2007-09-01
SummaryTo estimate flood quantiles and other statistics at ungauged sites, many organizations employ an iterative generalized least squares (GLS) regression procedure to estimate the parameters of a model of the statistic of interest as a function of basin characteristics. The GLS regression procedure accounts for differences in available record lengths and spatial correlation in concurrent events by using an estimator of the sampling covariance matrix of available flood quantiles. Previous studies by the US Geological Survey using the LP3 distribution have neglected the impact of uncertainty in the weighted skew on quantile precision. The needed relationship is developed here and its use is illustrated in a regional flood study with 162 sites from South Carolina. The performance of a pooled regression model is compared to separate models for each hydrologic region: statistical tests recommend an interesting hybrid of the two which is both surprising and hydrologically reasonable. The statistical analysis is augmented with new diagnostic metrics including a condition number to check for multicollinearity, a new pseudo- R appropriate for use with GLS regression, and two error variance ratios. GLS regression for the standard deviation demonstrates that again a hybrid model is attractive, and that GLS rather than an OLS or WLS analysis is appropriate for the development of regional standard deviation models.
Quantile Estimation in Dependent Sequences.
1981-09-01
Heidelberger and Welch (1980; 1981) we used K = 25 and d = 2 , although for extreme quantiles in highly congested queues with short run lengths, d = 3 was...YqG , i.e. if Yq(1) i "’" <q(G) are the order statistics of Yql,...,qG 23 1.L then the ngm is defiried by (assuming G is odd) (3.8) ngq(xp) = yq([. 5G ...random variables (NEAR(l) and GNEAR(1) processes; see Lawrance and Lewis (1981a) and (1981b)) and on waiting time se- quences in heavily congested
Identification of Influential Points in a Linear Regression Model
Directory of Open Access Journals (Sweden)
Jan Grosz
2011-03-01
Full Text Available The article deals with the detection and identification of influential points in the linear regression model. Three methods of detection of outliers and leverage points are described. These procedures can also be used for one-sample (independentdatasets. This paper briefly describes theoretical aspects of several robust methods as well. Robust statistics is a powerful tool to increase the reliability and accuracy of statistical modelling and data analysis. A simulation model of the simple linear regression is presented.
Quantile treatment effects of job loss on health.
Schiele, Valentin; Schmitz, Hendrik
2016-09-01
Studies on health effects of job loss mostly estimate mean effects. We argue that the effects might differ over the distribution of the health status and use quantile regression methods to provide a more complete picture. To take the potential endogeneity of job loss into account, we estimate quantile treatment effects where we rely on job loss due to plant closures. We find that the effect of job loss indeed varies across the mental and physical health distribution. Job loss due to plant closures affects physical health adversely for individuals in the middle and lower part of the health distribution while those in best physical condition do not seem to be affected. The results for mental health, though less distinct, point in the same direction. We find no effects on BMI.
On multivariate quantiles under partial ordering
Belloni, Alexandre
2009-01-01
This paper focuses on generalizing quantiles from the ordering point of view. We propose the concept of {\\it partial quantiles} based on a given partial order. We establish that partial quantiles are equivariant under partial-order-preserving transformations of the data, display a concentration of measure phenomenon, generalize the concept of efficient frontier, and can measure dispersion from the partial order perspective. We also study several statistical aspects of partial quantiles. We provide estimators, associated rates of convergence, and asymptotic distributions that hold uniformly over a continuum of quantile indices. Furthermore, we provide procedures that can restore monotonicity properties that might have been disturbed by estimation error, and establish computational complexity bounds. Finally, we illustrate the concepts by discussing several theoretical examples and simulations. Empirical applications to compare intake nutrients within diets and to evaluate the performance of investment funds ar...
Quantile forecast discrimination ability and value
Bouallegue, Zied Ben; Friederichs, Petra
2015-01-01
While probabilistic forecast verification for categorical forecasts is well established, some of the existing concepts and methods have not found their equivalent for the case of continuous variables. New tools dedicated to the assessment of forecast discrimination ability and forecast value are introduced here, based on quantile forecasts being the base product for the continuous case (hence in a nonparametric framework). The relative user characteristic (RUC) curve and the quantile value plot allow analysing the performance of a forecast for a specific user in a decision-making framework. The RUC curve is designed as a user-based discrimination tool and the quantile value plot translates forecast discrimination ability in terms of economic value. The relationship between the overall value of a quantile forecast and the respective quantile skill score is also discussed. The application of these new verification approaches and tools is illustrated based on synthetic datasets, as well as for the case of global...
Adaptive Regression and Classification Models with Applications in Insurance
Directory of Open Access Journals (Sweden)
Jekabsons Gints
2014-07-01
Full Text Available Nowadays, in the insurance industry the use of predictive modeling by means of regression and classification techniques is becoming increasingly important and popular. The success of an insurance company largely depends on the ability to perform such tasks as credibility estimation, determination of insurance premiums, estimation of probability of claim, detecting insurance fraud, managing insurance risk. This paper discusses regression and classification modeling for such types of prediction problems using the method of Adaptive Basis Function Construction
Geometric Properties of AR（q） Nonlinear Regression Models
Institute of Scientific and Technical Information of China (English)
LIUYing-ar; WEIBo-cheng
2004-01-01
This paper is devoted to a study of geometric properties of AR(q) nonlinear regression models. We present geometric frameworks for regression parameter space and autoregression parameter space respectively based on the weighted inner product by fisher information matrix. Several geometric properties related to statistical curvatures are given for the models. The results of this paper extended the work of Bates & Watts(1980,1988)[1.2] and Seber & Wild (1989)[3].
Robust Depth-Weighted Wavelet for Nonparametric Regression Models
Institute of Scientific and Technical Information of China (English)
Lu LIN
2005-01-01
In the nonpaxametric regression models, the original regression estimators including kernel estimator, Fourier series estimator and wavelet estimator are always constructed by the weighted sum of data, and the weights depend only on the distance between the design points and estimation points. As a result these estimators are not robust to the perturbations in data. In order to avoid this problem, a new nonparametric regression model, called the depth-weighted regression model, is introduced and then the depth-weighted wavelet estimation is defined. The new estimation is robust to the perturbations in data, which attains very high breakdown value close to 1/2. On the other hand, some asymptotic behaviours such as asymptotic normality are obtained. Some simulations illustrate that the proposed wavelet estimator is more robust than the original wavelet estimator and, as a price to pay for the robustness, the new method is slightly less efficient than the original method.
Wavelet regression model in forecasting crude oil price
Hamid, Mohd Helmie; Shabri, Ani
2017-05-01
This study presents the performance of wavelet multiple linear regression (WMLR) technique in daily crude oil forecasting. WMLR model was developed by integrating the discrete wavelet transform (DWT) and multiple linear regression (MLR) model. The original time series was decomposed to sub-time series with different scales by wavelet theory. Correlation analysis was conducted to assist in the selection of optimal decomposed components as inputs for the WMLR model. The daily WTI crude oil price series has been used in this study to test the prediction capability of the proposed model. The forecasting performance of WMLR model were also compared with regular multiple linear regression (MLR), Autoregressive Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) using root mean square errors (RMSE) and mean absolute errors (MAE). Based on the experimental results, it appears that the WMLR model performs better than the other forecasting technique tested in this study.
Regression Model Optimization for the Analysis of Experimental Data
Ulbrich, N.
2009-01-01
A candidate math model search algorithm was developed at Ames Research Center that determines a recommended math model for the multivariate regression analysis of experimental data. The search algorithm is applicable to classical regression analysis problems as well as wind tunnel strain gage balance calibration analysis applications. The algorithm compares the predictive capability of different regression models using the standard deviation of the PRESS residuals of the responses as a search metric. This search metric is minimized during the search. Singular value decomposition is used during the search to reject math models that lead to a singular solution of the regression analysis problem. Two threshold dependent constraints are also applied. The first constraint rejects math models with insignificant terms. The second constraint rejects math models with near-linear dependencies between terms. The math term hierarchy rule may also be applied as an optional constraint during or after the candidate math model search. The final term selection of the recommended math model depends on the regressor and response values of the data set, the user s function class combination choice, the user s constraint selections, and the result of the search metric minimization. A frequently used regression analysis example from the literature is used to illustrate the application of the search algorithm to experimental data.
Institute of Scientific and Technical Information of China (English)
蒲勇健; 李萍
2013-01-01
针对城乡居民收入差距以及区域间农民收入差距不断扩大这一现象，采用分位数回归模型，利用2000-2009年数据对农村居民收入不同组成成分的影响因素进行了分析和评价。结果表明：对农民纯收入影响显著的是：农业技术水平、人均社会投资、人均耕地面积，他们通过分别影响经营性收入和工资性收入最终影响农民纯收入水平；虽然农村固定投资对人均纯收入的影响程度逐渐降低，但人均固定投资对农村居民高收入群体的促进作用高于低收入群体，且主要作用于促进工资性收入增长；耕地面积对工资性低收入群体的抑制作用最强，对工资性高收入群体的抑制作用最弱；城镇化对工资性收入高水平群体的影响明显减弱，对于工资性低收入群体的促进作用则有加强趋势但其对经营性收入却有负面作用。%Given the urban-rural income gap and income gap between farmers in different regions is expanding, this article, by using quantile regression model, investigates and evaluates the factors influencing the different composition of the income of rural resi-dents during the period of 2000-2009. It concluded that: Agricultural technology level, per social capita investment and per arable land area exerted notable impact on the farmers' net income. They affect wages income and operating income respectively, then ulti-mately affect the level of farmers' net income. Although the influence on the per net income by the rural fixed investment gradually weakened, per capita fixed investment, mainly improving wages income, promoted more for the high-income groups than for the low-income groups in rural area. Arable land puts the strongest inhibitory effect on the wages of low-income groups, the weakest on that of high-income groups. The positive impact of urbanization, playing a negative role in operating income for rural residents, on wages in
Institute of Scientific and Technical Information of China (English)
刘玉萍; 郭郡郡; 刘成玉
2012-01-01
Based on provincial panel data from 1995 to 2009 in China,using panel quantile regression model estimate the impact of demographic factors on China’s carbon dioxide emissions,the results showed that： Population size and urbanization are the main demographic factors which impact China’s carbon dioxide emissions,but the marginal effect of population size is large in the developed provinces,while the marginal effect of urbanization is larger in the less developed provinces;Smaller family trend impacts on carbon dioxide emissions differently by province.The effect is positive in some provinces and is not significant in the other provinces.Age structure is not yet a major demographic factors who led to change in carbon dioxide emissions;Comprehensive comparison,the impact of economic development on carbon dioxide emissions is greater than the impact of demographic factors on it,the marginal effect of industry structure on carbon dioxide emissions is less than that of population size and urbanization,but the relationship between technological progress and carbon dioxide emissions are blurred.%基于1995～2009年中国省际面板数据,利用面板分位数回归模型估计人口因素对我国CO2排放量的影响,结果显示：人口数量和人口城市化率是影响我国CO2排放的主要人口因素,但从影响大小上看,人口数量变化对发达省份CO2排放的影响大于欠发达省份,而人口城市化率则对欠发达省份的CO2排放具有更大的影响;家庭小型化对CO2排放的影响因省而异,对不同省份,要么没有明显的影响,要么可能导致CO2排放量增加;年龄结构目前还不是导致我国CO2排放量变化的主要人口因素;综合比较而言,经济发展水平对CO2排放的影响大于人口各因素,产业结构对CO2排放的影响小于人口数量和人口城市化率,而技术进步与CO2排放的关系则显得模糊。
Credit Scoring Model Hybridizing Artificial Intelligence with Logistic Regression
Directory of Open Access Journals (Sweden)
Han Lu
2013-01-01
Full Text Available Today the most commonly used techniques for credit scoring are artificial intelligence and statistics. In this paper, we started a new way to use these two kinds of models. Through logistic regression filters the variables with a high degree of correlation, artificial intelligence models reduce complexity and accelerate convergence, while these models hybridizing logistic regression have better explanations in statistically significance, thus improve the effect of artificial intelligence models. With experiments on German data set, we find an interesting phenomenon defined as ‘Dimensional interference’ with support vector machine and from cross validation it can be seen that the new method gives a lot of help with credit scoring.
Analysis of Sting Balance Calibration Data Using Optimized Regression Models
Ulbrich, N.; Bader, Jon B.
2010-01-01
Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.
Institute of Scientific and Technical Information of China (English)
王芳; 周兴
2012-01-01
文章利用非条件分位数回归的分解方法对2011年城市外来劳动力群体的性别工资差距进行了分析。研究发现：（1）城市外来劳动力群体中,性别间受教育程度的差异已经变得不太显著,但在职业方面针对女性外来劳动力的＂粘性地板＂现象仍然十分突出。（2）人力资本禀赋和就业职业特征对男性与女性外来劳动力工资的影响程度有所不同,不同收入分位数上的影响系数也有明显的差异。（3）性别歧视是造成外来劳动力性别间收入差距的主要原因,而男性与女性劳动者工作经验的差距以及工作经验收益率的差异已经取代教育因素成为了性别收入差距中的突出问题。%Based on the micro income data of migrate labors in the urban China,This paper use unconditional quantiles regression to analysis the Gender wage gap of migrate labors at different quantiles and decompose the gender wage gap.The results show that：（1） Since the labor market segmentation and gender discrimination,female migrate labors are segregated from the state sectors.（2） The gender wage gap is caused by differences in personal characteristics and gender discrimination,but gender discrimination is the main reason for the gender wage gap.（3）The gender gap of work experience has replaced the education to become the main factor cause the gender discrimination and gender wage gap.
Joint regression analysis and AMMI model applied to oat improvement
Oliveira, A.; Oliveira, T. A.; Mejza, S.
2012-09-01
In our work we present an application of some biometrical methods useful in genotype stability evaluation, namely AMMI model, Joint Regression Analysis (JRA) and multiple comparison tests. A genotype stability analysis of oat (Avena Sativa L.) grain yield was carried out using data of the Portuguese Plant Breeding Board, sample of the 22 different genotypes during the years 2002, 2003 and 2004 in six locations. In Ferreira et al. (2006) the authors state the relevance of the regression models and of the Additive Main Effects and Multiplicative Interactions (AMMI) model, to study and to estimate phenotypic stability effects. As computational techniques we use the Zigzag algorithm to estimate the regression coefficients and the agricolae-package available in R software for AMMI model analysis.
Quantiles of the Realized Stock-Bond Correlation and Links to the Macroeconomy
DEFF Research Database (Denmark)
Aslanidis, Nektarios; Christiansen, Charlotte
2014-01-01
This paper adopts quantile regressions to scrutinize the realized stock–bond correlation based upon high frequency returns. The paper provides in-sample and out-of-sample analysis and considers factors constructed from a large number of macro-finance predictors well-known from the return predicta......This paper adopts quantile regressions to scrutinize the realized stock–bond correlation based upon high frequency returns. The paper provides in-sample and out-of-sample analysis and considers factors constructed from a large number of macro-finance predictors well-known from the return...
Buffalos milk yield analysis using random regression models
Directory of Open Access Journals (Sweden)
A.S. Schierholt
2010-02-01
Full Text Available Data comprising 1,719 milk yield records from 357 females (predominantly Murrah breed, daughters of 110 sires, with births from 1974 to 2004, obtained from the Programa de Melhoramento Genético de Bubalinos (PROMEBUL and from records of EMBRAPA Amazônia Oriental - EAO herd, located in Belém, Pará, Brazil, were used to compare random regression models for estimating variance components and predicting breeding values of the sires. The data were analyzed by different models using the Legendre’s polynomial functions from second to fourth orders. The random regression models included the effects of herd-year, month of parity date of the control; regression coefficients for age of females (in order to describe the fixed part of the lactation curve and random regression coefficients related to the direct genetic and permanent environment effects. The comparisons among the models were based on the Akaike Infromation Criterion. The random effects regression model using third order Legendre’s polynomials with four classes of the environmental effect were the one that best described the additive genetic variation in milk yield. The heritability estimates varied from 0.08 to 0.40. The genetic correlation between milk yields in younger ages was close to the unit, but in older ages it was low.
Optimization of Regression Models of Experimental Data Using Confirmation Points
Ulbrich, N.
2010-01-01
A new search metric is discussed that may be used to better assess the predictive capability of different math term combinations during the optimization of a regression model of experimental data. The new search metric can be determined for each tested math term combination if the given experimental data set is split into two subsets. The first subset consists of data points that are only used to determine the coefficients of the regression model. The second subset consists of confirmation points that are exclusively used to test the regression model. The new search metric value is assigned after comparing two values that describe the quality of the fit of each subset. The first value is the standard deviation of the PRESS residuals of the data points. The second value is the standard deviation of the response residuals of the confirmation points. The greater of the two values is used as the new search metric value. This choice guarantees that both standard deviations are always less or equal to the value that is used during the optimization. Experimental data from the calibration of a wind tunnel strain-gage balance is used to illustrate the application of the new search metric. The new search metric ultimately generates an optimized regression model that was already tested at regression model independent confirmation points before it is ever used to predict an unknown response from a set of regressors.
Quantile forecast discrimination ability and value
DEFF Research Database (Denmark)
Ben Bouallègue, Zied; Pinson, Pierre; Friederichs, Petra
2015-01-01
While probabilistic forecast verification for categorical forecasts is well established, some of the existing concepts and methods have not found their equivalent for the case of continuous variables. New tools dedicated to the assessment of forecast discrimination ability and forecast value......-based discrimination tool and the quantile value plot translates forecast discrimination ability in terms of economic value. The relationship between the overall value of a quantile forecast and the respective quantile skill score is also discussed. The application of these new verification approaches and tools...
Geographically Weighted Logistic Regression Applied to Credit Scoring Models
Directory of Open Access Journals (Sweden)
Pedro Henrique Melo Albuquerque
Full Text Available Abstract This study used real data from a Brazilian financial institution on transactions involving Consumer Direct Credit (CDC, granted to clients residing in the Distrito Federal (DF, to construct credit scoring models via Logistic Regression and Geographically Weighted Logistic Regression (GWLR techniques. The aims were: to verify whether the factors that influence credit risk differ according to the borrower’s geographic location; to compare the set of models estimated via GWLR with the global model estimated via Logistic Regression, in terms of predictive power and financial losses for the institution; and to verify the viability of using the GWLR technique to develop credit scoring models. The metrics used to compare the models developed via the two techniques were the AICc informational criterion, the accuracy of the models, the percentage of false positives, the sum of the value of false positive debt, and the expected monetary value of portfolio default compared with the monetary value of defaults observed. The models estimated for each region in the DF were distinct in their variables and coefficients (parameters, with it being concluded that credit risk was influenced differently in each region in the study. The Logistic Regression and GWLR methodologies presented very close results, in terms of predictive power and financial losses for the institution, and the study demonstrated viability in using the GWLR technique to develop credit scoring models for the target population in the study.
CICAAR - Convolutive ICA with an Auto-Regressive Inverse Model
DEFF Research Database (Denmark)
Dyrholm, Mads; Hansen, Lars Kai
2004-01-01
We invoke an auto-regressive IIR inverse model for convolutive ICA and derive expressions for the likelihood and its gradient. We argue that optimization will give a stable inverse. When there are more sensors than sources the mixing model parameters are estimated in a second step by least squares...
Systematic evaluation of land use regression models for NO₂
Wang, M.|info:eu-repo/dai/nl/345480279; Beelen, R.M.J.|info:eu-repo/dai/nl/30483100X; Eeftens, M.R.|info:eu-repo/dai/nl/315028300; Meliefste, C.; Hoek, G.|info:eu-repo/dai/nl/069553475; Brunekreef, B.|info:eu-repo/dai/nl/067548180
2012-01-01
Land use regression (LUR) models have become popular to explain the spatial variation of air pollution concentrations. Independent evaluation is important. We developed LUR models for nitrogen dioxide (NO(2)) using measurements conducted at 144 sampling sites in The Netherlands. Sites were randomly
FUNCTIONAL-COEFFICIENT REGRESSION MODEL AND ITS ESTIMATION
Institute of Scientific and Technical Information of China (English)
无
2001-01-01
In this paper,a class of functional-coefficient regression models is proposed and an estimation procedure based on the locally weighted least equares is suggested. This class of models,with the proposed estimation method,is a powerful means for exploratory data analysis.
Fitting Additive Binomial Regression Models with the R Package blm
Directory of Open Access Journals (Sweden)
Stephanie Kovalchik
2013-09-01
Full Text Available The R package blm provides functions for fitting a family of additive regression models to binary data. The included models are the binomial linear model, in which all covariates have additive effects, and the linear-expit (lexpit model, which allows some covariates to have additive effects and other covariates to have logisitc effects. Additive binomial regression is a model of event probability, and the coefficients of linear terms estimate covariate-adjusted risk differences. Thus, in contrast to logistic regression, additive binomial regression puts focus on absolute risk and risk differences. In this paper, we give an overview of the methodology we have developed to fit the binomial linear and lexpit models to binary outcomes from cohort and population-based case-control studies. We illustrate the blm packages methods for additive model estimation, diagnostics, and inference with risk association analyses of a bladder cancer nested case-control study in the NIH-AARP Diet and Health Study.
Maximum Entropy Discrimination Poisson Regression for Software Reliability Modeling.
Chatzis, Sotirios P; Andreou, Andreas S
2015-11-01
Reliably predicting software defects is one of the most significant tasks in software engineering. Two of the major components of modern software reliability modeling approaches are: 1) extraction of salient features for software system representation, based on appropriately designed software metrics and 2) development of intricate regression models for count data, to allow effective software reliability data modeling and prediction. Surprisingly, research in the latter frontier of count data regression modeling has been rather limited. More specifically, a lack of simple and efficient algorithms for posterior computation has made the Bayesian approaches appear unattractive, and thus underdeveloped in the context of software reliability modeling. In this paper, we try to address these issues by introducing a novel Bayesian regression model for count data, based on the concept of max-margin data modeling, effected in the context of a fully Bayesian model treatment with simple and efficient posterior distribution updates. Our novel approach yields a more discriminative learning technique, making more effective use of our training data during model inference. In addition, it allows of better handling uncertainty in the modeled data, which can be a significant problem when the training data are limited. We derive elegant inference algorithms for our model under the mean-field paradigm and exhibit its effectiveness using the publicly available benchmark data sets.
Sugarcane Land Classification with Satellite Imagery using Logistic Regression Model
Henry, F.; Herwindiati, D. E.; Mulyono, S.; Hendryli, J.
2017-03-01
This paper discusses the classification of sugarcane plantation area from Landsat-8 satellite imagery. The classification process uses binary logistic regression method with time series data of normalized difference vegetation index as input. The process is divided into two steps: training and classification. The purpose of training step is to identify the best parameter of the regression model using gradient descent algorithm. The best fit of the model can be utilized to classify sugarcane and non-sugarcane area. The experiment shows high accuracy and successfully maps the sugarcane plantation area which obtained best result of Cohen’s Kappa value 0.7833 (strong) with 89.167% accuracy.
Institute of Scientific and Technical Information of China (English)
王林辉; 赵景
2015-01-01
At present,the decline in labor income share becomes universal phenomenon, numerous lit⁃eratures aiming to explain it are confined to the perspectives of the structural transformation of the economy and trade and technology, with limited universal adaptability and explanatory ability, and are ignoring biased technology progress influence on labor income share.This paper uses the three equation standardization system method to measure biased technology progress, and analyzes the in⁃come distribution effects of technological progress through quantile regression by using regional panel data.The results show that: the trend of regional technology progress bias converges to national trend, appears capital biased technical progress on the average, which means the technical progress is more advantageous to improve the marginal output of capital. Results on panel quantile regression show that: biased technical progress has obvious depressant effect on labor income share, the more techni⁃cal progress biased towards the capital, the more capital income share can be promoted, and labor in⁃come status can be worse. At the same time, in the different quantiles of labor income share, the in⁃come distribution effects of technical progress are different, before reaching the 50% quantiles, the lower the labor income share, the stronger the inhibiting effect of technical progress bias on labor in⁃come share, and with the rise of labor incomes, the inhibiting effect will continue to weaken. Mean⁃while, influenced by technical progress itself and factor endowments structure and regional economic environment, income distribution effect of biased technical progress also appears to regional differences.%劳动收入占比下降成为世界范围内的普遍现象，大量文献从经济结构转型、国际贸易和技术进步视角解释，其普适性和解释力度有限，且忽视技术进步方向变化对劳动收入分配的影响。文章采用三方程标准化
Quantiles for Finite Mixtures of Normal Distributions
Rahman, Mezbahur; Rahman, Rumanur; Pearson, Larry M.
2006-01-01
Quantiles for finite mixtures of normal distributions are computed. The difference between a linear combination of independent normal random variables and a linear combination of independent normal densities is emphasized. (Contains 3 tables and 1 figure.)
The art of regression modeling in road safety
Hauer, Ezra
2015-01-01
This unique book explains how to fashion useful regression models from commonly available data to erect models essential for evidence-based road safety management and research. Composed from techniques and best practices presented over many years of lectures and workshops, The Art of Regression Modeling in Road Safety illustrates that fruitful modeling cannot be done without substantive knowledge about the modeled phenomenon. Class-tested in courses and workshops across North America, the book is ideal for professionals, researchers, university professors, and graduate students with an interest in, or responsibilities related to, road safety. This book also: · Presents for the first time a powerful analytical tool for road safety researchers and practitioners · Includes problems and solutions in each chapter as well as data and spreadsheets for running models and PowerPoint presentation slides · Features pedagogy well-suited for graduate courses and workshops including problems, solutions, and PowerPoint p...
Logistic regression for risk factor modelling in stuttering research.
Reed, Phil; Wu, Yaqionq
2013-06-01
To outline the uses of logistic regression and other statistical methods for risk factor analysis in the context of research on stuttering. The principles underlying the application of a logistic regression are illustrated, and the types of questions to which such a technique has been applied in the stuttering field are outlined. The assumptions and limitations of the technique are discussed with respect to existing stuttering research, and with respect to formulating appropriate research strategies to accommodate these considerations. Finally, some alternatives to the approach are briefly discussed. The way the statistical procedures are employed are demonstrated with some hypothetical data. Research into several practical issues concerning stuttering could benefit if risk factor modelling were used. Important examples are early diagnosis, prognosis (whether a child will recover or persist) and assessment of treatment outcome. After reading this article you will: (a) Summarize the situations in which logistic regression can be applied to a range of issues about stuttering; (b) Follow the steps in performing a logistic regression analysis; (c) Describe the assumptions of the logistic regression technique and the precautions that need to be checked when it is employed; (d) Be able to summarize its advantages over other techniques like estimation of group differences and simple regression. Copyright © 2012 Elsevier Inc. All rights reserved.
Direction of Effects in Multiple Linear Regression Models.
Wiedermann, Wolfgang; von Eye, Alexander
2015-01-01
Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed.
Modelling multimodal photometric redshift regression with noisy observations
Kügler, S D
2016-01-01
In this work, we are trying to extent the existing photometric redshift regression models from modeling pure photometric data back to the spectra themselves. To that end, we developed a PCA that is capable of describing the input uncertainty (including missing values) in a dimensionality reduction framework. With this "spectrum generator" at hand, we are capable of treating the redshift regression problem in a fully Bayesian framework, returning a posterior distribution over the redshift. This approach allows therefore to approach the multimodal regression problem in an adequate fashion. In addition, input uncertainty on the magnitudes can be included quite naturally and lastly, the proposed algorithm allows in principle to make predictions outside the training values which makes it a fascinating opportunity for the detection of high-redshifted quasars.
Robust Bayesian Regularized Estimation Based on t Regression Model
Directory of Open Access Journals (Sweden)
Zean Li
2015-01-01
Full Text Available The t distribution is a useful extension of the normal distribution, which can be used for statistical modeling of data sets with heavy tails, and provides robust estimation. In this paper, in view of the advantages of Bayesian analysis, we propose a new robust coefficient estimation and variable selection method based on Bayesian adaptive Lasso t regression. A Gibbs sampler is developed based on the Bayesian hierarchical model framework, where we treat the t distribution as a mixture of normal and gamma distributions and put different penalization parameters for different regression coefficients. We also consider the Bayesian t regression with adaptive group Lasso and obtain the Gibbs sampler from the posterior distributions. Both simulation studies and real data example show that our method performs well compared with other existing methods when the error distribution has heavy tails and/or outliers.
A Multi-objective Procedure for Efficient Regression Modeling
Sinha, Ankur; Kuosmanen, Timo
2012-01-01
Variable selection is recognized as one of the most critical steps in statistical modeling. The problems encountered in engineering and social sciences are commonly characterized by over-abundance of explanatory variables, non-linearities and unknown interdependencies between the regressors. An added difficulty is that the analysts may have little or no prior knowledge on the relative importance of the variables. To provide a robust method for model selection, this paper introduces a technique called the Multi-objective Genetic Algorithm for Variable Selection (MOGA-VS) which provides the user with an efficient set of regression models for a given data-set. The algorithm considers the regression problem as a two objective task, where the purpose is to choose those models over the other which have less number of regression coefficients and better goodness of fit. In MOGA-VS, the model selection procedure is implemented in two steps. First, we generate the frontier of all efficient or non-dominated regression m...
Gizaw, Mesgana Seyoum; Gan, Thian Yew
2016-07-01
Regional Flood Frequency Analysis (RFFA) is a statistical method widely used to estimate flood quantiles of catchments with limited streamflow data. In addition, to estimate the flood quantile of ungauged sites, there could be only a limited number of stations with complete dataset are available from hydrologically similar, surrounding catchments. Besides traditional regression based RFFA methods, recent applications of machine learning algorithms such as the artificial neural network (ANN) have shown encouraging results in regional flood quantile estimations. Another novel machine learning technique that is becoming widely applicable in the hydrologic community is the Support Vector Regression (SVR). In this study, an RFFA model based on SVR was developed to estimate regional flood quantiles for two study areas, one with 26 catchments located in southeastern British Columbia (BC) and another with 23 catchments located in southern Ontario (ON), Canada. The SVR-RFFA model for both study sites was developed from 13 sets of physiographic and climatic predictors for the historical period. The Ef (Nash Sutcliffe coefficient) and R2 of the SVR-RFFA model was about 0.7 when estimating flood quantiles of 10, 25, 50 and 100 year return periods which indicate satisfactory model performance in both study areas. In addition, the SVR-RFFA model also performed well based on other goodness-of-fit statistics such as BIAS (mean bias) and BIASr (relative BIAS). If the amount of data available for training RFFA models is limited, the SVR-RFFA model was found to perform better than an ANN based RFFA model, and with significantly lower median CV (coefficient of variation) of the estimated flood quantiles. The SVR-RFFA model was then used to project changes in flood quantiles over the two study areas under the impact of climate change using the RCP4.5 and RCP8.5 climate projections of five Coupled Model Intercomparison Project (CMIP5) GCMs (Global Climate Models) for the 2041
Analyzing industrial energy use through ordinary least squares regression models
Golden, Allyson Katherine
Extensive research has been performed using regression analysis and calibrated simulations to create baseline energy consumption models for residential buildings and commercial institutions. However, few attempts have been made to discuss the applicability of these methodologies to establish baseline energy consumption models for industrial manufacturing facilities. In the few studies of industrial facilities, the presented linear change-point and degree-day regression analyses illustrate ideal cases. It follows that there is a need in the established literature to discuss the methodologies and to determine their applicability for establishing baseline energy consumption models of industrial manufacturing facilities. The thesis determines the effectiveness of simple inverse linear statistical regression models when establishing baseline energy consumption models for industrial manufacturing facilities. Ordinary least squares change-point and degree-day regression methods are used to create baseline energy consumption models for nine different case studies of industrial manufacturing facilities located in the southeastern United States. The influence of ambient dry-bulb temperature and production on total facility energy consumption is observed. The energy consumption behavior of industrial manufacturing facilities is only sometimes sufficiently explained by temperature, production, or a combination of the two variables. This thesis also provides methods for generating baseline energy models that are straightforward and accessible to anyone in the industrial manufacturing community. The methods outlined in this thesis may be easily replicated by anyone that possesses basic spreadsheet software and general knowledge of the relationship between energy consumption and weather, production, or other influential variables. With the help of simple inverse linear regression models, industrial manufacturing facilities may better understand their energy consumption and
Applications of some discrete regression models for count data
Directory of Open Access Journals (Sweden)
B. M. Golam Kibria
2006-01-01
Full Text Available In this paper we have considered several regression models to fit the count data that encounter in the field of Biometrical, Environmental, Social Sciences and Transportation Engineering. We have fitted Poisson (PO, Negative Binomial (NB, Zero-Inflated Poisson (ZIP and Zero-Inflated Negative Binomial (ZINB regression models to run-off-road (ROR crash data which collected on arterial roads in south region (rural of Florida State. To compare the performance of these models, we analyzed data with moderate to high percentage of zero counts. Because the variances were almost three times greater than the means, it appeared that both NB and ZINB models performed better than PO and ZIP models for the zero inflated and over dispersed count data.
A regression model to estimate regional ground water recharge.
Lorenz, David L; Delin, Geoffrey N
2007-01-01
A regional regression model was developed to estimate the spatial distribution of ground water recharge in subhumid regions. The regional regression recharge (RRR) model was based on a regression of basin-wide estimates of recharge from surface water drainage basins, precipitation, growing degree days (GDD), and average basin specific yield (SY). Decadal average recharge, precipitation, and GDD were used in the RRR model. The RRR estimates were derived from analysis of stream base flow using a computer program that was based on the Rorabaugh method. As expected, there was a strong correlation between recharge and precipitation. The model was applied to statewide data in Minnesota. Where precipitation was least in the western and northwestern parts of the state (50 to 65 cm/year), recharge computed by the RRR model also was lowest (0 to 5 cm/year). A strong correlation also exists between recharge and SY. SY was least in areas where glacial lake clay occurs, primarily in the northwest part of the state; recharge estimates in these areas were in the 0- to 5-cm/year range. In sand-plain areas where SY is greatest, recharge estimates were in the 15- to 29-cm/year range on the basis of the RRR model. Recharge estimates that were based on the RRR model compared favorably with estimates made on the basis of other methods. The RRR model can be applied in other subhumid regions where region wide data sets of precipitation, streamflow, GDD, and soils data are available.
Time series regression model for infectious disease and weather.
Imai, Chisato; Armstrong, Ben; Chalabi, Zaid; Mangtani, Punam; Hashizume, Masahiro
2015-10-01
Time series regression has been developed and long used to evaluate the short-term associations of air pollution and weather with mortality or morbidity of non-infectious diseases. The application of the regression approaches from this tradition to infectious diseases, however, is less well explored and raises some new issues. We discuss and present potential solutions for five issues often arising in such analyses: changes in immune population, strong autocorrelations, a wide range of plausible lag structures and association patterns, seasonality adjustments, and large overdispersion. The potential approaches are illustrated with datasets of cholera cases and rainfall from Bangladesh and influenza and temperature in Tokyo. Though this article focuses on the application of the traditional time series regression to infectious diseases and weather factors, we also briefly introduce alternative approaches, including mathematical modeling, wavelet analysis, and autoregressive integrated moving average (ARIMA) models. Modifications proposed to standard time series regression practice include using sums of past cases as proxies for the immune population, and using the logarithm of lagged disease counts to control autocorrelation due to true contagion, both of which are motivated from "susceptible-infectious-recovered" (SIR) models. The complexity of lag structures and association patterns can often be informed by biological mechanisms and explored by using distributed lag non-linear models. For overdispersed models, alternative distribution models such as quasi-Poisson and negative binomial should be considered. Time series regression can be used to investigate dependence of infectious diseases on weather, but may need modifying to allow for features specific to this context. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Harrell , Jr , Frank E
2015-01-01
This highly anticipated second edition features new chapters and sections, 225 new references, and comprehensive R software. In keeping with the previous edition, this book is about the art and science of data analysis and predictive modeling, which entails choosing and using multiple tools. Instead of presenting isolated techniques, this text emphasizes problem solving strategies that address the many issues arising when developing multivariable models using real data and not standard textbook examples. It includes imputation methods for dealing with missing data effectively, methods for fitting nonlinear relationships and for making the estimation of transformations a formal part of the modeling process, methods for dealing with "too many variables to analyze and not enough observations," and powerful model validation techniques based on the bootstrap. The reader will gain a keen understanding of predictive accuracy, and the harm of categorizing continuous predictors or outcomes. This text realistically...
Linearity and Misspecification Tests for Vector Smooth Transition Regression Models
DEFF Research Database (Denmark)
Teräsvirta, Timo; Yang, Yukai
The purpose of the paper is to derive Lagrange multiplier and Lagrange multiplier type specification and misspecification tests for vector smooth transition regression models. We report results from simulation studies in which the size and power properties of the proposed asymptotic tests in small...
Trimmed Likelihood-based Estimation in Binary Regression Models
Cizek, P.
2005-01-01
The binary-choice regression models such as probit and logit are typically estimated by the maximum likelihood method.To improve its robustness, various M-estimation based procedures were proposed, which however require bias corrections to achieve consistency and their resistance to outliers is rela
PARAMETER ESTIMATION IN LINEAR REGRESSION MODELS FOR LONGITUDINAL CONTAMINATED DATA
Institute of Scientific and Technical Information of China (English)
QianWeimin; LiYumei
2005-01-01
The parameter estimation and the coefficient of contamination for the regression models with repeated measures are studied when its response variables are contaminated by another random variable sequence. Under the suitable conditions it is proved that the estimators which are established in the paper are strongly consistent estimators.
Change-point estimation for censored regression model
Institute of Scientific and Technical Information of China (English)
Zhan-feng WANG; Yao-hua WU; Lin-cheng ZHAO
2007-01-01
In this paper, we consider the change-point estimation in the censored regression model assuming that there exists one change point. A nonparametric estimate of the change-point is proposed and is shown to be strongly consistent. Furthermore, its convergence rate is also obtained.
Improved Methodology for Parameter Inference in Nonlinear, Hydrologic Regression Models
Bates, Bryson C.
1992-01-01
A new method is developed for the construction of reliable marginal confidence intervals and joint confidence regions for the parameters of nonlinear, hydrologic regression models. A parameter power transformation is combined with measures of the asymptotic bias and asymptotic skewness of maximum likelihood estimators to determine the transformation constants which cause the bias or skewness to vanish. These optimized constants are used to construct confidence intervals and regions for the transformed model parameters using linear regression theory. The resulting confidence intervals and regions can be easily mapped into the original parameter space to give close approximations to likelihood method confidence intervals and regions for the model parameters. Unlike many other approaches to parameter transformation, the procedure does not use a grid search to find the optimal transformation constants. An example involving the fitting of the Michaelis-Menten model to velocity-discharge data from an Australian gauging station is used to illustrate the usefulness of the methodology.
Girinoto, Sadik, Kusman; Indahwati
2017-03-01
The National Socio-Economic Survey samples are designed to produce estimates of parameters of planned domains (provinces and districts). The estimation of unplanned domains (sub-districts and villages) has its limitation to obtain reliable direct estimates. One of the possible solutions to overcome this problem is employing small area estimation techniques. The popular choice of small area estimation is based on linear mixed models. However, such models need strong distributional assumptions and do not easy allow for outlier-robust estimation. As an alternative approach for this purpose, M-quantile regression approach to small area estimation based on modeling specific M-quantile coefficients of conditional distribution of study variable given auxiliary covariates. It obtained outlier-robust estimation from influence function of M-estimator type and also no need strong distributional assumptions. In this paper, the aim of study is to estimate the poverty indicator at sub-district level in Bogor District-West Java using M-quantile models for small area estimation. Using data taken from National Socioeconomic Survey and Villages Potential Statistics, the results provide a detailed description of pattern of incidence and intensity of poverty within Bogor district. We also compare the results with direct estimates. The results showed the framework may be preferable when direct estimate having no incidence of poverty at all in the small area.
On modified skew logistic regression model and its applications
Directory of Open Access Journals (Sweden)
C. Satheesh Kumar
2015-12-01
Full Text Available Here we consider a modiﬁed form of the logistic regression model useful for situations where the dependent variable is dichotomous in nature and the explanatory variables exhibit asymmetric and multimodal behaviour. The proposed model has been ﬁtted to some real life data set by using method of maximum likelihood estimation and illustrated its usefulness in certain medical applications.
Improved Testing and Specifivations of Smooth Transition Regression Models
Escribano, Álvaro; Jordá, Óscar
1997-01-01
This paper extends previous work in Escribano and Jordá (1997)and introduces new LM specification procedures to choose between Logistic and Exponential Smooth Transition Regression (STR)Models. These procedures are simpler, consistent and more powerful than those previously available in the literature. An analysis of the properties of Taylor approximations around the transition function of STR models permits one to understand why these procedures work better and it suggests ways to improve te...
Support vector regression-based internal model control
Institute of Scientific and Technical Information of China (English)
HUANG Yan-wei; PENG Tie-gen
2007-01-01
This paper proposes a design of internal model control systems for process with delay by using support vector regression (SVR). The proposed system fully uses the excellent nonlinear estimation performance of SVR with the structural risk minimization principle. Closed-system stability and steady error are analyzed for the existence of modeling errors. The simulations show that the proposed control systems have the better control performance than that by neural networks in the cases of the training samples with small size and noises.
CONSERVATIVE ESTIMATING FUNCTIONIN THE NONLINEAR REGRESSION MODEL WITHAGGREGATED DATA
Institute of Scientific and Technical Information of China (English)
无
2000-01-01
The purpose of this paper is to study the theory of conservative estimating functions in nonlinear regression model with aggregated data. In this model, a quasi-score function with aggregated data is defined. When this function happens to be conservative, it is projection of the true score function onto a class of estimation functions. By constructing, the potential function for the projected score with aggregated data is obtained, which have some properties of log-likelihood function.
Using regression models to determine the poroelastic properties of cartilage.
Chung, Chen-Yuan; Mansour, Joseph M
2013-07-26
The feasibility of determining biphasic material properties using regression models was investigated. A transversely isotropic poroelastic finite element model of stress relaxation was developed and validated against known results. This model was then used to simulate load intensity for a wide range of material properties. Linear regression equations for load intensity as a function of the five independent material properties were then developed for nine time points (131, 205, 304, 390, 500, 619, 700, 800, and 1000s) during relaxation. These equations illustrate the effect of individual material property on the stress in the time history. The equations at the first four time points, as well as one at a later time (five equations) could be solved for the five unknown material properties given computed values of the load intensity. Results showed that four of the five material properties could be estimated from the regression equations to within 9% of the values used in simulation if time points up to 1000s are included in the set of equations. However, reasonable estimates of the out of plane Poisson's ratio could not be found. Although all regression equations depended on permeability, suggesting that true equilibrium was not realized at 1000s of simulation, it was possible to estimate material properties to within 10% of the expected values using equations that included data up to 800s. This suggests that credible estimates of most material properties can be obtained from tests that are not run to equilibrium, which is typically several thousand seconds.
On concurvity in nonlinear and nonparametric regression models
Directory of Open Access Journals (Sweden)
Sonia Amodio
2014-12-01
Full Text Available When data are affected by multicollinearity in the linear regression framework, then concurvity will be present in fitting a generalized additive model (GAM. The term concurvity describes nonlinear dependencies among the predictor variables. As collinearity results in inflated variance of the estimated regression coefficients in the linear regression model, the result of the presence of concurvity leads to instability of the estimated coefficients in GAMs. Even if the backfitting algorithm will always converge to a solution, in case of concurvity the final solution of the backfitting procedure in fitting a GAM is influenced by the starting functions. While exact concurvity is highly unlikely, approximate concurvity, the analogue of multicollinearity, is of practical concern as it can lead to upwardly biased estimates of the parameters and to underestimation of their standard errors, increasing the risk of committing type I error. We compare the existing approaches to detect concurvity, pointing out their advantages and drawbacks, using simulated and real data sets. As a result, this paper will provide a general criterion to detect concurvity in nonlinear and non parametric regression models.
Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate
Minh, Vu Trieu; Katushin, Dmitri; Antonov, Maksim; Veinthal, Renno
2017-03-01
This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM) based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS), Brazilian tensile strength (BTS), rock brittleness index (BI), the distance between planes of weakness (DPW), and the alpha angle (Alpha) between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP). Four (4) statistical regression models (two linear and two nonlinear) are built to predict the ROP of TBM. Finally a fuzzy logic model is developed as an alternative method and compared to the four statistical regression models. Results show that the fuzzy logic model provides better estimations and can be applied to predict the TBM performance. The R-squared value (R2) of the fuzzy logic model scores the highest value of 0.714 over the second runner-up of 0.667 from the multiple variables nonlinear regression model.
Institute of Scientific and Technical Information of China (English)
尚君; 陈艺源; 马捷; 任燕燕
2013-01-01
This article uses the related time series data from 1985 to 2011 to make the empirical analyses on the different influence degree of per capita GDP, per capita education expenditure and the disposable income of residents on life insurance demand and property insurance demand under different demand level using the co-integration analysis and quantile regression method. The results show that:all the three factors have significant positive effect on the life insurance demand and property insurance demand, But on the whole, the influence degree of various factors on the life insurance demand are greater than the property insurance demand.And, under different levels of insurance demand, the influence degree of all influencing factors presents difference for both life insurance demand and property insurance demand.%本文采用1985-2011年的相关时间序列数据，运用协整分析和分位数回归方法，将保险需求细分为人身保险需求和财产保险需求，分别分析了在不同需求水平下人均GDP、人均教育支出和居民可支配收入对保险需求的影响程度，结果表明：三个影响因素对人身保险需求和财产保险需求均存在显著的正向作用，但整体而言，各个因素对人身保险需求的影响程度均大于对财产保险需求的影响程度；在不同的保险需求水平下，各个影响因素的影响程度存在差异。
Efficient robust nonparametric estimation in a semimartingale regression model
Konev, Victor
2010-01-01
The paper considers the problem of robust estimating a periodic function in a continuous time regression model with dependent disturbances given by a general square integrable semimartingale with unknown distribution. An example of such a noise is non-gaussian Ornstein-Uhlenbeck process with the L\\'evy process subordinator, which is used to model the financial Black-Scholes type markets with jumps. An adaptive model selection procedure, based on the weighted least square estimates, is proposed. Under general moment conditions on the noise distribution, sharp non-asymptotic oracle inequalities for the robust risks have been derived and the robust efficiency of the model selection procedure has been shown.
REGRESSION ANALYSIS OF PRODUCTIVITY USING MIXED EFFECT MODEL
Directory of Open Access Journals (Sweden)
Siana Halim
2007-01-01
Full Text Available Production plants of a company are located in several areas that spread across Middle and East Java. As the production process employs mostly manpower, we suspected that each location has different characteristics affecting the productivity. Thus, the production data may have a spatial and hierarchical structure. For fitting a linear regression using the ordinary techniques, we are required to make some assumptions about the nature of the residuals i.e. independent, identically and normally distributed. However, these assumptions were rarely fulfilled especially for data that have a spatial and hierarchical structure. We worked out the problem using mixed effect model. This paper discusses the model construction of productivity and several characteristics in the production line by taking location as a random effect. The simple model with high utility that satisfies the necessary regression assumptions was built using a free statistic software R version 2.6.1.
Illustrating Bayesian evaluation of informative hypotheses for regression models
Directory of Open Access Journals (Sweden)
Anouck eKluytmans
2012-01-01
Full Text Available In the present paper we illustrate the Bayesian evaluation of informative hypotheses for regression models. This approach allows psychologists to more directly test their theories than they would using conventional statis- tical analyses. Throughout this paper, both real-world data and simulated datasets will be introduced and evaluated to investigate the pragmatical as well as the theoretical qualities of the approach. We will pave the way from forming informative hypotheses in the context of regression models to interpreting the Bayes factors that express the support for the hypotheses being evaluated. In doing so, the present approach goes beyond p-values and uninformative null hypothesis testing, moving on to informative testing and quantification of model support in a way that is accessible to everyday psychologists.
Technical Note: The normal quantile transformation and its application in a flood forecasting system
Directory of Open Access Journals (Sweden)
K. Bogner
2012-04-01
Full Text Available The Normal Quantile Transform (NQT has been used in many hydrological and meteorological applications in order to make the Cumulated Distribution Function (CDF of the observed, simulated and forecast river discharge, water level or precipitation data Gaussian. It is also the heart of the meta-Gaussian model for assessing the total predictive uncertainty of the Hydrological Uncertainty Processor (HUP developed by Krzysztofowicz. In the field of geo-statistics this transformation is better known as the Normal-Score Transform. In this paper some possible problems caused by small sample sizes when applying the NQT in flood forecasting systems will be discussed and a novel way to solve the problem will be outlined by combining extreme value analysis and non-parametric regression methods. The method will be illustrated by examples of hydrological stream-flow forecasts.
Batch Mode Active Learning for Regression With Expected Model Change.
Cai, Wenbin; Zhang, Muhan; Zhang, Ya
2016-04-20
While active learning (AL) has been widely studied for classification problems, limited efforts have been done on AL for regression. In this paper, we introduce a new AL framework for regression, expected model change maximization (EMCM), which aims at choosing the unlabeled data instances that result in the maximum change of the current model once labeled. The model change is quantified as the difference between the current model parameters and the updated parameters after the inclusion of the newly selected examples. In light of the stochastic gradient descent learning rule, we approximate the change as the gradient of the loss function with respect to each single candidate instance. Under the EMCM framework, we propose novel AL algorithms for the linear and nonlinear regression models. In addition, by simulating the behavior of the sequential AL policy when applied for k iterations, we further extend the algorithms to batch mode AL to simultaneously choose a set of k most informative instances at each query time. Extensive experimental results on both UCI and StatLib benchmark data sets have demonstrated that the proposed algorithms are highly effective and efficient.
Hierarchical Neural Regression Models for Customer Churn Prediction
Directory of Open Access Journals (Sweden)
Golshan Mohammadi
2013-01-01
Full Text Available As customers are the main assets of each industry, customer churn prediction is becoming a major task for companies to remain in competition with competitors. In the literature, the better applicability and efficiency of hierarchical data mining techniques has been reported. This paper considers three hierarchical models by combining four different data mining techniques for churn prediction, which are backpropagation artificial neural networks (ANN, self-organizing maps (SOM, alpha-cut fuzzy c-means (α-FCM, and Cox proportional hazards regression model. The hierarchical models are ANN + ANN + Cox, SOM + ANN + Cox, and α-FCM + ANN + Cox. In particular, the first component of the models aims to cluster data in two churner and nonchurner groups and also filter out unrepresentative data or outliers. Then, the clustered data as the outputs are used to assign customers to churner and nonchurner groups by the second technique. Finally, the correctly classified data are used to create Cox proportional hazards model. To evaluate the performance of the hierarchical models, an Iranian mobile dataset is considered. The experimental results show that the hierarchical models outperform the single Cox regression baseline model in terms of prediction accuracy, Types I and II errors, RMSE, and MAD metrics. In addition, the α-FCM + ANN + Cox model significantly performs better than the two other hierarchical models.
Regression Model to Predict Global Solar Irradiance in Malaysia
Directory of Open Access Journals (Sweden)
Hairuniza Ahmed Kutty
2015-01-01
Full Text Available A novel regression model is developed to estimate the monthly global solar irradiance in Malaysia. The model is developed based on different available meteorological parameters, including temperature, cloud cover, rain precipitate, relative humidity, wind speed, pressure, and gust speed, by implementing regression analysis. This paper reports on the details of the analysis of the effect of each prediction parameter to identify the parameters that are relevant to estimating global solar irradiance. In addition, the proposed model is compared in terms of the root mean square error (RMSE, mean bias error (MBE, and the coefficient of determination (R2 with other models available from literature studies. Seven models based on single parameters (PM1 to PM7 and five multiple-parameter models (PM7 to PM12 are proposed. The new models perform well, with RMSE ranging from 0.429% to 1.774%, R2 ranging from 0.942 to 0.992, and MBE ranging from −0.1571% to 0.6025%. In general, cloud cover significantly affects the estimation of global solar irradiance. However, cloud cover in Malaysia lacks sufficient influence when included into multiple-parameter models although it performs fairly well in single-parameter prediction models.
Phone Duration Modeling of Affective Speech Using Support Vector Regression
Directory of Open Access Journals (Sweden)
Alexandros Lazaridis
2012-07-01
Full Text Available In speech synthesis accurate modeling of prosody is important for producing high quality synthetic speech. One of the main aspects of prosody is phone duration. Robust phone duration modeling is a prerequisite for synthesizing emotional speech with natural sounding. In this work ten phone duration models are evaluated. These models belong to well known and widely used categories of algorithms, such as the decision trees, linear regression, lazy-learning algorithms and meta-learning algorithms. Furthermore, we investigate the effectiveness of Support Vector Regression (SVR in phone duration modeling in the context of emotional speech. The evaluation of the eleven models is performed on a Modern Greek emotional speech database which consists of four categories of emotional speech (anger, fear, joy, sadness plus neutral speech. The experimental results demonstrated that the SVR-based modeling outperforms the other ten models across all the four emotion categories. Specifically, the SVR model achieved an average relative reduction of 8% in terms of root mean square error (RMSE throughout all emotional categories.
Data correction for seven activity trackers based on regression models.
Andalibi, Vafa; Honko, Harri; Christophe, Francois; Viik, Jari
2015-08-01
Using an activity tracker for measuring activity-related parameters, e.g. steps and energy expenditure (EE), can be very helpful in assisting a person's fitness improvement. Unlike the measuring of number of steps, an accurate EE estimation requires additional personal information as well as accurate velocity of movement, which is hard to achieve due to inaccuracy of sensors. In this paper, we have evaluated regression-based models to improve the precision for both steps and EE estimation. For this purpose, data of seven activity trackers and two reference devices was collected from 20 young adult volunteers wearing all devices at once in three different tests, namely 60-minute office work, 6-hour overall activity and 60-minute walking. Reference data is used to create regression models for each device and relative percentage errors of adjusted values are then statistically compared to that of original values. The effectiveness of regression models are determined based on the result of a statistical test. During a walking period, EE measurement was improved in all devices. The step measurement was also improved in five of them. The results show that improvement of EE estimation is possible only with low-cost implementation of fitting model over the collected data e.g. in the app or in corresponding service back-end.
Forecasting relativistic electron flux using dynamic multiple regression models
Directory of Open Access Journals (Sweden)
H.-L. Wei
2011-02-01
Full Text Available The forecast of high energy electron fluxes in the radiation belts is important because the exposure of modern spacecraft to high energy particles can result in significant damage to onboard systems. A comprehensive physical model of processes related to electron energisation that can be used for such a forecast has not yet been developed. In the present paper a systems identification approach is exploited to deduce a dynamic multiple regression model that can be used to predict the daily maximum of high energy electron fluxes at geosynchronous orbit from data. It is shown that the model developed provides reliable predictions.
Resampling procedures to validate dendro-auxometric regression models
Directory of Open Access Journals (Sweden)
2009-03-01
Full Text Available Regression analysis has a large use in several sectors of forest research. The validation of a dendro-auxometric model is a basic step in the building of the model itself. The more a model resists to attempts of demonstrating its groundlessness, the more its reliability increases. In the last decades many new theories, that quite utilizes the calculation speed of the calculators, have been formulated. Here we show the results obtained by the application of a bootsprap resampling procedure as a validation tool.
Fuzzy and Regression Modelling of Hard Milling Process
Directory of Open Access Journals (Sweden)
A. Tamilarasan
2014-04-01
Full Text Available The present study highlights the application of box-behnken design coupled with fuzzy and regression modeling approach for making expert system in hard milling process to improve the process performance with systematic reduction of production cost. The important input fields of work piece hardness, nose radius, feed per tooth, radial depth of cut and axial depth cut were considered. The cutting forces, work surface temperature and sound pressure level were identified as key index of machining outputs. The results indicate that the fuzzy logic and regression modeling technique can be effectively used for the prediction of desired responses with less average error variation. Predicted results were verified by experiments and shown the good potential characteristics of the developed system for automated machining environment.
Regression Cloud Models and Their Applications in Energy Consumption of Data Center
Directory of Open Access Journals (Sweden)
Yanshuang Zhou
2015-01-01
Full Text Available As cloud data center consumes more and more energy, both researchers and engineers aim to minimize energy consumption while keeping its services available. A good energy model can reflect the relationships between running tasks and the energy consumed by hardware and can be further used to schedule tasks for saving energy. In this paper, we analyzed linear and nonlinear regression energy model based on performance counters and system utilization and proposed a support vector regression energy model. For performance counters, we gave a general linear regression framework and compared three linear regression models. For system utilization, we compared our support vector regression model with linear regression and three nonlinear regression models. The experiments show that linear regression model is good enough to model performance counters, nonlinear regression is better than linear regression model for modeling system utilization, and support vector regression model is better than polynomial and exponential regression models.
Pressure Points in Reading Comprehension: A Quantile Multiple Regression Analysis
Logan, Jessica
2017-01-01
The goal of this study was to examine how selected pressure points or areas of vulnerability are related to individual differences in reading comprehension and whether the importance of these pressure points varies as a function of the level of children's reading comprehension. A sample of 245 third-grade children were given an assessment battery…
Probabilistic load forecasting via Quantile Regression Averaging on sister forecasts
Bidong Liu; Jakub Nowotarski; Tao Hong; Rafal Weron
2015-01-01
Majority of the load forecasting literature has been on point forecasting, which provides the expected value for each step throughout the forecast horizon. In the smart grid era, the electricity demand is more active and less predictable than ever before. As a result, probabilistic load forecasting, which provides additional information on the variability and uncertainty of future load values, is becoming of great importance to power systems planning and operations. This paper proposes a prac...
Determination of accident-prone road sections using quantile regression
Directory of Open Access Journals (Sweden)
Thomas Edison Guerrero-Barbosa
2016-01-01
Full Text Available La identificación acertada de sitios peligrosos a accidentalidad permite a las entidades gubernamentales, encargadas de realizar mejoras a la seguridad vial, una adecuada destinación de las inversiones a los tramos viales verdaderamente críticos; dada esta necesidad inmediata, la presente investigación se enfoca en determinar los tramos propensos a accidentes y la posterior elaboración de un ranking de peligrosidad efectuado a los tramos críticos encontrados en el perímetro urbano de Ocaña (Colombia, utilizando Regresión Cuantil (RC. A partir del modelo estimado correspondiente al cuantil 95 fue posible establecer relaciones de causalidad entre características como longitud del tramo vial, ancho de calzada, número de carriles, número de intersecciones, tránsito promedio diario y velocidad media con la frecuencia de accidentes, determinándose un total de 7 tramos críticos a los cuales se les estableció un ranking de peligrosidad.
Central limit theorem of linear regression model under right censorship
Institute of Scientific and Technical Information of China (English)
HE; Shuyuan(何书元); HUANG; Xiang(Heung; Wong)(黄香)
2003-01-01
In this paper, the estimation of joint distribution F(y,z) of (Y, Z) and the estimation in thelinear regression model Y = b′Z + ε for complete data are extended to that of the right censored data. Theregression parameter estimates of b and the variance of ε are weighted least square estimates with randomweights. The central limit theorems of the estimators are obtained under very weak conditions and the derivedasymptotic variance has a very simple form.
APPLYING LOGISTIC REGRESSION MODEL TO THE EXAMINATION RESULTS DATA
Directory of Open Access Journals (Sweden)
Goutam Saha
2011-01-01
Full Text Available The binary logistic regression model is used to analyze the school examination results(scores of 1002 students. The analysis is performed on the basis of the independent variables viz.gender, medium of instruction, type of schools, category of schools, board of examinations andlocation of schools, where scores or marks are assumed to be dependent variables. The odds ratioanalysis compares the scores obtained in two examinations viz. matriculation and highersecondary.
Predicting and Modelling of Survival Data when Cox's Regression Model does not hold
DEFF Research Database (Denmark)
Scheike, Thomas H.; Zhang, Mei-Jie
2002-01-01
Aalen model; additive risk model; counting processes; competing risk; Cox regression; flexible modeling; goodness of fit; prediction of survival; survival analysis; time-varying effects......Aalen model; additive risk model; counting processes; competing risk; Cox regression; flexible modeling; goodness of fit; prediction of survival; survival analysis; time-varying effects...
GAUSSIAN COPULA MARGINAL REGRESSION FOR MODELING EXTREME DATA WITH APPLICATION
Directory of Open Access Journals (Sweden)
Sutikno
2014-01-01
Full Text Available Regression is commonly used to determine the relationship between the response variable and the predictor variable, where the parameters are estimated by Ordinary Least Square (OLS. This method can be used with an assumption that residuals are normally distributed (0, σ^{2}. However, the assumption of normality of the data is often violated due to extreme observations, which are often found in the climate data. Modeling of rice harvested area with rainfall predictor variables allows extreme observations. Therefore, another approximation is necessary to be applied in order to overcome the presence of extreme observations. The method used to solve this problem is a Gaussian Copula Marginal Regression (GCMR, the regression-based Copula. As a case study, the method is applied to model rice harvested area of rice production centers in East Java, Indonesia, covering District: Banyuwangi, Lamongan, Bojonegoro, Ngawi and Jember. Copula is chosen because this method is not strict against the assumption distribution, especially the normal distribution. Moreover, this method can describe dependency on extreme point clearly. The GCMR performance will be compared with OLS and Generalized Linear Models (GLM. The identification result of the dependencies structure between the Rice Harvest per period (RH and monthly rainfall showed a dependency in all areas of research. It is shown that the real test copula type mostly follows the Gumbel distribution. While the comparison of the model goodness for rice harvested area in the modeling showed that the method used to model the exact GCMR in five districts RH1 and RH2 in Jember district since its lowest AICc. Looking at the data distribution pattern of response variables, it can be concluded that the GCMR good for modeling the response variable that is not normally distributed and tend to have a large skew.
Online Statistical Modeling (Regression Analysis) for Independent Responses
Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus
2017-06-01
Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.
DEFF Research Database (Denmark)
Klein, John P.; Andersen, Per Kragh
2005-01-01
Bone marrow transplantation; Generalized estimating equations; Jackknife statistics; Regression models......Bone marrow transplantation; Generalized estimating equations; Jackknife statistics; Regression models...
K factor estimation in distribution transformers using linear regression models
Directory of Open Access Journals (Sweden)
Juan Miguel Astorga Gómez
2016-06-01
Full Text Available Background: Due to massive incorporation of electronic equipment to distribution systems, distribution transformers are subject to operation conditions other than the design ones, because of the circulation of harmonic currents. It is necessary to quantify the effect produced by these harmonic currents to determine the capacity of the transformer to withstand these new operating conditions. The K-factor is an indicator that estimates the ability of a transformer to withstand the thermal effects caused by harmonic currents. This article presents a linear regression model to estimate the value of the K-factor, from total current harmonic content obtained with low-cost equipment.Method: Two distribution transformers that feed different loads are studied variables, current total harmonic distortion factor K are recorded, and the regression model that best fits the data field is determined. To select the regression model the coefficient of determination R2 and the Akaike Information Criterion (AIC are used. With the selected model, the K-factor is estimated to actual operating conditions.Results: Once determined the model it was found that for both agricultural cargo and industrial mining, present harmonic content (THDi exceeds the values that these transformers can drive (average of 12.54% and minimum 8,90% in the case of agriculture and average value of 18.53% and a minimum of 6.80%, for industrial mining case.Conclusions: When estimating the K factor using polynomial models it was determined that studied transformers can not withstand the current total harmonic distortion of their current loads. The appropriate K factor for studied transformer should be 4; this allows transformers support the current total harmonic distortion of their respective loads.
Institute of Scientific and Technical Information of China (English)
杨嘉樑; 黄洪亮; 宋利明; 饶欣; 吴越; 齐广瑞
2014-01-01
We developed an“Integrated Habitat Index (IHI)”model based on the quantile regression method using sur-vey data collected at 43 sites in waters near the Cook Islands from September, 2012 through November, 2012. The model variables included vertical profile data for temperature, salinity, chlorophyll-a, horizontal current, vertical current and catch per unit effort (CPUE) of albacore tuna(Thunnus alalunga), and the interactions among these variables. Mod-els were developed for five 40 m water strata between 40 m and 240 m and the entire water column to predict the spatial distribution of albacore tuna. The environmental variables measured at modeling sites were used as inputs to the IHI models to predict the IHI value of the 5 strata and the entire water column. We tested for a significant difference be-tween the observed CPUE and predicted CPUE within the 5 water strata and the entire water column using a Wilcoxon test. The Spearman correlation coefficients were assumed to indicate the predictive power of the IHI model. The trend line of the arithmetic average about the predicted IHI for the 5 strata was compared with the CPUEs at the specific depth stratum. The environmental variables at validation sites were used to validate the model’s predictive power. These data were input into the CPUE models to predict the CPUE of the 5 water strata and the entire water column. We used a Wilcoxon test to compare between the predicted and observed CPUEs within the 5 water strata and the entire water column to validate the IHI results. The CPUE for albacore tuna in the 5 strata and the entire water column exhibited a skewed normal distribution with a longer left tail. There was no significant difference between the nominal CPUEs and predictive CPUEs of albacore in the 5 water strata and the entire water column at the modelling sites or the validation sites. The IHI models had good predictive power, and were able to accurately predict the distribution of albacore tuna in
Extended cox regression model: The choice of timefunction
Isik, Hatice; Tutkun, Nihal Ata; Karasoy, Durdu
2017-07-01
Cox regression model (CRM), which takes into account the effect of censored observations, is one the most applicative and usedmodels in survival analysis to evaluate the effects of covariates. Proportional hazard (PH), requires a constant hazard ratio over time, is the assumptionofCRM. Using extended CRM provides the test of including a time dependent covariate to assess the PH assumption or an alternative model in case of nonproportional hazards. In this study, the different types of real data sets are used to choose the time function and the differences between time functions are analyzed and discussed.
A New Approach in Regression Analysis for Modeling Adsorption Isotherms
Directory of Open Access Journals (Sweden)
Dana D. Marković
2014-01-01
Full Text Available Numerous regression approaches to isotherm parameters estimation appear in the literature. The real insight into the proper modeling pattern can be achieved only by testing methods on a very big number of cases. Experimentally, it cannot be done in a reasonable time, so the Monte Carlo simulation method was applied. The objective of this paper is to introduce and compare numerical approaches that involve different levels of knowledge about the noise structure of the analytical method used for initial and equilibrium concentration determination. Six levels of homoscedastic noise and five types of heteroscedastic noise precision models were considered. Performance of the methods was statistically evaluated based on median percentage error and mean absolute relative error in parameter estimates. The present study showed a clear distinction between two cases. When equilibrium experiments are performed only once, for the homoscedastic case, the winning error function is ordinary least squares, while for the case of heteroscedastic noise the use of orthogonal distance regression or Margart’s percent standard deviation is suggested. It was found that in case when experiments are repeated three times the simple method of weighted least squares performed as well as more complicated orthogonal distance regression method.
Model and Variable Selection Procedures for Semiparametric Time Series Regression
Directory of Open Access Journals (Sweden)
Risa Kato
2009-01-01
Full Text Available Semiparametric regression models are very useful for time series analysis. They facilitate the detection of features resulting from external interventions. The complexity of semiparametric models poses new challenges for issues of nonparametric and parametric inference and model selection that frequently arise from time series data analysis. In this paper, we propose penalized least squares estimators which can simultaneously select significant variables and estimate unknown parameters. An innovative class of variable selection procedure is proposed to select significant variables and basis functions in a semiparametric model. The asymptotic normality of the resulting estimators is established. Information criteria for model selection are also proposed. We illustrate the effectiveness of the proposed procedures with numerical simulations.
Regularized multivariate regression models with skew-t error distributions
Chen, Lianfu
2014-06-01
We consider regularization of the parameters in multivariate linear regression models with the errors having a multivariate skew-t distribution. An iterative penalized likelihood procedure is proposed for constructing sparse estimators of both the regression coefficient and inverse scale matrices simultaneously. The sparsity is introduced through penalizing the negative log-likelihood by adding L1-penalties on the entries of the two matrices. Taking advantage of the hierarchical representation of skew-t distributions, and using the expectation conditional maximization (ECM) algorithm, we reduce the problem to penalized normal likelihood and develop a procedure to minimize the ensuing objective function. Using a simulation study the performance of the method is assessed, and the methodology is illustrated using a real data set with a 24-dimensional response vector. © 2014 Elsevier B.V.
Modeling the number of car theft using Poisson regression
Zulkifli, Malina; Ling, Agnes Beh Yen; Kasim, Maznah Mat; Ismail, Noriszura
2016-10-01
Regression analysis is the most popular statistical methods used to express the relationship between the variables of response with the covariates. The aim of this paper is to evaluate the factors that influence the number of car theft using Poisson regression model. This paper will focus on the number of car thefts that occurred in districts in Peninsular Malaysia. There are two groups of factor that have been considered, namely district descriptive factors and socio and demographic factors. The result of the study showed that Bumiputera composition, Chinese composition, Other ethnic composition, foreign migration, number of residence with the age between 25 to 64, number of employed person and number of unemployed person are the most influence factors that affect the car theft cases. These information are very useful for the law enforcement department, insurance company and car owners in order to reduce and limiting the car theft cases in Peninsular Malaysia.
Interpreting parameters in the logistic regression model with random effects
DEFF Research Database (Denmark)
Larsen, Klaus; Petersen, Jørgen Holm; Budtz-Jørgensen, Esben
2000-01-01
interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects......interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects...
Estimation of Conditional Quantile using Neural Networks
DEFF Research Database (Denmark)
Kulczycki, P.; Schiøler, Henrik
1999-01-01
The problem of estimating conditional quantiles using neural networks is investigated here. A basic structure is developed using the methodology of kernel estimation, and a theory guaranteeing con-sistency on a mild set of assumptions is provided. The constructed structure constitutes a basis...... for the design of a variety of different neural networks, some of which are considered in detail. The task of estimating conditional quantiles is related to Bayes point estimation whereby a broad range of applications within engineering, economics and management can be suggested. Numerical results illustrating...... the capabilities of the elaborated neural network are also given....
Estimation of Conditional Quantile using Neural Networks
DEFF Research Database (Denmark)
Kulczycki, P.; Schiøler, Henrik
1999-01-01
The problem of estimating conditional quantiles using neural networks is investigated here. A basic structure is developed using the methodology of kernel estimation, and a theory guaranteeing con-sistency on a mild set of assumptions is provided. The constructed structure constitutes a basis...... for the design of a variety of different neural networks, some of which are considered in detail. The task of estimating conditional quantiles is related to Bayes point estimation whereby a broad range of applications within engineering, economics and management can be suggested. Numerical results illustrating...... the capabilities of the elaborated neural network are also given....
Dynamic Regression Intervention Modeling for the Malaysian Daily Load
Directory of Open Access Journals (Sweden)
Fadhilah Abdrazak
2014-05-01
Full Text Available Malaysia is a unique country due to having both fixed and moving holidays. These moving holidays may overlap with other fixed holidays and therefore, increase the complexity of the load forecasting activities. The errors due to holidays’ effects in the load forecasting are known to be higher than other factors. If these effects can be estimated and removed, the behavior of the series could be better viewed. Thus, the aim of this paper is to improve the forecasting errors by using a dynamic regression model with intervention analysis. Based on the linear transfer function method, a daily load model consists of either peak or average is developed. The developed model outperformed the seasonal ARIMA model in estimating the fixed and moving holidays’ effects and achieved a smaller Mean Absolute Percentage Error (MAPE in load forecast.
Bayesian Lasso Quantile Regression for Panel Data Models%面板数据的贝叶斯Lasso分位回归方法
Institute of Scientific and Technical Information of China (English)
李翰芳; 罗幼喜; 田茂再
2013-01-01
本文讨论了含有随机效应的面板数据模型,通过引入条件Laplace先验,构造了一种新的贝叶斯Lasso分位回归法.与一般贝叶斯分位回归法不同,该方法能够更大程度地将模型中非重要的解释变量系数压缩至0,从而在估计系数的同时也起到变量选择的作用.利用积分恒等式,本文构造了一种易于实施的参数估计切片Gibbs抽样算法.模拟结果显示,模型合有较多变量时,新方法排除“噪声”变量的能力明显高于现有文献中的其他方法.本文最后对我国各地区多个宏观经济指标的面板数据进行建模分析,演示了新方法估计参数与挑选变量的能力.
Two-step variable selection in quantile regression models%分位数回归模型中的两步变量选择
Institute of Scientific and Technical Information of China (English)
樊亚莉
2015-01-01
对于高维分位数回归模型提出了一种两步变量选择方法,这里协变量的维数pn远远大于样本量n.在第一步中,使用e1惩罚,并且证明第一步由LASSO惩罚所得到的惩罚估计量能够把模型从超高维降到同真实模型同阶的维数,并且所选模型能够覆盖真实模型.第二步对第一步所得模型使用自适应的LASSO惩罚来剔除冗余变量.在一些正则性条件下,证明了此方法具有变量选择的相合性.还进行了数值模拟和实际数据分析,用来表明此方法在有限样本下的表现.
Modeling of the Monthly Rainfall-Runoff Process Through Regressions
Directory of Open Access Journals (Sweden)
Campos-Aranda Daniel Francisco
2014-10-01
Full Text Available To solve the problems associated with the assessment of water resources of a river, the modeling of the rainfall-runoff process (RRP allows the deduction of runoff missing data and to extend its record, since generally the information available on precipitation is larger. It also enables the estimation of inputs to reservoirs, when their building led to the suppression of the gauging station. The simplest mathematical model that can be set for the RRP is the linear regression or curve on a monthly basis. Such a model is described in detail and is calibrated with the simultaneous record of monthly rainfall and runoff in Ballesmi hydrometric station, which covers 35 years. Since the runoff of this station has an important contribution from the spring discharge, the record is corrected first by removing that contribution. In order to do this a procedure was developed based either on the monthly average regional runoff coefficients or on nearby and similar watershed; in this case the Tancuilín gauging station was used. Both stations belong to the Partial Hydrologic Region No. 26 (Lower Rio Panuco and are located within the state of San Luis Potosi, México. The study performed indicates that the monthly regression model, due to its conceptual approach, faithfully reproduces monthly average runoff volumes and achieves an excellent approximation in relation to the dispersion, proved by calculation of the means and standard deviations.
Mixed-model Regression for Variable-star Photometry
Dose, Eric
2016-05-01
Mixed-model regression, a recent advance from social-science statistics, applies directly to reducing one night's photometric raw data, especially for variable stars in fields with multiple comparison stars. One regression model per filter/passband yields any or all of: transform values, extinction values, nightly zero-points, rapid zero-point fluctuations ("cirrus effect"), ensemble comparisons, vignette and gradient removal arising from incomplete flat-correction, check-star and target-star magnitudes, and specific indications of unusually large catalog magnitude errors. When images from several different fields of view are included, the models improve without complicating the calculations. The mixed-model approach is generally robust to outliers and missing data points, and it directly yields 14 diagnostic plots, used to monitor data set quality and/or residual systematic errors - these diagnostic plots may in fact turn out to be the prime advantage of this approach. Also presented is initial work on a split-annulus approach to sky background estimation, intended to address the sensitivity of photometric observations to noise within the sky-background annulus.
Genetic evaluation of European quails by random regression models
Directory of Open Access Journals (Sweden)
Flaviana Miranda Gonçalves
2012-09-01
Full Text Available The objective of this study was to compare different random regression models, defined from different classes of heterogeneity of variance combined with different Legendre polynomial orders for the estimate of (covariance of quails. The data came from 28,076 observations of 4,507 female meat quails of the LF1 lineage. Quail body weights were determined at birth and 1, 14, 21, 28, 35 and 42 days of age. Six different classes of residual variance were fitted to Legendre polynomial functions (orders ranging from 2 to 6 to determine which model had the best fit to describe the (covariance structures as a function of time. According to the evaluated criteria (AIC, BIC and LRT, the model with six classes of residual variances and of sixth-order Legendre polynomial was the best fit. The estimated additive genetic variance increased from birth to 28 days of age, and dropped slightly from 35 to 42 days. The heritability estimates decreased along the growth curve and changed from 0.51 (1 day to 0.16 (42 days. Animal genetic and permanent environmental correlation estimates between weights and age classes were always high and positive, except for birth weight. The sixth order Legendre polynomial, along with the residual variance divided into six classes was the best fit for the growth rate curve of meat quails; therefore, they should be considered for breeding evaluation processes by random regression models.
Fuzzy regression modeling for tool performance prediction and degradation detection.
Li, X; Er, M J; Lim, B S; Zhou, J H; Gan, O P; Rutkowski, L
2010-10-01
In this paper, the viability of using Fuzzy-Rule-Based Regression Modeling (FRM) algorithm for tool performance and degradation detection is investigated. The FRM is developed based on a multi-layered fuzzy-rule-based hybrid system with Multiple Regression Models (MRM) embedded into a fuzzy logic inference engine that employs Self Organizing Maps (SOM) for clustering. The FRM converts a complex nonlinear problem to a simplified linear format in order to further increase the accuracy in prediction and rate of convergence. The efficacy of the proposed FRM is tested through a case study - namely to predict the remaining useful life of a ball nose milling cutter during a dry machining process of hardened tool steel with a hardness of 52-54 HRc. A comparative study is further made between four predictive models using the same set of experimental data. It is shown that the FRM is superior as compared with conventional MRM, Back Propagation Neural Networks (BPNN) and Radial Basis Function Networks (RBFN) in terms of prediction accuracy and learning speed.
A hybrid neural network model for noisy data regression.
Lee, Eric W M; Lim, Chee Peng; Yuen, Richard K K; Lo, S M
2004-04-01
A hybrid neural network model, based on the fusion of fuzzy adaptive resonance theory (FA ART) and the general regression neural network (GRNN), is proposed in this paper. Both FA and the GRNN are incremental learning systems and are very fast in network training. The proposed hybrid model, denoted as GRNNFA, is able to retain these advantages and, at the same time, to reduce the computational requirements in calculating and storing information of the kernels. A clustering version of the GRNN is designed with data compression by FA for noise removal. An adaptive gradient-based kernel width optimization algorithm has also been devised. Convergence of the gradient descent algorithm can be accelerated by the geometric incremental growth of the updating factor. A series of experiments with four benchmark datasets have been conducted to assess and compare effectiveness of GRNNFA with other approaches. The GRNNFA model is also employed in a novel application task for predicting the evacuation time of patrons at typical karaoke centers in Hong Kong in the event of fire. The results positively demonstrate the applicability of GRNNFA in noisy data regression problems.
Gupta, Cherry; Cobre, Juliana; Polpo, Adriano; Sinha, Debjayoti
2016-09-01
Existing cure-rate survival models are generally not convenient for modeling and estimating the survival quantiles of a patient with specified covariate values. This paper proposes a novel class of cure-rate model, the transform-both-sides cure-rate model (TBSCRM), that can be used to make inferences about both the cure-rate and the survival quantiles. We develop the Bayesian inference about the covariate effects on the cure-rate as well as on the survival quantiles via Markov Chain Monte Carlo (MCMC) tools. We also show that the TBSCRM-based Bayesian method outperforms existing cure-rate models based methods in our simulation studies and in application to the breast cancer survival data from the National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) database.
Multivariate parametric random effect regression models for fecundability studies.
Ecochard, R; Clayton, D G
2000-12-01
Delay until conception is generally described by a mixture of geometric distributions. Weinberg and Gladen (1986, Biometrics 42, 547-560) proposed a regression generalization of the beta-geometric mixture model where covariates effects were expressed in terms of contrasts of marginal hazards. Scheike and Jensen (1997, Biometrics 53, 318-329) developed a frailty model for discrete event times data based on discrete-time analogues of Hougaard's results (1984, Biometrika 71, 75-83). This paper is on a generalization to a three-parameter family distribution and an extension to multivariate cases. The model allows the introduction of explanatory variables, including time-dependent variables at the subject-specific level, together with a choice from a flexible family of random effect distributions. This makes it possible, in the context of medically assisted conception, to include data sources with multiple pregnancies (or attempts at pregnancy) per couple.
Institute of Scientific and Technical Information of China (English)
徐盈之; 郭进; 周秀丽
2016-01-01
以碳税的征收如何最大限度地兼顾区域经济协调发展为出发点，通过分位数回归方法，考察了开征碳税对全国及东中西和东北四大区域经济协调发展的影响效应。结果表明：就全国而言，征收碳税将显著地降低区域经济协调发展水平，且其影响效应在条件分布的不同位置存在明显的差异性。扩大人力资本规模、促进区域贸易往来能够削弱开征碳税对区域经济协调发展的负面影响；开征碳税对东部和东北地区的经济协调发展存在负面影响，且在东北地区的负面影响最为明显，而对中部和西部地区的作用效果则恰好相反。据此，从实施阶段性、差异化碳税税率等角度提出对策建议。%The effect of introducing carbon tax on the coordinated development of national and regional economy was explored by applying the quantile regression approach for the purpose of maximizing its positive impact.The results indicated that:as to China in the whole,introducing carbon tax has a significantly negative effect on promoting the level of coordinated development of regional economy,but enhancing human capital scale and reinforcing regional trading could alleviate the negative effect;as to the four regions,introducing carbon tax has a significantly negative effect on promoting the level of coordinated development of regional economy in the eastern and northeastern regions,and the negative effect in the northeastern region is much more serious,whereas introducing carbon tax shows a positive effect in the central and western regions.Finally,some relevant policies and suggestions like carrying out periodical and differentiated carbon tax were proposed based on the results.
Ciupak, Maurycy; Ozga-Zielinski, Bogdan; Adamowski, Jan; Quilty, John; Khalil, Bahaa
2015-11-01
A novel implementation of Dynamic Linear Bayesian Models (DLBM), using either a Varying Coefficient Regression (VCR) or a Discount Weighted Regression (DWR) algorithm was used in the hydrological modeling of annual hydrographs as well as 1-, 2-, and 3-day lead time stream flow forecasting. Using hydrological data (daily discharge, rainfall, and mean, maximum and minimum air temperatures) from the Upper Narew River watershed in Poland, the forecasting performance of DLBM was compared to that of traditional multiple linear regression (MLR) and more recent artificial neural network (ANN) based models. Model performance was ranked DLBM-DWR > DLBM-VCR > MLR > ANN for both annual hydrograph modeling and 1-, 2-, and 3-day lead forecasting, indicating that the DWR and VCR algorithms, operating in a DLBM framework, represent promising new methods for both annual hydrograph modeling and short-term stream flow forecasting.
Khoshravesh, Mojtaba; Sefidkouhi, Mohammad Ali Gholami; Valipour, Mohammad
2017-07-01
The proper evaluation of evapotranspiration is essential in food security investigation, farm management, pollution detection, irrigation scheduling, nutrient flows, carbon balance as well as hydrologic modeling, especially in arid environments. To achieve sustainable development and to ensure water supply, especially in arid environments, irrigation experts need tools to estimate reference evapotranspiration on a large scale. In this study, the monthly reference evapotranspiration was estimated by three different regression models including the multivariate fractional polynomial (MFP), robust regression, and Bayesian regression in Ardestan, Esfahan, and Kashan. The results were compared with Food and Agriculture Organization (FAO)-Penman-Monteith (FAO-PM) to select the best model. The results show that at a monthly scale, all models provided a closer agreement with the calculated values for FAO-PM ( R 2 > 0.95 and RMSE < 12.07 mm month-1). However, the MFP model gives better estimates than the other two models for estimating reference evapotranspiration at all stations.
Regression Models for Predicting Force Coefficients of Aerofoils
Directory of Open Access Journals (Sweden)
Mohammed ABDUL AKBAR
2015-09-01
Full Text Available Renewable sources of energy are attractive and advantageous in a lot of different ways. Among the renewable energy sources, wind energy is the fastest growing type. Among wind energy converters, Vertical axis wind turbines (VAWTs have received renewed interest in the past decade due to some of the advantages they possess over their horizontal axis counterparts. VAWTs have evolved into complex 3-D shapes. A key component in predicting the output of VAWTs through analytical studies is obtaining the values of lift and drag coefficients which is a function of shape of the aerofoil, ‘angle of attack’ of wind and Reynolds’s number of flow. Sandia National Laboratories have carried out extensive experiments on aerofoils for the Reynolds number in the range of those experienced by VAWTs. The volume of experimental data thus obtained is huge. The current paper discusses three Regression analysis models developed wherein lift and drag coefficients can be found out using simple formula without having to deal with the bulk of the data. Drag coefficients and Lift coefficients were being successfully estimated by regression models with R2 values as high as 0.98.
Empirical likelihood ratio tests for multivariate regression models
Institute of Scientific and Technical Information of China (English)
WU Jianhong; ZHU Lixing
2007-01-01
This paper proposes some diagnostic tools for checking the adequacy of multivariate regression models including classical regression and time series autoregression. In statistical inference, the empirical likelihood ratio method has been well known to be a powerful tool for constructing test and confidence region. For model checking, however, the naive empirical likelihood (EL) based tests are not of Wilks' phenomenon. Hence, we make use of bias correction to construct the EL-based score tests and derive a nonparametric version of Wilks' theorem. Moreover, by the advantages of both the EL and score test method, the EL-based score tests share many desirable features as follows: They are self-scale invariant and can detect the alternatives that converge to the null at rate n-1/2, the possibly fastest rate for lack-of-fit testing; they involve weight functions, which provides us with the flexibility to choose scores for improving power performance, especially under directional alternatives. Furthermore, when the alternatives are not directional, we construct asymptotically distribution-free maximin tests for a large class of possible alternatives. A simulation study is carried out and an application for a real dataset is analyzed.
Approximation by randomly weighting method in censored regression model
Institute of Scientific and Technical Information of China (English)
无
2009-01-01
Censored regression ("Tobit") models have been in common use, and their linear hypothesis testings have been widely studied. However, the critical values of these tests are usually related to quantities of an unknown error distribution and estimators of nuisance parameters. In this paper, we propose a randomly weighting test statistic and take its conditional distribution as an approximation to null distribution of the test statistic. It is shown that, under both the null and local alternative hypotheses, conditionally asymptotic distribution of the randomly weighting test statistic is the same as the null distribution of the test statistic. Therefore, the critical values of the test statistic can be obtained by randomly weighting method without estimating the nuisance parameters. At the same time, we also achieve the weak consistency and asymptotic normality of the randomly weighting least absolute deviation estimate in censored regression model. Simulation studies illustrate that the per-formance of our proposed resampling test method is better than that of central chi-square distribution under the null hypothesis.
Approximation by randomly weighting method in censored regression model
Institute of Scientific and Technical Information of China (English)
WANG ZhanFeng; WU YaoHua; ZHAO LinCheng
2009-01-01
Censored regression ("Tobit") models have been in common use,and their linear hypothesis testings have been widely studied.However,the critical values of these tests are usually related to quantities of an unknown error distribution and estimators of nuisance parameters.In this paper,we propose a randomly weighting test statistic and take its conditional distribution as an approximation to null distribution of the test statistic.It is shown that,under both the null and local alternative hypotheses,conditionally asymptotic distribution of the randomly weighting test statistic is the same as the null distribution of the test statistic.Therefore,the critical values of the test statistic can be obtained by randomly weighting method without estimating the nuisance parameters.At the same time,we also achieve the weak consistency and asymptotic normality of the randomly weighting least absolute deviation estimate in censored regression model.Simulation studies illustrate that the performance of our proposed resampling test method is better than that of central chi-square distribution under the null hypothesis.
Remodeling and Estimation for Sparse Partially Linear Regression Models
Directory of Open Access Journals (Sweden)
Yunhui Zeng
2013-01-01
Full Text Available When the dimension of covariates in the regression model is high, one usually uses a submodel as a working model that contains significant variables. But it may be highly biased and the resulting estimator of the parameter of interest may be very poor when the coefficients of removed variables are not exactly zero. In this paper, based on the selected submodel, we introduce a two-stage remodeling method to get the consistent estimator for the parameter of interest. More precisely, in the first stage, by a multistep adjustment, we reconstruct an unbiased model based on the correlation information between the covariates; in the second stage, we further reduce the adjusted model by a semiparametric variable selection method and get a new estimator of the parameter of interest simultaneously. Its convergence rate and asymptotic normality are also obtained. The simulation results further illustrate that the new estimator outperforms those obtained by the submodel and the full model in the sense of mean square errors of point estimation and mean square prediction errors of model prediction.
Information Criteria for Deciding between Normal Regression Models
Maier, Robert S
2013-01-01
Regression models fitted to data can be assessed on their goodness of fit, though models with many parameters should be disfavored to prevent over-fitting. Statisticians' tools for this are little known to physical scientists. These include the Akaike Information Criterion (AIC), a penalized goodness-of-fit statistic, and the AICc, a variant including a small-sample correction. They entered the physical sciences through being used by astrophysicists to compare cosmological models; e.g., predictions of the distance-redshift relation. The AICc is shown to have been misapplied, being applicable only if error variances are unknown. If error bars accompany the data, the AIC should be used instead. Erroneous applications of the AICc are listed in an appendix. It is also shown how the variability of the AIC difference between models with a known error variance can be estimated. This yields a significance test that can potentially replace the use of `Akaike weights' for deciding between such models. Additionally, the...
Genomic breeding value estimation using nonparametric additive regression models
Directory of Open Access Journals (Sweden)
Solberg Trygve
2009-01-01
Full Text Available Abstract Genomic selection refers to the use of genomewide dense markers for breeding value estimation and subsequently for selection. The main challenge of genomic breeding value estimation is the estimation of many effects from a limited number of observations. Bayesian methods have been proposed to successfully cope with these challenges. As an alternative class of models, non- and semiparametric models were recently introduced. The present study investigated the ability of nonparametric additive regression models to predict genomic breeding values. The genotypes were modelled for each marker or pair of flanking markers (i.e. the predictors separately. The nonparametric functions for the predictors were estimated simultaneously using additive model theory, applying a binomial kernel. The optimal degree of smoothing was determined by bootstrapping. A mutation-drift-balance simulation was carried out. The breeding values of the last generation (genotyped was predicted using data from the next last generation (genotyped and phenotyped. The results show moderate to high accuracies of the predicted breeding values. A determination of predictor specific degree of smoothing increased the accuracy.
THE REGRESSION MODEL OF IRAN LIBRARIES ORGANIZATIONAL CLIMATE.
Jahani, Mohammad Ali; Yaminfirooz, Mousa; Siamian, Hasan
2015-10-01
The purpose of this study was to drawing a regression model of organizational climate of central libraries of Iran's universities. This study is an applied research. The statistical population of this study consisted of 96 employees of the central libraries of Iran's public universities selected among the 117 universities affiliated to the Ministry of Health by Stratified Sampling method (510 people). Climate Qual localized questionnaire was used as research tools. For predicting the organizational climate pattern of the libraries is used from the multivariate linear regression and track diagram. of the 9 variables affecting organizational climate, 5 variables of innovation, teamwork, customer service, psychological safety and deep diversity play a major role in prediction of the organizational climate of Iran's libraries. The results also indicate that each of these variables with different coefficient have the power to predict organizational climate but the climate score of psychological safety (0.94) plays a very crucial role in predicting the organizational climate. Track diagram showed that five variables of teamwork, customer service, psychological safety, deep diversity and innovation directly effects on the organizational climate variable that contribution of the team work from this influence is more than any other variables. Of the indicator of the organizational climate of climateQual, the contribution of the team work from this influence is more than any other variables that reinforcement of teamwork in academic libraries can be more effective in improving the organizational climate of this type libraries.
THE REGRESSION MODEL OF IRAN LIBRARIES ORGANIZATIONAL CLIMATE
Jahani, Mohammad Ali; Yaminfirooz, Mousa; Siamian, Hasan
2015-01-01
Background: The purpose of this study was to drawing a regression model of organizational climate of central libraries of Iran’s universities. Methods: This study is an applied research. The statistical population of this study consisted of 96 employees of the central libraries of Iran’s public universities selected among the 117 universities affiliated to the Ministry of Health by Stratified Sampling method (510 people). Climate Qual localized questionnaire was used as research tools. For predicting the organizational climate pattern of the libraries is used from the multivariate linear regression and track diagram. Results: of the 9 variables affecting organizational climate, 5 variables of innovation, teamwork, customer service, psychological safety and deep diversity play a major role in prediction of the organizational climate of Iran’s libraries. The results also indicate that each of these variables with different coefficient have the power to predict organizational climate but the climate score of psychological safety (0.94) plays a very crucial role in predicting the organizational climate. Track diagram showed that five variables of teamwork, customer service, psychological safety, deep diversity and innovation directly effects on the organizational climate variable that contribution of the team work from this influence is more than any other variables. Conclusions: Of the indicator of the organizational climate of climateQual, the contribution of the team work from this influence is more than any other variables that reinforcement of teamwork in academic libraries can be more effective in improving the organizational climate of this type libraries. PMID:26622203
A Gompertz regression model for fern spores germination
Directory of Open Access Journals (Sweden)
Gabriel y Galán, Jose María
2015-06-01
Full Text Available Germination is one of the most important biological processes for both seed and spore plants, also for fungi. At present, mathematical models of germination have been developed in fungi, bryophytes and several plant species. However, ferns are the only group whose germination has never been modelled. In this work we develop a regression model of the germination of fern spores. We have found that for Blechnum serrulatum, Blechnum yungense, Cheilanthes pilosa, Niphidium macbridei and Polypodium feuillei species the Gompertz growth model describe satisfactorily cumulative germination. An important result is that regression parameters are independent of fern species and the model is not affected by intraspecific variation. Our results show that the Gompertz curve represents a general germination model for all the non-green spore leptosporangiate ferns, including in the paper a discussion about the physiological and ecological meaning of the model.La germinación es uno de los procesos biológicos más relevantes tanto para las plantas con esporas, como para las plantas con semillas y los hongos. Hasta el momento, se han desarrollado modelos de germinación para hongos, briofitos y diversas especies de espermatófitos. Los helechos son el único grupo de plantas cuya germinación nunca ha sido modelizada. En este trabajo se desarrolla un modelo de regresión para explicar la germinación de las esporas de helechos. Observamos que para las especies Blechnum serrulatum, Blechnum yungense, Cheilanthes pilosa, Niphidium macbridei y Polypodium feuillei el modelo de crecimiento de Gompertz describe satisfactoriamente la germinación acumulativa. Un importante resultado es que los parámetros de la regresión son independientes de la especie y que el modelo no está afectado por variación intraespecífica. Por lo tanto, los resultados del trabajo muestran que la curva de Gompertz puede representar un modelo general para todos los helechos leptosporangiados
Meta-Modeling by Symbolic Regression and Pareto Simulated Annealing
Stinstra, E.; Rennen, G.; Teeuwen, G.J.A.
2006-01-01
The subject of this paper is a new approach to Symbolic Regression.Other publications on Symbolic Regression use Genetic Programming.This paper describes an alternative method based on Pareto Simulated Annealing.Our method is based on linear regression for the estimation of constants.Interval arithm
Modeling Information Content Via Dirichlet-Multinomial Regression Analysis.
Ferrari, Alberto
2017-02-16
Shannon entropy is being increasingly used in biomedical research as an index of complexity and information content in sequences of symbols, e.g. languages, amino acid sequences, DNA methylation patterns and animal vocalizations. Yet, distributional properties of information entropy as a random variable have seldom been the object of study, leading to researchers mainly using linear models or simulation-based analytical approach to assess differences in information content, when entropy is measured repeatedly in different experimental conditions. Here a method to perform inference on entropy in such conditions is proposed. Building on results coming from studies in the field of Bayesian entropy estimation, a symmetric Dirichlet-multinomial regression model, able to deal efficiently with the issue of mean entropy estimation, is formulated. Through a simulation study the model is shown to outperform linear modeling in a vast range of scenarios and to have promising statistical properties. As a practical example, the method is applied to a data set coming from a real experiment on animal communication.
A nonlinear regression model-based predictive control algorithm.
Dubay, R; Abu-Ayyad, M; Hernandez, J M
2009-04-01
This paper presents a unique approach for designing a nonlinear regression model-based predictive controller (NRPC) for single-input-single-output (SISO) and multi-input-multi-output (MIMO) processes that are common in industrial applications. The innovation of this strategy is that the controller structure allows nonlinear open-loop modeling to be conducted while closed-loop control is executed every sampling instant. Consequently, the system matrix is regenerated every sampling instant using a continuous function providing a more accurate prediction of the plant. Computer simulations are carried out on nonlinear plants, demonstrating that the new approach is easily implemented and provides tight control. Also, the proposed algorithm is implemented on two real time SISO applications; a DC motor, a plastic injection molding machine and a nonlinear MIMO thermal system comprising three temperature zones to be controlled with interacting effects. The experimental closed-loop responses of the proposed algorithm were compared to a multi-model dynamic matrix controller (MPC) with improved results for various set point trajectories. Good disturbance rejection was attained, resulting in improved tracking of multi-set point profiles in comparison to multi-model MPC.
Statistical Inference for Partially Linear Regression Models with Measurement Errors
Institute of Scientific and Technical Information of China (English)
Jinhong YOU; Qinfeng XU; Bin ZHOU
2008-01-01
In this paper, the authors investigate three aspects of statistical inference for the partially linear regression models where some covariates are measured with errors. Firstly,a bandwidth selection procedure is proposed, which is a combination of the difference-based technique and GCV method. Secondly, a goodness-of-fit test procedure is proposed,which is an extension of the generalized likelihood technique. Thirdly, a variable selection procedure for the parametric part is provided based on the nonconcave penalization and corrected profile least squares. Same as "Variable selection via nonconcave penalized like-lihood and its oracle properties" (J. Amer. Statist. Assoc., 96, 2001, 1348-1360), it is shown that the resulting estimator has an oracle property with a proper choice of regu-larization parameters and penalty function. Simulation studies are conducted to illustrate the finite sample performances of the proposed procedures.
Projection-type estimation for varying coefficient regression models
Lee, Young K; Park, Byeong U; 10.3150/10-BEJ331
2012-01-01
In this paper we introduce new estimators of the coefficient functions in the varying coefficient regression model. The proposed estimators are obtained by projecting the vector of the full-dimensional kernel-weighted local polynomial estimators of the coefficient functions onto a Hilbert space with a suitable norm. We provide a backfitting algorithm to compute the estimators. We show that the algorithm converges at a geometric rate under weak conditions. We derive the asymptotic distributions of the estimators and show that the estimators have the oracle properties. This is done for the general order of local polynomial fitting and for the estimation of the derivatives of the coefficient functions, as well as the coefficient functions themselves. The estimators turn out to have several theoretical and numerical advantages over the marginal integration estimators studied by Yang, Park, Xue and H\\"{a}rdle [J. Amer. Statist. Assoc. 101 (2006) 1212--1227].
The R Package threg to Implement Threshold Regression Models
Directory of Open Access Journals (Sweden)
Tao Xiao
2015-08-01
This new package includes four functions: threg, and the methods hr, predict and plot for threg objects returned by threg. The threg function is the model-fitting function which is used to calculate regression coefficient estimates, asymptotic standard errors and p values. The hr method for threg objects is the hazard-ratio calculation function which provides the estimates of hazard ratios at selected time points for specified scenarios (based on given categories or value settings of covariates. The predict method for threg objects is used for prediction. And the plot method for threg objects provides plots for curves of estimated hazard functions, survival functions and probability density functions of the first-hitting-time; function curves corresponding to different scenarios can be overlaid in the same plot for comparison to give additional research insights.
Epistasis analysis for quantitative traits by functional regression model.
Zhang, Futao; Boerwinkle, Eric; Xiong, Momiao
2014-06-01
The critical barrier in interaction analysis for rare variants is that most traditional statistical methods for testing interactions were originally designed for testing the interaction between common variants and are difficult to apply to rare variants because of their prohibitive computational time and poor ability. The great challenges for successful detection of interactions with next-generation sequencing (NGS) data are (1) lack of methods for interaction analysis with rare variants, (2) severe multiple testing, and (3) time-consuming computations. To meet these challenges, we shift the paradigm of interaction analysis between two loci to interaction analysis between two sets of loci or genomic regions and collectively test interactions between all possible pairs of SNPs within two genomic regions. In other words, we take a genome region as a basic unit of interaction analysis and use high-dimensional data reduction and functional data analysis techniques to develop a novel functional regression model to collectively test interactions between all possible pairs of single nucleotide polymorphisms (SNPs) within two genome regions. By intensive simulations, we demonstrate that the functional regression models for interaction analysis of the quantitative trait have the correct type 1 error rates and a much better ability to detect interactions than the current pairwise interaction analysis. The proposed method was applied to exome sequence data from the NHLBI's Exome Sequencing Project (ESP) and CHARGE-S study. We discovered 27 pairs of genes showing significant interactions after applying the Bonferroni correction (P-values < 4.58 × 10(-10)) in the ESP, and 11 were replicated in the CHARGE-S study.
Robust Medical Test Evaluation Using Flexible Bayesian Semiparametric Regression Models
Directory of Open Access Journals (Sweden)
Adam J. Branscum
2013-01-01
Full Text Available The application of Bayesian methods is increasing in modern epidemiology. Although parametric Bayesian analysis has penetrated the population health sciences, flexible nonparametric Bayesian methods have received less attention. A goal in nonparametric Bayesian analysis is to estimate unknown functions (e.g., density or distribution functions rather than scalar parameters (e.g., means or proportions. For instance, ROC curves are obtained from the distribution functions corresponding to continuous biomarker data taken from healthy and diseased populations. Standard parametric approaches to Bayesian analysis involve distributions with a small number of parameters, where the prior specification is relatively straight forward. In the nonparametric Bayesian case, the prior is placed on an infinite dimensional space of all distributions, which requires special methods. A popular approach to nonparametric Bayesian analysis that involves Polya tree prior distributions is described. We provide example code to illustrate how models that contain Polya tree priors can be fit using SAS software. The methods are used to evaluate the covariate-specific accuracy of the biomarker, soluble epidermal growth factor receptor, for discerning lung cancer cases from controls using a flexible ROC regression modeling framework. The application highlights the usefulness of flexible models over a standard parametric method for estimating ROC curves.
Modeling Pan Evaporation for Kuwait by Multiple Linear Regression
Directory of Open Access Journals (Sweden)
Jaber Almedeij
2012-01-01
Full Text Available Evaporation is an important parameter for many projects related to hydrology and water resources systems. This paper constitutes the first study conducted in Kuwait to obtain empirical relations for the estimation of daily and monthly pan evaporation as functions of available meteorological data of temperature, relative humidity, and wind speed. The data used here for the modeling are daily measurements of substantial continuity coverage, within a period of 17 years between January 1993 and December 2009, which can be considered representative of the desert climate of the urban zone of the country. Multiple linear regression technique is used with a procedure of variable selection for fitting the best model forms. The correlations of evaporation with temperature and relative humidity are also transformed in order to linearize the existing curvilinear patterns of the data by using power and exponential functions, respectively. The evaporation models suggested with the best variable combinations were shown to produce results that are in a reasonable agreement with observation values.
The microcomputer scientific software series 2: general linear model--regression.
Harold M. Rauscher
1983-01-01
The general linear model regression (GLMR) program provides the microcomputer user with a sophisticated regression analysis capability. The output provides a regression ANOVA table, estimators of the regression model coefficients, their confidence intervals, confidence intervals around the predicted Y-values, residuals for plotting, a check for multicollinearity, a...
Air Pollution Analysis using Ontologies and Regression Models
Directory of Open Access Journals (Sweden)
Parul Choudhary
2016-07-01
Full Text Available Rapidly throughout the world economy, "the expansive Web" in the "world" explosive growth, rapidly growing market characterized by short product cycles exists and the demand for increased flexibility as well as the extensive use of a new data vision managed data society. A new socio-economic system that relies more and more on movement and allocation results in data whose daily existence, refinement, economy and adjust the exchange industry. Cooperative Engineering Co -operation and multi -disciplinary installed on people's cooperation is a good example. Semantic Web is a new form of Web content that is meaningful to computers and additional approved another example. Communication, vision sharing and exchanging data Society's are new commercial bet. Urban air pollution modeling and data processing techniques need elevated Association. Artificial intelligence in countless ways and breakthrough technologies can solve environmental problems from uneven offers. A method for data to formal ontology means a true meaning and lack of ambiguity to allow us to portray memo. In this work we survey regression model for ontologies and air pollution.
Empirical Quantile CLTs for Time Dependent Data
Kuelbs, James
2011-01-01
We establish empirical quantile process CLTs based on $n$ independent copies of a stochastic process $\\{X_t: t \\in E\\}$ that are uniform in $t \\in E$ and quantile levels $\\alpha \\in I$, where $I$ is a closed sub-interval of $(0,1)$. Typically $E=[0,T]$, or a finite product of such intervals. Also included are CLT's for the empirical process based on $\\{I_{X_t \\le y} - \\rm {Pr}(X_t \\le y): t \\in E, y \\in R \\}$ that are uniform in $t \\in E, y \\in R$. The process $\\{X_t: t \\in E\\}$ may be chosen from a broad collection of Gaussian processes, compound Poisson processes, stationary independent increment stable processes, and martingales.
Limit theorems for functions of marginal quantiles
Babu, G Jogesh; Choi, Kwok Pui; Mangalam, Vasudevan; 10.3150/10-BEJ287
2011-01-01
Multivariate distributions are explored using the joint distributions of marginal sample quantiles. Limit theory for the mean of a function of order statistics is presented. The results include a multivariate central limit theorem and a strong law of large numbers. A result similar to Bahadur's representation of quantiles is established for the mean of a function of the marginal quantiles. In particular, it is shown that \\[\\sqrt{n}\\Biggl(\\frac{1}{n}\\sum_{i=1}^n\\phi\\bigl(X_{n:i}^{(1)},...,X_{n:i}^{(d)}\\bigr)-\\bar{\\gamma}\\Biggr)=\\frac{1}{\\sqrt{n}}\\sum_{i=1}^nZ_{n,i}+\\mathrm{o}_P(1)\\] as $n\\rightarrow\\infty$, where $\\bar{\\gamma}$ is a constant and $Z_{n,i}$ are i.i.d. random variables for each $n$. This leads to the central limit theorem. Weak convergence to a Gaussian process using equicontinuity of functions is indicated. The results are established under very general conditions. These conditions are shown to be satisfied in many commonly occurring situations.
Song, Chao; Kwan, Mei-Po; Zhu, Jiping
2017-04-08
An increasing number of fires are occurring with the rapid development of cities, resulting in increased risk for human beings and the environment. This study compares geographically weighted regression-based models, including geographically weighted regression (GWR) and geographically and temporally weighted regression (GTWR), which integrates spatial and temporal effects and global linear regression models (LM) for modeling fire risk at the city scale. The results show that the road density and the spatial distribution of enterprises have the strongest influences on fire risk, which implies that we should focus on areas where roads and enterprises are densely clustered. In addition, locations with a large number of enterprises have fewer fire ignition records, probably because of strict management and prevention measures. A changing number of significant variables across space indicate that heterogeneity mainly exists in the northern and eastern rural and suburban areas of Hefei city, where human-related facilities or road construction are only clustered in the city sub-centers. GTWR can capture small changes in the spatiotemporal heterogeneity of the variables while GWR and LM cannot. An approach that integrates space and time enables us to better understand the dynamic changes in fire risk. Thus governments can use the results to manage fire safety at the city scale.
Calibrating regionally downscaled precipitation over Norway through quantile-based approaches
Bolin, David; Frigessi, Arnoldo; Guttorp, Peter; Haug, Ola; Orskaug, Elisabeth; Scheel, Ida; Wallin, Jonas
2016-06-01
Dynamical downscaling of earth system models is intended to produce high-resolution climate information at regional to local scales. Current models, while adequate for describing temperature distributions at relatively small scales, struggle when it comes to describing precipitation distributions. In order to better match the distribution of observed precipitation over Norway, we consider approaches to statistical adjustment of the output from a regional climate model when forced with ERA-40 reanalysis boundary conditions. As a second step, we try to correct downscalings of historical climate model runs using these transformations built from downscaled ERA-40 data. Unless such calibrations are successful, it is difficult to argue that scenario-based downscaled climate projections are realistic and useful for decision makers. We study both full quantile calibrations and several different methods that correct individual quantiles separately using random field models. Results based on cross-validation show that while a full quantile calibration is not very effective in this case, one can correct individual quantiles satisfactorily if the spatial structure in the data are accounted for. Interestingly, different methods are favoured depending on whether ERA-40 data or historical climate model runs are adjusted.
MODELING SNAKE MICROHABITAT FROM RADIOTELEMETRY STUDIES USING POLYTOMOUS LOGISTIC REGRESSION
Multivariate analysis of snake microhabitat has historically used techniques that were derived under assumptions of normality and common covariance structure (e.g., discriminant function analysis, MANOVA). In this study, polytomous logistic regression (PLR which does not require ...
Correlation-regression model for physico-chemical quality of ...
African Journals Online (AJOL)
abusaad
Key words: Groundwater, water quality, bore well, water supply, correlation, regression. INTRODUCTION ..... interpreting groundwater quality data and relating them to specific hydro ..... Regional trends in nitrate content of Texas groundwater.
Faraway, Julian J
2005-01-01
Linear models are central to the practice of statistics and form the foundation of a vast range of statistical methodologies. Julian J. Faraway''s critically acclaimed Linear Models with R examined regression and analysis of variance, demonstrated the different methods available, and showed in which situations each one applies. Following in those footsteps, Extending the Linear Model with R surveys the techniques that grow from the regression model, presenting three extensions to that framework: generalized linear models (GLMs), mixed effect models, and nonparametric regression models. The author''s treatment is thoroughly modern and covers topics that include GLM diagnostics, generalized linear mixed models, trees, and even the use of neural networks in statistics. To demonstrate the interplay of theory and practice, throughout the book the author weaves the use of the R software environment to analyze the data of real examples, providing all of the R commands necessary to reproduce the analyses. All of the ...
Regression of retinopathy by squalamine in a mouse model.
Higgins, Rosemary D; Yan, Yun; Geng, Yixun; Zasloff, Michael; Williams, Jon I
2004-07-01
The goal of this study was to determine whether an antiangiogenic agent, squalamine, given late during the evolution of oxygen-induced retinopathy (OIR) in the mouse, could improve retinal neovascularization. OIR was induced in neonatal C57BL6 mice and the neonates were treated s.c. with squalamine doses begun at various times after OIR induction. A system of retinal whole mounts and assessment of neovascular nuclei extending beyond the inner limiting membrane from animals reared under room air or OIR conditions and killed periodically from d 12 to 21 were used to assess retinopathy in squalamine-treated and untreated animals. OIR evolved after 75% oxygen exposure in neonatal mice with florid retinal neovascularization developing by d 14. Squalamine (single dose, 25 mg/kg s.c.) given on d 15 or 16, but not d 17, substantially improved retinal neovascularization in the mouse model of OIR. There was improvement seen in the degree of blood vessel tuft formation, blood vessel tortuosity, and central vasoconstriction with squalamine treatment at d 15 or 16. Single-dose squalamine at d 12 was effective at reducing subsequent development of retinal neovascularization at doses as low as 1 mg/kg. Squalamine is a very active inhibitor of OIR in mouse neonates at doses as low as 1 mg/kg given once. Further, squalamine given late in the course of OIR improves retinopathy by inducing regression of retinal neovessels and abrogating invasion of new vessels beyond the inner-limiting membrane of the retina.
Linking Simple Economic Theory Models and the Cointegrated Vector AutoRegressive Model
DEFF Research Database (Denmark)
Møller, Niels Framroze
This paper attempts to clarify the connection between simple economic theory models and the approach of the Cointegrated Vector-Auto-Regressive model (CVAR). By considering (stylized) examples of simple static equilibrium models, it is illustrated in detail, how the theoretical model and its...
Defining Sample Quantiles by the True Rank Probability
Directory of Open Access Journals (Sweden)
Lasse Makkonen
2014-01-01
Full Text Available Many definitions exist for sample quantiles and are included in statistical software. The need to adopt a standard definition of sample quantiles has been recognized and different definitions have been compared in terms of satisfying some desirable properties, but no consensus has been found. We outline here that comparisons of the sample quantile definitions are irrelevant because the probabilities associated with order-ranked sample values are known exactly. Accordingly, the standard definition for sample quantiles should be based on the true rank probabilities. We show that this allows more accurate inference of the tails of the distribution, and thus improves estimation of the probability of extreme events.
Regression model for tuning the PID controller with fractional order time delay system
S.P. Agnihotri; Laxman Madhavrao Waghmare
2014-01-01
In this paper a regression model based for tuning proportional integral derivative (PID) controller with fractional order time delay system is proposed. The novelty of this paper is that tuning parameters of the fractional order time delay system are optimally predicted using the regression model. In the proposed method, the output parameters of the fractional order system are used to derive the regression function. Here, the regression model depends on the weights of the exponential function...
A generalized additive regression model for survival times
DEFF Research Database (Denmark)
Scheike, Thomas H.
2001-01-01
Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models......Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models...
A generalized additive regression model for survival times
DEFF Research Database (Denmark)
Scheike, Thomas H.
2001-01-01
Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models......Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models...
Gu, Fei; Preacher, Kristopher J; Wu, Wei; Yung, Yiu-Fai
2014-01-01
Although the state space approach for estimating multilevel regression models has been well established for decades in the time series literature, it does not receive much attention from educational and psychological researchers. In this article, we (a) introduce the state space approach for estimating multilevel regression models and (b) extend the state space approach for estimating multilevel factor models. A brief outline of the state space formulation is provided and then state space forms for univariate and multivariate multilevel regression models, and a multilevel confirmatory factor model, are illustrated. The utility of the state space approach is demonstrated with either a simulated or real example for each multilevel model. It is concluded that the results from the state space approach are essentially identical to those from specialized multilevel regression modeling and structural equation modeling software. More importantly, the state space approach offers researchers a computationally more efficient alternative to fit multilevel regression models with a large number of Level 1 units within each Level 2 unit or a large number of observations on each subject in a longitudinal study.
A Bayesian Nonparametric Causal Model for Regression Discontinuity Designs
Karabatsos, George; Walker, Stephen G.
2013-01-01
The regression discontinuity (RD) design (Thistlewaite & Campbell, 1960; Cook, 2008) provides a framework to identify and estimate causal effects from a non-randomized design. Each subject of a RD design is assigned to the treatment (versus assignment to a non-treatment) whenever her/his observed value of the assignment variable equals or…
Linear regression model selection using p-values when the model dimension grows
Pokarowski, Piotr; Teisseyre, Paweł
2012-01-01
We consider a new criterion-based approach to model selection in linear regression. Properties of selection criteria based on p-values of a likelihood ratio statistic are studied for families of linear regression models. We prove that such procedures are consistent i.e. the minimal true model is chosen with probability tending to 1 even when the number of models under consideration slowly increases with a sample size. The simulation study indicates that introduced methods perform promisingly when compared with Akaike and Bayesian Information Criteria.
Grajeda, Laura M; Ivanescu, Andrada; Saito, Mayuko; Crainiceanu, Ciprian; Jaganath, Devan; Gilman, Robert H; Crabtree, Jean E; Kelleher, Dermott; Cabrera, Lilia; Cama, Vitaliano; Checkley, William
2016-01-01
Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p linear regression equation for both estimation and prediction of population- and individual-level growth in height. We show that cubic regression splines are superior to linear regression splines for the case of a small number of knots in both estimation and prediction with the full linear mixed effect model (AIC 19,352 vs. 19,598, respectively). While the regression parameters are more complex to interpret in the former, we argue that inference for any problem depends more on the estimated curve or differences in curves rather
Sampling errors of quantile estimations from finite samples of data
Roy, Philippe; Gachon, Philippe
2016-01-01
Empirical relationships are derived for the expected sampling error of quantile estimations using Monte Carlo experiments for two frequency distributions frequently encountered in climate sciences. The relationships found are expressed as a scaling factor times the standard error of the mean; these give a quick tool to estimate the uncertainty of quantiles for a given finite sample size.
A nonparametric dynamic additive regression model for longitudinal data
DEFF Research Database (Denmark)
Martinussen, Torben; Scheike, Thomas H.
2000-01-01
dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models......dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models...
The Applicability of Confidence Intervals of Quantiles for the Generalized Logistic Distribution
Shin, H.; Heo, J.; Kim, T.; Jung, Y.
2007-12-01
The generalized logistic (GL) distribution has been widely used for frequency analysis. However, there is a little study related to the confidence intervals that indicate the prediction accuracy of distribution for the GL distribution. In this paper, the estimation of the confidence intervals of quantiles for the GL distribution is presented based on the method of moments (MOM), maximum likelihood (ML), and probability weighted moments (PWM) and the asymptotic variances of each quantile estimator are derived as functions of the sample sizes, return periods, and parameters. Monte Carlo simulation experiments are also performed to verify the applicability of the derived confidence intervals of quantile. As the results, the relative bias (RBIAS) and relative root mean square error (RRMSE) of the confidence intervals generally increase as return period increases and reverse as sample size increases. And PWM for estimating the confidence intervals performs better than the other methods in terms of RRMSE when the data is almost symmetric while ML shows the smallest RBIAS and RRMSE when the data is more skewed and sample size is moderately large. The GL model was applied to fit the distribution of annual maximum rainfall data. The results show that there are little differences in the estimated quantiles between ML and PWM while distinct differences in MOM.
VARIABLE SELECTION BY PSEUDO WAVELETS IN HETEROSCEDASTIC REGRESSION MODELS INVOLVING TIME SERIES
Institute of Scientific and Technical Information of China (English)
无
2006-01-01
A simple but efficient method has been proposed to select variables in heteroscedastic regression models. It is shown that the pseudo empirical wavelet coefficients corresponding to the significant explanatory variables in the regression models are clearly larger than those nonsignificant ones, on the basis of which a procedure is developed to select variables in regression models. The coefficients of the models are also estimated. All estimators are proved to be consistent.
Lamont, A.E.; Vermunt, J.K.; Van Horn, M.L.
2016-01-01
Regression mixture models are increasingly used as an exploratory approach to identify heterogeneity in the effects of a predictor on an outcome. In this simulation study, we tested the effects of violating an implicit assumption often made in these models; that is, independent variables in the
Directory of Open Access Journals (Sweden)
Simone Becker Lopes
2014-04-01
Full Text Available Considering the importance of spatial issues in transport planning, the main objective of this study was to analyze the results obtained from different approaches of spatial regression models. In the case of spatial autocorrelation, spatial dependence patterns should be incorporated in the models, since that dependence may affect the predictive power of these models. The results obtained with the spatial regression models were also compared with the results of a multiple linear regression model that is typically used in trips generation estimations. The findings support the hypothesis that the inclusion of spatial effects in regression models is important, since the best results were obtained with alternative models (spatial regression models or the ones with spatial variables included. This was observed in a case study carried out in the city of Porto Alegre, in the state of Rio Grande do Sul, Brazil, in the stages of specification and calibration of the models, with two distinct datasets.
First Look at Photometric Reduction via Mixed-Model Regression (Poster abstract)
Dose, E.
2016-12-01
(Abstract only) Mixed-model regression is proposed as a new approach to photometric reduction, especially for variable-star photometry in several filters. Mixed-model regression adds to normal multivariate regression certain "random effects": categorical-variable terms that model and extract specific systematic errors such as image-to-image zero-point fluctuations (cirrus effect) or even errors in comp-star catalog magnitudes.
Introduction to mixed modelling beyond regression and analysis of variance
Galwey, N W
2007-01-01
Mixed modelling is one of the most promising and exciting areas of statistical analysis, enabling more powerful interpretation of data through the recognition of random effects. However, many perceive mixed modelling as an intimidating and specialized technique.
Waller, Niels; Jones, Jeff
2011-01-01
We describe methods for assessing all possible criteria (i.e., dependent variables) and subsets of criteria for regression models with a fixed set of predictors, x (where x is an n x 1 vector of independent variables). Our methods build upon the geometry of regression coefficients (hereafter called regression weights) in n-dimensional space. For a…
U.S. Environmental Protection Agency — Spreadsheets are included here to support the manuscript "Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition". This...
Preference learning with evolutionary Multivariate Adaptive Regression Spline model
DEFF Research Database (Denmark)
Abou-Zleikha, Mohamed; Shaker, Noor; Christensen, Mads Græsbøll
2015-01-01
for human decision making. Learning models from pairwise preference data is however an NP-hard problem. Therefore, constructing models that can effectively learn such data is a challenging task. Models are usually constructed with accuracy being the most important factor. Another vitally important aspect...... that is usually given less attention is expressiveness, i.e. how easy it is to explain the relationship between the model input and output. Most machine learning techniques are focused either on performance or on expressiveness. This paper employ MARS models which have the advantage of being a powerful method...
Cepeda-Cuervo, Edilberto; Núñez-Antón, Vicente
2013-01-01
In this article, a proposed Bayesian extension of the generalized beta spatial regression models is applied to the analysis of the quality of education in Colombia. We briefly revise the beta distribution and describe the joint modeling approach for the mean and dispersion parameters in the spatial regression models' setting. Finally, we motivate…
von Davier, Matthias; Sinharay, Sandip
2009-01-01
This paper presents an application of a stochastic approximation EM-algorithm using a Metropolis-Hastings sampler to estimate the parameters of an item response latent regression model. Latent regression models are extensions of item response theory (IRT) to a 2-level latent variable model in which covariates serve as predictors of the…
Kleibergen, F.
2003-01-01
We obtain the prior and posterior probability of a nested regression model as the Hausdorff-integral of the prior and posterior on the parameters of an encompassing linear regression model over a lower dimensional set that represents the nested model. The invariant expression of the
Kleibergen, F.R.
2004-01-01
We obtain the prior and posterior probability of a nested regression model as the Hausdorff-integral of the prior and posterior on the parameters of an encompassing linear regression model over a lower-dimensional set that represents the nested model. The Hausdorff-integral is invariant and
A note on the maximum likelihood estimator in the gamma regression model
Directory of Open Access Journals (Sweden)
Jerzy P. Rydlewski
2009-01-01
Full Text Available This paper considers a nonlinear regression model, in which the dependent variable has the gamma distribution. A model is considered in which the shape parameter of the random variable is the sum of continuous and algebraically independent functions. The paper proves that there is exactly one maximum likelihood estimator for the gamma regression model.
Genetic parameters for various random regression models to describe the weight data of pigs
Huisman, A.E.; Veerkamp, R.F.; Arendonk, van J.A.M.
2002-01-01
Various random regression models have been advocated for the fitting of covariance structures. It was suggested that a spline model would fit better to weight data than a random regression model that utilizes orthogonal polynomials. The objective of this study was to investigate which kind of random
Genetic parameters for different random regression models to describe weight data of pigs
Huisman, A.E.; Veerkamp, R.F.; Arendonk, van J.A.M.
2001-01-01
Various random regression models have been advocated for the fitting of covariance structures. It was suggested that a spline model would fit better to weight data than a random regression model that utilizes orthogonal polynomials. The objective of this study was to investigate which kind of random
Cepeda-Cuervo, Edilberto; Núñez-Antón, Vicente
2013-01-01
In this article, a proposed Bayesian extension of the generalized beta spatial regression models is applied to the analysis of the quality of education in Colombia. We briefly revise the beta distribution and describe the joint modeling approach for the mean and dispersion parameters in the spatial regression models' setting. Finally, we…
Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.
Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko
2016-03-01
In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. Copyright © 2015 Elsevier Ltd. All rights reserved.
Modeling by regression for laser cutting of quartz crystal
Institute of Scientific and Technical Information of China (English)
无
2000-01-01
Presents the theoretical models built by analysis of the mechanism of laser cutting of quartz crystal and re gression of test results for the laser cutting of quartz crystal, and comparative analysis of calculation errors for these models, and concludes with test results that these models comprehensively reflect the physical features of laser cutting of quartz crystal and satisfy the industrial production requirements, and they can be used to select right parameters for improvement of productivity and quality and saving of energy.
Regime variance testing - a quantile approach
gajda, Janusz; Wyłomańska, Agnieszka
2012-01-01
This paper is devoted to testing time series that exhibit behavior related to two or more regimes with different statistical properties. Motivation of our study are two real data sets from plasma physics with observable two-regimes structure. In this paper we develop estimation procedure for critical point of division the structure change of a time series. Moreover we propose three tests for recognition such specific behavior. The presented methodology is based on the empirical second moment and its main advantage is lack of the distribution assumption. Moreover, the examined statistical properties we express in the language of empirical quantiles of the squared data therefore the methodology is an extension of the approach known from the literature. The theoretical results we confirm by simulations and analysis of real data of turbulent laboratory plasma.
Logistic Regression Models to Forecast Travelling Behaviour in Tripoli City
Directory of Open Access Journals (Sweden)
Amiruddin Ismail
2011-01-01
Full Text Available Transport modes are very important to Libyan’s Tripoli residents for their daily trips. However, the total number of own car and private transport namely taxi and micro buses on the road increases and causes many problems such as traffic congestion, accidents, air and noise pollution. These problems then causes other related phenomena to the travel activities such as delay in trips, stress and frustration to motorists which may affect their productivity and efficiency to both workers and students. Delay may also increase travel cost as well inefficiency in trips making if compare to other public transport users in some Arabs cities. Switching to public transport (PT modes alternatives such as buses, light rail transit and underground train could improve travel time and travel costs. A transport study has been carried out at Tripoli City Authority areas among own car users who live in areas with inadequate of private transport and poor public transportation services. Analyses about relation between factors such as travel time, travel cost, trip purpose and parking cost have been made to answer research questions. Logistic regression technique has been used to analyse these factors that influence users to switch their trips mode to public transport alternatives.
Teacher training through the Regression Model in foreign language education
Directory of Open Access Journals (Sweden)
Jesús García Laborda
2011-01-01
Full Text Available In the last few years, Spain has seen dramatic changes in its educational system. Many of them have been rejected by most teachers after their implementation (LOGSE while others have found potential drawbacks even before starting operating (LOCE, LOE. To face these changes, schools need well qualified instructors. Given this need, and also considering that, although all the schools want the best teachers but, as teachers’ salaries are regulated by the state, few schools can actually offer incentives to their teachers and consequently schools never have the instructors they wish. Apart from this, state schools have a fixed salary for their teachers and private institutions offer no additional bonuses for things like additional training or diplomas (for example, masters or post-degree courses and, therefore, teachers are rarely interested in pursuing any further studies in methodology or any other related fields such as education or applied linguistics. Although many teachers acknowledge their love to teaching, the current situation in schools (school violence, bad salaries, depression, social desprestige, legal changes and so has made the teaching job one of the most complicated and undevoted in Spain. It is not unusual to have a couple of instructors ill due to depression and other psychological sicknesses. This paper deals with the development and implementation of a training program based on regressive visualizations of one’s experience both as a teacher as well as a learner.
Misspecified poisson regression models for large-scale registry data
DEFF Research Database (Denmark)
Grøn, Randi; Gerds, Thomas A.; Andersen, Per K.
2016-01-01
working models that are then likely misspecified. To support and improve conclusions drawn from such models, we discuss methods for sensitivity analysis, for estimation of average exposure effects using aggregated data, and a semi-parametric bootstrap method to obtain robust standard errors. The methods...
CONSISTENCY OF LS ESTIMATOR IN SIMPLE LINEAR EV REGRESSION MODELS
Institute of Scientific and Technical Information of China (English)
Liu Jixue; Chen Xiru
2005-01-01
Consistency of LS estimate of simple linear EV model is studied. It is shown that under some common assumptions of the model, both weak and strong consistency of the estimate are equivalent but it is not so for quadratic-mean consistency.
A Noncentral "t" Regression Model for Meta-Analysis
Camilli, Gregory; de la Torre, Jimmy; Chiu, Chia-Yi
2010-01-01
In this article, three multilevel models for meta-analysis are examined. Hedges and Olkin suggested that effect sizes follow a noncentral "t" distribution and proposed several approximate methods. Raudenbush and Bryk further refined this model; however, this procedure is based on a normal approximation. In the current research literature, this…
A Negative Binomial Regression Model for Accuracy Tests
Hung, Lai-Fa
2012-01-01
Rasch used a Poisson model to analyze errors and speed in reading tests. An important property of the Poisson distribution is that the mean and variance are equal. However, in social science research, it is very common for the variance to be greater than the mean (i.e., the data are overdispersed). This study embeds the Rasch model within an…
Additive Intensity Regression Models in Corporate Default Analysis
DEFF Research Database (Denmark)
Lando, David; Medhat, Mamdouh; Nielsen, Mads Stenbo
2013-01-01
We consider additive intensity (Aalen) models as an alternative to the multiplicative intensity (Cox) models for analyzing the default risk of a sample of rated, nonfinancial U.S. firms. The setting allows for estimating and testing the significance of time-varying effects. We use a variety of mo...
A generalized exponential time series regression model for electricity prices
DEFF Research Database (Denmark)
Haldrup, Niels; Knapik, Oskar; Proietti, Tomasso
We consider the issue of modeling and forecasting daily electricity spot prices on the Nord Pool Elspot power market. We propose a method that can handle seasonal and non-seasonal persistence by modelling the price series as a generalized exponential process. As the presence of spikes can distort...... the estimation of the dynamic structure of the series we consider an iterative estimation strategy which, conditional on a set of parameter estimates, clears the spikes using a data cleaning algorithm, and reestimates the parameters using the cleaned data so as to robustify the estimates. Conditional...... on the estimated model, the best linear predictor is constructed. Our modeling approach provides good fit within sample and outperforms competing benchmark predictors in terms of forecasting accuracy. We also find that building separate models for each hour of the day and averaging the forecasts is a better...
Estimating Flood Quantiles on the Basis of Multi-Event Rainfall Simulation – Case Study
Directory of Open Access Journals (Sweden)
Jarosińska Elżbieta
2015-12-01
Full Text Available This paper presents an approach to estimating the probability distribution of annual discharges Q based on rainfall-runoff modelling using multiple rainfall events. The approach is based on the prior knowledge about the probability distribution of annual maximum daily totals of rainfall P in a natural catchment, random disaggregation of the totals into hourly values, and rainfall-runoff modelling. The presented Multi-Event Simulation of Extreme Flood method (MESEF combines design event method based on single-rainfall event modelling, and continuous simulation method used for estimating the maximum discharges of a given exceedance probability using rainfall-runoff models. In the paper, the flood quantiles were estimated using the MESEF method, and then compared to the flood quantiles estimated using classical statistical method based on observed data.
Using the classical linear regression model in analysis of the dependences of conveyor belt life
Directory of Open Access Journals (Sweden)
Miriam Andrejiová
2013-12-01
Full Text Available The paper deals with the classical linear regression model of the dependence of conveyor belt life on some selected parameters: thickness of paint layer, width and length of the belt, conveyor speed and quantity of transported material. The first part of the article is about regression model design, point and interval estimation of parameters, verification of statistical significance of the model, and about the parameters of the proposed regression model. The second part of the article deals with identification of influential and extreme values that can have an impact on estimation of regression model parameters. The third part focuses on assumptions of the classical regression model, i.e. on verification of independence assumptions, normality and homoscedasticity of residuals.
Zhang, Ying; Bi, Peng; Hiller, Janet
2008-01-01
This is the first study to identify appropriate regression models for the association between climate variation and salmonellosis transmission. A comparison between different regression models was conducted using surveillance data in Adelaide, South Australia. By using notified salmonellosis cases and climatic variables from the Adelaide metropolitan area over the period 1990-2003, four regression methods were examined: standard Poisson regression, autoregressive adjusted Poisson regression, multiple linear regression, and a seasonal autoregressive integrated moving average (SARIMA) model. Notified salmonellosis cases in 2004 were used to test the forecasting ability of the four models. Parameter estimation, goodness-of-fit and forecasting ability of the four regression models were compared. Temperatures occurring 2 weeks prior to cases were positively associated with cases of salmonellosis. Rainfall was also inversely related to the number of cases. The comparison of the goodness-of-fit and forecasting ability suggest that the SARIMA model is better than the other three regression models. Temperature and rainfall may be used as climatic predictors of salmonellosis cases in regions with climatic characteristics similar to those of Adelaide. The SARIMA model could, thus, be adopted to quantify the relationship between climate variations and salmonellosis transmission.
Wheeler, David C.; Calder, Catherine A.
2007-06-01
The realization in the statistical and geographical sciences that a relationship between an explanatory variable and a response variable in a linear regression model is not always constant across a study area has led to the development of regression models that allow for spatially varying coefficients. Two competing models of this type are geographically weighted regression (GWR) and Bayesian regression models with spatially varying coefficient processes (SVCP). In the application of these spatially varying coefficient models, marginal inference on the regression coefficient spatial processes is typically of primary interest. In light of this fact, there is a need to assess the validity of such marginal inferences, since these inferences may be misleading in the presence of explanatory variable collinearity. In this paper, we present the results of a simulation study designed to evaluate the sensitivity of the spatially varying coefficients in the competing models to various levels of collinearity. The simulation study results show that the Bayesian regression model produces more accurate inferences on the regression coefficients than does GWR. In addition, the Bayesian regression model is overall fairly robust in terms of marginal coefficient inference to moderate levels of collinearity, and degrades less substantially than GWR with strong collinearity.
Moment-bases estimation of smooth transition regression models with endogenous variables
W.D. Areosa (Waldyr Dutra); M.J. McAleer (Michael); M.C. Medeiros (Marcelo)
2008-01-01
textabstractNonlinear regression models have been widely used in practice for a variety of time series and cross-section datasets. For purposes of analyzing univariate and multivariate time series data, in particular, Smooth Transition Regression (STR) models have been shown to be very useful for re
Covariance Functions and Random Regression Models in the ...
African Journals Online (AJOL)
ARC-IRENE
modelled to account for heterogeneity of variance by AY. ... Results suggest that selection for CW could be effective and that RRM could be .... permanent environmental effects; and εij is the temporary environmental effect or measurement error. .... (1999), however, obtained correlations that were variable as low as 0.23 ...
Genomic Prediction of Genotype × Environment Interaction Kernel Regression Models.
Cuevas, Jaime; Crossa, José; Soberanis, Víctor; Pérez-Elizalde, Sergio; Pérez-Rodríguez, Paulino; Campos, Gustavo de Los; Montesinos-López, O A; Burgueño, Juan
2016-11-01
In genomic selection (GS), genotype × environment interaction (G × E) can be modeled by a marker × environment interaction (M × E). The G × E may be modeled through a linear kernel or a nonlinear (Gaussian) kernel. In this study, we propose using two nonlinear Gaussian kernels: the reproducing kernel Hilbert space with kernel averaging (RKHS KA) and the Gaussian kernel with the bandwidth estimated through an empirical Bayesian method (RKHS EB). We performed single-environment analyses and extended to account for G × E interaction (GBLUP-G × E, RKHS KA-G × E and RKHS EB-G × E) in wheat ( L.) and maize ( L.) data sets. For single-environment analyses of wheat and maize data sets, RKHS EB and RKHS KA had higher prediction accuracy than GBLUP for all environments. For the wheat data, the RKHS KA-G × E and RKHS EB-G × E models did show up to 60 to 68% superiority over the corresponding single environment for pairs of environments with positive correlations. For the wheat data set, the models with Gaussian kernels had accuracies up to 17% higher than that of GBLUP-G × E. For the maize data set, the prediction accuracy of RKHS EB-G × E and RKHS KA-G × E was, on average, 5 to 6% higher than that of GBLUP-G × E. The superiority of the Gaussian kernel models over the linear kernel is due to more flexible kernels that accounts for small, more complex marker main effects and marker-specific interaction effects.
Directory of Open Access Journals (Sweden)
Ramnath Vishal
2017-01-01
Full Text Available Traditionally in the field of pressure metrology uncertainty quantification was performed with the use of the Guide to the Uncertainty in Measurement (GUM; however, with the introduction of the GUM Supplement 1 (GS1 the use of Monte Carlo simulations has become an accepted practice for uncertainty analysis in metrology for mathematical models in which the underlying assumptions of the GUM are not valid. Consequently the use of quantile functions was developed as a means to easily summarize and report on uncertainty numerical results that were based on Monte Carlo simulations. In this paper, we considered the case of a piston–cylinder operated pressure balance where the effective area is modelled in terms of a combination of explicit/implicit and linear/non-linear models, and how quantile functions may be applied to analyse results and compare uncertainties from a mixture of GUM and GS1 methodologies.
Linking Simple Economic Theory Models and the Cointegrated Vector AutoRegressive Model
DEFF Research Database (Denmark)
Møller, Niels Framroze
This paper attempts to clarify the connection between simple economic theory models and the approach of the Cointegrated Vector-Auto-Regressive model (CVAR). By considering (stylized) examples of simple static equilibrium models, it is illustrated in detail, how the theoretical model and its stru....... Further fundamental extensions and advances to more sophisticated theory models, such as those related to dynamics and expectations (in the structural relations) are left for future papers......This paper attempts to clarify the connection between simple economic theory models and the approach of the Cointegrated Vector-Auto-Regressive model (CVAR). By considering (stylized) examples of simple static equilibrium models, it is illustrated in detail, how the theoretical model and its......, it is demonstrated how other controversial hypotheses such as Rational Expectations can be formulated directly as restrictions on the CVAR-parameters. A simple example of a "Neoclassical synthetic" AS-AD model is also formulated. Finally, the partial- general equilibrium distinction is related to the CVAR as well...
Asymptotic Normality of LS Estimate in Simple Linear EV Regression Model
Institute of Scientific and Technical Information of China (English)
Jixue LIU
2006-01-01
Though EV model is theoretically more appropriate for applications in which measurement errors exist, people are still more inclined to use the ordinary regression models and the traditional LS method owing to the difficulties of statistical inference and computation. So it is meaningful to study the performance of LS estimate in EV model.In this article we obtain general conditions guaranteeing the asymptotic normality of the estimates of regression coefficients in the linear EV model. It is noticeable that the result is in some way different from the corresponding result in the ordinary regression model.
Institute of Scientific and Technical Information of China (English)
2009-01-01
In this paper, we study the local asymptotic behavior of the regression spline estimator in the framework of marginal semiparametric model. Similarly to Zhu, Fung and He (2008), we give explicit expression for the asymptotic bias of regression spline estimator for nonparametric function f. Our results also show that the asymptotic bias of the regression spline estimator does not depend on the working covariance matrix, which distinguishes the regression splines from the smoothing splines and the seemingly unrelated kernel. To understand the local bias result of the regression spline estimator, we show that the regression spline estimator can be obtained iteratively by applying the standard weighted least squares regression spline estimator to pseudo-observations. At each iteration, the bias of the estimator is unchanged and only the variance is updated.
Directory of Open Access Journals (Sweden)
Ivanka Jerić
2011-11-01
Full Text Available Predicting antitumor activity of compounds using regression models trained on a small number of compounds with measured biological activity is an ill-posed inverse problem. Yet, it occurs very often within the academic community. To counteract, up to some extent, overfitting problems caused by a small training data, we propose to use consensus of six regression models for prediction of biological activity of virtual library of compounds. The QSAR descriptors of 22 compounds related to the opioid growth factor (OGF, Tyr-Gly-Gly-Phe-Met with known antitumor activity were used to train regression models: the feed-forward artificial neural network, the k-nearest neighbor, sparseness constrained linear regression, the linear and nonlinear (with polynomial and Gaussian kernel support vector machine. Regression models were applied on a virtual library of 429 compounds that resulted in six lists with candidate compounds ranked by predicted antitumor activity. The highly ranked candidate compounds were synthesized, characterized and tested for an antiproliferative activity. Some of prepared peptides showed more pronounced activity compared with the native OGF; however, they were less active than highly ranked compounds selected previously by the radial basis function support vector machine (RBF SVM regression model. The ill-posedness of the related inverse problem causes unstable behavior of trained regression models on test data. These results point to high complexity of prediction based on the regression models trained on a small data sample.
A Vector Auto Regression Model Applied to Real Estate Development Investment: A Statistic Analysis
National Research Council Canada - National Science Library
Liu, Fengyun; Matsuno, Shuji; Malekian, Reza; Yu, Jin; Li, Zhixiong
2016-01-01
.... The above theoretical model is empirically evidenced with VAR (Vector Auto Regression) methodology. A panel VAR model shows that land leasing and real estate price appreciation positively affect local government general fiscal revenue...
Reduction of the curvature of a class of nonlinear regression models
Institute of Scientific and Technical Information of China (English)
吴翊; 易东云
2000-01-01
It is proved that the curvature of nonlinear model can be reduced to zero by increasing measured data for a class of nonlinear regression models. The result is important to actual problem and has obtained satisfying effect on data fusing.
Multivariable Linear Regression Model for Promotional Forecasting:The Coca Cola - Morrisons Case
Zheng, Yiwei/Y
2009-01-01
This paper describes a promotional forecasting model, built by linear regression module in Microsoft Excel. It intends to provide quick and reliable forecasts with a moderate credit and to assist the CPFR between the Coca Cola Enterprises (CCE) and the Morrisons. The model is derived from previous researches and literature review on CPFR, promotion, forecasting and modelling. It is designed as a multivariable linear regression model, which involves several promotional mix as variables includi...
Comparative analysis of regression and artificial neural network models for wind speed prediction
Bilgili, Mehmet; Sahin, Besir
2010-11-01
In this study, wind speed was modeled by linear regression (LR), nonlinear regression (NLR) and artificial neural network (ANN) methods. A three-layer feedforward artificial neural network structure was constructed and a backpropagation algorithm was used for the training of ANNs. To get a successful simulation, firstly, the correlation coefficients between all of the meteorological variables (wind speed, ambient temperature, atmospheric pressure, relative humidity and rainfall) were calculated taking two variables in turn for each calculation. All independent variables were added to the simple regression model. Then, the method of stepwise multiple regression was applied for the selection of the “best” regression equation (model). Thus, the best independent variables were selected for the LR and NLR models and also used in the input layer of the ANN. The results obtained by all methods were compared to each other. Finally, the ANN method was found to provide better performance than the LR and NLR methods.
Prediction of the result in race walking using regularized regression models
Directory of Open Access Journals (Sweden)
Krzysztof Przednowek
2013-04-01
Full Text Available The following paper presents the use of regularized linear models as tools to optimize training process. The models were calculated by using data collected from race-walkers' training events. The models used predict the outcomes over a 3 km race and following a prescribed training plan. The material included a total of 122 training patterns made by 21 players. The methods of analysis include: classical model of OLS regression, ridge regression, LASSO regression and elastic net regression. In order to compare and choose the best method a cross-validation of the extit{leave-one-out} was used. All models were calculated using R language with additional packages. The best model was determined by the LASSO method which generates an error of about 26 seconds. The method has simplified the structure of the model by eliminating 5 out of 18 predictors.
DEFF Research Database (Denmark)
Tan, Qihua; Bathum, L; Christiansen, L
2003-01-01
In this paper, we apply logistic regression models to measure genetic association with human survival for highly polymorphic and pleiotropic genes. By modelling genotype frequency as a function of age, we introduce a logistic regression model with polytomous responses to handle the polymorphic...... situation. Genotype and allele-based parameterization can be used to investigate the modes of gene action and to reduce the number of parameters, so that the power is increased while the amount of multiple testing minimized. A binomial logistic regression model with fractional polynomials is used to capture...
STATISTICAL INFERENCES FOR VARYING-COEFFICINT MODELS BASED ON LOCALLY WEIGHTED REGRESSION TECHNIQUE
Institute of Scientific and Technical Information of China (English)
梅长林; 张文修; 梁怡
2001-01-01
Some fundamental issues on statistical inferences relating to varying-coefficient regression models are addressed and studied. An exact testing procedure is proposed for checking the goodness of fit of a varying-coefficient model fired by the locally weighted regression technique versus an ordinary linear regression model. Also, an appropriate statistic for testing variation of model parameters over the locations where the observations are collected is constructed and a formal testing approach which is essential to exploring spatial non-stationarity in geography science is suggested.
Koon, Sharon; Petscher, Yaacov
2015-01-01
The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…
Aboveground biomass and carbon stocks modelling using non-linear regression model
Ain Mohd Zaki, Nurul; Abd Latif, Zulkiflee; Nazip Suratman, Mohd; Zainee Zainal, Mohd
2016-06-01
Aboveground biomass (AGB) is an important source of uncertainty in the carbon estimation for the tropical forest due to the variation biodiversity of species and the complex structure of tropical rain forest. Nevertheless, the tropical rainforest holds the most extensive forest in the world with the vast diversity of tree with layered canopies. With the usage of optical sensor integrate with empirical models is a common way to assess the AGB. Using the regression, the linkage between remote sensing and a biophysical parameter of the forest may be made. Therefore, this paper exemplifies the accuracy of non-linear regression equation of quadratic function to estimate the AGB and carbon stocks for the tropical lowland Dipterocarp forest of Ayer Hitam forest reserve, Selangor. The main aim of this investigation is to obtain the relationship between biophysical parameter field plots with the remotely-sensed data using nonlinear regression model. The result showed that there is a good relationship between crown projection area (CPA) and carbon stocks (CS) with Pearson Correlation (p < 0.01), the coefficient of correlation (r) is 0.671. The study concluded that the integration of Worldview-3 imagery with the canopy height model (CHM) raster based LiDAR were useful in order to quantify the AGB and carbon stocks for a larger sample area of the lowland Dipterocarp forest.
MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.
2005-01-01
Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.
Empirical Likelihood Method for Quantiles with Response Data Missing at Random
Institute of Scientific and Technical Information of China (English)
Xia-yan LI; Jun-qing YUAN
2012-01-01
Empirical likelihood is a nonparametric method for constructing confidence intervals and tests,notably in enabling the shape of a confidence region determined by the sample data.This paper presents a new version of the empirical likelihood method for quantiles under kernel regression imputation to adapt missing response data.It eliminates the need to solve nonlinear equations,and it is easy to apply.We also consider exponential empirical likelihood as an alternative method. Numerical results are presented to compare our method with others.
Measuring risk of crude oil at extreme quantiles
Directory of Open Access Journals (Sweden)
Saša Žiković
2011-06-01
Full Text Available The purpose of this paper is to investigate the performance of VaR models at measuring risk for WTI oil one-month futures returns. Risk models, ranging from industry standards such as RiskMetrics and historical simulation to conditional extreme value model, are used to calculate commodity market risk at extreme quantiles: 0.95, 0.99, 0.995 and 0.999 for both long and short trading positions. Our results show that out of the tested fat tailed distributions, generalised Pareto distribution provides the best fit to both tails of oil returns although tails differ significantly, with the right tail having a higher tail index, indicative of more extreme events. The main conclusion is that, in the analysed period, only extreme value theory based models provide a reasonable degree of safety while widespread VaR models do not provide adequate risk coverage and their performance is especially weak for short position in oil.
Combining an additive and tree-based regression model simultaneously: STIMA
Dusseldorp, E.; Conversano, C.; Os, B.J. van
2010-01-01
Additive models and tree-based regression models are two main classes of statistical models used to predict the scores on a continuous response variable. It is known that additive models become very complex in the presence of higher order interaction effects, whereas some tree-based models, such as
Rocconi, Louis M.
2013-01-01
This study examined the differing conclusions one may come to depending upon the type of analysis chosen, hierarchical linear modeling or ordinary least squares (OLS) regression. To illustrate this point, this study examined the influences of seniors' self-reported critical thinking abilities three ways: (1) an OLS regression with the student…
Thomas, Michael S. C.; Knowland, Victoria C. P.; Karmiloff-Smith, Annette
2011-01-01
Loss of previously established behaviors in early childhood constitutes a markedly atypical developmental trajectory. It is found almost uniquely in autism and its cause is currently unknown (Baird et al., 2008). We present an artificial neural network model of developmental regression, exploring the hypothesis that regression is caused by…
Thomas, Michael S. C.; Knowland, Victoria C. P.; Karmiloff-Smith, Annette
2011-01-01
Loss of previously established behaviors in early childhood constitutes a markedly atypical developmental trajectory. It is found almost uniquely in autism and its cause is currently unknown (Baird et al., 2008). We present an artificial neural network model of developmental regression, exploring the hypothesis that regression is caused by…
CONFIDENCE REGIONS IN TERMS OF STATISTICAL CURVATURE FOR AR(q) NONLINEAR REGRESSION MODELS
Institute of Scientific and Technical Information of China (English)
刘应安; 韦博成
2004-01-01
This paper constructs a set of confidence regions of parameters in terms of statistical curvatures for AR(q) nonlinear regression models. The geometric frameworks are proposed for the model. Then several confidence regions for parameters and parameter subsets in terms of statistical curvatures are given based on the likelihood ratio statistics and score statistics. Several previous results, such as [1] and [2] are extended to AR(q)nonlinear regression models.
Directory of Open Access Journals (Sweden)
Soldić-Aleksić Jasna
2009-01-01
Full Text Available Market segmentation presents one of the key concepts of the modern marketing. The main goal of market segmentation is focused on creating groups (segments of customers that have similar characteristics, needs, wishes and/or similar behavior regarding the purchase of concrete product/service. Companies can create specific marketing plan for each of these segments and therefore gain short or long term competitive advantage on the market. Depending on the concrete marketing goal, different segmentation schemes and techniques may be applied. This paper presents a predictive market segmentation model based on the application of logistic regression model and CHAID analysis. The logistic regression model was used for the purpose of variables selection (from the initial pool of eleven variables which are statistically significant for explaining the dependent variable. Selected variables were afterwards included in the CHAID procedure that generated the predictive market segmentation model. The model results are presented on the concrete empirical example in the following form: summary model results, CHAID tree, Gain chart, Index chart, risk and classification tables.
Martino, K G; Marks, B P
2007-12-01
Two different microbial modeling procedures were compared and validated against independent data for Listeria monocytogenes growth. The most generally used method is two consecutive regressions: growth parameters are estimated from a primary regression of microbial counts, and a secondary regression relates the growth parameters to experimental conditions. A global regression is an alternative method in which the primary and secondary models are combined, giving a direct relationship between experimental factors and microbial counts. The Gompertz equation was the primary model, and a response surface model was the secondary model. Independent data from meat and poultry products were used to validate the modeling procedures. The global regression yielded the lower standard errors of calibration, 0.95 log CFU/ml for aerobic and 1.21 log CFU/ml for anaerobic conditions. The two-step procedure yielded errors of 1.35 log CFU/ml for aerobic and 1.62 log CFU/ ml for anaerobic conditions. For food products, the global regression was more robust than the two-step procedure for 65% of the cases studied. The robustness index for the global regression ranged from 0.27 (performed better than expected) to 2.60. For the two-step method, the robustness index ranged from 0.42 to 3.88. The predictions were overestimated (fail safe) in more than 50% of the cases using the global regression and in more than 70% of the cases using the two-step regression. Overall, the global regression performed better than the two-step procedure for this specific application.
Antretter, Elfi; Dunkel, Dirk; Osvath, Peter; Voros, Viktor; Fekete, Sandor; Haring, Christian
2006-06-01
The prospective investigation of repetitive nonfatal suicidal behavior is associated with two methodological problems. Due to the commonly used definitions of nonfatal suicidal behavior, clinical samples usually consist of patients with a considerable between-person variability. Second, repeated nonfatal suicidal episodes of the same subjects are likely to be correlated. We examined three regression techniques to comparatively evaluate their efficiency in addressing the given methodological problems. Repeated episodes of nonfatal suicidal behavior were assessed in two independent patient samples during a 2-year follow-up period. The first regression design modeled repetitive nonfatal suicidal behavior as a summary measure. The second regression model treated repeated episodes of the same subject as independent events. The third regression model represented a hierarchical linear model. The estimated mean effects of the first model were likely to be nonrepresentative for a considerable part of the study subjects. The second regression design overemphasized the impact of the predictor variables. The hierarchical linear model most appropriately accounted for the heterogeneity of the samples and the correlated data structure. The nonhierarchical regression designs did not provide appropriate statistical models for the prospective investigation of repetitive nonfatal suicidal behavior. Multilevel modeling provides a convenient alternative.
Regional flow duration curves: Geostatistical techniques versus multivariate regression
Pugliese, Alessio; Farmer, William H.; Castellarin, Attilio; Archfield, Stacey A.; Vogel, Richard M.
2016-10-01
A period-of-record flow duration curve (FDC) represents the relationship between the magnitude and frequency of daily streamflows. Prediction of FDCs is of great importance for locations characterized by sparse or missing streamflow observations. We present a detailed comparison of two methods which are capable of predicting an FDC at ungauged basins: (1) an adaptation of the geostatistical method, Top-kriging, employing a linear weighted average of dimensionless empirical FDCs, standardised with a reference streamflow value; and (2) regional multiple linear regression of streamflow quantiles, perhaps the most common method for the prediction of FDCs at ungauged sites. In particular, Top-kriging relies on a metric for expressing the similarity between catchments computed as the negative deviation of the FDC from a reference streamflow value, which we termed total negative deviation (TND). Comparisons of these two methods are made in 182 largely unregulated river catchments in the southeastern U.S. using a three-fold cross-validation algorithm. Our results reveal that the two methods perform similarly throughout flow-regimes, with average Nash-Sutcliffe Efficiencies 0.566 and 0.662, (0.883 and 0.829 on log-transformed quantiles) for the geostatistical and the linear regression models, respectively. The differences between the reproduction of FDC's occurred mostly for low flows with exceedance probability (i.e. duration) above 0.98.
Regional flow duration curves: Geostatistical techniques versus multivariate regression
Pugliese, Alessio; Farmer, William H.; Castellarin, Attilio; Archfield, Stacey A.; Vogel, Richard M.
2016-01-01
A period-of-record flow duration curve (FDC) represents the relationship between the magnitude and frequency of daily streamflows. Prediction of FDCs is of great importance for locations characterized by sparse or missing streamflow observations. We present a detailed comparison of two methods which are capable of predicting an FDC at ungauged basins: (1) an adaptation of the geostatistical method, Top-kriging, employing a linear weighted average of dimensionless empirical FDCs, standardised with a reference streamflow value; and (2) regional multiple linear regression of streamflow quantiles, perhaps the most common method for the prediction of FDCs at ungauged sites. In particular, Top-kriging relies on a metric for expressing the similarity between catchments computed as the negative deviation of the FDC from a reference streamflow value, which we termed total negative deviation (TND). Comparisons of these two methods are made in 182 largely unregulated river catchments in the southeastern U.S. using a three-fold cross-validation algorithm. Our results reveal that the two methods perform similarly throughout flow-regimes, with average Nash-Sutcliffe Efficiencies 0.566 and 0.662, (0.883 and 0.829 on log-transformed quantiles) for the geostatistical and the linear regression models, respectively. The differences between the reproduction of FDC's occurred mostly for low flows with exceedance probability (i.e. duration) above 0.98.
Regression modeling of streamflow, baseflow, and runoff using geographic information systems.
Zhu, Yuanhong; Day, Rick L
2009-02-01
Regression models for predicting total streamflow (TSF), baseflow (TBF), and storm runoff (TRO) are needed for water resource planning and management. This study used 54 streams with >20 years of streamflow gaging station records during the period October 1971 to September 2001 in Pennsylvania and partitioned TSF into TBF and TRO. TBF was considered a surrogate of groundwater recharge for basins. Regression models for predicting basin-wide TSF, TBF, and TRO were developed under three scenarios that varied in regression variables used for model development. Regression variables representing basin geomorphological, geological, soil, and climatic characteristics were estimated using geographic information systems. All regression models for TSF, TBF, and TRO had R(2) values >0.94 and reasonable prediction errors. The two best TSF models developed under scenarios 1 and 2 had similar absolute prediction errors. The same was true for the two best TBF models. Therefore, any one of the two best TSF and TBF models could be used for respective flow prediction depending on variable availability. The TRO model developed under scenario 1 had smaller absolute prediction errors than that developed under scenario 2. Simplified Area-alone models developed under scenario 3 might be used when variables for using best models are not available, but had lower R(2) values and higher or more variable prediction errors than the best models.
Procedures for adjusting regional regression models of urban-runoff quality using local data
Hoos, A.B.; Sisolak, J.K.
1993-01-01
Statistical operations termed model-adjustment procedures (MAP?s) can be used to incorporate local data into existing regression models to improve the prediction of urban-runoff quality. Each MAP is a form of regression analysis in which the local data base is used as a calibration data set. Regression coefficients are determined from the local data base, and the resulting `adjusted? regression models can then be used to predict storm-runoff quality at unmonitored sites. The response variable in the regression analyses is the observed load or mean concentration of a constituent in storm runoff for a single storm. The set of explanatory variables used in the regression analyses is different for each MAP, but always includes the predicted value of load or mean concentration from a regional regression model. The four MAP?s examined in this study were: single-factor regression against the regional model prediction, P, (termed MAP-lF-P), regression against P,, (termed MAP-R-P), regression against P, and additional local variables (termed MAP-R-P+nV), and a weighted combination of P, and a local-regression prediction (termed MAP-W). The procedures were tested by means of split-sample analysis, using data from three cities included in the Nationwide Urban Runoff Program: Denver, Colorado; Bellevue, Washington; and Knoxville, Tennessee. The MAP that provided the greatest predictive accuracy for the verification data set differed among the three test data bases and among model types (MAP-W for Denver and Knoxville, MAP-lF-P and MAP-R-P for Bellevue load models, and MAP-R-P+nV for Bellevue concentration models) and, in many cases, was not clearly indicated by the values of standard error of estimate for the calibration data set. A scheme to guide MAP selection, based on exploratory data analysis of the calibration data set, is presented and tested. The MAP?s were tested for sensitivity to the size of a calibration data set. As expected, predictive accuracy of all MAP?s for
Comparison of land-use regression models between Great Britain and the Netherlands.
Vienneau, D.; de Hoogh, K.; Beelen, R.M.J.; Fischer, P.; Hoek, G.; Briggs, D.
2010-01-01
Land-use regression models have increasingly been applied for air pollution mapping at typically the city level. Though models generally predict spatial variability well, the structure of models differs widely between studies. The observed differences in the models may be due to artefacts of data an
U.S. Geological Survey, Department of the Interior — This dataset was created using the PRISM (Parameter-elevation Regressions on Independent Slopes Model) climate mapping system, developed by Dr. Christopher Daly,...
Rank Set Sampling in Improving the Estimates of Simple Regression Model
Directory of Open Access Journals (Sweden)
M Iqbal Jeelani
2015-04-01
Full Text Available In this paper Rank set sampling (RSS is introduced with a view of increasing the efficiency of estimates of Simple regression model. Regression model is considered with respect to samples taken from sampling techniques like Simple random sampling (SRS, Systematic sampling (SYS and Rank set sampling (RSS. It is found that R2 and Adj R2 obtained from regression model based on Rank set sample is higher than rest of two sampling schemes. Similarly Root mean square error, p-values, coefficient of variation are much lower in Rank set based regression model, also under validation technique (Jackknifing there is consistency in the measure of R2, Adj R2 and RMSE in case of RSS as compared to SRS and SYS. Results are supported with an empirical study involving a real data set generated of Pinus Wallichiana taken from block Langate of district Kupwara.
Institute of Scientific and Technical Information of China (English)
Tao Hu; Heng-jian Cui; Xing-wei Tong
2009-01-01
This article considers a semiparametric varying-coefficient partially linear regression model with current status data. The semiparametric varying-coefficient partially linear regression model which is a gen-eralization of the partially linear regression model and varying-coefficient regression model that allows one to explore the possibly nonlinear effect of a certain covariate on the response variable. A Sieve maximum likelihood estimation method is proposed and the asymptotic properties of the proposed estimators are discussed. Under some mild conditions, the estimators are shown to be strongly consistent. The convergence rate of the estima-tor for the unknown smooth function is obtained and the estimator for the unknown parameter is shown to be asymptotically efficient and normally distributed. Simulation studies are conducted to examine the small-sample properties of the proposed estimates and a real dataset is used to illustrate our approach.
Population-based estimates of pesticide intake are needed to characterize exposure for particular demographic groups based on their dietary behaviors. Regression modeling performed on measurements of selected pesticides in composited duplicate diet samples allowed (1) estimation ...
Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition
Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on ...
Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition
Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on ...
Drzewiecki, Wojciech
2016-12-01
In this work nine non-linear regression models were compared for sub-pixel impervious surface area mapping from Landsat images. The comparison was done in three study areas both for accuracy of imperviousness coverage evaluation in individual points in time and accuracy of imperviousness change assessment. The performance of individual machine learning algorithms (Cubist, Random Forest, stochastic gradient boosting of regression trees, k-nearest neighbors regression, random k-nearest neighbors regression, Multivariate Adaptive Regression Splines, averaged neural networks, and support vector machines with polynomial and radial kernels) was also compared with the performance of heterogeneous model ensembles constructed from the best models trained using particular techniques. The results proved that in case of sub-pixel evaluation the most accurate prediction of change may not necessarily be based on the most accurate individual assessments. When single methods are considered, based on obtained results Cubist algorithm may be advised for Landsat based mapping of imperviousness for single dates. However, Random Forest may be endorsed when the most reliable evaluation of imperviousness change is the primary goal. It gave lower accuracies for individual assessments, but better prediction of change due to more correlated errors of individual predictions. Heterogeneous model ensembles performed for individual time points assessments at least as well as the best individual models. In case of imperviousness change assessment the ensembles always outperformed single model approaches. It means that it is possible to improve the accuracy of sub-pixel imperviousness change assessment using ensembles of heterogeneous non-linear regression models.
Madarang, Krish J; Kang, Joo-Hyon
2014-06-01
Stormwater runoff has been identified as a source of pollution for the environment, especially for receiving waters. In order to quantify and manage the impacts of stormwater runoff on the environment, predictive models and mathematical models have been developed. Predictive tools such as regression models have been widely used to predict stormwater discharge characteristics. Storm event characteristics, such as antecedent dry days (ADD), have been related to response variables, such as pollutant loads and concentrations. However it has been a controversial issue among many studies to consider ADD as an important variable in predicting stormwater discharge characteristics. In this study, we examined the accuracy of general linear regression models in predicting discharge characteristics of roadway runoff. A total of 17 storm events were monitored in two highway segments, located in Gwangju, Korea. Data from the monitoring were used to calibrate United States Environmental Protection Agency's Storm Water Management Model (SWMM). The calibrated SWMM was simulated for 55 storm events, and the results of total suspended solid (TSS) discharge loads and event mean concentrations (EMC) were extracted. From these data, linear regression models were developed. R(2) and p-values of the regression of ADD for both TSS loads and EMCs were investigated. Results showed that pollutant loads were better predicted than pollutant EMC in the multiple regression models. Regression may not provide the true effect of site-specific characteristics, due to uncertainty in the data.
Random regression models using different functions to model milk flow in dairy cows.
Laureano, M M M; Bignardi, A B; El Faro, L; Cardoso, V L; Tonhati, H; Albuquerque, L G
2014-09-12
We analyzed 75,555 test-day milk flow records from 2175 primiparous Holstein cows that calved between 1997 and 2005. Milk flow was obtained by dividing the mean milk yield (kg) of the 3 daily milking by the total milking time (min) and was expressed as kg/min. Milk flow was grouped into 43 weekly classes. The analyses were performed using a single-trait Random Regression Models that included direct additive genetic, permanent environmental, and residual random effects. In addition, the contemporary group and linear and quadratic effects of cow age at calving were included as fixed effects. Fourth-order orthogonal Legendre polynomial of days in milk was used to model the mean trend in milk flow. The additive genetic and permanent environmental covariance functions were estimated using random regression Legendre polynomials and B-spline functions of days in milk. The model using a third-order Legendre polynomial for additive genetic effects and a sixth-order polynomial for permanent environmental effects, which contained 7 residual classes, proved to be the most adequate to describe variations in milk flow, and was also the most parsimonious. The heritability in milk flow estimated by the most parsimonious model was of moderate to high magnitude.
Modelling QTL effect on BTA06 using random regression test day models.
Suchocki, T; Szyda, J; Zhang, Q
2013-02-01
In statistical models, a quantitative trait locus (QTL) effect has been incorporated either as a fixed or as a random term, but, up to now, it has been mainly considered as a time-independent variable. However, for traits recorded repeatedly, it is very interesting to investigate the variation of QTL over time. The major goal of this study was to estimate the position and effect of QTL for milk, fat, protein yields and for somatic cell score based on test day records, while testing whether the effects are constant or variable throughout lactation. The analysed data consisted of 23 paternal half-sib families (716 daughters of 23 sires) of Chinese Holstein-Friesian cattle genotyped at 14 microsatellites located in the area of the casein loci on BTA6. A sequence of three models was used: (i) a lactation model, (ii) a random regression model with a QTL constant in time and (iii) a random regression model with a QTL variable in time. The results showed that, for each production trait, at least one significant QTL exists. For milk and protein yields, the QTL effect was variable in time, while for fat yield, each of the three models resulted in a significant QTL effect. When a QTL is incorporated into a model as a constant over time, its effect is averaged over lactation stages and may, thereby, be difficult or even impossible to be detected. Our results showed that, in such a situation, only a longitudinal model is able to identify loci significantly influencing trait variation.
The empirical likelihood goodness-of-fit test for regression model
Institute of Scientific and Technical Information of China (English)
Li-xing ZHU; Yong-song QIN; Wang-li XU
2007-01-01
Goodness-of-fit test for regression modes has received much attention in literature. In this paper, empirical likelihood (EL) goodness-of-fit tests for regression models including classical parametric and autoregressive (AR) time series models are proposed. Unlike the existing locally smoothing and globally smoothing methodologies, the new method has the advantage that the tests are self-scale invariant and that the asymptotic null distribution is chi-squared. Simulations are carried out to illustrate the methodology.
On asymptotics of t-type regression estimation in multiple linear model
Institute of Scientific and Technical Information of China (English)
无
2004-01-01
We consider a robust estimator (t-type regression estimator) of multiple linear regression model by maximizing marginal likelihood of a scaled t-type error t-distribution.The marginal likelihood can also be applied to the de-correlated response when the withinsubject correlation can be consistently estimated from an initial estimate of the model based on the independent working assumption. This paper shows that such a t-type estimator is consistent.
Developing and testing a global-scale regression model to quantify mean annual streamflow
Barbarossa, Valerio; Huijbregts, Mark A. J.; Hendriks, A. Jan; Beusen, Arthur H. W.; Clavreul, Julie; King, Henry; Schipper, Aafke M.
2017-01-01
Quantifying mean annual flow of rivers (MAF) at ungauged sites is essential for assessments of global water supply, ecosystem integrity and water footprints. MAF can be quantified with spatially explicit process-based models, which might be overly time-consuming and data-intensive for this purpose, or with empirical regression models that predict MAF based on climate and catchment characteristics. Yet, regression models have mostly been developed at a regional scale and the extent to which they can be extrapolated to other regions is not known. In this study, we developed a global-scale regression model for MAF based on a dataset unprecedented in size, using observations of discharge and catchment characteristics from 1885 catchments worldwide, measuring between 2 and 106 km2. In addition, we compared the performance of the regression model with the predictive ability of the spatially explicit global hydrological model PCR-GLOBWB by comparing results from both models to independent measurements. We obtained a regression model explaining 89% of the variance in MAF based on catchment area and catchment averaged mean annual precipitation and air temperature, slope and elevation. The regression model performed better than PCR-GLOBWB for the prediction of MAF, as root-mean-square error (RMSE) values were lower (0.29-0.38 compared to 0.49-0.57) and the modified index of agreement (d) was higher (0.80-0.83 compared to 0.72-0.75). Our regression model can be applied globally to estimate MAF at any point of the river network, thus providing a feasible alternative to spatially explicit process-based global hydrological models.
Regression Model Term Selection for the Analysis of Strain-Gage Balance Calibration Data
Ulbrich, Norbert Manfred; Volden, Thomas R.
2010-01-01
The paper discusses the selection of regression model terms for the analysis of wind tunnel strain-gage balance calibration data. Different function class combinations are presented that may be used to analyze calibration data using either a non-iterative or an iterative method. The role of the intercept term in a regression model of calibration data is reviewed. In addition, useful algorithms and metrics originating from linear algebra and statistics are recommended that will help an analyst (i) to identify and avoid both linear and near-linear dependencies between regression model terms and (ii) to make sure that the selected regression model of the calibration data uses only statistically significant terms. Three different tests are suggested that may be used to objectively assess the predictive capability of the final regression model of the calibration data. These tests use both the original data points and regression model independent confirmation points. Finally, data from a simplified manual calibration of the Ames MK40 balance is used to illustrate the application of some of the metrics and tests to a realistic calibration data set.
A hybrid model using logistic regression and wavelet transformation to detect traffic incidents
Directory of Open Access Journals (Sweden)
Shaurya Agarwal
2016-07-01
Full Text Available This research paper investigates a hybrid model using logistic regression with a wavelet-based feature extraction for detecting traffic incidents. A logistic regression model is suitable when the outcome can take only a limited number of values. For traffic incident detection, the outcome is limited to only two values, the presence or absence of an incident. The logistic regression model used in this study is a generalized linear model (GLM with a binomial response and a logit link function. This paper presents a framework to use logistic regression and wavelet-based feature extraction for traffic incident detection. It investigates the effect of preprocessing data on the performance of incident detection models. Results of this study indicate that logistic regression along with wavelet based feature extraction can be used effectively for incident detection by balancing the incident detection rate and the false alarm rate according to need. Logistic regression on raw data resulted in a maximum detection rate of 95.4% at the cost of 14.5% false alarm rate. Whereas the hybrid model achieved a maximum detection rate of 98.78% at the expense of 6.5% false alarm rate. Results indicate that the proposed approach is practical and efficient; with future improvements in the proposed technique, it will make an effective tool for traffic incident detection.
Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi
2012-01-01
The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.
Chen, Baojiang; Qin, Jing
2014-05-10
In statistical analysis, a regression model is needed if one is interested in finding the relationship between a response variable and covariates. When the response depends on the covariate, then it may also depend on the function of this covariate. If one has no knowledge of this functional form but expect for monotonic increasing or decreasing, then the isotonic regression model is preferable. Estimation of parameters for isotonic regression models is based on the pool-adjacent-violators algorithm (PAVA), where the monotonicity constraints are built in. With missing data, people often employ the augmented estimating method to improve estimation efficiency by incorporating auxiliary information through a working regression model. However, under the framework of the isotonic regression model, the PAVA does not work as the monotonicity constraints are violated. In this paper, we develop an empirical likelihood-based method for isotonic regression model to incorporate the auxiliary information. Because the monotonicity constraints still hold, the PAVA can be used for parameter estimation. Simulation studies demonstrate that the proposed method can yield more efficient estimates, and in some situations, the efficiency improvement is substantial. We apply this method to a dementia study.
Using the Logistic Regression model in supporting decisions of establishing marketing strategies
Directory of Open Access Journals (Sweden)
Cristinel CONSTANTIN
2015-12-01
Full Text Available This paper is about an instrumental research regarding the using of Logistic Regression model for data analysis in marketing research. The decision makers inside different organisation need relevant information to support their decisions regarding the marketing strategies. The data provided by marketing research could be computed in various ways but the multivariate data analysis models can enhance the utility of the information. Among these models we can find the Logistic Regression model, which is used for dichotomous variables. Our research is based on explanation the utility of this model and interpretation of the resulted information in order to help practitioners and researchers to use it in their future investigations
Regression-based air temperature spatial prediction models: an example from Poland
Directory of Open Access Journals (Sweden)
Mariusz Szymanowski
2013-10-01
Full Text Available A Geographically Weighted Regression ? Kriging (GWRK algorithm, based on the local Geographically Weighted Regression (GWR, is applied for spatial prediction of air temperature in Poland. Hengl's decision tree for selecting a suitable prediction model is extended for varying spatial relationships between the air temperature and environmental predictors with an assumption of existing environmental dependence of analyzed temperature variables. The procedure includes the potential choice of a local GWR instead of the global Multiple Linear Regression (MLR method for modeling the deterministic part of spatial variation, which is usual in the standard regression (residual kriging model (MLRK. The analysis encompassed: testing for environmental correlation, selecting an appropriate regression model, testing for spatial autocorrelation of the residual component, and validating the prediction accuracy. The proposed approach was performed for 69 air temperature cases, with time aggregation ranging from daily to annual average air temperatures. The results show that, irrespective of the level of data aggregation, the spatial distribution of temperature is better fitted by local models, and hence is the reason for choosing a GWR instead of the MLR for all variables analyzed. Additionally, in most cases (78% there is spatial autocorrelation in the residuals of the deterministic part, which suggests that the GWR model should be extended by ordinary kriging of residuals to the GWRK form. The decision tree used in this paper can be considered as universal as it encompasses either spatially varying relationships of modeled and explanatory variables or random process that can be modeled by a stochastic extension of the regression model (residual kriging. Moreover, for all cases analyzed, the selection of a method based on the local regression model (GWRK or GWR does not depend on the data aggregation level, showing the potential versatility of the technique.
Intuitionistic Fuzzy Weighted Linear Regression Model with Fuzzy Entropy under Linear Restrictions.
Kumar, Gaurav; Bajaj, Rakesh Kumar
2014-01-01
In fuzzy set theory, it is well known that a triangular fuzzy number can be uniquely determined through its position and entropies. In the present communication, we extend this concept on triangular intuitionistic fuzzy number for its one-to-one correspondence with its position and entropies. Using the concept of fuzzy entropy the estimators of the intuitionistic fuzzy regression coefficients have been estimated in the unrestricted regression model. An intuitionistic fuzzy weighted linear regression (IFWLR) model with some restrictions in the form of prior information has been considered. Further, the estimators of regression coefficients have been obtained with the help of fuzzy entropy for the restricted/unrestricted IFWLR model by assigning some weights in the distance function.
Directory of Open Access Journals (Sweden)
Ahmad A. Saifan
2016-04-01
Full Text Available Regression testing is a safeguarding procedure to validate and verify adapted software, and guarantee that no errors have emerged. However, regression testing is very costly when testers need to re-execute all the test cases against the modified software. This paper proposes a new approach in regression test selection domain. The approach is based on meta-models (test models and structured models to decrease the number of test cases to be used in the regression testing process. The approach has been evaluated using three Java applications. To measure the effectiveness of the proposed approach, we compare the results using the re-test to all approaches. The results have shown that our approach reduces the size of test suite without negative impact on the effectiveness of the fault detection.
Hartmann, Armin; Van Der Kooij, Anita J; Zeeck, Almut
2009-07-01
In explorative regression studies, linear models are often applied without questioning the linearity of the relations between the predictor variables and the dependent variable, or linear relations are taken as an approximation. In this study, the method of regression with optimal scaling transformations is demonstrated. This method does not require predefined nonlinear functions and results in easy-to-interpret transformations that will show the form of the relations. The method is illustrated using data from a German multicenter project on the indication criteria for inpatient or day clinic psychotherapy treatment. The indication criteria to include in the regression model were selected with the Lasso, which is a tool for predictor selection that overcomes the disadvantages of stepwise regression methods. The resulting prediction model indicates that treatment status is (approximately) linearly related to some criteria and nonlinearly related to others.
Modeling personalized head-related impulse response using support vector regression
Institute of Scientific and Technical Information of China (English)
HUANG Qing-hua; FANG Yong
2009-01-01
A new customization approach based on support vector regression (SVR) is proposed to obtain individual headrelated impulse response (HRIR) without complex measurement and special equipment. Principal component analysis (PCA) is first applied to obtain a few principal components and corresponding weight vectors correlated with individual anthropometric parameters. Then the weight vectors act as output of the nonlinear regression model. Some measured anthropometric parameters are selected as input of the model according to the correlation coefficients between the parameters and the weight vectors. After the regression model is learned from the training data, the individual HRIR can be predicted based on the measured anthropometric parameters. Compared with a back-propagation neural network (BPNN) for nonlinear regression,better generalization and prediction performance for small training samples can be obtained using the proposed PCA-SVR algorithm.
RAINFALL-RUNOFF MODELING IN THE TURKEY RIVER USING NUMERICAL AND REGRESSION METHODS
Directory of Open Access Journals (Sweden)
J. Behmanesh
2015-01-01
Full Text Available Modeling rainfall-runoff relationships in a watershed have an important role in water resources engineering. Researchers have used numerical models for modeling rainfall-runoff process in the watershed because of non-linear nature of rainfall-runoff relationship, vast data requirement and physical models hardness. The main object of this research was to model the rainfall-runoff relationship at the Turkey River in Mississippi. In this research, two numerical models including ANN and ANFIS were used to model the rainfall-runoff process and the best model was chosen. Also, by using SPSS software, the regression equations were developed and then the best equation was selected from regression analysis. The obtained results from the numerical and regression modeling were compared each other. The comparison showed that the model obtained from ANFIS modeling was better than the model obtained from regression modeling. The results also stated that the Turkey river flow rate had a logical relationship with one and two days ago flow rate and one, two and three days ago rainfall values.
RAINFALL-RUNOFF MODELING IN THE TURKEY RIVER USING NUMERICAL AND REGRESSION METHODS
Directory of Open Access Journals (Sweden)
J. Behmanesh
2015-03-01
Full Text Available Modeling rainfall-runoff relationships in a watershed have an important role in water resources engineering. Researchers have used numerical models for modeling rainfall-runoff process in the watershed because of non-linear nature of rainfall-runoff relationship, vast data requirement and physical models hardness. The main object of this research was to model the rainfall-runoff relationship at the Turkey River in Mississippi. In this research, two numerical models including ANN and ANFIS were used to model the rainfall-runoff process and the best model was chosen. Also, by using SPSS software, the regression equations were developed and then the best equation was selected from regression analysis. The obtained results from the numerical and regression modeling were compared each other. The comparison showed that the model obtained from ANFIS modeling was better than the model obtained from regression modeling. The results also stated that the Turkey river flow rate had a logical relationship with one and two days ago flow rate and one, two and three days ago rainfall values.
Electricity demand loads modeling using AutoRegressive Moving Average (ARMA) models
Energy Technology Data Exchange (ETDEWEB)
Pappas, S.S. [Department of Information and Communication Systems Engineering, University of the Aegean, Karlovassi, 83 200 Samos (Greece); Ekonomou, L.; Chatzarakis, G.E. [Department of Electrical Engineering Educators, ASPETE - School of Pedagogical and Technological Education, N. Heraklion, 141 21 Athens (Greece); Karamousantas, D.C. [Technological Educational Institute of Kalamata, Antikalamos, 24100 Kalamata (Greece); Katsikas, S.K. [Department of Technology Education and Digital Systems, University of Piraeus, 150 Androutsou Srt., 18 532 Piraeus (Greece); Liatsis, P. [Division of Electrical Electronic and Information Engineering, School of Engineering and Mathematical Sciences, Information and Biomedical Engineering Centre, City University, Northampton Square, London EC1V 0HB (United Kingdom)
2008-09-15
This study addresses the problem of modeling the electricity demand loads in Greece. The provided actual load data is deseasonilized and an AutoRegressive Moving Average (ARMA) model is fitted on the data off-line, using the Akaike Corrected Information Criterion (AICC). The developed model fits the data in a successful manner. Difficulties occur when the provided data includes noise or errors and also when an on-line/adaptive modeling is required. In both cases and under the assumption that the provided data can be represented by an ARMA model, simultaneous order and parameter estimation of ARMA models under the presence of noise are performed. The produced results indicate that the proposed method, which is based on the multi-model partitioning theory, tackles successfully the studied problem. For validation purposes the produced results are compared with three other established order selection criteria, namely AICC, Akaike's Information Criterion (AIC) and Schwarz's Bayesian Information Criterion (BIC). The developed model could be useful in the studies that concern electricity consumption and electricity prices forecasts. (author)
Singh, Kunwar P; Gupta, Shikha; Rai, Premanjali
2014-05-01
Kernel function-based regression models were constructed and applied to a nonlinear hydro-chemical dataset pertaining to surface water for predicting the dissolved oxygen levels. Initial features were selected using nonlinear approach. Nonlinearity in the data was tested using BDS statistics, which revealed the data with nonlinear structure. Kernel ridge regression, kernel principal component regression, kernel partial least squares regression, and support vector regression models were developed using the Gaussian kernel function and their generalization and predictive abilities were compared in terms of several statistical parameters. Model parameters were optimized using the cross-validation procedure. The proposed kernel regression methods successfully captured the nonlinear features of the original data by transforming it to a high dimensional feature space using the kernel function. Performance of all the kernel-based modeling methods used here were comparable both in terms of predictive and generalization abilities. Values of the performance criteria parameters suggested for the adequacy of the constructed models to fit the nonlinear data and their good predictive capabilities.
Shi, J Q; Wang, B; Will, E J; West, R M
2012-11-20
We propose a new semiparametric model for functional regression analysis, combining a parametric mixed-effects model with a nonparametric Gaussian process regression model, namely a mixed-effects Gaussian process functional regression model. The parametric component can provide explanatory information between the response and the covariates, whereas the nonparametric component can add nonlinearity. We can model the mean and covariance structures simultaneously, combining the information borrowed from other subjects with the information collected from each individual subject. We apply the model to dose-response curves that describe changes in the responses of subjects for differing levels of the dose of a drug or agent and have a wide application in many areas. We illustrate the method for the management of renal anaemia. An individual dose-response curve is improved when more information is included by this mechanism from the subject/patient over time, enabling a patient-specific treatment regime.
Ulrich, David; Parkhouse, Bonnie L.
1982-01-01
An alumni-based model is proposed as an alternative to sports management curriculum design procedures. The model relies on the assessment of curriculum by sport management alumni and uses performance ratings of employers and measures of satisfaction by alumni in a regression model to identify curriculum leading to increased work performance and…
Menendez, P.; Eilers, P.; Tikunov, Y.M.; Bovy, A.G.; Eeuwijk, van F.
2012-01-01
The search for models which link tomato taste attributes to their metabolic profiling, is a main challenge within the breeding programs that aim to enhance tomato flavor. In this paper, we compared such models calculated by the traditional statistical approach, stepwise regression, with models obtai
MULTIPLE LOGISTIC REGRESSION MODEL TO PREDICT RISK FACTORS OF ORAL HEALTH DISEASES
Directory of Open Access Journals (Sweden)
Parameshwar V. Pandit
2012-06-01
Full Text Available Purpose: To analysis the dependence of oral health diseases i.e. dental caries and periodontal disease on considering the number of risk factors through the applications of logistic regression model. Method: The cross sectional study involves a systematic random sample of 1760 permanent dentition aged between 18-40 years in Dharwad, Karnataka, India. Dharwad is situated in North Karnataka. The mean age was 34.26±7.28. The risk factors of dental caries and periodontal disease were established by multiple logistic regression model using SPSS statistical software. Results: The factors like frequency of brushing, timings of cleaning teeth and type of toothpastes are significant persistent predictors of dental caries and periodontal disease. The log likelihood value of full model is –1013.1364 and Akaike’s Information Criterion (AIC is 1.1752 as compared to reduced regression model are -1019.8106 and 1.1748 respectively for dental caries. But, the log likelihood value of full model is –1085.7876 and AIC is 1.2577 followed by reduced regression model are -1019.8106 and 1.1748 respectively for periodontal disease. The area under Receiver Operating Characteristic (ROC curve for the dental caries is 0.7509 (full model and 0.7447 (reduced model; the ROC for the periodontal disease is 0.6128 (full model and 0.5821 (reduced model. Conclusions: The frequency of brushing, timings of cleaning teeth and type of toothpastes are main signifi cant risk factors of dental caries and periodontal disease. The fitting performance of reduced logistic regression model is slightly a better fit as compared to full logistic regression model in identifying the these risk factors for both dichotomous dental caries and periodontal disease.
Structured Additive Regression Models: An R Interface to BayesX
Directory of Open Access Journals (Sweden)
Nikolaus Umlauf
2015-02-01
Full Text Available Structured additive regression (STAR models provide a flexible framework for model- ing possible nonlinear effects of covariates: They contain the well established frameworks of generalized linear models and generalized additive models as special cases but also allow a wider class of effects, e.g., for geographical or spatio-temporal data, allowing for specification of complex and realistic models. BayesX is standalone software package providing software for fitting general class of STAR models. Based on a comprehensive open-source regression toolbox written in C++, BayesX uses Bayesian inference for estimating STAR models based on Markov chain Monte Carlo simulation techniques, a mixed model representation of STAR models, or stepwise regression techniques combining penalized least squares estimation with model selection. BayesX not only covers models for responses from univariate exponential families, but also models from less-standard regression situations such as models for multi-categorical responses with either ordered or unordered categories, continuous time survival data, or continuous time multi-state models. This paper presents a new fully interactive R interface to BayesX: the R package R2BayesX. With the new package, STAR models can be conveniently specified using Rs formula language (with some extended terms, fitted using the BayesX binary, represented in R with objects of suitable classes, and finally printed/summarized/plotted. This makes BayesX much more accessible to users familiar with R and adds extensive graphics capabilities for visualizing fitted STAR models. Furthermore, R2BayesX complements the already impressive capabilities for semiparametric regression in R by a comprehensive toolbox comprising in particular more complex response types and alternative inferential procedures such as simulation-based Bayesian inference.
Directory of Open Access Journals (Sweden)
S. Goyal
2012-03-01
Full Text Available This paper highlights the significance of computational intelligence models for predicting shelf life of processed cheese stored at 7-8 g.C. Linear Layer and Generalized Regression models were developed with input parameters: Soluble nitrogen, pH, Standard plate count, Yeast & mould count, Spores, and sensory score as output parameter. Mean Square Error, Root Mean Square Error, Coefficient of Determination and Nash - Sutcliffo Coefficient were used in order to compare the prediction ability of the models. The study revealed that Generalized Regression computational intelligence models are quite effective in predicting the shelf life of processed cheese stored at 7-8 g.C.
The Relationship between Economic Growth and Money Laundering – a Linear Regression Model
Directory of Open Access Journals (Sweden)
Daniel Rece
2009-09-01
Full Text Available This study provides an overview of the relationship between economic growth and money laundering modeled by a least squares function. The report analyzes statistically data collected from USA, Russia, Romania and other eleven European countries, rendering a linear regression model. The study illustrates that 23.7% of the total variance in the regressand (level of money laundering is “explained” by the linear regression model. In our opinion, this model will provide critical auxiliary judgment and decision support for anti-money laundering service systems.
DEFF Research Database (Denmark)
Carstensen, Bendix
1996-01-01
This paper shows how to fit excess and relative risk regression models to interval censored survival data, and how to implement the models in standard statistical software. The methods developed are used for the analysis of HIV infection rates in a cohort of Danish homosexual men.......This paper shows how to fit excess and relative risk regression models to interval censored survival data, and how to implement the models in standard statistical software. The methods developed are used for the analysis of HIV infection rates in a cohort of Danish homosexual men....
A primer for biomedical scientists on how to execute model II linear regression analysis.
Ludbrook, John
2012-04-01
1. There are two very different ways of executing linear regression analysis. One is Model I, when the x-values are fixed by the experimenter. The other is Model II, in which the x-values are free to vary and are subject to error. 2. I have received numerous complaints from biomedical scientists that they have great difficulty in executing Model II linear regression analysis. This may explain the results of a Google Scholar search, which showed that the authors of articles in journals of physiology, pharmacology and biochemistry rarely use Model II regression analysis. 3. I repeat my previous arguments in favour of using least products linear regression analysis for Model II regressions. I review three methods for executing ordinary least products (OLP) and weighted least products (WLP) regression analysis: (i) scientific calculator and/or computer spreadsheet; (ii) specific purpose computer programs; and (iii) general purpose computer programs. 4. Using a scientific calculator and/or computer spreadsheet, it is easy to obtain correct values for OLP slope and intercept, but the corresponding 95% confidence intervals (CI) are inaccurate. 5. Using specific purpose computer programs, the freeware computer program smatr gives the correct OLP regression coefficients and obtains 95% CI by bootstrapping. In addition, smatr can be used to compare the slopes of OLP lines. 6. When using general purpose computer programs, I recommend the commercial programs systat and Statistica for those who regularly undertake linear regression analysis and I give step-by-step instructions in the Supplementary Information as to how to use loss functions.
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-12-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.
Evaluation of Regression Models of Balance Calibration Data Using an Empirical Criterion
Ulbrich, Norbert; Volden, Thomas R.
2012-01-01
An empirical criterion for assessing the significance of individual terms of regression models of wind tunnel strain gage balance outputs is evaluated. The criterion is based on the percent contribution of a regression model term. It considers a term to be significant if its percent contribution exceeds the empirical threshold of 0.05%. The criterion has the advantage that it can easily be computed using the regression coefficients of the gage outputs and the load capacities of the balance. First, a definition of the empirical criterion is provided. Then, it is compared with an alternate statistical criterion that is widely used in regression analysis. Finally, calibration data sets from a variety of balances are used to illustrate the connection between the empirical and the statistical criterion. A review of these results indicated that the empirical criterion seems to be suitable for a crude assessment of the significance of a regression model term as the boundary between a significant and an insignificant term cannot be defined very well. Therefore, regression model term reduction should only be performed by using the more universally applicable statistical criterion.
Modeling Governance KB with CATPCA to Overcome Multicollinearity in the Logistic Regression
Khikmah, L.; Wijayanto, H.; Syafitri, U. D.
2017-04-01
The problem often encounters in logistic regression modeling are multicollinearity problems. Data that have multicollinearity between explanatory variables with the result in the estimation of parameters to be bias. Besides, the multicollinearity will result in error in the classification. In general, to overcome multicollinearity in regression used stepwise regression. They are also another method to overcome multicollinearity which involves all variable for prediction. That is Principal Component Analysis (PCA). However, classical PCA in only for numeric data. Its data are categorical, one method to solve the problems is Categorical Principal Component Analysis (CATPCA). Data were used in this research were a part of data Demographic and Population Survey Indonesia (IDHS) 2012. This research focuses on the characteristic of women of using the contraceptive methods. Classification results evaluated using Area Under Curve (AUC) values. The higher the AUC value, the better. Based on AUC values, the classification of the contraceptive method using stepwise method (58.66%) is better than the logistic regression model (57.39%) and CATPCA (57.39%). Evaluation of the results of logistic regression using sensitivity, shows the opposite where CATPCA method (99.79%) is better than logistic regression method (92.43%) and stepwise (92.05%). Therefore in this study focuses on major class classification (using a contraceptive method), then the selected model is CATPCA because it can raise the level of the major class model accuracy.
Inferring river bathymetry via Image-to-Depth Quantile Transformation (IDQT)
Legleiter, Carl
2016-01-01
Conventional, regression-based methods of inferring depth from passive optical image data undermine the advantages of remote sensing for characterizing river systems. This study introduces and evaluates a more flexible framework, Image-to-Depth Quantile Transformation (IDQT), that involves linking the frequency distribution of pixel values to that of depth. In addition, a new image processing workflow involving deep water correction and Minimum Noise Fraction (MNF) transformation can reduce a hyperspectral data set to a single variable related to depth and thus suitable for input to IDQT. Applied to a gravel bed river, IDQT avoided negative depth estimates along channel margins and underpredictions of pool depth. Depth retrieval accuracy (R25 0.79) and precision (0.27 m) were comparable to an established band ratio-based method, although a small shallow bias (0.04 m) was observed. Several ways of specifying distributions of pixel values and depths were evaluated but had negligible impact on the resulting depth estimates, implying that IDQT was robust to these implementation details. In essence, IDQT uses frequency distributions of pixel values and depths to achieve an aspatial calibration; the image itself provides information on the spatial distribution of depths. The approach thus reduces sensitivity to misalignment between field and image data sets and allows greater flexibility in the timing of field data collection relative to image acquisition, a significant advantage in dynamic channels. IDQT also creates new possibilities for depth retrieval in the absence of field data if a model could be used to predict the distribution of depths within a reach.
A Stochastic Restricted Principal Components Regression Estimator in the Linear Model
Directory of Open Access Journals (Sweden)
Daojiang He
2014-01-01
Full Text Available We propose a new estimator to combat the multicollinearity in the linear model when there are stochastic linear restrictions on the regression coefficients. The new estimator is constructed by combining the ordinary mixed estimator (OME and the principal components regression (PCR estimator, which is called the stochastic restricted principal components (SRPC regression estimator. Necessary and sufficient conditions for the superiority of the SRPC estimator over the OME and the PCR estimator are derived in the sense of the mean squared error matrix criterion. Finally, we give a numerical example and a Monte Carlo study to illustrate the performance of the proposed estimator.
Regression analysis understanding and building business and economic models using Excel
Wilson, J Holton
2012-01-01
The technique of regression analysis is used so often in business and economics today that an understanding of its use is necessary for almost everyone engaged in the field. This book will teach you the essential elements of building and understanding regression models in a business/economic context in an intuitive manner. The authors take a non-theoretical treatment that is accessible even if you have a limited statistical background. It is specifically designed to teach the correct use of regression, while advising you of its limitations and teaching about common pitfalls. This book describe
Hanks, Ephraim M.; Schliep, Erin M.; Hooten, Mevin B.; Hoeting, Jennifer A.
2015-01-01
In spatial generalized linear mixed models (SGLMMs), covariates that are spatially smooth are often collinear with spatially smooth random effects. This phenomenon is known as spatial confounding and has been studied primarily in the case where the spatial support of the process being studied is discrete (e.g., areal spatial data). In this case, the most common approach suggested is restricted spatial regression (RSR) in which the spatial random effects are constrained to be orthogonal to the fixed effects. We consider spatial confounding and RSR in the geostatistical (continuous spatial support) setting. We show that RSR provides computational benefits relative to the confounded SGLMM, but that Bayesian credible intervals under RSR can be inappropriately narrow under model misspecification. We propose a posterior predictive approach to alleviating this potential problem and discuss the appropriateness of RSR in a variety of situations. We illustrate RSR and SGLMM approaches through simulation studies and an analysis of malaria frequencies in The Gambia, Africa.
Estimasi Model Seemingly Unrelated Regression (SUR dengan Metode Generalized Least Square (GLS
Directory of Open Access Journals (Sweden)
Ade Widyaningsih
2014-06-01
Full Text Available Regression analysis is a statistical tool that is used to determine the relationship between two or more quantitative variables so that one variable can be predicted from the other variables. A method that can used to obtain a good estimation in the regression analysis is ordinary least squares method. The least squares method is used to estimate the parameters of one or more regression but relationships among the errors in the response of other estimators are not allowed. One way to overcome this problem is Seemingly Unrelated Regression model (SUR in which parameters are estimated using Generalized Least Square (GLS. In this study, the author applies SUR model using GLS method on world gasoline demand data. The author obtains that SUR using GLS is better than OLS because SUR produce smaller errors than the OLS.
Modeling of retardance in ferrofluid with Taguchi-based multiple regression analysis
Lin, Jing-Fung; Wu, Jyh-Shyang; Sheu, Jer-Jia
2015-03-01
The citric acid (CA) coated Fe3O4 ferrofluids are prepared by a co-precipitation method and the magneto-optical retardance property is measured by a Stokes polarimeter. Optimization and multiple regression of retardance in ferrofluids are executed by combining Taguchi method and Excel. From the nine tests for four parameters, including pH of suspension, molar ratio of CA to Fe3O4, volume of CA, and coating temperature, influence sequence and excellent program are found. Multiple regression analysis and F-test on the significance of regression equation are performed. It is found that the model F value is much larger than Fcritical and significance level P <0.0001. So it can be concluded that the regression model has statistically significant predictive ability. Substituting excellent program into equation, retardance is obtained as 32.703°, higher than the highest value in tests by 11.4%.
Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne
2016-04-01
Existing evidence suggests that ambient ultrafine particles (UFPs) (regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure.
Estimating conditional quantiles with the help of the pinball loss
Steinwart, Ingo; 10.3150/10-BEJ267
2011-01-01
The so-called pinball loss for estimating conditional quantiles is a well-known tool in both statistics and machine learning. So far, however, only little work has been done to quantify the efficiency of this tool for nonparametric approaches. We fill this gap by establishing inequalities that describe how close approximate pinball risk minimizers are to the corresponding conditional quantile. These inequalities, which hold under mild assumptions on the data-generating distribution, are then used to establish so-called variance bounds, which recently turned out to play an important role in the statistical analysis of (regularized) empirical risk minimization approaches. Finally, we use both types of inequalities to establish an oracle inequality for support vector machines that use the pinball loss. The resulting learning rates are min--max optimal under some standard regularity assumptions on the conditional quantile.
Simulation and Estimation of Extreme Quantiles and Extreme Probabilities
Energy Technology Data Exchange (ETDEWEB)
Guyader, Arnaud, E-mail: arnaud.guyader@uhb.fr [Universite Rennes 2 (France); Hengartner, Nicolas [Los Alamos National Laboratory, Information Sciences Group (United States); Matzner-Lober, Eric [Universite Rennes 2 (France)
2011-10-15
Let X be a random vector with distribution {mu} on Double-Struck-Capital-R {sup d} and {Phi} be a mapping from Double-Struck-Capital-R {sup d} to Double-Struck-Capital-R . That mapping acts as a black box, e.g., the result from some computer experiments for which no analytical expression is available. This paper presents an efficient algorithm to estimate a tail probability given a quantile or a quantile given a tail probability. The algorithm improves upon existing multilevel splitting methods and can be analyzed using Poisson process tools that lead to exact description of the distribution of the estimated probabilities and quantiles. The performance of the algorithm is demonstrated in a problem related to digital watermarking.
A brief introduction to regression designs and mixed-effects modelling by a recent convert
Balling, Laura Winther
2008-01-01
This article discusses the advantages of multiple regression designs over the factorial designs traditionally used in many psycholinguistic experiments. It is shown that regression designs are typically more informative, statistically more powerful and better suited to the analysis of naturalistic tasks. The advantages of including both fixed and random effects are demonstrated with reference to linear mixed-effects models, and problems of collinearity, variable distribution and variable sele...
Strathe, Anders B; Mark, Thomas; Nielsen, Bjarne; Do, Duy Ngoc; KADARMIDEEN, Haja N.; Jensen, Just
2014-01-01
Random regression models were used to estimate covariance functions between cumulated feed intake (CFI) and body weight (BW) in 8424 Danish Duroc pigs. Random regressions on second order Legendre polynomials of age were used to describe genetic and permanent environmental curves in BW and CFI. Based on covariance functions, residual feed intake (RFI) was defined and derived as the conditional genetic variance in feed intake given mid-test breeding value for BW and rate of gain. The heritabili...
Kamaruddin, Ainur Amira; Ali, Zalila; Noor, Norlida Mohd.; Baharum, Adam; Ahmad, Wan Muhamad Amir W.
2014-07-01
Logistic regression analysis examines the influence of various factors on a dichotomous outcome by estimating the probability of the event's occurrence. Logistic regression, also called a logit model, is a statistical procedure used to model dichotomous outcomes. In the logit model the log odds of the dichotomous outcome is modeled as a linear combination of the predictor variables. The log odds ratio in logistic regression provides a description of the probabilistic relationship of the variables and the outcome. In conducting logistic regression, selection procedures are used in selecting important predictor variables, diagnostics are used to check that assumptions are valid which include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers and a test statistic is calculated to determine the aptness of the model. This study used the binary logistic regression model to investigate overweight and obesity among rural secondary school students on the basis of their demographics profile, medical history, diet and lifestyle. The results indicate that overweight and obesity of students are influenced by obesity in family and the interaction between a student's ethnicity and routine meals intake. The odds of a student being overweight and obese are higher for a student having a family history of obesity and for a non-Malay student who frequently takes routine meals as compared to a Malay student.
Energy Technology Data Exchange (ETDEWEB)
Akkaya, Ali Volkan [Department of Mechanical Engineering, Yildiz Technical University, 34349 Besiktas, Istanbul (Turkey)
2009-02-15
In this paper, multiple nonlinear regression models for estimation of higher heating value of coals are developed using proximate analysis data obtained generally from the low rank coal samples as-received basis. In this modeling study, three main model structures depended on the number of proximate analysis parameters, which are named the independent variables, such as moisture, ash, volatile matter and fixed carbon, are firstly categorized. Secondly, sub-model structures with different arrangements of the independent variables are considered. Each sub-model structure is analyzed with a number of model equations in order to find the best fitting model using multiple nonlinear regression method. Based on the results of nonlinear regression analysis, the best model for each sub-structure is determined. Among them, the models giving highest correlation for three main structures are selected. Although the selected all three models predicts HHV rather accurately, the model involving four independent variables provides the most accurate estimation of HHV. Additionally, when the chosen model with four independent variables and a literature model are tested with extra proximate analysis data, it is seen that that the developed model in this study can give more accurate prediction of HHV of coals. It can be concluded that the developed model is effective tool for HHV estimation of low rank coals. (author)