Sample records for adaptive regression modeling

  1. Multiple Regressive Model Adaptive Control

    Garipov, Emil; Stoilkov, Teodor; Kalaykov, Ivan


    The essence of the ideas applied to this text consists in the development of the strategy for control of the arbitrary in complexity continuous plant by means of a set of discrete timeinvariant linear controllers. Their number and tuned parameters correspond to the number and parameters of the linear time-invariant regressive models in the model bank, which approximate the complex plant dynamics in different operating points. Described strategy is known as Multiple Regressive Model Adaptive C...

  2. Adaptive nonparametric instrumental regression by model selection

    Johannes, Jan


    We consider the problem of estimating the structural function in nonparametric instrumental regression, where in the presence of an instrument W a response Y is modeled in dependence of an endogenous explanatory variable Z. The proposed estimator is based on dimension reduction and additional thresholding. The minimax optimal rate of convergence of the estimator is derived assuming that the structural function belongs to some ellipsoids which are in a certain sense linked to the conditional expectation operator of Z given W. We illustrate these results by considering classical smoothness assumptions. However, the proposed estimator requires an optimal choice of a dimension parameter depending on certain characteristics of the unknown structural function and the conditional expectation operator of Z given W, which are not known in practice. The main issue addressed in our work is a fully adaptive choice of this dimension parameter using a model selection approach under the restriction that the conditional expe...

  3. Complex Environmental Data Modelling Using Adaptive General Regression Neural Networks

    Kanevski, Mikhail


    The research deals with an adaptation and application of Adaptive General Regression Neural Networks (GRNN) to high dimensional environmental data. GRNN [1,2,3] are efficient modelling tools both for spatial and temporal data and are based on nonparametric kernel methods closely related to classical Nadaraya-Watson estimator. Adaptive GRNN, using anisotropic kernels, can be also applied for features selection tasks when working with high dimensional data [1,3]. In the present research Adaptive GRNN are used to study geospatial data predictability and relevant feature selection using both simulated and real data case studies. The original raw data were either three dimensional monthly precipitation data or monthly wind speeds embedded into 13 dimensional space constructed by geographical coordinates and geo-features calculated from digital elevation model. GRNN were applied in two different ways: 1) adaptive GRNN with the resulting list of features ordered according to their relevancy; and 2) adaptive GRNN applied to evaluate all possible models N [in case of wind fields N=(2^13 -1)=8191] and rank them according to the cross-validation error. In both cases training were carried out applying leave-one-out procedure. An important result of the study is that the set of the most relevant features depends on the month (strong seasonal effect) and year. The predictabilities of precipitation and wind field patterns, estimated using the cross-validation and testing errors of raw and shuffled data, were studied in detail. The results of both approaches were qualitatively and quantitatively compared. In conclusion, Adaptive GRNN with their ability to select features and efficient modelling of complex high dimensional data can be widely used in automatic/on-line mapping and as an integrated part of environmental decision support systems. 1. Kanevski M., Pozdnoukhov A., Timonin V. Machine Learning for Spatial Environmental Data. Theory, applications and software. EPFL Press

  4. Preference learning with evolutionary Multivariate Adaptive Regression Spline model

    Abou-Zleikha, Mohamed; Shaker, Noor; Christensen, Mads Græsbøll


    This paper introduces a novel approach for pairwise preference learning through combining an evolutionary method with Multivariate Adaptive Regression Spline (MARS). Collecting users' feedback through pairwise preferences is recommended over other ranking approaches as this method is more appealing...

  5. Adaptive Metric Kernel Regression

    Goutte, Cyril; Larsen, Jan


    Kernel smoothing is a widely used nonparametric pattern recognition technique. By nature, it suffers from the curse of dimensionality and is usually difficult to apply to high input dimensions. In this paper, we propose an algorithm that adapts the input metric used in multivariate regression by...... minimising a cross-validation estimate of the generalisation error. This allows one to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms the standard...

  6. Adaptive metric kernel regression

    Goutte, Cyril; Larsen, Jan


    Kernel smoothing is a widely used non-parametric pattern recognition technique. By nature, it suffers from the curse of dimensionality and is usually difficult to apply to high input dimensions. In this contribution, we propose an algorithm that adapts the input metric used in multivariate...... regression by minimising a cross-validation estimate of the generalisation error. This allows to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms the...

  7. Time-adaptive quantile regression

    Møller, Jan Kloppenborg; Nielsen, Henrik Aalborg; Madsen, Henrik


    An algorithm for time-adaptive quantile regression is presented. The algorithm is based on the simplex algorithm, and the linear optimization formulation of the quantile regression problem is given. The observations have been split to allow a direct use of the simplex algorithm. The simplex method...... and an updating procedure are combined into a new algorithm for time-adaptive quantile regression, which generates new solutions on the basis of the old solution, leading to savings in computation time. The suggested algorithm is tested against a static quantile regression model on a data set with...... wind power production, where the models combine splines and quantile regression. The comparison indicates superior performance for the time-adaptive quantile regression in all the performance parameters considered....

  8. Adaptive functional linear regression

    Comte, Fabienne


    We consider the estimation of the slope function in functional linear regression, where scalar responses are modeled in dependence of random functions. Cardot and Johannes [2010] have shown that a thresholded projection estimator can attain up to a constant minimax-rates of convergence in a general framework which allows to cover the prediction problem with respect to the mean squared prediction error as well as the estimation of the slope function and its derivatives. This estimation procedure, however, requires an optimal choice of a tuning parameter with regard to certain characteristics of the slope function and the covariance operator associated with the functional regressor. As this information is usually inaccessible in practice, we investigate a fully data-driven choice of the tuning parameter which combines model selection and Lepski's method. It is inspired by the recent work of Goldenshluger and Lepski [2011]. The tuning parameter is selected as minimizer of a stochastic penalized contrast function...

  9. Adaptive robust polynomial regression for power curve modeling with application to wind power forecasting

    Xu, Man; Pinson, Pierre; Lu, Zongxiang;


    Wind farm power curve modeling, which characterizes the relationship between meteorological variables and power production, is a crucial procedure for wind power forecasting. In many cases, power curve modeling is more impacted by the limited quality of input data rather than the stochastic nature...... of the energy conversion process. Such nature may be due the varying wind conditions, aging and state of the turbines, etc. And, an equivalent steady-state power curve, estimated under normal operating conditions with the intention to filter abnormal data, is not sufficient to solve the problem...... because of the lack of time adaptivity. In this paper, a refined local polynomial regression algorithm is proposed to yield an adaptive robust model of the time-varying scattered power curve for forecasting applications. The time adaptivity of the algorithm is considered with a new data-driven bandwidth...

  10. A projection-based adaptive-to-model test for regressions

    Tan, Falong; Zhu, Xuehu; Zhu, Lixing


    A longstanding problem of existing empirical process-based tests for regressions is that when the number of covariates is greater than one, they either have no tractable limiting null distributions or are not omnibus. To attack this problem, we in this paper propose a projection-based adaptive-to-model approach. When the hypothetical model is parametric single-index, the method can fully utilize the dimension reduction model structure under the null hypothesis as if the covariate were one-dim...

  11. PM10 modeling in the Oviedo urban area (Northern Spain) by using multivariate adaptive regression splines

    Nieto, Paulino José García; Antón, Juan Carlos Álvarez; Vilán, José Antonio Vilán; García-Gonzalo, Esperanza


    The aim of this research work is to build a regression model of the particulate matter up to 10 micrometers in size (PM10) by using the multivariate adaptive regression splines (MARS) technique in the Oviedo urban area (Northern Spain) at local scale. This research work explores the use of a nonparametric regression algorithm known as multivariate adaptive regression splines (MARS) which has the ability to approximate the relationship between the inputs and outputs, and express the relationship mathematically. In this sense, hazardous air pollutants or toxic air contaminants refer to any substance that may cause or contribute to an increase in mortality or serious illness, or that may pose a present or potential hazard to human health. To accomplish the objective of this study, the experimental dataset of nitrogen oxides (NOx), carbon monoxide (CO), sulfur dioxide (SO2), ozone (O3) and dust (PM10) were collected over 3 years (2006-2008) and they are used to create a highly nonlinear model of the PM10 in the Oviedo urban nucleus (Northern Spain) based on the MARS technique. One main objective of this model is to obtain a preliminary estimate of the dependence between PM10 pollutant in the Oviedo urban area at local scale. A second aim is to determine the factors with the greatest bearing on air quality with a view to proposing health and lifestyle improvements. The United States National Ambient Air Quality Standards (NAAQS) establishes the limit values of the main pollutants in the atmosphere in order to ensure the health of healthy people. Firstly, this MARS regression model captures the main perception of statistical learning theory in order to obtain a good prediction of the dependence among the main pollutants in the Oviedo urban area. Secondly, the main advantages of MARS are its capacity to produce simple, easy-to-interpret models, its ability to estimate the contributions of the input variables, and its computational efficiency. Finally, on the basis of

  12. Air quality modeling in the Oviedo urban area (NW Spain) by using multivariate adaptive regression splines.

    Nieto, P J García; Antón, J C Álvarez; Vilán, J A Vilán; García-Gonzalo, E


    The aim of this research work is to build a regression model of air quality by using the multivariate adaptive regression splines (MARS) technique in the Oviedo urban area (northern Spain) at a local scale. To accomplish the objective of this study, the experimental data set made up of nitrogen oxides (NO x ), carbon monoxide (CO), sulfur dioxide (SO2), ozone (O3), and dust (PM10) was collected over 3 years (2006-2008). The US National Ambient Air Quality Standards (NAAQS) establishes the limit values of the main pollutants in the atmosphere in order to ensure the health of healthy people. Firstly, this MARS regression model captures the main perception of statistical learning theory in order to obtain a good prediction of the dependence among the main pollutants in the Oviedo urban area. Secondly, the main advantages of MARS are its capacity to produce simple, easy-to-interpret models, its ability to estimate the contributions of the input variables, and its computational efficiency. Finally, on the basis of these numerical calculations, using the MARS technique, conclusions of this research work are exposed. PMID:25414030

  13. Adaptive multitrack reconstruction for particle trajectories based on fuzzy c-regression models

    Niu, Li-Bo; Li, Yu-Lan; Huang, Meng; Fu, Jian-Qiang; He, Bin; Li, Yuan-Jing


    In this paper, an approach to straight and circle track reconstruction is presented, which is suitable for particle trajectories in an homogenous magnetic field (or 0 T) or Cherenkov rings. The method is based on fuzzy c-regression models, where the number of the models stands for the track number. The approximate number of tracks and a rough evaluation of the track parameters given by Hough transform are used to initiate the fuzzy c-regression models. The technique effectively represents a merger between track candidates finding and parameters fitting. The performance of this approach is tested by some simulated data under various scenarios. Results show that this technique is robust and could provide very accurate results efficiently. Supported by National Natural Science Foundation of China (11275109)

  14. Adaptive Local Linear Quantile Regression

    Yu-nan Su; Mao-zai Tian


    In this paper we propose a new method of local linear adaptive smoothing for nonparametric conditional quantile regression. Some theoretical properties of the procedure are investigated. Then we demonstrate the performance of the method on a simulated example and compare it with other methods. The simulation results demonstrate a reasonable performance of our method proposed especially in situations when the underlying image is piecewise linear or can be approximated by such images. Generally speaking, our method outperforms most other existing methods in the sense of the mean square estimation (MSE) and mean absolute estimation (MAE) criteria. The procedure is very stable with respect to increasing noise level and the algorithm can be easily applied to higher dimensional situations.

  15. Laser-induced Breakdown spectroscopy quantitative analysis method via adaptive analytical line selection and relevance vector machine regression model

    Yang, Jianhong, E-mail: [School of Mechanical Engineering, University of Science and Technology Beijing, Beijing, 100083 (China); Yi, Cancan; Xu, Jinwu [School of Mechanical Engineering, University of Science and Technology Beijing, Beijing, 100083 (China); Ma, Xianghong [School of Engineering and Applied Science, Aston University, Birmingham B4 7ET (United Kingdom)


    A new LIBS quantitative analysis method based on analytical line adaptive selection and Relevance Vector Machine (RVM) regression model is proposed. First, a scheme of adaptively selecting analytical line is put forward in order to overcome the drawback of high dependency on a priori knowledge. The candidate analytical lines are automatically selected based on the built-in characteristics of spectral lines, such as spectral intensity, wavelength and width at half height. The analytical lines which will be used as input variables of regression model are determined adaptively according to the samples for both training and testing. Second, an LIBS quantitative analysis method based on RVM is presented. The intensities of analytical lines and the elemental concentrations of certified standard samples are used to train the RVM regression model. The predicted elemental concentration analysis results will be given with a form of confidence interval of probabilistic distribution, which is helpful for evaluating the uncertainness contained in the measured spectra. Chromium concentration analysis experiments of 23 certified standard high-alloy steel samples have been carried out. The multiple correlation coefficient of the prediction was up to 98.85%, and the average relative error of the prediction was 4.01%. The experiment results showed that the proposed LIBS quantitative analysis method achieved better prediction accuracy and better modeling robustness compared with the methods based on partial least squares regression, artificial neural network and standard support vector machine. - Highlights: • Both training and testing samples are considered for analytical lines selection. • The analytical lines are auto-selected based on the built-in characteristics of spectral lines. • The new method can achieve better prediction accuracy and modeling robustness. • Model predictions are given with confidence interval of probabilistic distribution.

  16. Laser-induced Breakdown spectroscopy quantitative analysis method via adaptive analytical line selection and relevance vector machine regression model

    A new LIBS quantitative analysis method based on analytical line adaptive selection and Relevance Vector Machine (RVM) regression model is proposed. First, a scheme of adaptively selecting analytical line is put forward in order to overcome the drawback of high dependency on a priori knowledge. The candidate analytical lines are automatically selected based on the built-in characteristics of spectral lines, such as spectral intensity, wavelength and width at half height. The analytical lines which will be used as input variables of regression model are determined adaptively according to the samples for both training and testing. Second, an LIBS quantitative analysis method based on RVM is presented. The intensities of analytical lines and the elemental concentrations of certified standard samples are used to train the RVM regression model. The predicted elemental concentration analysis results will be given with a form of confidence interval of probabilistic distribution, which is helpful for evaluating the uncertainness contained in the measured spectra. Chromium concentration analysis experiments of 23 certified standard high-alloy steel samples have been carried out. The multiple correlation coefficient of the prediction was up to 98.85%, and the average relative error of the prediction was 4.01%. The experiment results showed that the proposed LIBS quantitative analysis method achieved better prediction accuracy and better modeling robustness compared with the methods based on partial least squares regression, artificial neural network and standard support vector machine. - Highlights: • Both training and testing samples are considered for analytical lines selection. • The analytical lines are auto-selected based on the built-in characteristics of spectral lines. • The new method can achieve better prediction accuracy and modeling robustness. • Model predictions are given with confidence interval of probabilistic distribution

  17. Recursive Gaussian Process Regression Model for Adaptive Quality Monitoring in Batch Processes

    Le Zhou


    Full Text Available In chemical batch processes with slow responses and a long duration, it is time-consuming and expensive to obtain sufficient normal data for statistical analysis. With the persistent accumulation of the newly evolving data, the modelling becomes adequate gradually and the subsequent batches will change slightly owing to the slow time-varying behavior. To efficiently make use of the small amount of initial data and the newly evolving data sets, an adaptive monitoring scheme based on the recursive Gaussian process (RGP model is designed in this paper. Based on the initial data, a Gaussian process model and the corresponding SPE statistic are constructed at first. When the new batches of data are included, a strategy based on the RGP model is used to choose the proper data for model updating. The performance of the proposed method is finally demonstrated by a penicillin fermentation batch process and the result indicates that the proposed monitoring scheme is effective for adaptive modelling and online monitoring.

  18. A Rapid Model Adaptation Technique for Emotional Speech Recognition with Style Estimation Based on Multiple-Regression HMM

    Ijima, Yusuke; Nose, Takashi; Tachibana, Makoto; Kobayashi, Takao

    In this paper, we propose a rapid model adaptation technique for emotional speech recognition which enables us to extract paralinguistic information as well as linguistic information contained in speech signals. This technique is based on style estimation and style adaptation using a multiple-regression HMM (MRHMM). In the MRHMM, the mean parameters of the output probability density function are controlled by a low-dimensional parameter vector, called a style vector, which corresponds to a set of the explanatory variables of the multiple regression. The recognition process consists of two stages. In the first stage, the style vector that represents the emotional expression category and the intensity of its expressiveness for the input speech is estimated on a sentence-by-sentence basis. Next, the acoustic models are adapted using the estimated style vector, and then standard HMM-based speech recognition is performed in the second stage. We assess the performance of the proposed technique in the recognition of simulated emotional speech uttered by both professional narrators and non-professional speakers.

  19. Survival Analysis with Multivariate adaptive Regression Splines

    Kriner, Monika


    Multivariate adaptive regression splines (MARS) are a useful tool to identify linear and nonlinear effects and interactions between two covariates. In this dissertation a new proposal to model survival type data with MARS is introduced. Martingale and deviance residuals of a Cox PH model are used as response in a common MARS approach to model functional forms of covariate effects as well as possible interactions in a data-driven way. Simulation studies prove that the new method yields a bett...

  20. Pan evaporation modeling using least square support vector machine, multivariate adaptive regression splines and M5 model tree

    Kisi, Ozgur


    Pan evaporation (Ep) modeling is an important issue in reservoir management, regional water resources planning and evaluation of drinking-water supplies. The main purpose of this study is to investigate the accuracy of least square support vector machine (LSSVM), multivariate adaptive regression splines (MARS) and M5 Model Tree (M5Tree) in modeling Ep. The first part of the study focused on testing the ability of the LSSVM, MARS and M5Tree models in estimating the Ep data of Mersin and Antalya stations located in Mediterranean Region of Turkey by using cross-validation method. The LSSVM models outperformed the MARS and M5Tree models in estimating Ep of Mersin and Antalya stations with local input and output data. The average root mean square error (RMSE) of the M5Tree and MARS models was decreased by 24-32.1% and 10.8-18.9% using LSSVM models for the Mersin and Antalya stations, respectively. The ability of three different methods was examined in estimation of Ep using input air temperature, solar radiation, relative humidity and wind speed data from nearby station in the second part of the study (cross-station application without local input data). The results showed that the MARS models provided better accuracy than the LSSVM and M5Tree models with respect to RMSE, mean absolute error (MAE) and determination coefficient (R2) criteria. The average RMSE accuracy of the LSSVM and M5Tree was increased by 3.7% and 16.5% using MARS. In the case of without local input data, the average RMSE accuracy of the LSSVM and M5Tree was respectively increased by 11.4% and 18.4% using MARS. In the third part of the study, the ability of the applied models was examined in Ep estimation using input and output data of nearby station. The results reported that the MARS models performed better than the other models with respect to RMSE, MAE and R2 criteria. The average RMSE of the LSSVM and M5Tree was respectively decreased by 54% and 3.4% using MARS. The overall results indicated that

  1. Flexible survival regression modelling

    Cortese, Giuliana; Scheike, Thomas H; Martinussen, Torben


    time-varying effects. The introduced models are all applied to data on breast cancer from the Norwegian cancer registry, and these analyses clearly reveal the shortcomings of Cox's regression model and the need for other supplementary analyses with models such as those we present here.......Regression analysis of survival data, and more generally event history data, is typically based on Cox's regression model. We here review some recent methodology, focusing on the limitations of Cox's regression model. The key limitation is that the model is not well suited to represent time...

  2. A New Predictive Model of Centerline Segregation in Continuous Cast Steel Slabs by Using Multivariate Adaptive Regression Splines Approach

    Paulino José García Nieto


    Full Text Available The aim of this study was to obtain a predictive model able to perform an early detection of central segregation severity in continuous cast steel slabs. Segregation in steel cast products is an internal defect that can be very harmful when slabs are rolled in heavy plate mills. In this research work, the central segregation was studied with success using the data mining methodology based on multivariate adaptive regression splines (MARS technique. For this purpose, the most important physical-chemical parameters are considered. The results of the present study are two-fold. In the first place, the significance of each physical-chemical variable on the segregation is presented through the model. Second, a model for forecasting segregation is obtained. Regression with optimal hyperparameters was performed and coefficients of determination equal to 0.93 for continuity factor estimation and 0.95 for average width were obtained when the MARS technique was applied to the experimental dataset, respectively. The agreement between experimental data and the model confirmed the good performance of the latter.

  3. Adaptive methods for flood forecasting using linear regression models in the upper basin of Senegal River

    In flood forecasting modelling, large basins are often considered as hydrological systems with multiple inputs and one output. Inputs are hydrological variables such rainfall, runoff and physical characteristics of basin; output is runoff. Relating inputs to output can be achieved using deterministic, conceptual, or stochastic models. Rainfall runoff models generally lack of accuracy. Physical hydrological processes based models, either deterministic or conceptual are highly data requirement demanding and by the way very complex. Stochastic multiple input-output models, using only historical chronicles of hydrological variables particularly runoff are by the way very popular among the hydrologists for large river basin flood forecasting. Application is made on the Senegal River upstream of Bakel, where the River is formed by the main branch, Bafing, and two tributaries, Bakoye and Faleme; Bafing being regulated by Manantaly Dam. A three inputs and one output model has been used for flood forecasting on Bakel. Influence of the lead forecasting, and of the three inputs taken separately, then associated two by two, and altogether has been verified using a dimensionless variance as criterion of quality. Inadequacies occur generally between model output and observations; to put model in better compliance with current observations, we have compared four parameter updating procedure, recursive least squares, Kalman filtering, stochastic gradient method, iterative method, and an AR errors forecasting model. A combination of these model updating have been used in real time flood forecasting.(Author)

  4. Adaptive regression modeling of biomarkers of potential harm in a population of U.S. adult cigarette smokers and nonsmokers

    Mendes Paul E


    Full Text Available Abstract Background This article describes the data mining analysis of a clinical exposure study of 3585 adult smokers and 1077 nonsmokers. The analysis focused on developing models for four biomarkers of potential harm (BOPH: white blood cell count (WBC, 24 h urine 8-epi-prostaglandin F2α (EPI8, 24 h urine 11-dehydro-thromboxane B2 (DEH11, and high-density lipoprotein cholesterol (HDL. Methods Random Forest was used for initial variable selection and Multivariate Adaptive Regression Spline was used for developing the final statistical models Results The analysis resulted in the generation of models that predict each of the BOPH as function of selected variables from the smokers and nonsmokers. The statistically significant variables in the models were: platelet count, hemoglobin, C-reactive protein, triglycerides, race and biomarkers of exposure to cigarette smoke for WBC (R-squared = 0.29; creatinine clearance, liver enzymes, weight, vitamin use and biomarkers of exposure for EPI8 (R-squared = 0.41; creatinine clearance, urine creatinine excretion, liver enzymes, use of Non-steroidal antiinflammatory drugs, vitamins and biomarkers of exposure for DEH11 (R-squared = 0.29; and triglycerides, weight, age, sex, alcohol consumption and biomarkers of exposure for HDL (R-squared = 0.39. Conclusions Levels of WBC, EPI8, DEH11 and HDL were statistically associated with biomarkers of exposure to cigarette smoking and demographics and life style factors. All of the predictors togather explain 29%-41% of the variability in the BOPH.

  5. On the interest of combining an analog model to a regression model for the adaptation of the downscaling link. Application to probabilistic prediction of precipitation over France.

    Chardon, Jérémy; Hingray, Benoit; Favre, Anne-Catherine


    Scenarios of surface weather required for the impact studies have to be unbiased and adapted to the space and time scales of the considered hydro-systems. Hence, surface weather scenarios obtained from global climate models and/or numerical weather prediction models are not really appropriated. Outputs of these models have to be post-processed, which is often carried out thanks to Statistical Downscaling Methods (SDMs). Among those SDMs, approaches based on regression are often applied. For a given station, a regression link can be established between a set of large scale atmospheric predictors and the surface weather variable. These links are then used for the prediction of the latter. However, physical processes generating surface weather vary in time. This is well known for precipitation for instance. The most relevant predictors and the regression link are also likely to vary in time. A better prediction skill is thus classically obtained with a seasonal stratification of the data. Another strategy is to identify the most relevant predictor set and establish the regression link from dates that are similar - or analog - to the target date. In practice, these dates can be selected thanks to an analog model. In this study, we explore the possibility of improving the local performance of an analog model - where the analogy is applied to the geopotential heights 1000 and 500 hPa - using additional local scale predictors for the probabilistic prediction of the Safran precipitation over France. For each prediction day, the prediction is obtained from two GLM regression models - for both the occurrence and the quantity of precipitation - for which predictors and parameters are estimated from the analog dates. Firstly, the resulting combined model noticeably allows increasing the prediction performance by adapting the downscaling link for each prediction day. Secondly, the selected predictors for a given prediction depend on the large scale situation and on the


    Constanţa-Nicoleta BODEA


    Full Text Available In this communication we will discuss two regression credibility models from Non – Life Insurance Mathematics that can be solved by means of matrix theory. In the first regression credibility model, starting from a well-known representation formula of the inverse for a special class of matrices a risk premium will be calculated for a contract with risk parameter θ. In the next regression credibility model, we will obtain a credibility solution in the form of a linear combination of the individual estimate (based on the data of a particular state and the collective estimate (based on aggregate USA data. To illustrate the solution with the properties mentioned above, we shall need the well-known representation theorem for a special class of matrices, the properties of the trace for a square matrix, the scalar product of two vectors, the norm with respect to a positive definite matrix given in advance and the complicated mathematical properties of conditional expectations and of conditional covariances.

  7. Stock price forecasting for companies listed on Tehran stock exchange using multivariate adaptive regression splines model and semi-parametric splines technique

    Rounaghi, Mohammad Mahdi; Abbaszadeh, Mohammad Reza; Arashi, Mohammad


    One of the most important topics of interest to investors is stock price changes. Investors whose goals are long term are sensitive to stock price and its changes and react to them. In this regard, we used multivariate adaptive regression splines (MARS) model and semi-parametric splines technique for predicting stock price in this study. The MARS model as a nonparametric method is an adaptive method for regression and it fits for problems with high dimensions and several variables. semi-parametric splines technique was used in this study. Smoothing splines is a nonparametric regression method. In this study, we used 40 variables (30 accounting variables and 10 economic variables) for predicting stock price using the MARS model and using semi-parametric splines technique. After investigating the models, we select 4 accounting variables (book value per share, predicted earnings per share, P/E ratio and risk) as influencing variables on predicting stock price using the MARS model. After fitting the semi-parametric splines technique, only 4 accounting variables (dividends, net EPS, EPS Forecast and P/E Ratio) were selected as variables effective in forecasting stock prices.

  8. A New Predictive Model Based on the ABC Optimized Multivariate Adaptive Regression Splines Approach for Predicting the Remaining Useful Life in Aircraft Engines

    Paulino José García Nieto


    Full Text Available Remaining useful life (RUL estimation is considered as one of the most central points in the prognostics and health management (PHM. The present paper describes a nonlinear hybrid ABC–MARS-based model for the prediction of the remaining useful life of aircraft engines. Indeed, it is well-known that an accurate RUL estimation allows failure prevention in a more controllable way so that the effective maintenance can be carried out in appropriate time to correct impending faults. The proposed hybrid model combines multivariate adaptive regression splines (MARS, which have been successfully adopted for regression problems, with the artificial bee colony (ABC technique. This optimization technique involves parameter setting in the MARS training procedure, which significantly influences the regression accuracy. However, its use in reliability applications has not yet been widely explored. Bearing this in mind, remaining useful life values have been predicted here by using the hybrid ABC–MARS-based model from the remaining measured parameters (input variables for aircraft engines with success. A correlation coefficient equal to 0.92 was obtained when this hybrid ABC–MARS-based model was applied to experimental data. The agreement of this model with experimental data confirmed its good performance. The main advantage of this predictive model is that it does not require information about the previous operation states of the aircraft engine.

  9. Adaptive Rank Penalized Estimators in Multivariate Regression

    Bunea, Florentina; Wegkamp, Marten


    We introduce a new criterion, the Rank Selection Criterion (RSC), for selecting the optimal reduced rank estimator of the coefficient matrix in multivariate response regression models. The corresponding RSC estimator minimizes the Frobenius norm of the fit plus a regularization term proportional to the number of parameters in the reduced rank model. The rank of the RSC estimator provides a consistent estimator of the rank of the coefficient matrix. The consistency results are valid not only in the classic asymptotic regime, when the number of responses $n$ and predictors $p$ stays bounded, and the number of observations $m$ grows, but also when either, or both, $n$ and $p$ grow, possibly much faster than $m$. Our finite sample prediction and estimation performance bounds show that the RSC estimator achieves the optimal balance between the approximation error and the penalty term. Furthermore, our procedure has very low computational complexity, linear in the number of candidate models, making it particularly ...

  10. Application of least square support vector machine and multivariate adaptive regression spline models in long term prediction of river water pollution

    Kisi, Ozgur; Parmar, Kulwinder Singh


    This study investigates the accuracy of least square support vector machine (LSSVM), multivariate adaptive regression splines (MARS) and M5 model tree (M5Tree) in modeling river water pollution. Various combinations of water quality parameters, Free Ammonia (AMM), Total Kjeldahl Nitrogen (TKN), Water Temperature (WT), Total Coliform (TC), Fecal Coliform (FC) and Potential of Hydrogen (pH) monitored at Nizamuddin, Delhi Yamuna River in India were used as inputs to the applied models. Results indicated that the LSSVM and MARS models had almost same accuracy and they performed better than the M5Tree model in modeling monthly chemical oxygen demand (COD). The average root mean square error (RMSE) of the LSSVM and M5Tree models was decreased by 1.47% and 19.1% using MARS model, respectively. Adding TC input to the models did not increase their accuracy in modeling COD while adding FC and pH inputs to the models generally decreased the accuracy. The overall results indicated that the MARS and LSSVM models could be successfully used in estimating monthly river water pollution level by using AMM, TKN and WT parameters as inputs.

  11. Forecasting with Dynamic Regression Models

    Pankratz, Alan


    One of the most widely used tools in statistical forecasting, single equation regression models is examined here. A companion to the author's earlier work, Forecasting with Univariate Box-Jenkins Models: Concepts and Cases, the present text pulls together recent time series ideas and gives special attention to possible intertemporal patterns, distributed lag responses of output to input series and the auto correlation patterns of regression disturbance. It also includes six case studies.

  12. Survival Data and Regression Models

    Grégoire, G.


    We start this chapter by introducing some basic elements for the analysis of censored survival data. Then we focus on right censored data and develop two types of regression models. The first one concerns the so-called accelerated failure time models (AFT), which are parametric models where a function of a parameter depends linearly on the covariables. The second one is a semiparametric model, where the covariables enter in a multiplicative form in the expression of the hazard rate function. The main statistical tool for analysing these regression models is the maximum likelihood methodology and, in spite we recall some essential results about the ML theory, we refer to the chapter "Logistic Regression" for a more detailed presentation.

  13. Prediction of longitudinal dispersion coefficient using multivariate adaptive regression splines

    Amir Hamzeh Haghiabi


    In this paper, multivariate adaptive regression splines (MARS) was developed as a novel soft-computingtechnique for predicting longitudinal dispersion coefficient (DL) in rivers. As mentioned in the literature,experimental dataset related to DL was collected and used for preparing MARS model. Results of MARSmodel were compared with multi-layer neural network model and empirical formulas. To define the mosteffective parameters on DL, the Gamma test was used. Performance of MARS model was assessed bycalculation of standard error indices. Error indices showed that MARS model has suitable performanceand is more accurate compared to multi-layer neural network model and empirical formulas. Results ofthe Gamma test and MARS model showed that flow depth (H) and ratio of the mean velocity to shearvelocity (u/u^∗) were the most effective parameters on the DL.

  14. Fuzzy linear regression forecasting models

    吴冲; 惠晓峰; 朱洪文


    The fuzzy linear regression forecasting model is deduced from the symmetric triangular fuzzy number.With the help of the degree of fitting and the measure of fuzziness, the determination of symmetric triangularfuzzy numbers is changed into a problem of solving linear programming.

  15. Heteroscedasticity checks for regression models


    For checking on heteroscedasticity in regression models, a unified approach is proposed to constructing test statistics in parametric and nonparametric regression models. For nonparametric regression, the test is not affected sensitively by the choice of smoothing parameters which are involved in estimation of the nonparametric regression function. The limiting null distribution of the test statistic remains the same in a wide range of the smoothing parameters. When the covariate is one-dimensional, the tests are, under some conditions, asymptotically distribution-free. In the high-dimensional cases, the validity of bootstrap approximations is investigated. It is shown that a variant of the wild bootstrap is consistent while the classical bootstrap is not in the general case, but is applicable if some extra assumption on conditional variance of the squared error is imposed. A simulation study is performed to provide evidence of how the tests work and compare with tests that have appeared in the literature. The approach may readily be extended to handle partial linear, and linear autoregressive models.

  16. Heteroscedastic transformation cure regression models.

    Chen, Chyong-Mei; Chen, Chen-Hsin


    Cure models have been applied to analyze clinical trials with cures and age-at-onset studies with nonsusceptibility. Lu and Ying (On semiparametric transformation cure model. Biometrika 2004; 91:331?-343. DOI: 10.1093/biomet/91.2.331) developed a general class of semiparametric transformation cure models, which assumes that the failure times of uncured subjects, after an unknown monotone transformation, follow a regression model with homoscedastic residuals. However, it cannot deal with frequently encountered heteroscedasticity, which may result from dispersed ranges of failure time span among uncured subjects' strata. To tackle the phenomenon, this article presents semiparametric heteroscedastic transformation cure models. The cure status and the failure time of an uncured subject are fitted by a logistic regression model and a heteroscedastic transformation model, respectively. Unlike the approach of Lu and Ying, we derive score equations from the full likelihood for estimating the regression parameters in the proposed model. The similar martingale difference function to their proposal is used to estimate the infinite-dimensional transformation function. Our proposed estimating approach is intuitively applicable and can be conveniently extended to other complicated models when the maximization of the likelihood may be too tedious to be implemented. We conduct simulation studies to validate large-sample properties of the proposed estimators and to compare with the approach of Lu and Ying via the relative efficiency. The estimating method and the two relevant goodness-of-fit graphical procedures are illustrated by using breast cancer data and melanoma data. Copyright © 2016 John Wiley & Sons, Ltd. PMID:26887342

  17. Heteroscedasticity checks for regression models

    ZHU; Lixing


    [1]Carroll, R. J., Ruppert, D., Transformation and Weighting in Regression, New York: Chapman and Hall, 1988.[2]Cook, R. D., Weisberg, S., Diagnostics for heteroscedasticity in regression, Biometrika, 1988, 70: 1—10.[3]Davidian, M., Carroll, R. J., Variance function estimation, J. Amer. Statist. Assoc., 1987, 82: 1079—1091.[4]Bickel, P., Using residuals robustly I: Tests for heteroscedasticity, Ann. Statist., 1978, 6: 266—291.[5]Carroll, R. J., Ruppert, D., On robust tests for heteroscedasticity, Ann. Statist., 1981, 9: 205—209.[6]Eubank, R. L., Thomas, W., Detecting heteroscedasticity in nonparametric regression, J. Roy. Statist. Soc., Ser. B, 1993, 55: 145—155.[7]Diblasi, A., Bowman, A., Testing for constant variance in a linear model, Statist. and Probab. Letters, 1997, 33: 95—103.[8]Dette, H., Munk, A., Testing heteoscedasticity in nonparametric regression, J. R. Statist. Soc. B, 1998, 60: 693—708.[9]Müller, H. G., Zhao, P. L., On a semi-parametric variance function model and a test for heteroscedasticity, Ann. Statist., 1995, 23: 946—967.[10]Stute, W., Manteiga, G., Quindimil, M. P., Bootstrap approximations in model checks for regression, J. Amer. Statist. Asso., 1998, 93: 141—149.[11]Stute, W., Thies, G., Zhu, L. X., Model checks for regression: An innovation approach, Ann. Statist., 1998, 26: 1916—1939.[12]Shorack, G. R., Wellner, J. A., Empirical Processes with Applications to Statistics, New York: Wiley, 1986.[13]Efron, B., Bootstrap methods: Another look at the jackknife, Ann. Statist., 1979, 7: 1—26.[14]Wu, C. F. J., Jackknife, bootstrap and other re-sampling methods in regression analysis, Ann. Statist., 1986, 14: 1261—1295.[15]H rdle, W., Mammen, E., Comparing non-parametric versus parametric regression fits, Ann. Statist., 1993, 21: 1926—1947.[16]Liu, R. Y., Bootstrap procedures under some non-i.i.d. models, Ann. Statist., 1988, 16: 1696—1708.[17


    ZHAO Lincheng; FANG Yixin


    Rao and Zhao (1992) used random weighting method to derive the approximate distribution of the M-estimator in linear regression model. In this paper we extend the result to the censored regression model (or censored "Tobit" model).

  19. Regression Models for Market-Shares

    Birch, Kristina; Olsen, Jørgen Kai; Tjur, Tue


    On the background of a data set of weekly sales and prices for three brands of coffee, this paper discusses various regression models and their relation to the multiplicative competitive-interaction model (the MCI model, see Cooper 1988, 1993) for market-shares. Emphasis is put on the...... interpretation of the parameters in relation to models for the total sales based on discrete choice models.Key words and phrases. MCI model, discrete choice model, market-shares, price elasitcity, regression model....

  20. The Infinite Hierarchical Factor Regression Model

    Rai, Piyush


    We propose a nonparametric Bayesian factor regression model that accounts for uncertainty in the number of factors, and the relationship between factors. To accomplish this, we propose a sparse variant of the Indian Buffet Process and couple this with a hierarchical model over factors, based on Kingman's coalescent. We apply this model to two problems (factor analysis and factor regression) in gene-expression data analysis.

  1. Regression modeling methods, theory, and computation with SAS

    Panik, Michael


    Regression Modeling: Methods, Theory, and Computation with SAS provides an introduction to a diverse assortment of regression techniques using SAS to solve a wide variety of regression problems. The author fully documents the SAS programs and thoroughly explains the output produced by the programs.The text presents the popular ordinary least squares (OLS) approach before introducing many alternative regression methods. It covers nonparametric regression, logistic regression (including Poisson regression), Bayesian regression, robust regression, fuzzy regression, random coefficients regression,

  2. Modelling of filariasis in East Java with Poisson regression and generalized Poisson regression models



    Poisson regression has been used if the response variable is count data that based on the Poisson distribution. The Poisson distribution assumed equal dispersion. In fact, a situation where count data are over dispersion or under dispersion so that Poisson regression inappropriate because it may underestimate the standard errors and overstate the significance of the regression parameters, and consequently, giving misleading inference about the regression parameters. This paper suggests the generalized Poisson regression model to handling over dispersion and under dispersion on the Poisson regression model. The Poisson regression model and generalized Poisson regression model will be applied the number of filariasis cases in East Java. Based regression Poisson model the factors influence of filariasis are the percentage of families who don't behave clean and healthy living and the percentage of families who don't have a healthy house. The Poisson regression model occurs over dispersion so that we using generalized Poisson regression. The best generalized Poisson regression model showing the factor influence of filariasis is percentage of families who don't have healthy house. Interpretation of result the model is each additional 1 percentage of families who don't have healthy house will add 1 people filariasis patient.

  3. Applied Regression Modeling A Business Approach

    Pardoe, Iain


    An applied and concise treatment of statistical regression techniques for business students and professionals who have little or no background in calculusRegression analysis is an invaluable statistical methodology in business settings and is vital to model the relationship between a response variable and one or more predictor variables, as well as the prediction of a response value given values of the predictors. In view of the inherent uncertainty of business processes, such as the volatility of consumer spending and the presence of market uncertainty, business professionals use regression a

  4. Speaker adaptation of HMMs using evolutionary strategy-based linear regression

    Selouani, Sid-Ahmed; O'Shaughnessy, Douglas


    A new framework for speaker adaptation of continuous-density hidden Markov models (HMMs) is introduced. It aims to improve the robustness of speech recognizers by adapting HMM parameters to new conditions (e.g., from new speakers). It describes an optimization technique using an evolutionary strategy for linear regression-based spectral transformation. In classical iterative maximum likelihood linear regression (MLLR), a global transform matrix is estimated to make a general model better match particular target conditions. To permit adaptation on a small amount of data, a regression tree classification is performed. However, an important drawback of MLLR is that the number of regression classes is fixed. The new approach allows the degree of freedom of the global transform to be implicitly variable, as the evolutionary optimization permits the survival of only active classes. The fitness function is evaluated by the phoneme correctness through the evolution steps. The implementation requirements such as chromosome representation, selection function, genetic operators, and evaluation function have been chosen in order to lend more reliability to the global transformation matrix. Triphone experiments used the TIMIT and ARPA-RM1 databases. For new speakers, the new technique achieves 8 percent fewer word errors than the basic MLLR method.

  5. Nonparametric and semiparametric dynamic additive regression models

    Scheike, Thomas Harder; Martinussen, Torben

    Dynamic additive regression models provide a flexible class of models for analysis of longitudinal data. The approach suggested in this work is suited for measurements obtained at random time points and aims at estimating time-varying effects. Both fully nonparametric and semiparametric models can...

  6. A simple bivariate count data regression model

    Shiferaw Gurmu; John Elder


    This paper develops a simple bivariate count data regression model in which dependence between count variables is introduced by means of stochastically related unobserved heterogeneity components. Unlike existing commonly used bivariate models, we obtain a computationally simple closed form of the model with an unrestricted correlation pattern.

  7. A new bivariate negative binomial regression model

    Faroughi, Pouya; Ismail, Noriszura


    This paper introduces a new form of bivariate negative binomial (BNB-1) regression which can be fitted to bivariate and correlated count data with covariates. The BNB regression discussed in this study can be fitted to bivariate and overdispersed count data with positive, zero or negative correlations. The joint p.m.f. of the BNB1 distribution is derived from the product of two negative binomial marginals with a multiplicative factor parameter. Several testing methods were used to check overdispersion and goodness-of-fit of the model. Application of BNB-1 regression is illustrated on Malaysian motor insurance dataset. The results indicated that BNB-1 regression has better fit than bivariate Poisson and BNB-2 models with regards to Akaike information criterion.

  8. An Application on Multinomial Logistic Regression Model

    Abdalla M El-Habil


    This study aims to identify an application of Multinomial Logistic Regression model which is one of the important methods for categorical data analysis. This model deals with one nominal/ordinal response variable that has more than two categories, whether nominal or ordinal variable. This model has been applied in data analysis in many areas, for example health, social, behavioral, and educational.To identify the model by practical way, we used real data on physical violence against children...

  9. Empirical Bayes Estimation in Regression Model

    Li-chun Wang


    This paper considers the empirical Bayes (EB) estimation problem for the parameterβ of the linear regression model y = Xβ + ε with ε~ N(0, σ2I) givenβ. Based on Pitman closeness (PC) criterion and mean square error matrix (MSEM) criterion, we prove the superiority of the EB estimator over the ordinary least square estimator (OLSE).

  10. Bootstrap inference longitudinal semiparametric regression model

    Pane, Rahmawati; Otok, Bambang Widjanarko; Zain, Ismaini; Budiantara, I. Nyoman


    Semiparametric regression contains two components, i.e. parametric and nonparametric component. Semiparametric regression model is represented by yt i=μ (x˜'ti,zt i)+εt i where μ (x˜'ti,zt i)=x˜'tiβ ˜+g (zt i) and yti is response variable. It is assumed to have a linear relationship with the predictor variables x˜'ti=(x1 i 1,x2 i 2,…,xT i r) . Random error εti, i = 1, …, n, t = 1, …, T is normally distributed with zero mean and variance σ2 and g(zti) is a nonparametric component. The results of this study showed that the PLS approach on longitudinal semiparametric regression models obtain estimators β˜^t=[X'H(λ)X]-1X'H(λ )y ˜ and g˜^λ(z )=M (λ )y ˜ . The result also show that bootstrap was valid on longitudinal semiparametric regression model with g^λ(b )(z ) as nonparametric component estimator.

  11. Validation of a heteroscedastic hazards regression model.

    Wu, Hong-Dar Isaac; Hsieh, Fushing; Chen, Chen-Hsin


    A Cox-type regression model accommodating heteroscedasticity, with a power factor of the baseline cumulative hazard, is investigated for analyzing data with crossing hazards behavior. Since the approach of partial likelihood cannot eliminate the baseline hazard, an overidentified estimating equation (OEE) approach is introduced in the estimation procedure. It by-product, a model checking statistic, is presented to test for the overall adequacy of the heteroscedastic model. Further, under the heteroscedastic model setting, we propose two statistics to test the proportional hazards assumption. Implementation of this model is illustrated in a data analysis of a cancer clinical trial. PMID:11878222

  12. Multiple Imputations for LInear Regression Models

    Brownstone, David


    Rubin (1987) has proposed multiple imputations as a general method for estimation in the presence of missing data. Rubin’s results only strictly apply to Bayesian models, but Schenker and Welsh (1988) directly prove the consistency  multiple imputations inference~ when there are missing values of the dependent variable in linear regression models. This paper extends and modifies Schenker and Welsh’s theorems to give conditions where multiple imputations yield consistent inferences for bo...

  13. Identification of regression models - application in traffic

    Dohnal, Pavel

    Ljubljana : Jozef Stefan Institute, 2005, s. 1-5. [International PhD Workshop on Systems and Control a Young Generation Viewpoint /6./. Izola (SI), 04.10.2005-08.10.2005] R&D Projects: GA MŠk(CZ) 1M0572 Institutional research plan: CEZ:AV0Z10750506 Keywords : regression model * model order * intensity of traffic flow * prediction Subject RIV: BC - Control Systems Theory

  14. Modeling and simulation of output power of a high-power He-SrBr2 laser by using multivariate adaptive regression splines

    Iliev, I. P.; Gocheva-Ilieva, S. G.


    Due to advancement in computing technology researchers are focusing onto novel predictive models and techniques for improving the design and performance of the devices. This study examines a recently developed high-powered SrBr2 laser excited in a nanosecond pulse longitudinal He-SrBr2 discharge. Based on the accumulated experiment data, a new approach is proposed for determining the relationship between laser output power and basic laser characteristics: geometric design, supplied electric power, helium pressure, etc. Piece-wise linear and nonlinear statistical models have been built with the help of the flexible predictive MARS technique. It is shown that the best nonlinear MARS model containing second degree terms provides the best description of the examined data, demonstrating a coefficient of determination of over 99%. The resulting theoretical models are used to estimate and predict the experiment as well as to analyze the local behavior of the relationships between laser output power and input laser characteristics. The second order model is applied to the design and optimization of the laser in order to enhance laser generation.

  15. Estimation of a semiparametric contaminated regression model

    Vandekerkhove, Pierre


    We consider in this paper a contamined regression model where the distribution of the contaminating component is known when the Eu- clidean parameters of the regression model, the noise distribution, the contamination ratio and the distribution of the design data are un- known. Our model is said to be semiparametric in the sense that the probability density function (pdf) of the noise involved in the regression model is not supposed to belong to a parametric density family. When the pdf's of the noise and the contaminating phenomenon are supposed to be symmetric about zero, we propose an estimator of the various (Eu- clidean and functionnal) parameters of the model, and prove under mild conditions its convergence. We prove in particular that, under technical conditions all satisfied in the Gaussian case, the Euclidean part of the model is estimated at the rate $o_{a.s}(n-1/4+\\gamma), $\\gamma> 0$. We recall that, as it is pointed out in Bordes and Vandekerkhove (2010), this result cannot be ignored to go furth...

  16. Modeling oil production based on symbolic regression

    Numerous models have been proposed to forecast the future trends of oil production and almost all of them are based on some predefined assumptions with various uncertainties. In this study, we propose a novel data-driven approach that uses symbolic regression to model oil production. We validate our approach on both synthetic and real data, and the results prove that symbolic regression could effectively identify the true models beneath the oil production data and also make reliable predictions. Symbolic regression indicates that world oil production will peak in 2021, which broadly agrees with other techniques used by researchers. Our results also show that the rate of decline after the peak is almost half the rate of increase before the peak, and it takes nearly 12 years to drop 4% from the peak. These predictions are more optimistic than those in several other reports, and the smoother decline will provide the world, especially the developing countries, with more time to orchestrate mitigation plans. -- Highlights: •A data-driven approach has been shown to be effective at modeling the oil production. •The Hubbert model could be discovered automatically from data. •The peak of world oil production is predicted to appear in 2021. •The decline rate after peak is half of the increase rate before peak. •Oil production projected to decline 4% post-peak

  17. Efficient robust nonparametric estimation in a semimartingale regression model

    Konev, Victor


    The paper considers the problem of robust estimating a periodic function in a continuous time regression model with dependent disturbances given by a general square integrable semimartingale with unknown distribution. An example of such a noise is non-gaussian Ornstein-Uhlenbeck process with the L\\'evy process subordinator, which is used to model the financial Black-Scholes type markets with jumps. An adaptive model selection procedure, based on the weighted least square estimates, is proposed. Under general moment conditions on the noise distribution, sharp non-asymptotic oracle inequalities for the robust risks have been derived and the robust efficiency of the model selection procedure has been shown.

  18. Predictive densities for day-ahead electricity prices using time-adaptive quantile regression

    Jónsson, Tryggvi; Pinson, Pierre; Madsen, Henrik;


    A large part of the decision-making problems actors of the power system are facing on a daily basis requires scenarios for day-ahead electricity market prices. These scenarios are most likely to be generated based on marginal predictive densities for such prices, then enhanced with a temporal...... dependence structure. A semi-parametric methodology for generating such densities is presented: it includes: (i) a time-adaptive quantile regression model for the 5%–95% quantiles; and (ii) a description of the distribution tails with exponential distributions. The forecasting skill of the proposed model is...

  19. An Application on Multinomial Logistic Regression Model

    Abdalla M El-Habil


    Full Text Available Normal 0 false false false EN-US X-NONE X-NONE This study aims to identify an application of Multinomial Logistic Regression model which is one of the important methods for categorical data analysis. This model deals with one nominal/ordinal response variable that has more than two categories, whether nominal or ordinal variable. This model has been applied in data analysis in many areas, for example health, social, behavioral, and educational.To identify the model by practical way, we used real data on physical violence against children, from a survey of Youth 2003 which was conducted by Palestinian Central Bureau of Statistics (PCBS. Segment of the population of children in the age group (10-14 years for residents in Gaza governorate, size of 66,935 had been selected, and the response variable consisted of four categories. Eighteen of explanatory variables were used for building the primary multinomial logistic regression model. Model had been tested through a set of statistical tests to ensure its appropriateness for the data. Also the model had been tested by selecting randomly of two observations of the data used to predict the position of each observation in any classified group it can be, by knowing the values of the explanatory variables used. We concluded by using the multinomial logistic regression model that we can able to define accurately the relationship between the group of explanatory variables and the response variable, identify the effect of each of the variables, and we can predict the classification of any individual case.

  20. General regression and representation model for classification.

    Jianjun Qian

    Full Text Available Recently, the regularized coding-based classification methods (e.g. SRC and CRC show a great potential for pattern classification. However, most existing coding methods assume that the representation residuals are uncorrelated. In real-world applications, this assumption does not hold. In this paper, we take account of the correlations of the representation residuals and develop a general regression and representation model (GRR for classification. GRR not only has advantages of CRC, but also takes full use of the prior information (e.g. the correlations between representation residuals and representation coefficients and the specific information (weight matrix of image pixels to enhance the classification performance. GRR uses the generalized Tikhonov regularization and K Nearest Neighbors to learn the prior information from the training data. Meanwhile, the specific information is obtained by using an iterative algorithm to update the feature (or image pixel weights of the test sample. With the proposed model as a platform, we design two classifiers: basic general regression and representation classifier (B-GRR and robust general regression and representation classifier (R-GRR. The experimental results demonstrate the performance advantages of proposed methods over state-of-the-art algorithms.

  1. Bayesian Inference of a Multivariate Regression Model

    Marick S. Sinay


    Full Text Available We explore Bayesian inference of a multivariate linear regression model with use of a flexible prior for the covariance structure. The commonly adopted Bayesian setup involves the conjugate prior, multivariate normal distribution for the regression coefficients and inverse Wishart specification for the covariance matrix. Here we depart from this approach and propose a novel Bayesian estimator for the covariance. A multivariate normal prior for the unique elements of the matrix logarithm of the covariance matrix is considered. Such structure allows for a richer class of prior distributions for the covariance, with respect to strength of beliefs in prior location hyperparameters, as well as the added ability, to model potential correlation amongst the covariance structure. The posterior moments of all relevant parameters of interest are calculated based upon numerical results via a Markov chain Monte Carlo procedure. The Metropolis-Hastings-within-Gibbs algorithm is invoked to account for the construction of a proposal density that closely matches the shape of the target posterior distribution. As an application of the proposed technique, we investigate a multiple regression based upon the 1980 High School and Beyond Survey.

  2. An operational GLS model for hydrologic regression

    Tasker, Gary D.; Stedinger, J.R.


    Recent Monte Carlo studies have documented the value of generalized least squares (GLS) procedures to estimate empirical relationships between streamflow statistics and physiographic basin characteristics. This paper presents a number of extensions of the GLS method that deal with realities and complexities of regional hydrologic data sets that were not addressed in the simulation studies. These extensions include: (1) a more realistic model of the underlying model errors; (2) smoothed estimates of cross correlation of flows; (3) procedures for including historical flow data; (4) diagnostic statistics describing leverage and influence for GLS regression; and (5) the formulation of a mathematical program for evaluating future gaging activities. ?? 1989.

  3. Regression models for expected length of stay.

    Grand, Mia Klinten; Putter, Hein


    In multi-state models, the expected length of stay (ELOS) in a state is not a straightforward object to relate to covariates, and the traditional approach has instead been to construct regression models for the transition intensities and calculate ELOS from these. The disadvantage of this approach is that the effect of covariates on the intensities is not easily translated into the effect on ELOS, and it typically relies on the Markov assumption. We propose to use pseudo-observations to construct regression models for ELOS, thereby allowing a direct interpretation of covariate effects while at the same time avoiding the Markov assumption. For this approach, all we need is a non-parametric consistent estimator for ELOS. For every subject (and for every state of interest), a pseudo-observation is constructed, and they are then used as outcome variables in the regression model. We furthermore show how to construct longitudinal (pseudo-) data when combining the concept of pseudo-observations with landmarking. In doing so, covariates are allowed to be time-varying, and we can investigate potential time-varying effects of the covariates. The models can be fitted using generalized estimating equations, and dependence between observations on the same subject is handled by applying the sandwich estimator. The method is illustrated using data from the US Health and Retirement Study where the impact of socio-economic factors on ELOS in health and disability is explored. Finally, we investigate the performance of our approach under different degrees of left-truncation, non-Markovianity, and right-censoring by means of simulation. PMID:26497637

  4. An Adaptive Support Vector Regression Machine for the State Prognosis of Mechanical Systems

    Qing Zhang


    Full Text Available Due to the unsteady state evolution of mechanical systems, the time series of state indicators exhibits volatile behavior and staged characteristics. To model hidden trends and predict deterioration failure utilizing volatile state indicators, an adaptive support vector regression (ASVR machine is proposed. In ASVR, the width of an error-insensitive tube, which is a constant in the traditional support vector regression, is set as a variable determined by the transient distribution boundary of local regions in the training time series. Thus, the localized regions are obtained using a sliding time window, and their boundaries are defined by a robust measure known as the truncated range. Utilizing an adaptive error-insensitive tube, a stabilized tolerance level for noise is achieved, whether the time series occurs in low-volatility regions or in high-volatility regions. The proposed method is evaluated by vibrational data measured on descaling pumps. The results show that ASVR is capable of capturing the local trends of the volatile time series of state indicators and is superior to the standard support vector regression for state prediction.

  5. Hierarchical linear regression models for conditional quantiles

    TIAN; Maozai


    The quantile regression has several useful features and therefore is gradually developing into a comprehensive approach to the statistical analysis of linear and nonlinear response models,but it cannot deal effectively with the data with a hierarchical structure.In practice,the existence of such data hierarchies is neither accidental nor ignorable,it is a common phenomenon.To ignore this hierarchical data structure risks overlooking the importance of group effects,and may also render many of the traditional statistical analysis techniques used for studying data relationships invalid.On the other hand,the hierarchical models take a hierarchical data structure into account and have also many applications in statistics,ranging from overdispersion to constructing min-max estimators.However,the hierarchical models are virtually the mean regression,therefore,they cannot be used to characterize the entire conditional distribution of a dependent variable given high-dimensional covariates.Furthermore,the estimated coefficient vector (marginal effects)is sensitive to an outlier observation on the dependent variable.In this article,a new approach,which is based on the Gauss-Seidel iteration and taking a full advantage of the quantile regression and hierarchical models,is developed.On the theoretical front,we also consider the asymptotic properties of the new method,obtaining the simple conditions for an n1/2-convergence and an asymptotic normality.We also illustrate the use of the technique with the real educational data which is hierarchical and how the results can be explained.

  6. Quantile regression modeling for Malaysian automobile insurance premium data

    Fuzi, Mohd Fadzli Mohd; Ismail, Noriszura; Jemain, Abd Aziz


    Quantile regression is a robust regression to outliers compared to mean regression models. Traditional mean regression models like Generalized Linear Model (GLM) are not able to capture the entire distribution of premium data. In this paper we demonstrate how a quantile regression approach can be used to model net premium data to study the effects of change in the estimates of regression parameters (rating classes) on the magnitude of response variable (pure premium). We then compare the results of quantile regression model with Gamma regression model. The results from quantile regression show that some rating classes increase as quantile increases and some decrease with decreasing quantile. Further, we found that the confidence interval of median regression (τ = O.5) is always smaller than Gamma regression in all risk factors.

  7. Regression models for convex ROC curves.

    Lloyd, C J


    The performance of a diagnostic test is summarized by its receiver operating characteristic (ROC) curve. Under quite natural assumptions about the latent variable underlying the test, the ROC curve is convex. Empirical data on a test's performance often comes in the form of observed true positive and false positive relative frequencies under varying conditions. This paper describes a family of regression models for analyzing such data. The underlying ROC curves are specified by a quality parameter delta and a shape parameter mu and are guaranteed to be convex provided delta > 1. Both the position along the ROC curve and the quality parameter delta are modeled linearly with covariates at the level of the individual. The shape parameter mu enters the model through the link functions log(p mu) - log(1 - p mu) of a binomial regression and is estimated either by search or from an appropriate constructed variate. One simple application is to the meta-analysis of independent studies of the same diagnostic test, illustrated on some data of Moses, Shapiro, and Littenberg (1993). A second application, to so-called vigilance data, is given, where ROC curves differ across subjects and modeling of the position along the ROC curve is of primary interest. PMID:10985227

  8. Regression Models For Saffron Yields in Iran

    S. H, Sanaeinejad; S. N, Hosseini

    Saffron is an important crop in social and economical aspects in Khorassan Province (Northeast of Iran). In this research wetried to evaluate trends of saffron yield in recent years and to study the relationship between saffron yield and the climate change. A regression analysis was used to predict saffron yield based on 20 years of yield data in Birjand, Ghaen and Ferdows cities.Climatologically data for the same periods was provided by database of Khorassan Climatology Center. Climatologically data includedtemperature, rainfall, relative humidity and sunshine hours for ModelI, and temperature and rainfall for Model II. The results showed the coefficients of determination for Birjand, Ferdows and Ghaen for Model I were 0.69, 0.50 and 0.81 respectively. Also coefficients of determination for the same cities for model II were 0.53, 0.50 and 0.72 respectively. Multiple regression analysisindicated that among weather variables, temperature was the key parameter for variation ofsaffron yield. It was concluded that increasing temperature at spring was the main cause of declined saffron yield during recent years across the province. Finally, yield trend was predicted for the last 5 years using time series analysis.

  9. An Additive-Multiplicative Cox-Aalen Regression Model

    Scheike, Thomas H.; Zhang, Mei-Jie


    Aalen model; additive risk model; counting processes; Cox regression; survival analysis; time-varying effects......Aalen model; additive risk model; counting processes; Cox regression; survival analysis; time-varying effects...

  10. Inferring gene regression networks with model trees

    Aguilar-Ruiz Jesus S


    Full Text Available Abstract Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear

  11. Multiple Linear Regression Models in Outlier Detection

    S.M.A.Khaleelur Rahman


    Full Text Available Identifying anomalous values in the real-world database is important both for improving the quality of original data and for reducing the impact of anomalous values in the process of knowledge discovery in databases. Such anomalous values give useful information to the data analyst in discovering useful patterns. Through isolation, these data may be separated and analyzed. The analysis of outliers and influential points is an important step of the regression diagnostics. In this paper, our aim is to detect the points which are very different from the others points. They do not seem to belong to a particular population and behave differently. If these influential points are to be removed it will lead to a different model. Distinction between these points is not always obvious and clear. Hence several indicators are used for identifying and analyzing outliers. Existing methods of outlier detection are based on manual inspection of graphically represented data. In this paper, we present a new approach in automating the process of detecting and isolating outliers. Impact of anomalous values on the dataset has been established by using two indicators DFFITS and Cook’sD. The process is based on modeling the human perception of exceptional values by using multiple linear regression analysis.

  12. Entrepreneurial intention modeling using hierarchical multiple regression

    Marina Jeger


    Full Text Available The goal of this study is to identify the contribution of effectuation dimensions to the predictive power of the entrepreneurial intention model over and above that which can be accounted for by other predictors selected and confirmed in previous studies. As is often the case in social and behavioral studies, some variables are likely to be highly correlated with each other. Therefore, the relative amount of variance in the criterion variable explained by each of the predictors depends on several factors such as the order of variable entry and sample specifics. The results show the modest predictive power of two dimensions of effectuation prior to the introduction of the theory of planned behavior elements. The article highlights the main advantages of applying hierarchical regression in social sciences as well as in the specific context of entrepreneurial intention formation, and addresses some of the potential pitfalls that this type of analysis entails.

  13. An adaptive online learning approach for Support Vector Regression: Online-SVR-FID

    Liu, Jie; Zio, Enrico


    Support Vector Regression (SVR) is a popular supervised data-driven approach for building empirical models from available data. Like all data-driven methods, under non-stationary environmental and operational conditions it needs to be provided with adaptive learning capabilities, which might become computationally burdensome with large datasets cumulating dynamically. In this paper, a cost-efficient online adaptive learning approach is proposed for SVR by combining Feature Vector Selection (FVS) and Incremental and Decremental Learning. The proposed approach adaptively modifies the model only when different pattern drifts are detected according to proposed criteria. Two tolerance parameters are introduced in the approach to control the computational complexity, reduce the influence of the intrinsic noise in the data and avoid the overfitting problem of SVR. Comparisons of the prediction results is made with other online learning approaches e.g. NORMA, SOGA, KRLS, Incremental Learning, on several artificial datasets and a real case study concerning time series prediction based on data recorded on a component of a nuclear power generation system. The performance indicators MSE and MARE computed on the test dataset demonstrate the efficiency of the proposed online learning method.

  14. Hierarchical sparsity priors for regression models

    Griffin, Jim E.; Brown, Philip J


    We focus on the increasingly important area of sparse regression problems where there are many variables and the effects of a large subset of these are negligible. This paper describes the construction of hierarchical prior distributions when the effects are considered related. These priors allow dependence between the regression coefficients and encourage related shrinkage towards zero of different regression coefficients. The properties of these priors are discussed and applications to line...

  15. Some Priors for Sparse Regression Modelling

    Griffin, Jim E.; Brown, Philip J


    A wide range of methods, Bayesian and others, tackle regression when there are many variables. In the Bayesian context, the prior is constructed to reflect ideas of variable selection and to encourage appropriate shrinkage. The prior needs to be reasonably robust to different signal to noise structures. Two simple evergreen prior constructions stem from ridge regression on the one hand and g-priors on the other. We seek to embed recent ideas about sparsity of the regression coefficients and r...

  16. Cox Proportional Hazard with Multivariate Adaptive Regression Splines to Analyze the Product Sales Time in E-Commerce

    Irwansyah, Edy


    Cox Proportional Hazard (Cox PH) model is a survival analysis method to perform model of relationship between independent variable and dependent variable which shown by time until an event occurs. This method compute residuals, martingale or deviance, which can used to diagnostic the lack of fit of a model and PH assumption. The alternative method if these not satisfied is Multivariate Adaptive Regression Splines (MARS) approach. This method use to perform the analysis of product selling time...

  17. Using Multivariate Adaptive Regression Spline and Artificial Neural Network to Simulate Urbanization in Mumbai, India

    Ahmadlou, M.; Delavar, M. R.; Tayyebi, A.; Shafizadeh-Moghadam, H.


    Land use change (LUC) models used for modelling urban growth are different in structure and performance. Local models divide the data into separate subsets and fit distinct models on each of the subsets. Non-parametric models are data driven and usually do not have a fixed model structure or model structure is unknown before the modelling process. On the other hand, global models perform modelling using all the available data. In addition, parametric models have a fixed structure before the modelling process and they are model driven. Since few studies have compared local non-parametric models with global parametric models, this study compares a local non-parametric model called multivariate adaptive regression spline (MARS), and a global parametric model called artificial neural network (ANN) to simulate urbanization in Mumbai, India. Both models determine the relationship between a dependent variable and multiple independent variables. We used receiver operating characteristic (ROC) to compare the power of the both models for simulating urbanization. Landsat images of 1991 (TM) and 2010 (ETM+) were used for modelling the urbanization process. The drivers considered for urbanization in this area were distance to urban areas, urban density, distance to roads, distance to water, distance to forest, distance to railway, distance to central business district, number of agricultural cells in a 7 by 7 neighbourhoods, and slope in 1991. The results showed that the area under the ROC curve for MARS and ANN was 94.77% and 95.36%, respectively. Thus, ANN performed slightly better than MARS to simulate urban areas in Mumbai, India.

  18. Synthesis analysis of regression models with a continuous outcome

    Zhou, Xiao-Hua; Hu, Nan; Hu, Guizhou; Root, Martin


    To estimate the multivariate regression model from multiple individual studies, it would be challenging to obtain results if the input from individual studies only provide univariate or incomplete multivariate regression information. Samsa et al. (J. Biomed. Biotechnol. 2005; 2:113–123) proposed a simple method to combine coefficients from univariate linear regression models into a multivariate linear regression model, a method known as synthesis analysis. However, the validity of this method...

  19. Model performance analysis and model validation in logistic regression

    Rosa Arboretti Giancristofaro


    Full Text Available In this paper a new model validation procedure for a logistic regression model is presented. At first, we illustrate a brief review of different techniques of model validation. Next, we define a number of properties required for a model to be considered "good", and a number of quantitative performance measures. Lastly, we describe a methodology for the assessment of the performance of a given model by using an example taken from a management study.

  20. Predicting respiratory tumor motion with multi-dimensional adaptive filters and support vector regression

    Intra-fraction tumor tracking methods can improve radiation delivery during radiotherapy sessions. Image acquisition for tumor tracking and subsequent adjustment of the treatment beam with gating or beam tracking introduces time latency and necessitates predicting the future position of the tumor. This study evaluates the use of multi-dimensional linear adaptive filters and support vector regression to predict the motion of lung tumors tracked at 30 Hz. We expand on the prior work of other groups who have looked at adaptive filters by using a general framework of a multiple-input single-output (MISO) adaptive system that uses multiple correlated signals to predict the motion of a tumor. We compare the performance of these two novel methods to conventional methods like linear regression and single-input, single-output adaptive filters. At 400 ms latency the average root-mean-square-errors (RMSEs) for the 14 treatment sessions studied using no prediction, linear regression, single-output adaptive filter, MISO and support vector regression are 2.58, 1.60, 1.58, 1.71 and 1.26 mm, respectively. At 1 s, the RMSEs are 4.40, 2.61, 3.34, 2.66 and 1.93 mm, respectively. We find that support vector regression most accurately predicts the future tumor position of the methods studied and can provide a RMSE of less than 2 mm at 1 s latency. Also, a multi-dimensional adaptive filter framework provides improved performance over single-dimension adaptive filters. Work is underway to combine these two frameworks to improve performance.

  1. Bayesian Model Averaging in the Instrumental Variable Regression Model

    Gary Koop; Robert Leon Gonzalez; Rodney Strachan


    This paper considers the instrumental variable regression model when there is uncertainly about the set of instruments, exogeneity restrictions, the validity of identifying restrictions and the set of exogenous regressors. This uncertainly can result in a huge number of models. To avoid statistical problems associated with standard model selection procedures, we develop a reversible jump Markov chain Monte Carlo algorithm that allows us to do Bayesian model averaging. The algorithm is very fl...

  2. Electricity prices forecasting by automatic dynamic harmonic regression models

    The changes experienced by electricity markets in recent years have created the necessity for more accurate forecast tools of electricity prices, both for producers and consumers. Many methodologies have been applied to this aim, but in the view of the authors, state space models are not yet fully exploited. The present paper proposes a univariate dynamic harmonic regression model set up in a state space framework for forecasting prices in these markets. The advantages of the approach are threefold. Firstly, a fast automatic identification and estimation procedure is proposed based on the frequency domain. Secondly, the recursive algorithms applied offer adaptive predictions that compare favourably with respect to other techniques. Finally, since the method is based on unobserved components models, explicit information about trend, seasonal and irregular behaviours of the series can be extracted. This information is of great value to the electricity companies' managers in order to improve their strategies, i.e. it provides management innovations. The good forecast performance and the rapid adaptability of the model to changes in the data are illustrated with actual prices taken from the PJM interconnection in the US and for the Spanish market for the year 2002

  3. Model Selection in Kernel Ridge Regression

    Exterkate, Peter

    Kernel ridge regression is gaining popularity as a data-rich nonlinear forecasting tool, which is applicable in many different contexts. This paper investigates the influence of the choice of kernel and the setting of tuning parameters on forecast accuracy. We review several popular kernels...

  4. Regression and regression analysis time series prediction modeling on climate data of quetta, pakistan

    Various statistical techniques was used on five-year data from 1998-2002 of average humidity, rainfall, maximum and minimum temperatures, respectively. The relationships to regression analysis time series (RATS) were developed for determining the overall trend of these climate parameters on the basis of which forecast models can be corrected and modified. We computed the coefficient of determination as a measure of goodness of fit, to our polynomial regression analysis time series (PRATS). The correlation to multiple linear regression (MLR) and multiple linear regression analysis time series (MLRATS) were also developed for deciphering the interdependence of weather parameters. Spearman's rand correlation and Goldfeld-Quandt test were used to check the uniformity or non-uniformity of variances in our fit to polynomial regression (PR). The Breusch-Pagan test was applied to MLR and MLRATS, respectively which yielded homoscedasticity. We also employed Bartlett's test for homogeneity of variances on a five-year data of rainfall and humidity, respectively which showed that the variances in rainfall data were not homogenous while in case of humidity, were homogenous. Our results on regression and regression analysis time series show the best fit to prediction modeling on climatic data of Quetta, Pakistan. (author)

  5. Prediction of Rotor Spun Yarn Strength Using Adaptive Neuro-fuzzy Inference System and Linear Multiple Regression Methods

    NURWAHA Deogratias; WANG Xin-hou


    This paper presents a comparison study of two models for predicting the strength of rotor spun cotton yarns from fiber properties. The adaptive neuro-fuzzy system inference (ANFIS) and Multiple Linear Regression models are used to predict the rotor spun yarn strength. Fiber properties and yarn count are used as inputs to train the two models and the count-strength-product (CSP) was the target. The predictive performances of the two models are estimated and compared. We found that the ANFIS has a better predictive power in comparison with linear multipleregression model. The impact of each fiber property is also illustrated.

  6. Real-time detection of generic objects using objectness estimation and locally adaptive regression kernels matching

    Zheng, Zhihui; Gao, Lei; Xiao, Liping; Zhou, Bin; Gao, Shibo


    Our purpose is to develop a detection algorithm capable of searching for generic interest objects in real time without large training sets and long-time training stages. Instead of the classical sliding window object detection paradigm, we employ an objectness measure to produce a small set of candidate windows efficiently using Binarized Normed Gradients and a Laplacian of Gaussian-like filter. We then extract Locally Adaptive Regression Kernels (LARKs) as descriptors both from a model image and the candidate windows which measure the likeness of a pixel to its surroundings. Using a matrix cosine similarity measure, the algorithm yields a scalar resemblance map, indicating the likelihood of similarity between the model and the candidate windows. By employing nonparametric significance tests and non-maxima suppression, we detect the presence of objects similar to the given model. Experiments show that the proposed detection paradigm can automatically detect the presence, the number, as well as location of similar objects to the given model. The high quality and efficiency of our method make it suitable for real time multi-category object detection applications.

  7. Stochastic Approximation Methods for Latent Regression Item Response Models

    von Davier, Matthias; Sinharay, Sandip


    This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…

  8. Support Vector Regression Model Based on Empirical Mode Decomposition and Auto Regression for Electric Load Forecasting

    Hong-Juan Li


    Full Text Available Electric load forecasting is an important issue for a power utility, associated with the management of daily operations such as energy transfer scheduling, unit commitment, and load dispatch. Inspired by strong non-linear learning capability of support vector regression (SVR, this paper presents a SVR model hybridized with the empirical mode decomposition (EMD method and auto regression (AR for electric load forecasting. The electric load data of the New South Wales (Australia market are employed for comparing the forecasting performances of different forecasting models. The results confirm the validity of the idea that the proposed model can simultaneously provide forecasting with good accuracy and interpretability.

  9. Multiple Retrieval Models and Regression Models for Prior Art Search

    Lopez, Patrice


    This paper presents the system called PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach presents three main characteristics: 1. The usage of multiple retrieval models (KL, Okapi) and term index definitions (lemma, phrase, concept) for the three languages considered in the present track (English, French, German) producing ten different sets of ranked results. 2. The merging of the different results based on multiple regression models using an additional validation set created from the patent collection. 3. The exploitation of patent metadata and of the citation structures for creating restricted initial working sets of patents and for producing a final re-ranking regression model. As we exploit specific metadata of the patent documents and the citation relations only at the creation of initial working sets and during the final post ranking step, our architecture remains generic and easy to extend.

  10. Using AMMI, factorial regression and partial least squares regression models for interpreting genotype x environment interaction.

    Vargas, M.; Crossa, J.; Eeuwijk, van F.A.; Ramirez, M.E.; Sayre, K.


    Partial least squares (PLS) and factorial regression (FR) are statistical models that incorporate external environmental and/or cultivar variables for studying and interpreting genotype × environment interaction (GEl). The Additive Main effect and Multiplicative Interaction (AMMI) model uses only th

  11. Combination of supervised and semi-supervised regression models for improved unbiased estimation

    Arenas-Garía, Jeronimo; Moriana-Varo, Carlos; Larsen, Jan


    In this paper we investigate the steady-state performance of semisupervised regression models adjusted using a modified RLS-like algorithm, identifying the situations where the new algorithm is expected to outperform standard RLS. By using an adaptive combination of the supervised and semisupervi......In this paper we investigate the steady-state performance of semisupervised regression models adjusted using a modified RLS-like algorithm, identifying the situations where the new algorithm is expected to outperform standard RLS. By using an adaptive combination of the supervised and...

  12. Modeling tourism flows through gravity models: A quantile regression approach

    Santeramo, Fabio Gaetano; Morelli, Mariangela


    Gravity models are widely used to study tourism flows. The peculiarities of the segmented international demand for agritourism in Italy is examined by means of novel approach: a panel data quantile regression. We characterize the international demand for Italian agritourism with a large dataset, by considering data of thirty-three countries of origin, from 1998 to 2010. Distance and income are major determinants, but we also found that mutual agreements and high urbanization rates in countrie...

  13. Multiple Retrieval Models and Regression Models for Prior Art Search

    Lopez, Patrice; Romary, Laurent


    This paper presents the system called PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach presents three main characteristics: 1. The usage of multiple retrieval models (KL, Okapi) and term index definitions (lemma, phrase, concept) for the three languages considered in the present track (English, French, German) producing ten different sets of ranked results. 2. The merging of the different results based on multiple regression m...

  14. Drought Patterns Forecasting using an Auto-Regressive Logistic Model

    del Jesus, M.; Sheffield, J.; Méndez Incera, F. J.; Losada, I. J.; Espejo, A.


    Drought is characterized by a water deficit that may manifest across a large range of spatial and temporal scales. Drought may create important socio-economic consequences, many times of catastrophic dimensions. A quantifiable definition of drought is elusive because depending on its impacts, consequences and generation mechanism, different water deficit periods may be identified as a drought by virtue of some definitions but not by others. Droughts are linked to the water cycle and, although a climate change signal may not have emerged yet, they are also intimately linked to climate.In this work we develop an auto-regressive logistic model for drought prediction at different temporal scales that makes use of a spatially explicit framework. Our model allows to include covariates, continuous or categorical, to improve the performance of the auto-regressive component.Our approach makes use of dimensionality reduction (principal component analysis) and classification techniques (K-Means and maximum dissimilarity) to simplify the representation of complex climatic patterns, such as sea surface temperature (SST) and sea level pressure (SLP), while including information on their spatial structure, i.e. considering their spatial patterns. This procedure allows us to include in the analysis multivariate representation of complex climatic phenomena, as the El Niño-Southern Oscillation. We also explore the impact of other climate-related variables such as sun spots. The model allows to quantify the uncertainty of the forecasts and can be easily adapted to make predictions under future climatic scenarios. The framework herein presented may be extended to other applications such as flash flood analysis, or risk assessment of natural hazards.

  15. Support vector regression model for complex target RCS predicting

    Wang Gu; Chen Weishi; Miao Jungang


    The electromagnetic scattering computation has developed rapidly for many years; some computing problems for complex and coated targets cannot be solved by using the existing theory and computing models. A computing model based on data is established for making up the insufficiency of theoretic models. Based on the "support vector regression method", which is formulated on the principle of minimizing a structural risk, a data model to predicate the unknown radar cross section of some appointed targets is given. Comparison between the actual data and the results of this predicting model based on support vector regression method proved that the support vector regression method is workable and with a comparative precision.

  16. Multiattribute shopping models and ridge regression analysis

    Timmermans, HJP Harry


    Policy decisions regarding retailing facilities essentially involve multiple attributes of shopping centres. If mathematical shopping models are to contribute to these decision processes, their structure should reflect the multiattribute character of retailing planning. Examination of existing models shows that most operational shopping models include only two policy variables. A serious problem in the calibration of the existing multiattribute shopping models is that of multicollinearity ari...

  17. Parametric and Non-Parametric Regression Models

    Brabec, Marek

    Stuttgart : E. Schweizerbart'sche Verlagsbuchhandlung, 2013 - (Hermanussen, M.), s. 194-199 ISBN 978-3-510-65278-5 Institutional support: RVO:67985807 Keywords : statistical modeling * non-parametric * semi-parametric * mixed-effects model * growth curve model Subject RIV: BB - Applied Statistics, Operational Research

  18. Adaptive Linear and Normalized Combination of Radial Basis Function Networks for Function Approximation and Regression

    Yunfeng Wu


    Full Text Available This paper presents a novel adaptive linear and normalized combination (ALNC method that can be used to combine the component radial basis function networks (RBFNs to implement better function approximation and regression tasks. The optimization of the fusion weights is obtained by solving a constrained quadratic programming problem. According to the instantaneous errors generated by the component RBFNs, the ALNC is able to perform the selective ensemble of multiple leaners by adaptively adjusting the fusion weights from one instance to another. The results of the experiments on eight synthetic function approximation and six benchmark regression data sets show that the ALNC method can effectively help the ensemble system achieve a higher accuracy (measured in terms of mean-squared error and the better fidelity (characterized by normalized correlation coefficient of approximation, in relation to the popular simple average, weighted average, and the Bagging methods.

  19. Symbolic regression of generative network models

    Menezes, Telmo


    Networks are a powerful abstraction with applicability to a variety of scientific fields. Models explaining their morphology and growth processes permit a wide range of phenomena to be more systematically analysed and understood. At the same time, creating such models is often challenging and requires insights that may be counter-intuitive. Yet there currently exists no general method to arrive at better models. We have developed an approach to automatically detect realistic decentralised network growth models from empirical data, employing a machine learning technique inspired by natural selection and defining a unified formalism to describe such models as computer programs. As the proposed method is completely general and does not assume any pre-existing models, it can be applied "out of the box" to any given network. To validate our approach empirically, we systematically rediscover pre-defined growth laws underlying several canonical network generation models and credible laws for diverse real-world netwo...


    We present a Bayesian approach to regress a circular variable on a linear predictor. The regression coefficients are assumed to have a nonparametric distribution with a Dirichlet process prior. The semiparametric Bayesian approach gives added flexibility to the model and is usefu...

  1. Daily Reference Evapotranspiration Estimation using Linear Regression and ANN Models

    Mallikarjuna, P.; Jyothy, S. A.; Sekhar Reddy, K. C.


    The present study investigates the applicability of linear regression and ANN models for estimating daily reference evapotranspiration (ET0) at Tirupati, Nellore, Rajahmundry, Anakapalli and Rajendranagar regions of Andhra Pradesh. The climatic parameters influencing daily ET0 were identified through multiple and partial correlation analysis. The daily temperature, wind velocity, relative humidity and sunshine hours mostly influenced the study area in the daily ET0 estimation. Linear regression models in terms of the climatic parameters influencing the region and, optimal neural network architectures considering these influencing climatic parameters as input parameters were developed. The models' performance in the estimation of ET0 was evaluated with that estimated by FAO-56 Penman-Montieth method. The regression models showed a satisfactory performance in the daily ET0 estimation for the regions selected for the present study. The optimal ANN (4,4,1) models, however, consistently showed an improved performance over regression models.

  2. Marginal Regression Models with Varying Coefficients for Correlated Ordinal Data

    Gieger, Christian


    This paper discusses marginal regression models for repeated or clustered ordinal measurements in which the coefficients of explanatory variables are allowed to vary as smooth functions of other covariates. We model the marginal response probabilities and the marginal pairwise association structure by two semiparametric regressions. To estimate the fixed parameters and varying coefficients in both models we derive an algorithm that is based on penalized generalized estimating equations. This...

  3. Fuzzy Multiple Regression Model for Estimating Software Development Time

    Venus Marza


    Full Text Available As software becomes more complex and its scope dramatically increase, the importance of research on developing methods for estimating software development time has perpetually increased, so accurate estimation is the main goal of software managers for reducing risks of projects. The purpose of this article is to introduce a new Fuzzy Multiple Regression approach, which has the higher accurate than other methods for estimating. Furthermore, we compare Fuzzy Multiple Regression model with Fuzzy Logic model & Multiple Regression model based on their accuracy.


    Barbu Bogdan POPESCU


    Full Text Available There are presented econometric models developed for analysis of banking exclusion of the economic crisis. Access to public goods and services is a condition „sine qua non” for open and efficient society. Availability of banking and payment of the entire population without discrimination in our opinion should be the primary objective of public service policy.


    Barbu Bogdan POPESCU; Lavinia Stefania TOTAN


    There are presented econometric models developed for analysis of banking exclusion of the economic crisis. Access to public goods and services is a condition „sine qua non” for open and efficient society. Availability of banking and payment of the entire population without discrimination in our opinion should be the primary objective of public service policy.

  6. Modeling inequality and spread in multiple regression

    Rolf Aaberge; Steinar Bjerve; Kjell Doksum


    We consider concepts and models for measuring inequality in the distribution of resources with a focus on how inequality varies as a function of covariates. Lorenz introduced a device for measuring inequality in the distribution of income that indicates how much the incomes below the u$^{th}$ quantile fall short of the egalitarian situation where everyone has the same income. Gini introduced a summary measure of inequality that is the average over u of the difference between the Lorenz curve ...

  7. Residual diagnostics for cross-section time series regression models

    Baum, Christopher F


    These routines support the diagnosis of groupwise heteroskedasticity and cross-sectional correlation in the context of a regression model fit to pooled cross-section time series (xt) data. Copyright 2001 by Stata Corporation.

  8. Regression Test-Selection Technique Using Component Model Based Modification: Code to Test Traceability

    Ahmad A. Saifan


    Full Text Available Regression testing is a safeguarding procedure to validate and verify adapted software, and guarantee that no errors have emerged. However, regression testing is very costly when testers need to re-execute all the test cases against the modified software. This paper proposes a new approach in regression test selection domain. The approach is based on meta-models (test models and structured models to decrease the number of test cases to be used in the regression testing process. The approach has been evaluated using three Java applications. To measure the effectiveness of the proposed approach, we compare the results using the re-test to all approaches. The results have shown that our approach reduces the size of test suite without negative impact on the effectiveness of the fault detection.

  9. Switching regression models with non-normal errors

    Pruska, Krystyna


    In this paper two forms of switching regression models with non-normal errors are considered. The pseudo maximum likelihood method is proposed for the estimation of their parameters. Monte Carlo experiments results are presented for a special switching regression model, too. In this research there are compared distributions of parameters estimators for different distributions of errors. The error distributions are as follows: normal, Student’s or Laplace’s. The maximum likel...

  10. A Comparison of Random Coefficient Modelling and Geographically Weighted Regression for Spatially Non-stationary Regression Problems

    Brunsdon, Chris; Aitkin, Murray; Fotheringham, Stewart; Charlton, Martin


    Compares random coefficient modelling (RCM) and geographically weighted regression for spatially non-stationary regression. Relationship between limiting long-term illness (LLTI) and social factors; Variants of RCM; Factors contributing to the prevalence of LLTI.

  11. Robust Depth-Weighted Wavelet for Nonparametric Regression Models

    Lu LIN


    In the nonpaxametric regression models, the original regression estimators including kernel estimator, Fourier series estimator and wavelet estimator are always constructed by the weighted sum of data, and the weights depend only on the distance between the design points and estimation points. As a result these estimators are not robust to the perturbations in data. In order to avoid this problem, a new nonparametric regression model, called the depth-weighted regression model, is introduced and then the depth-weighted wavelet estimation is defined. The new estimation is robust to the perturbations in data, which attains very high breakdown value close to 1/2. On the other hand, some asymptotic behaviours such as asymptotic normality are obtained. Some simulations illustrate that the proposed wavelet estimator is more robust than the original wavelet estimator and, as a price to pay for the robustness, the new method is slightly less efficient than the original method.

  12. Alternative regression models to assess increase in childhood BMI

    Mansmann Ulrich


    Full Text Available Abstract Background Body mass index (BMI data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Methods Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs, quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS. We analyzed data of 4967 children participating in the school entry health examination in Bavaria, Germany, from 2001 to 2002. TV watching, meal frequency, breastfeeding, smoking in pregnancy, maternal obesity, parental social class and weight gain in the first 2 years of life were considered as risk factors for obesity. Results GAMLSS showed a much better fit regarding the estimation of risk factors effects on transformed and untransformed BMI data than common GLMs with respect to the generalized Akaike information criterion. In comparison with GAMLSS, quantile regression allowed for additional interpretation of prespecified distribution quantiles, such as quantiles referring to overweight or obesity. The variables TV watching, maternal BMI and weight gain in the first 2 years were directly, and meal frequency was inversely significantly associated with body composition in any model type examined. In contrast, smoking in pregnancy was not directly, and breastfeeding and parental social class were not inversely significantly associated with body composition in GLM models, but in GAMLSS and partly in quantile regression models. Risk factor specific BMI percentile curves could be estimated from GAMLSS and quantile regression models. Conclusion GAMLSS and quantile regression seem to be more appropriate than common GLMs for risk factor modeling of BMI data.

  13. A generalized regression model for a binary response

    Kateri, Maria; Agresti, Alan


    Abstract Logistic regression is the closest model, given its sufficient statistics, to the model of constant success probability in terms of Kullback-Leibler information. A generalized binary model has this property for the more general ?-divergence. These results generalize to multinomial and other discrete data.

  14. Multivariate Regression Models for Estimating Journal Usefulness in Physics.

    Bennion, Bruce C.; Karschamroon, Sunee


    This study examines possibility of ranking journals in physics by means of bibliometric regression models that estimate usefulness as it is reported by 167 physicists in United States and Canada. Development of four models, patterns of deviation from models, and validity and application are discussed. Twenty-six references are cited. (EJS)

  15. Analysis of Sting Balance Calibration Data Using Optimized Regression Models

    Ulbrich, N.; Bader, Jon B.


    Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.

  16. Correlation between Production and Labor based on Regression Model

    Constantin Anghelache


    In the theoretical analysis, dependency of variables is stochastic. Consideration of the residual variable within such a model is needed. Other factors that influence the score variable are grouped in the residual. Uni-factorial nonlinear models are linearized transformations that are applied to the variables, the regression model. So, for example, a model of the form turns into a linear model by logarithm the two terms of the above equality, resulting in linear function. This model is recomm...

  17. General bound of overfitting for MLP regression models

    Rynkiewicz, Joseph


    Multilayer perceptrons (MLP) with one hidden layer have been used for a long time to deal with non-linear regression. However, in some task, MLP's are too powerful models and a small mean square error (MSE) may be more due to overfitting than to actual modelling. If the noise of the regression model is Gaussian, the overfitting of the model is totally determined by the behavior of the likelihood ratio test statistic (LRTS), however in numerous cases the assumption of normality of the noise is...

  18. Stability and adaptability of runner peanut genotypes based on nonlinear regression and AMMI analysis

    Roseane Cavalcanti dos Santos


    Full Text Available The objective of this work was to estimate the stability and adaptability of pod and seed yield in runner peanut genotypes based on the nonlinear regression and AMMI analysis. Yield data from 11 trials, distributed in six environments and three harvests, carried out in the Northeast region of Brazil during the rainy season were used. Significant effects of genotypes (G, environments (E, and GE interactions were detected in the analysis, indicating different behaviors among genotypes in favorable and unfavorable environmental conditions. The genotypes BRS Pérola Branca and LViPE‑06 are more stable and adapted to the semiarid environment, whereas LGoPE‑06 is a promising material for pod production, despite being highly dependent on favorable environments.

  19. Regression model for Quality of Web Services dataset with WEKA

    Shalini Gambhir; Puneet Arora; Jatin Gambhir


    The Waikato Environment for Knowledge Analysis (WEKA) came about through the perceived need for a unified workbench that would allow researchers easy access to state-of the-art techniques in machine learning algorithms for data mining tasks. It provides a general-purpose environment for automatic classification, regression, clustering, and feature selection etc. in various research areas. This paper provides an introduction to the WEKA workbench and briefly discusses regression model for some o...

  20. Model selection criteria for factor-augmented regressions

    Jan J. J. Groen; Kapetanios, George


    In a factor-augmented regression, the forecast of a variable depends on a few factors estimated from a large number of predictors. But how does one determine the appropriate number of factors relevant for such a regression? Existing work has focused on criteria that can consistently estimate the appropriate number of factors in a large-dimensional panel of explanatory variables. However, not all of these factors are necessarily relevant for modeling a specific dependent variable within a fact...

  1. Adaptive Estimation of Heteroscedastic Money Demand Model of Pakistan

    Muhammad Aslam


    Full Text Available For the problem of estimation of Money demand model of Pakistan, money supply (M1 shows heteroscedasticity of the unknown form. For estimation of such model we compare two adaptive estimators with ordinary least squares estimator and show the attractive performance of the adaptive estimators, namely, nonparametric kernel estimator and nearest neighbour regression estimator. These comparisons are made on the basis standard errors of the estimated coefficients, standard error of regression, Akaike Information Criteria (AIC value, and the Durban-Watson statistic for autocorrelation. We further show that nearest neighbour regression estimator performs better when comparing with the other nonparametric kernel estimator.

  2. Gologit2: Generalized Logistic Regression Models for Ordinal Dependent Variables

    Richard Williams


    -gologit2- is a user-written program that estimates generalized logistic regression models for ordinal dependent variables. The actual values taken on by the dependent variable are irrelevant except that larger values are assumed to correspond to "higher" outcomes. A major strength of -gologit2- is that it can also estimate two special cases of the generalized model: the proportional odds model and the partial proportional odds model. Hence, -gologit2- can estimate models that are less restri...

  3. Buffalos milk yield analysis using random regression models

    A.S. Schierholt


    Full Text Available Data comprising 1,719 milk yield records from 357 females (predominantly Murrah breed, daughters of 110 sires, with births from 1974 to 2004, obtained from the Programa de Melhoramento Genético de Bubalinos (PROMEBUL and from records of EMBRAPA Amazônia Oriental - EAO herd, located in Belém, Pará, Brazil, were used to compare random regression models for estimating variance components and predicting breeding values of the sires. The data were analyzed by different models using the Legendre’s polynomial functions from second to fourth orders. The random regression models included the effects of herd-year, month of parity date of the control; regression coefficients for age of females (in order to describe the fixed part of the lactation curve and random regression coefficients related to the direct genetic and permanent environment effects. The comparisons among the models were based on the Akaike Infromation Criterion. The random effects regression model using third order Legendre’s polynomials with four classes of the environmental effect were the one that best described the additive genetic variation in milk yield. The heritability estimates varied from 0.08 to 0.40. The genetic correlation between milk yields in younger ages was close to the unit, but in older ages it was low.

  4. Top-Down Visual Saliency Detection in Optical Satellite Images Based on Local Adaptive Regression Kernel

    Xiaoguang Cui


    Full Text Available This paper proposes a novel top-down visual saliency detection method for optical satellite images using local adaptive regression kernels. This method provides a saliency map by measuring the likeness of image patches to a given single template image. The local adaptive regression kernel (LARK is used as a descriptor to extract feature and compare against analogous feature from the target image. A multi-scale pyramid of the target image is constructed to cope with large-scale variations. In addition, accounting for rotation variations, the histogram of kernel orientation is employed to estimate the rotation angle of image patch, and then comparison is performed after rotating the patch by the estimated angle. Moreover, we use the bounded partial correlation (BPC to compare features between image patches and the template so as to rapidly generate the saliency map. Experiments were performed in optical satellite images to find airplanes, and experimental results demonstrate that the proposed method is effective and robust in complex scenes.

  5. Support Vector Regression-Based Adaptive Divided Difference Filter for Nonlinear State Estimation Problems

    Hongjian Wang


    Full Text Available We present a support vector regression-based adaptive divided difference filter (SVRADDF algorithm for improving the low state estimation accuracy of nonlinear systems, which are typically affected by large initial estimation errors and imprecise prior knowledge of process and measurement noises. The derivative-free SVRADDF algorithm is significantly simpler to compute than other methods and is implemented using only functional evaluations. The SVRADDF algorithm involves the use of the theoretical and actual covariance of the innovation sequence. Support vector regression (SVR is employed to generate the adaptive factor to tune the noise covariance at each sampling instant when the measurement update step executes, which improves the algorithm’s robustness. The performance of the proposed algorithm is evaluated by estimating states for (i an underwater nonmaneuvering target bearing-only tracking system and (ii maneuvering target bearing-only tracking in an air-traffic control system. The simulation results show that the proposed SVRADDF algorithm exhibits better performance when compared with a traditional DDF algorithm.

  6. An Implementation of Bayesian Adaptive Regression Splines (BARS in C with S and R Wrappers

    Garrick Wallstrom


    Full Text Available BARS (DiMatteo, Genovese, and Kass 2001 uses the powerful reversible-jump MCMC engine to perform spline-based generalized nonparametric regression. It has been shown to work well in terms of having small mean-squared error in many examples (smaller than known competitors, as well as producing visually-appealing fits that are smooth (filtering out high-frequency noise while adapting to sudden changes (retaining high-frequency signal. However, BARS is computationally intensive. The original implementation in S was too slow to be practical in certain situations, and was found to handle some data sets incorrectly. We have implemented BARS in C for the normal and Poisson cases, the latter being important in neurophysiological and other point-process applications. The C implementation includes all needed subroutines for fitting Poisson regression, manipulating B-splines (using code created by Bates and Venables, and finding starting values for Poisson regression (using code for density estimation created by Kooperberg. The code utilizes only freely-available external libraries (LAPACK and BLAS and is otherwise self-contained. We have also provided wrappers so that BARS can be used easily within S or R.

  7. Simulation study for model performance of multiresponse semiparametric regression

    Wibowo, Wahyu; Haryatmi, Sri; Budiantara, I. Nyoman


    The objective of this paper is to evaluate the performance of multiresponse semiparametric regression model based on both of the function types and sample sizes. In general, multiresponse semiparametric regression model consists of parametric and nonparametric functions. This paper focuses on both linear and quadratic functions for parametric components and spline function for nonparametric component. Moreover, this model could also be seen as a spline semiparametric seemingly unrelated regression model. Simulation study is conducted by evaluating three combinations of parametric and nonparametric components, i.e. linear-trigonometric, quadratic-exponential, and multiple linear-polynomial functions respectively. Two criterias are used for assessing the model performance, i.e. R-square and Mean Square Error (MSE). The results show that both of the function types and sample sizes have significantly influenced to the model performance. In addition, this multiresponse semiparametric regression model yields the best performance at the small sample size and combination between multiple linear and polynomial functions as parametric and nonparametric components respectively. Moreover, the model performances at the big sample size tend to be similar for any combination of parametric and nonparametric components.

  8. Testing heteroscedasticity by wavelets in a nonparametric regression model

    LI; Yuan; WONG; Heung; IP; Waicheung


    In the nonparametric regression models, a homoscedastic structure is usually assumed. However, the homoscedasticity cannot be guaranteed a priori. Hence, testing the heteroscedasticity is needed. In this paper we propose a consistent nonparametric test for heteroscedasticity, based on wavelets. The empirical wavelet coefficients of the conditional variance in a regression model are defined first. Then they are shown to be asymptotically normal, based on which a test statistic for the heteroscedasticity is constructed by using Fan's wavelet thresholding idea. Simulations show that our test is superior to the traditional nonparametric test.

  9. Direction of Effects in Multiple Linear Regression Models.

    Wiedermann, Wolfgang; von Eye, Alexander


    Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed. PMID:26609741

  10. Flexible competing risks regression modeling and goodness-of-fit

    Scheike, Thomas; Zhang, Mei-Jie


    In this paper we consider different approaches for estimation and assessment of covariate effects for the cumulative incidence curve in the competing risks model. The classic approach is to model all cause-specific hazards and then estimate the cumulative incidence curve based on these cause......-specific hazards. Another recent approach is to directly model the cumulative incidence by a proportional model (Fine and Gray, J Am Stat Assoc 94:496-509, 1999), and then obtain direct estimates of how covariates influences the cumulative incidence curve. We consider a simple and flexible class of regression...... models that is easy to fit and contains the Fine-Gray model as a special case. One advantage of this approach is that our regression modeling allows for non-proportional hazards. This leads to a new simple goodness-of-fit procedure for the proportional subdistribution hazards assumption that is very easy...

  11. Modelling multimodal photometric redshift regression with noisy observations

    Kügler, S D


    In this work, we are trying to extent the existing photometric redshift regression models from modeling pure photometric data back to the spectra themselves. To that end, we developed a PCA that is capable of describing the input uncertainty (including missing values) in a dimensionality reduction framework. With this "spectrum generator" at hand, we are capable of treating the redshift regression problem in a fully Bayesian framework, returning a posterior distribution over the redshift. This approach allows therefore to approach the multimodal regression problem in an adequate fashion. In addition, input uncertainty on the magnitudes can be included quite naturally and lastly, the proposed algorithm allows in principle to make predictions outside the training values which makes it a fascinating opportunity for the detection of high-redshifted quasars.

  12. The art of regression modeling in road safety

    Hauer, Ezra


    This unique book explains how to fashion useful regression models from commonly available data to erect models essential for evidence-based road safety management and research. Composed from techniques and best practices presented over many years of lectures and workshops, The Art of Regression Modeling in Road Safety illustrates that fruitful modeling cannot be done without substantive knowledge about the modeled phenomenon. Class-tested in courses and workshops across North America, the book is ideal for professionals, researchers, university professors, and graduate students with an interest in, or responsibilities related to, road safety. This book also: · Presents for the first time a powerful analytical tool for road safety researchers and practitioners · Includes problems and solutions in each chapter as well as data and spreadsheets for running models and PowerPoint presentation slides · Features pedagogy well-suited for graduate courses and workshops including problems, solutions, and PowerPoint p...

  13. Spatial stochastic regression modelling of urban land use

    Urbanization is very closely linked to industrialization, commercialization or overall economic growth and development. This results in innumerable benefits of the quantity and quality of the urban environment and lifestyle but on the other hand contributes to unbounded development, urban sprawl, overcrowding and decreasing standard of living. Regulation and observation of urban development activities is crucial. The understanding of urban systems that promotes urban growth are also essential for the purpose of policy making, formulating development strategies as well as development plan preparation. This study aims to compare two different stochastic regression modeling techniques for spatial structure models of urban growth in the same specific study area. Both techniques will utilize the same datasets and their results will be analyzed. The work starts by producing an urban growth model by using stochastic regression modeling techniques namely the Ordinary Least Square (OLS) and Geographically Weighted Regression (GWR). The two techniques are compared to and it is found that, GWR seems to be a more significant stochastic regression model compared to OLS, it gives a smaller AICc (Akaike's Information Corrected Criterion) value and its output is more spatially explainable

  14. Iterative Weighted Semiparametric Least Squares Estimation in Repeated Measurement Partially Linear Regression Models

    Ge-mai Chen; Jin-hong You


    Consider a repeated measurement partially linear regression model with an unknown vector pasemiparametric generalized least squares estimator (SGLSE) ofβ, we propose an iterative weighted semiparametric least squares estimator (IWSLSE) and show that it improves upon the SGLSE in terms of asymptotic covariance matrix. An adaptive procedure is given to determine the number of iterations. We also show that when the number of replicates is less than or equal to two, the IWSLSE can not improve upon the SGLSE.These results are generalizations of those in [2] to the case of semiparametric regressions.

  15. A nonparametric dynamic additive regression model for longitudinal data

    Martinussen, Torben; Thomas H. Scheike


    In this work we study additive dynamic regression models for longitudinal data. These models provide a flexible and nonparametric method for investigating the time-dynamics of longitudinal data. The methodology is aimed at data where measurements are recorded at random time points. We model the conditional mean of responses given the full internal history and possibly time-varying covariates. We derive the asymptotic distribution for a new nonparametric least squares estimat...

  16. Bayesian and maximin optimal designs for heteroscedastic regression models

    Dette, Holger; Haines, Linda M.; Imhof, Lorens A.


    The problem of constructing standardized maximin D-optimal designs for weighted polynomial regression models is addressed. In particular it is shown that, by following the broad approach to the construction of maximin designs introduced recently by Dette, Haines and Imhof (2003), such designs can be obtained as weak limits of the corresponding Bayesian Φq-optimal designs. The approach is illustrated for two specific weighted polynomial models and also for a particular growth model.

  17. Steganalysis of LSB Image Steganography using Multiple Regression and Auto Regressive (AR Model

    Souvik Bhattacharyya


    Full Text Available The staggering growth in communication technologyand usage of public domain channels (i.e. Internet has greatly facilitated transfer of data. However, such open communication channelshave greater vulnerability to security threats causing unauthorizedin- formation access. Traditionally, encryption is used to realizethen communication security. However, important information is notprotected once decoded. Steganography is the art and science of communicating in a way which hides the existence of the communication.Important information is firstly hidden in a host data, such as digitalimage, text, video or audio, etc, and then transmitted secretly tothe receiver. Steganalysis is another important topic in informationhiding which is the art of detecting the presence of steganography. Inthis paper a novel technique for the steganalysis of Image has beenpresented. The proposed technique uses an auto-regressive model todetect the presence of the hidden messages, as well as to estimatethe relative length of the embedded messages.Various auto regressiveparameters are used to classify cover image as well as stego imagewith the help of a SVM classifier. Multiple Regression analysis ofthe cover carrier along with the stego carrier has been carried outin order to find out the existence of the negligible amount of thesecret message. Experimental results demonstrate the effectivenessand accuracy of the proposed technique.

  18. Modeling urban growth with geographically weighted multinomial logistic regression

    Luo, Jun; Kanala, Nagaraj Kapi


    Spatial heterogeneity is usually ignored in previous land use change studies. This paper presents a geographically weighted multinomial logistic regression model for investigating multiple land use conversion in the urban growth process. The proposed model makes estimation at each sample location and generates local coefficients of driving factors for land use conversion. A Gaussian function is used for determine the geographic weights guarantying that all other samples are involved in the calibration of the model for one location. A case study on Springfield metropolitan area is conducted. A set of independent variables are selected as driving factors. A traditional multinomial logistic regression model is set up and compared with the proposed model. Spatial variations of coefficients of independent variables are revealed by investigating the estimations at sample locations.

  19. Applications of some discrete regression models for count data

    B. M. Golam Kibria


    Full Text Available In this paper we have considered several regression models to fit the count data that encounter in the field of Biometrical, Environmental, Social Sciences and Transportation Engineering. We have fitted Poisson (PO, Negative Binomial (NB, Zero-Inflated Poisson (ZIP and Zero-Inflated Negative Binomial (ZINB regression models to run-off-road (ROR crash data which collected on arterial roads in south region (rural of Florida State. To compare the performance of these models, we analyzed data with moderate to high percentage of zero counts. Because the variances were almost three times greater than the means, it appeared that both NB and ZINB models performed better than PO and ZIP models for the zero inflated and over dispersed count data.

  20. Semiparametric Robust Estimation of Truncated and Censored Regression Models

    Cizek, P.


    Many estimation methods of truncated and censored regression models such as the maximum likelihood and symmetrically censored least squares (SCLS) are sensitive to outliers and data contamination as we document. Therefore, we propose a semipara- metric general trimmed estimator (GTE) of truncated an


    QianWeimin; LiYumei


    The parameter estimation and the coefficient of contamination for the regression models with repeated measures are studied when its response variables are contaminated by another random variable sequence. Under the suitable conditions it is proved that the estimators which are established in the paper are strongly consistent estimators.


    任浩波; 赵延孟; 李元; 谢衷洁


    Wavelets are applied to detect the jumps in a heteroscedastic regression model.It is shown that the wavelet coefficients of the data have significantly large absolute values across fine scale levels near the jump points. Then a procedure is developed to estimate the jumps and jump heights. All estimators are proved to be consistent.

  3. Linearity and Misspecification Tests for Vector Smooth Transition Regression Models

    Teräsvirta, Timo; Yang, Yukai

    The purpose of the paper is to derive Lagrange multiplier and Lagrange multiplier type specification and misspecification tests for vector smooth transition regression models. We report results from simulation studies in which the size and power properties of the proposed asymptotic tests in small...

  4. Time series regression model for infectious disease and weather.

    Imai, Chisato; Armstrong, Ben; Chalabi, Zaid; Mangtani, Punam; Hashizume, Masahiro


    Time series regression has been developed and long used to evaluate the short-term associations of air pollution and weather with mortality or morbidity of non-infectious diseases. The application of the regression approaches from this tradition to infectious diseases, however, is less well explored and raises some new issues. We discuss and present potential solutions for five issues often arising in such analyses: changes in immune population, strong autocorrelations, a wide range of plausible lag structures and association patterns, seasonality adjustments, and large overdispersion. The potential approaches are illustrated with datasets of cholera cases and rainfall from Bangladesh and influenza and temperature in Tokyo. Though this article focuses on the application of the traditional time series regression to infectious diseases and weather factors, we also briefly introduce alternative approaches, including mathematical modeling, wavelet analysis, and autoregressive integrated moving average (ARIMA) models. Modifications proposed to standard time series regression practice include using sums of past cases as proxies for the immune population, and using the logarithm of lagged disease counts to control autocorrelation due to true contagion, both of which are motivated from "susceptible-infectious-recovered" (SIR) models. The complexity of lag structures and association patterns can often be informed by biological mechanisms and explored by using distributed lag non-linear models. For overdispersed models, alternative distribution models such as quasi-Poisson and negative binomial should be considered. Time series regression can be used to investigate dependence of infectious diseases on weather, but may need modifying to allow for features specific to this context. PMID:26188633

  5. Estimation of soil cation exchange capacity using Genetic Expression Programming (GEP) and Multivariate Adaptive Regression Splines (MARS)

    Emamgolizadeh, S.; Bateni, S. M.; Shahsavani, D.; Ashrafi, T.; Ghorbani, H.


    The soil cation exchange capacity (CEC) is one of the main soil chemical properties, which is required in various fields such as environmental and agricultural engineering as well as soil science. In situ measurement of CEC is time consuming and costly. Hence, numerous studies have used traditional regression-based techniques to estimate CEC from more easily measurable soil parameters (e.g., soil texture, organic matter (OM), and pH). However, these models may not be able to adequately capture the complex and highly nonlinear relationship between CEC and its influential soil variables. In this study, Genetic Expression Programming (GEP) and Multivariate Adaptive Regression Splines (MARS) were employed to estimate CEC from more readily measurable soil physical and chemical variables (e.g., OM, clay, and pH) by developing functional relations. The GEP- and MARS-based functional relations were tested at two field sites in Iran. Results showed that GEP and MARS can provide reliable estimates of CEC. Also, it was found that the MARS model (with root-mean-square-error (RMSE) of 0.318 Cmol+ kg-1 and correlation coefficient (R2) of 0.864) generated slightly better results than the GEP model (with RMSE of 0.270 Cmol+ kg-1 and R2 of 0.807). The performance of GEP and MARS models was compared with two existing approaches, namely artificial neural network (ANN) and multiple linear regression (MLR). The comparison indicated that MARS and GEP outperformed the MLP model, but they did not perform as good as ANN. Finally, a sensitivity analysis was conducted to determine the most and the least influential variables affecting CEC. It was found that OM and pH have the most and least significant effect on CEC, respectively.

  6. A regression model to estimate regional ground water recharge.

    Lorenz, David L; Delin, Geoffrey N


    A regional regression model was developed to estimate the spatial distribution of ground water recharge in subhumid regions. The regional regression recharge (RRR) model was based on a regression of basin-wide estimates of recharge from surface water drainage basins, precipitation, growing degree days (GDD), and average basin specific yield (SY). Decadal average recharge, precipitation, and GDD were used in the RRR model. The RRR estimates were derived from analysis of stream base flow using a computer program that was based on the Rorabaugh method. As expected, there was a strong correlation between recharge and precipitation. The model was applied to statewide data in Minnesota. Where precipitation was least in the western and northwestern parts of the state (50 to 65 cm/year), recharge computed by the RRR model also was lowest (0 to 5 cm/year). A strong correlation also exists between recharge and SY. SY was least in areas where glacial lake clay occurs, primarily in the northwest part of the state; recharge estimates in these areas were in the 0- to 5-cm/year range. In sand-plain areas where SY is greatest, recharge estimates were in the 15- to 29-cm/year range on the basis of the RRR model. Recharge estimates that were based on the RRR model compared favorably with estimates made on the basis of other methods. The RRR model can be applied in other subhumid regions where region wide data sets of precipitation, streamflow, GDD, and soils data are available. PMID:17335484

  7. Regression model for Quality of Web Services dataset with WEKA

    Shalini Gambhir


    Full Text Available The Waikato Environment for Knowledge Analysis (WEKA came about through the perceived need for a unified workbench that would allow researchers easy access to state-of the-art techniques in machine learning algorithms for data mining tasks. It provides a general-purpose environment for automatic classification, regression, clustering, and feature selection etc. in various research areas. This paper provides an introduction to the WEKA workbench and briefly discusses regression model for some of the quality of web service parameters.

  8. A regressive model of isochronism in speech units

    Jassem, W.; Krzysko, M.; Stolarski, P.


    To define linguistic isochronism in quantitative terms, a statistical regressive method of analyzing the number of rhythmic units in human speech was employed. The material used was two taped texts spoken in standard British English totaling approximately 2,500 sounds. The sounds were divided into statistically homogeneous classes, and the mean values in each class were utilized in regressive models. Abercrombie's theory of speech rhythm postulating anacrusis and Jassem's theory postulating two types of speech units, anacrusis and a rhythmic unit in the strict sense, were tested using this material.

  9. Using regression models to determine the poroelastic properties of cartilage.

    Chung, Chen-Yuan; Mansour, Joseph M


    The feasibility of determining biphasic material properties using regression models was investigated. A transversely isotropic poroelastic finite element model of stress relaxation was developed and validated against known results. This model was then used to simulate load intensity for a wide range of material properties. Linear regression equations for load intensity as a function of the five independent material properties were then developed for nine time points (131, 205, 304, 390, 500, 619, 700, 800, and 1000s) during relaxation. These equations illustrate the effect of individual material property on the stress in the time history. The equations at the first four time points, as well as one at a later time (five equations) could be solved for the five unknown material properties given computed values of the load intensity. Results showed that four of the five material properties could be estimated from the regression equations to within 9% of the values used in simulation if time points up to 1000s are included in the set of equations. However, reasonable estimates of the out of plane Poisson's ratio could not be found. Although all regression equations depended on permeability, suggesting that true equilibrium was not realized at 1000s of simulation, it was possible to estimate material properties to within 10% of the expected values using equations that included data up to 800s. This suggests that credible estimates of most material properties can be obtained from tests that are not run to equilibrium, which is typically several thousand seconds. PMID:23796400

  10. A Regression Analysis Model Based on Wavelet Networks

    XIONG Zheng-feng


    In this paper, an approach is proposed to combine wavelet networks and techniques of regression analysis. The resulting wavelet regression estimator is well suited for regression estimation of moderately large dimension, in particular for regressions with localized irregularities.

  11. Gaussian Process Models for Nonparametric Functional Regression with Functional Responses



    Recently nonparametric functional model with functional responses has been proposed within the functional reproducing kernel Hilbert spaces (fRKHS) framework. Motivated by its superior performance and also its limitations, we propose a Gaussian process model whose posterior mode coincide with the fRKHS estimator. The Bayesian approach has several advantages compared to its predecessor. Firstly, the multiple unknown parameters can be inferred together with the regression function in a unified ...

  12. CICAAR - Convolutive ICA with an Auto-Regressive Inverse Model

    Dyrholm, Mads; Hansen, Lars Kai


    We invoke an auto-regressive IIR inverse model for convolutive ICA and derive expressions for the likelihood and its gradient. We argue that optimization will give a stable inverse. When there are more sensors than sources the mixing model parameters are estimated in a second step by least square...... estimation. We demonstrate the method on synthetic data and finally separate speech and music in a real room recording....

  13. Single and multiple index functional regression models with nonparametric link

    Chen, Dong; Hall, Peter; Müller, Hans-Georg


    Fully nonparametric methods for regression from functional data have poor accuracy from a statistical viewpoint, reflecting the fact that their convergence rates are slower than nonparametric rates for the estimation of high-dimensional functions. This difficulty has led to an emphasis on the so-called functional linear model, which is much more flexible than common linear models in finite dimension, but nevertheless imposes structural constraints on the relationship between predictors and re...

  14. Regression and ARIMA hybrid model for new bug prediction

    Madhur Srivastava; Dr.Dharmendra Badal; Ratnesh Kumar Jain


    A multiple linear regression and ARIMA hybrid model is proposed for new bug prediction depending upon resolved bugs and other available parameters of the open source software bug report. Analysis of last five year bug report data of a open source software “worldcontrol” is done to identify the trends followed by various parameters. Bug report data has been categorized on monthly basis and forecast is also on monthly basis. Model accounts for the parameters such as resolved, assigned, reopened...

  15. Maximin and Bayesian Optimal Designs for Regression Models

    Dette, Holger; Haines, Linda M.; Imhof, Lorens A.


    For many problems of statistical inference in regression modelling, the Fisher information matrix depends on certain nuisance parameters which are unknown and which enter the model nonlinearly. A common strategy to deal with this problem within the context of design is to construct maximin optimal designs as those designs which maximize the minimum value of a real valued (standardized) function of the Fisher information matrix, where the minimum is taken over a specified range of the unknown ...

  16. Transpiration of glasshouse rose crops: evaluation of regression models

    Baas, R.; Rijssel, van, E.


    Regression models of transpiration (T) based on global radiation inside the greenhouse (G), with or without energy input from heating pipes (Eh) and/or vapor pressure deficit (VPD) were parameterized. Therefore, data on T, G, temperatures from air, canopy and heating pipes, and VPD from both a lysimeter experiment and from a cut rose grower were analyzed. Based on daily integrals, all T models showed good fits due to the dominant effect of global radiation G (solar + supplementary radiation) ...

  17. CICAAR - Convolutive ICA with an Auto-Regressive Inverse Model

    Dyrholm, Mads; Hansen, Lars Kai


    We invoke an auto-regressive IIR inverse model for convolutive ICA and derive expressions for the likelihood and its gradient. We argue that optimization will give a stable inverse. When there are more sensors than sources the mixing model parameters are estimated in a second step by least squares estimation. We demonstrate the method on synthetic data and finally separate speech and music in a real room recording.


    Siana Halim


    Full Text Available Production plants of a company are located in several areas that spread across Middle and East Java. As the production process employs mostly manpower, we suspected that each location has different characteristics affecting the productivity. Thus, the production data may have a spatial and hierarchical structure. For fitting a linear regression using the ordinary techniques, we are required to make some assumptions about the nature of the residuals i.e. independent, identically and normally distributed. However, these assumptions were rarely fulfilled especially for data that have a spatial and hierarchical structure. We worked out the problem using mixed effect model. This paper discusses the model construction of productivity and several characteristics in the production line by taking location as a random effect. The simple model with high utility that satisfies the necessary regression assumptions was built using a free statistic software R version 2.6.1.

  19. Predicting heartbeat arrival time for failure detection over internet using auto-regressive exogenous model

    Zhao Haijun; Ma Yan; Huang Xiaohong; Su Yujie


    Predicting heartbeat message arrival time is crucial for the quality of failure detection service over internet. However, internet dynamic characteristics make it very difficult to understand message behavior and accurately predict heartbeat arrival time. To solve this problem, a novel black-box model is proposed to predict the next heartbeat arrival time. Heartbeat arrival time is modeled as auto-regressive process, heartbeat sending time is modeled as exogenous variable, the model's coefficients are estimated based on the sliding window of observations and this result is used to predict the next heartbeat arrival time. Simulation shows that this adaptive auto-regressive exogenous (ARX) model can accurately capture heartbeat arrival dynamics and minimize prediction error in different network environments.

  20. Electricity consumption forecasting in Italy using linear regression models

    The influence of economic and demographic variables on the annual electricity consumption in Italy has been investigated with the intention to develop a long-term consumption forecasting model. The time period considered for the historical data is from 1970 to 2007. Different regression models were developed, using historical electricity consumption, gross domestic product (GDP), gross domestic product per capita (GDP per capita) and population. A first part of the paper considers the estimation of GDP, price and GDP per capita elasticities of domestic and non-domestic electricity consumption. The domestic and non-domestic short run price elasticities are found to be both approximately equal to -0.06, while long run elasticities are equal to -0.24 and -0.09, respectively. On the contrary, the elasticities of GDP and GDP per capita present higher values. In the second part of the paper, different regression models, based on co-integrated or stationary data, are presented. Different statistical tests are employed to check the validity of the proposed models. A comparison with national forecasts, based on complex econometric models, such as Markal-Time, was performed, showing that the developed regressions are congruent with the official projections, with deviations of ±1% for the best case and ±11% for the worst. These deviations are to be considered acceptable in relation to the time span taken into account. (author)

  1. Regression modeling strategies with applications to linear models, logistic and ordinal regression, and survival analysis

    Harrell , Jr , Frank E


    This highly anticipated second edition features new chapters and sections, 225 new references, and comprehensive R software. In keeping with the previous edition, this book is about the art and science of data analysis and predictive modeling, which entails choosing and using multiple tools. Instead of presenting isolated techniques, this text emphasizes problem solving strategies that address the many issues arising when developing multivariable models using real data and not standard textbook examples. It includes imputation methods for dealing with missing data effectively, methods for fitting nonlinear relationships and for making the estimation of transformations a formal part of the modeling process, methods for dealing with "too many variables to analyze and not enough observations," and powerful model validation techniques based on the bootstrap.  The reader will gain a keen understanding of predictive accuracy, and the harm of categorizing continuous predictors or outcomes.  This text realistically...

  2. Regression Model to Predict Global Solar Irradiance in Malaysia

    Hairuniza Ahmed Kutty


    Full Text Available A novel regression model is developed to estimate the monthly global solar irradiance in Malaysia. The model is developed based on different available meteorological parameters, including temperature, cloud cover, rain precipitate, relative humidity, wind speed, pressure, and gust speed, by implementing regression analysis. This paper reports on the details of the analysis of the effect of each prediction parameter to identify the parameters that are relevant to estimating global solar irradiance. In addition, the proposed model is compared in terms of the root mean square error (RMSE, mean bias error (MBE, and the coefficient of determination (R2 with other models available from literature studies. Seven models based on single parameters (PM1 to PM7 and five multiple-parameter models (PM7 to PM12 are proposed. The new models perform well, with RMSE ranging from 0.429% to 1.774%, R2 ranging from 0.942 to 0.992, and MBE ranging from −0.1571% to 0.6025%. In general, cloud cover significantly affects the estimation of global solar irradiance. However, cloud cover in Malaysia lacks sufficient influence when included into multiple-parameter models although it performs fairly well in single-parameter prediction models.

  3. [A Brillouin Scattering Spectrum Feature Extraction Based on Flies Optimization Algorithm with Adaptive Mutation and Generalized Regression Neural Network].

    Zhang, Yan-jun; Liu, Wen-zhe; Fu, Xing-hu; Bi, Wei-hong


    According to the high precision extracting characteristics of scattering spectrum in Brillouin optical time domain reflection optical fiber sensing system, this paper proposes a new algorithm based on flies optimization algorithm with adaptive mutation and generalized regression neural network. The method takes advantages of the generalized regression neural network which has the ability of the approximation ability, learning speed and generalization of the model. Moreover, by using the strong search ability of flies optimization algorithm with adaptive mutation, it can enhance the learning ability of the neural network. Thus the fitting degree of Brillouin scattering spectrum and the extraction accuracy of frequency shift is improved. Model of actual Brillouin spectrum are constructed by Gaussian white noise on theoretical spectrum, whose center frequency is 11.213 GHz and the linewidths are 40-50, 30-60 and 20-70 MHz, respectively. Comparing the algorithm with the Levenberg-Marquardt fitting method based on finite element analysis, hybrid algorithm particle swarm optimization, Levenberg-Marquardt and the least square method, the maximum frequency shift error of the new algorithm is 0.4 MHz, the fitting degree is 0.991 2 and the root mean square error is 0.024 1. The simulation results show that the proposed algorithm has good fitting degree and minimum absolute error. Therefore, the algorithm can be used on distributed optical fiber sensing system based on Brillouin optical time domain reflection, which can improve the fitting of Brillouin scattering spectrum and the precision of frequency shift extraction effectively. PMID:26904844

  4. Fuzzy and Regression Modelling of Hard Milling Process

    A. Tamilarasan


    Full Text Available The present study highlights the application of box-behnken design coupled with fuzzy and regression modeling approach for making expert system in hard milling process to improve the process performance with systematic reduction of production cost. The important input fields of work piece hardness, nose radius, feed per tooth, radial depth of cut and axial depth cut were considered. The cutting forces, work surface temperature and sound pressure level were identified as key index of machining outputs. The results indicate that the fuzzy logic and regression modeling technique can be effectively used for the prediction of desired responses with less average error variation. Predicted results were verified by experiments and shown the good potential characteristics of the developed system for automated machining environment.

  5. Procedure for Detecting Outliers in a Circular Regression Model

    Rambli, Adzhar; Abuzaid, Ali H. M.; Mohamed, Ibrahim Bin; Hussin, Abdul Ghapor


    A number of circular regression models have been proposed in the literature. In recent years, there is a strong interest shown on the subject of outlier detection in circular regression. An outlier detection procedure can be developed by defining a new statistic in terms of the circular residuals. In this paper, we propose a new measure which transforms the circular residuals into linear measures using a trigonometric function. We then employ the row deletion approach to identify observations that affect the measure the most, a candidate of outlier. The corresponding cut-off points and the performance of the detection procedure when applied on Down and Mardia’s model are studied via simulations. For illustration, we apply the procedure on circadian data. PMID:27064566

  6. Assessing the Usability of Predictions of Different Regression Models

    Šťastný, J.; Holeňa, Martin

    Seňa: Pont, 2010 - (Pardubská, D.), s. 93-98 ISBN 978-80-970179-3-4. [ITAT 2010. Conference on Theory and Practice of Information Technologies. Smrekovica (SK), 21.09.2010-25.09.2010] R&D Projects: GA ČR GA201/08/0802 Institutional research plan: CEZ:AV0Z10300504 Keywords : regression models * confidence of predictions * confidence intervals * transductive inference * sensitivity analysis Subject RIV: IN - Informatics, Computer Science

  7. Bootstrapping heteroskedastic regression models: wild bootstrap vs. pairs bootstrap

    Flachaire, Emmanuel


    International audience In regression models, appropriate bootstrap methods for inference robust to heteroskedasticity of unknown form are the wild bootstrap and the pairs bootstrap. The finite sample performance of a heteroskedastic-robust test is investigated with Monte Carlo experiments. The simulation results suggest that one specific version of the wild bootstrap outperforms the other versions of the wild bootstrap and of the pairs bootstrap. It is the only one for which the bootstrap ...

  8. Central limit theorem of linear regression model under right censorship

    HE; Shuyuan(何书元); HUANG; Xiang(Heung; Wong)(黄香)


    In this paper, the estimation of joint distribution F(y,z) of (Y, Z) and the estimation in thelinear regression model Y = b′Z + ε for complete data are extended to that of the right censored data. Theregression parameter estimates of b and the variance of ε are weighted least square estimates with randomweights. The central limit theorems of the estimators are obtained under very weak conditions and the derivedasymptotic variance has a very simple form.


    Jahani, Mohammad Ali; Yaminfirooz, Mousa; Siamian, Hasan


    Background: The purpose of this study was to drawing a regression model of organizational climate of central libraries of Iran’s universities. Methods: This study is an applied research. The statistical population of this study consisted of 96 employees of the central libraries of Iran’s public universities selected among the 117 universities affiliated to the Ministry of Health by Stratified Sampling method (510 people). Climate Qual localized questionnaire was used as research tools. For pr...

  10. On relationship between regression models and interpretation of multiple regression coefficients

    A N Varaksin; Panov, V. G.


    In this paper, we consider the problem of treating linear regression equation coefficients in the case of correlated predictors. It is shown that in general there are no natural ways of interpreting these coefficients similar to the case of single predictor. Nevertheless we suggest linear transformations of predictors, reducing multiple regression to a simple one and retaining the coefficient at variable of interest. The new variable can be treated as the part of the old variable that has no ...

  11. K factor estimation in distribution transformers using linear regression models

    Juan Miguel Astorga Gómez


    Full Text Available Background: Due to massive incorporation of electronic equipment to distribution systems, distribution transformers are subject to operation conditions other than the design ones, because of the circulation of harmonic currents. It is necessary to quantify the effect produced by these harmonic currents to determine the capacity of the transformer to withstand these new operating conditions. The K-factor is an indicator that estimates the ability of a transformer to withstand the thermal effects caused by harmonic currents. This article presents a linear regression model to estimate the value of the K-factor, from total current harmonic content obtained with low-cost equipment.Method: Two distribution transformers that feed different loads are studied variables, current total harmonic distortion factor K are recorded, and the regression model that best fits the data field is determined. To select the regression model the coefficient of determination R2 and the Akaike Information Criterion (AIC are used. With the selected model, the K-factor is estimated to actual operating conditions.Results: Once determined the model it was found that for both agricultural cargo and industrial mining, present harmonic content (THDi exceeds the values that these transformers can drive (average of 12.54% and minimum 8,90% in the case of agriculture and average value of 18.53% and a minimum of 6.80%, for industrial mining case.Conclusions: When estimating the K factor using polynomial models it was determined that studied transformers can not withstand the current total harmonic distortion of their current loads. The appropriate K factor for studied transformer should be 4; this allows transformers support the current total harmonic distortion of their respective loads.

  12. General bound of overfitting for MLP regression models

    Rynkiewicz, Joseph


    Multilayer perceptrons (MLP) with one hidden layer have been used for a long time to deal with non-linear regression. However, in some task, MLP's are too powerful models and a small mean square error (MSE) may be more due to overfitting than to actual modelling. If the noise of the regression model is Gaussian, the overfitting of the model is totally determined by the behavior of the likelihood ratio test statistic (LRTS), however in numerous cases the assumption of normality of the noise is arbitrary if not false. In this paper, we present an universal bound for the overfitting of such model under weak assumptions, this bound is valid without Gaussian or identifiability assumptions. The main application of this bound is to give a hint about determining the true architecture of the MLP model when the number of data goes to infinite. As an illustration, we use this theoretical result to propose and compare effective criteria to find the true architecture of an MLP.

  13. Reconstruction of missing daily streamflow data using dynamic regression models

    Tencaliec, Patricia; Favre, Anne-Catherine; Prieur, Clémentine; Mathevet, Thibault


    River discharge is one of the most important quantities in hydrology. It provides fundamental records for water resources management and climate change monitoring. Even very short data-gaps in this information can cause extremely different analysis outputs. Therefore, reconstructing missing data of incomplete data sets is an important step regarding the performance of the environmental models, engineering, and research applications, thus it presents a great challenge. The objective of this paper is to introduce an effective technique for reconstructing missing daily discharge data when one has access to only daily streamflow data. The proposed procedure uses a combination of regression and autoregressive integrated moving average models (ARIMA) called dynamic regression model. This model uses the linear relationship between neighbor and correlated stations and then adjusts the residual term by fitting an ARIMA structure. Application of the model to eight daily streamflow data for the Durance river watershed showed that the model yields reliable estimates for the missing data in the time series. Simulation studies were also conducted to evaluate the performance of the procedure.

  14. Regularized multivariate regression models with skew-t error distributions

    Chen, Lianfu


    We consider regularization of the parameters in multivariate linear regression models with the errors having a multivariate skew-t distribution. An iterative penalized likelihood procedure is proposed for constructing sparse estimators of both the regression coefficient and inverse scale matrices simultaneously. The sparsity is introduced through penalizing the negative log-likelihood by adding L1-penalties on the entries of the two matrices. Taking advantage of the hierarchical representation of skew-t distributions, and using the expectation conditional maximization (ECM) algorithm, we reduce the problem to penalized normal likelihood and develop a procedure to minimize the ensuing objective function. Using a simulation study the performance of the method is assessed, and the methodology is illustrated using a real data set with a 24-dimensional response vector. © 2014 Elsevier B.V.

  15. Utilization of geographically weighted regression (GWR) in forestry modeling

    Quirós-Segovia, María


    The diploma thesis is focused on the application of the Geographically Weighted Regression (GWR) in forestry models. This is a prospective method for coping with spatially heterogeneous data. In forestry, this method has been used previously in small areas with good results, but in this diploma thesis it is applied to a bigger area in the Region of Murcia, Spain. Main goal of the thesis is to evaluate GWR for developing of large scale height-diameter model based on data of National Forest Inv...

  16. Interpreting parameters in the logistic regression model with random effects

    Larsen, Klaus; Petersen, Jørgen Holm; Budtz-Jørgensen, Esben;


    interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects......interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects...

  17. Multivariate Frequency-Severity Regression Models in Insurance

    Edward W. Frees


    Full Text Available In insurance and related industries including healthcare, it is common to have several outcome measures that the analyst wishes to understand using explanatory variables. For example, in automobile insurance, an accident may result in payments for damage to one’s own vehicle, damage to another party’s vehicle, or personal injury. It is also common to be interested in the frequency of accidents in addition to the severity of the claim amounts. This paper synthesizes and extends the literature on multivariate frequency-severity regression modeling with a focus on insurance industry applications. Regression models for understanding the distribution of each outcome continue to be developed yet there now exists a solid body of literature for the marginal outcomes. This paper contributes to this body of literature by focusing on the use of a copula for modeling the dependence among these outcomes; a major advantage of this tool is that it preserves the body of work established for marginal models. We illustrate this approach using data from the Wisconsin Local Government Property Insurance Fund. This fund offers insurance protection for (i property; (ii motor vehicle; and (iii contractors’ equipment claims. In addition to several claim types and frequency-severity components, outcomes can be further categorized by time and space, requiring complex dependency modeling. We find significant dependencies for these data; specifically, we find that dependencies among lines are stronger than the dependencies between the frequency and average severity within each line.

  18. G/SPLINES: A hybrid of Friedman's Multivariate Adaptive Regression Splines (MARS) algorithm with Holland's genetic algorithm

    Rogers, David


    G/SPLINES are a hybrid of Friedman's Multivariable Adaptive Regression Splines (MARS) algorithm with Holland's Genetic Algorithm. In this hybrid, the incremental search is replaced by a genetic search. The G/SPLINE algorithm exhibits performance comparable to that of the MARS algorithm, requires fewer least squares computations, and allows significantly larger problems to be considered.

  19. Modeling of the Monthly Rainfall-Runoff Process Through Regressions

    Campos-Aranda Daniel Francisco


    Full Text Available To solve the problems associated with the assessment of water resources of a river, the modeling of the rainfall-runoff process (RRP allows the deduction of runoff missing data and to extend its record, since generally the information available on precipitation is larger. It also enables the estimation of inputs to reservoirs, when their building led to the suppression of the gauging station. The simplest mathematical model that can be set for the RRP is the linear regression or curve on a monthly basis. Such a model is described in detail and is calibrated with the simultaneous record of monthly rainfall and runoff in Ballesmi hydrometric station, which covers 35 years. Since the runoff of this station has an important contribution from the spring discharge, the record is corrected first by removing that contribution. In order to do this a procedure was developed based either on the monthly average regional runoff coefficients or on nearby and similar watershed; in this case the Tancuilín gauging station was used. Both stations belong to the Partial Hydrologic Region No. 26 (Lower Rio Panuco and are located within the state of San Luis Potosi, México. The study performed indicates that the monthly regression model, due to its conceptual approach, faithfully reproduces monthly average runoff volumes and achieves an excellent approximation in relation to the dispersion, proved by calculation of the means and standard deviations.

  20. Genetic evaluation of European quails by random regression models

    Flaviana Miranda Gonçalves


    Full Text Available The objective of this study was to compare different random regression models, defined from different classes of heterogeneity of variance combined with different Legendre polynomial orders for the estimate of (covariance of quails. The data came from 28,076 observations of 4,507 female meat quails of the LF1 lineage. Quail body weights were determined at birth and 1, 14, 21, 28, 35 and 42 days of age. Six different classes of residual variance were fitted to Legendre polynomial functions (orders ranging from 2 to 6 to determine which model had the best fit to describe the (covariance structures as a function of time. According to the evaluated criteria (AIC, BIC and LRT, the model with six classes of residual variances and of sixth-order Legendre polynomial was the best fit. The estimated additive genetic variance increased from birth to 28 days of age, and dropped slightly from 35 to 42 days. The heritability estimates decreased along the growth curve and changed from 0.51 (1 day to 0.16 (42 days. Animal genetic and permanent environmental correlation estimates between weights and age classes were always high and positive, except for birth weight. The sixth order Legendre polynomial, along with the residual variance divided into six classes was the best fit for the growth rate curve of meat quails; therefore, they should be considered for breeding evaluation processes by random regression models.

  1. Dynamic Regression Intervention Modeling for the Malaysian Daily Load

    Fadhilah Abdrazak


    Full Text Available Malaysia is a unique country due to having both fixed and moving holidays.  These moving holidays may overlap with other fixed holidays and therefore, increase the complexity of the load forecasting activities. The errors due to holidays’ effects in the load forecasting are known to be higher than other factors.  If these effects can be estimated and removed, the behavior of the series could be better viewed.  Thus, the aim of this paper is to improve the forecasting errors by using a dynamic regression model with intervention analysis.   Based on the linear transfer function method, a daily load model consists of either peak or average is developed.  The developed model outperformed the seasonal ARIMA model in estimating the fixed and moving holidays’ effects and achieved a smaller Mean Absolute Percentage Error (MAPE in load forecast.

  2. On relationship between regression models and interpretation of multiple regression coefficients

    Varaksin, A N


    In this paper, we consider the problem of treating linear regression equation coefficients in the case of correlated predictors. It is shown that in general there are no natural ways of interpreting these coefficients similar to the case of single predictor. Nevertheless we suggest linear transformations of predictors, reducing multiple regression to a simple one and retaining the coefficient at variable of interest. The new variable can be treated as the part of the old variable that has no linear statistical dependence on other presented variables.

  3. The application of Dynamic Linear Bayesian Models in hydrological forecasting: Varying Coefficient Regression and Discount Weighted Regression

    Ciupak, Maurycy; Ozga-Zielinski, Bogdan; Adamowski, Jan; Quilty, John; Khalil, Bahaa


    A novel implementation of Dynamic Linear Bayesian Models (DLBM), using either a Varying Coefficient Regression (VCR) or a Discount Weighted Regression (DWR) algorithm was used in the hydrological modeling of annual hydrographs as well as 1-, 2-, and 3-day lead time stream flow forecasting. Using hydrological data (daily discharge, rainfall, and mean, maximum and minimum air temperatures) from the Upper Narew River watershed in Poland, the forecasting performance of DLBM was compared to that of traditional multiple linear regression (MLR) and more recent artificial neural network (ANN) based models. Model performance was ranked DLBM-DWR > DLBM-VCR > MLR > ANN for both annual hydrograph modeling and 1-, 2-, and 3-day lead forecasting, indicating that the DWR and VCR algorithms, operating in a DLBM framework, represent promising new methods for both annual hydrograph modeling and short-term stream flow forecasting.

  4. Estimation of reference evapotranspiration using multivariate fractional polynomial, Bayesian regression, and robust regression models in three arid environments

    Khoshravesh, Mojtaba; Sefidkouhi, Mohammad Ali Gholami; Valipour, Mohammad


    The proper evaluation of evapotranspiration is essential in food security investigation, farm management, pollution detection, irrigation scheduling, nutrient flows, carbon balance as well as hydrologic modeling, especially in arid environments. To achieve sustainable development and to ensure water supply, especially in arid environments, irrigation experts need tools to estimate reference evapotranspiration on a large scale. In this study, the monthly reference evapotranspiration was estimated by three different regression models including the multivariate fractional polynomial (MFP), robust regression, and Bayesian regression in Ardestan, Esfahan, and Kashan. The results were compared with Food and Agriculture Organization (FAO)-Penman-Monteith (FAO-PM) to select the best model. The results show that at a monthly scale, all models provided a closer agreement with the calculated values for FAO-PM (R 2 > 0.95 and RMSE < 12.07 mm month-1). However, the MFP model gives better estimates than the other two models for estimating reference evapotranspiration at all stations.

  5. Regression modeling for digital test of ΣΔ modulators

    Léger, G; Rueda, Adoración


    The cost of Analogue and Mixed-Signal circuit testing is an important bottleneck in the industry, due to time-consuming verification of specifications that require state-ofthe- art Automatic Test Equipment. In this paper, we apply the concept of Alternate Test to achieve digital testing of converters. By training an ensemble of regression models that maps simple digital defect-oriented signatures onto Signal to Noise and Distortion Ratio (SNDR), an average error of 1:7% is achieved. Beyond th...

  6. Prototype of an adaptive disruption predictor for JET based on fuzzy logic and regression trees

    Disruptions remain one of the most hazardous events in the operation of a tokamak device, since they can cause damage to the vacuum vessel and surrounding structures. Their potential danger increases with the plasma volume and energy content and therefore they will constitute an even more serious issue for the next generation of machines. For these reasons, in the recent years a lot of attention has been devoted to devise predictors, capable of foreseeing the imminence of a disruption sufficiently in advance, to allow time for undertaking remedial actions. In this paper, the results of applying fuzzy logic and classification and regression trees (CART) to the problem of predicting disruptions at JET are reported. The conceptual tools of fuzzy logic, in addition to being well suited to accommodate the opinion of experts even if not formulated in mathematical but linguistic terms, are also fully transparent, since their governing rules are human defined. They can therefore help not only in forecasting disruptions but also in studying their behaviour. The analysis leading to the rules of the fuzzy predictor has been complemented with a systematic investigation of the correlation between the various experimental signals and the imminence of a disruption. This has been performed with an exhaustive, non-linear and unbiased method based on decision trees. This investigation has confirmed that the relative importance of various signals can change significantly depending on the plasma conditions. On the basis of the results provided by CART on the information content of the various quantities, the prototype of an adaptive fuzzy logic predictor was trained and tested on JET database. Its performance is significantly better than the previous static one, proving that more flexible prediction strategies, not uniform over the whole discharge but tuned to the operational region of the plasma at any given time, can be very competitive and should be investigated systematically

  7. A flexible count data regression model for risk analysis.

    Guikema, Seth D; Coffelt, Jeremy P; Goffelt, Jeremy P


    In many cases, risk and reliability analyses involve estimating the probabilities of discrete events such as hardware failures and occurrences of disease or death. There is often additional information in the form of explanatory variables that can be used to help estimate the likelihood of different numbers of events in the future through the use of an appropriate regression model, such as a generalized linear model. However, existing generalized linear models (GLM) are limited in their ability to handle the types of variance structures often encountered in using count data in risk and reliability analysis. In particular, standard models cannot handle both underdispersed data (variance less than the mean) and overdispersed data (variance greater than the mean) in a single coherent modeling framework. This article presents a new GLM based on a reformulation of the Conway-Maxwell Poisson (COM) distribution that is useful for both underdispersed and overdispersed count data and demonstrates this model by applying it to the assessment of electric power system reliability. The results show that the proposed COM GLM can provide as good of fits to data as the commonly used existing models for overdispered data sets while outperforming these commonly used models for underdispersed data sets. PMID:18304118

  8. Evaluating sediment chemistry and toxicity data using logistic regression modeling

    Field, L.J.; MacDonald, D.D.; Norton, S.B.; Severn, C.G.; Ingersoll, C.G.


    This paper describes the use of logistic-regression modeling for evaluating matching sediment chemistry and toxicity data. Contaminant- specific logistic models were used to estimate the percentage of samples expected to be toxic at a given concentration. These models enable users to select the probability of effects of concern corresponding to their specific assessment or management objective or to estimate the probability of observing specific biological effects at any contaminant concentration. The models were developed using a large database (n = 2,524) of matching saltwater sediment chemistry and toxicity data for field-collected samples compiled from a number of different sources and geographic areas. The models for seven chemicals selected as examples showed a wide range in goodness of fit, reflecting high variability in toxicity at low concentrations and limited data on toxicity at higher concentrations for some chemicals. The models for individual test endpoints (e.g., amphipod mortality) provided a better fit to the data than the models based on all endpoints combined. A comparison of the relative sensitivity of two amphipod species to specific contaminants illustrated an important application of the logistic model approach.

  9. Approximation by randomly weighting method in censored regression model


    Censored regression ("Tobit") models have been in common use, and their linear hypothesis testings have been widely studied. However, the critical values of these tests are usually related to quantities of an unknown error distribution and estimators of nuisance parameters. In this paper, we propose a randomly weighting test statistic and take its conditional distribution as an approximation to null distribution of the test statistic. It is shown that, under both the null and local alternative hypotheses, conditionally asymptotic distribution of the randomly weighting test statistic is the same as the null distribution of the test statistic. Therefore, the critical values of the test statistic can be obtained by randomly weighting method without estimating the nuisance parameters. At the same time, we also achieve the weak consistency and asymptotic normality of the randomly weighting least absolute deviation estimate in censored regression model. Simulation studies illustrate that the per-formance of our proposed resampling test method is better than that of central chi-square distribution under the null hypothesis.

  10. Regression Models for Predicting Force Coefficients of Aerofoils

    Mohammed ABDUL AKBAR


    Full Text Available Renewable sources of energy are attractive and advantageous in a lot of different ways. Among the renewable energy sources, wind energy is the fastest growing type. Among wind energy converters, Vertical axis wind turbines (VAWTs have received renewed interest in the past decade due to some of the advantages they possess over their horizontal axis counterparts. VAWTs have evolved into complex 3-D shapes. A key component in predicting the output of VAWTs through analytical studies is obtaining the values of lift and drag coefficients which is a function of shape of the aerofoil, ‘angle of attack’ of wind and Reynolds’s number of flow. Sandia National Laboratories have carried out extensive experiments on aerofoils for the Reynolds number in the range of those experienced by VAWTs. The volume of experimental data thus obtained is huge. The current paper discusses three Regression analysis models developed wherein lift and drag coefficients can be found out using simple formula without having to deal with the bulk of the data. Drag coefficients and Lift coefficients were being successfully estimated by regression models with R2 values as high as 0.98.

  11. Approximation by randomly weighting method in censored regression model

    WANG ZhanFeng; WU YaoHua; ZHAO LinCheng


    Censored regression ("Tobit") models have been in common use,and their linear hypothesis testings have been widely studied.However,the critical values of these tests are usually related to quantities of an unknown error distribution and estimators of nuisance parameters.In this paper,we propose a randomly weighting test statistic and take its conditional distribution as an approximation to null distribution of the test statistic.It is shown that,under both the null and local alternative hypotheses,conditionally asymptotic distribution of the randomly weighting test statistic is the same as the null distribution of the test statistic.Therefore,the critical values of the test statistic can be obtained by randomly weighting method without estimating the nuisance parameters.At the same time,we also achieve the weak consistency and asymptotic normality of the randomly weighting least absolute deviation estimate in censored regression model.Simulation studies illustrate that the performance of our proposed resampling test method is better than that of central chi-square distribution under the null hypothesis.

  12. Empirical likelihood ratio tests for multivariate regression models

    WU Jianhong; ZHU Lixing


    This paper proposes some diagnostic tools for checking the adequacy of multivariate regression models including classical regression and time series autoregression. In statistical inference, the empirical likelihood ratio method has been well known to be a powerful tool for constructing test and confidence region. For model checking, however, the naive empirical likelihood (EL) based tests are not of Wilks' phenomenon. Hence, we make use of bias correction to construct the EL-based score tests and derive a nonparametric version of Wilks' theorem. Moreover, by the advantages of both the EL and score test method, the EL-based score tests share many desirable features as follows: They are self-scale invariant and can detect the alternatives that converge to the null at rate n-1/2, the possibly fastest rate for lack-of-fit testing; they involve weight functions, which provides us with the flexibility to choose scores for improving power performance, especially under directional alternatives. Furthermore, when the alternatives are not directional, we construct asymptotically distribution-free maximin tests for a large class of possible alternatives. A simulation study is carried out and an application for a real dataset is analyzed.

  13. The Application of the Cumulative Logistic Regression Model to Automated Essay Scoring

    Haberman, Shelby J.; Sinharay, Sandip


    Most automated essay scoring programs use a linear regression model to predict an essay score from several essay features. This article applied a cumulative logit model instead of the linear regression model to automated essay scoring. Comparison of the performances of the linear regression model and the cumulative logit model was performed on a…

  14. Multiple Linear Regression Model Used in Economic Analyses

    Constantin ANGHELACHE; Madalina Gabriela ANGHEL; Ligia PRODAN; Cristina SACALA; Marius POPOVICI


    The multiple regression is a tool that offers the possibility to analyze the correlations between more than two variables, situation which account for most cases in macro-economic studies. The best known method of estimation for multiple regression is the method of least squares. As in the two-variable regression, we choose the regression function of sample and minimize the sum of squared residual values. Another method that allows us to take into account the number of variables factor when d...

  15. Many regression algorithms, one unified model — A review

    Stulp, Freek; Sigaud, Olivier


    Regression is the process of learning relationships between inputs and continuous outputs from example data, which enables predictions for novel inputs. The history of regression is closely related to the history of artificial neural networks since the seminal work of Rosenblatt (1958). The aims of this paper are to provide an overview of many regression algorithms, and to demonstrate how the function representation whose parameters they regress fall into two classes: a weighted sum of basis ...

  16. Meta-Modeling by Symbolic Regression and Pareto Simulated Annealing

    Stinstra, E.; Rennen, G.; Teeuwen, G.J.A.


    The subject of this paper is a new approach to Symbolic Regression.Other publications on Symbolic Regression use Genetic Programming.This paper describes an alternative method based on Pareto Simulated Annealing.Our method is based on linear regression for the estimation of constants.Interval arithm


    Jahani, Mohammad Ali; Yaminfirooz, Mousa; Siamian, Hasan


    Background: The purpose of this study was to drawing a regression model of organizational climate of central libraries of Iran’s universities. Methods: This study is an applied research. The statistical population of this study consisted of 96 employees of the central libraries of Iran’s public universities selected among the 117 universities affiliated to the Ministry of Health by Stratified Sampling method (510 people). Climate Qual localized questionnaire was used as research tools. For predicting the organizational climate pattern of the libraries is used from the multivariate linear regression and track diagram. Results: of the 9 variables affecting organizational climate, 5 variables of innovation, teamwork, customer service, psychological safety and deep diversity play a major role in prediction of the organizational climate of Iran’s libraries. The results also indicate that each of these variables with different coefficient have the power to predict organizational climate but the climate score of psychological safety (0.94) plays a very crucial role in predicting the organizational climate. Track diagram showed that five variables of teamwork, customer service, psychological safety, deep diversity and innovation directly effects on the organizational climate variable that contribution of the team work from this influence is more than any other variables. Conclusions: Of the indicator of the organizational climate of climateQual, the contribution of the team work from this influence is more than any other variables that reinforcement of teamwork in academic libraries can be more effective in improving the organizational climate of this type libraries. PMID:26622203

  18. A Gompertz regression model for fern spores germination

    Gabriel y Galán, Jose María


    Full Text Available Germination is one of the most important biological processes for both seed and spore plants, also for fungi. At present, mathematical models of germination have been developed in fungi, bryophytes and several plant species. However, ferns are the only group whose germination has never been modelled. In this work we develop a regression model of the germination of fern spores. We have found that for Blechnum serrulatum, Blechnum yungense, Cheilanthes pilosa, Niphidium macbridei and Polypodium feuillei species the Gompertz growth model describe satisfactorily cumulative germination. An important result is that regression parameters are independent of fern species and the model is not affected by intraspecific variation. Our results show that the Gompertz curve represents a general germination model for all the non-green spore leptosporangiate ferns, including in the paper a discussion about the physiological and ecological meaning of the model.La germinación es uno de los procesos biológicos más relevantes tanto para las plantas con esporas, como para las plantas con semillas y los hongos. Hasta el momento, se han desarrollado modelos de germinación para hongos, briofitos y diversas especies de espermatófitos. Los helechos son el único grupo de plantas cuya germinación nunca ha sido modelizada. En este trabajo se desarrolla un modelo de regresión para explicar la germinación de las esporas de helechos. Observamos que para las especies Blechnum serrulatum, Blechnum yungense, Cheilanthes pilosa, Niphidium macbridei y Polypodium feuillei el modelo de crecimiento de Gompertz describe satisfactoriamente la germinación acumulativa. Un importante resultado es que los parámetros de la regresión son independientes de la especie y que el modelo no está afectado por variación intraespecífica. Por lo tanto, los resultados del trabajo muestran que la curva de Gompertz puede representar un modelo general para todos los helechos leptosporangiados

  19. Conceptual Model of User Adaptive Enterprise Application

    Inese Šūpulniece


    Full Text Available The user adaptive enterprise application is a software system, which adapts its behavior to an individual user on the basis of nontrivial inferences from information about the user. The objective of this paper is to elaborate a conceptual model of the user adaptive enterprise applications. In order to conceptualize the user adaptive enterprise applications, their main characteristics are analyzed, the meta-model defining the key concepts relevant to these applications is developed, and the user adaptive enterprise application and its components are defined in terms of the meta-model. Modeling of the user adaptive enterprise application incorporates aspects of enterprise modeling, application modeling, and design of adaptive characteristics of the application. The end-user and her expectations are identified as two concepts of major importance not sufficiently explored in the existing research. Understanding these roles improves the adaptation result in the user adaptive applications.

  20. Hypothesis testing in functional linear regression models with Neyman's truncation and wavelet thresholding for longitudinal data.

    Yang, Xiaowei; Nie, Kun


    Longitudinal data sets in biomedical research often consist of large numbers of repeated measures. In many cases, the trajectories do not look globally linear or polynomial, making it difficult to summarize the data or test hypotheses using standard longitudinal data analysis based on various linear models. An alternative approach is to apply the approaches of functional data analysis, which directly target the continuous nonlinear curves underlying discretely sampled repeated measures. For the purposes of data exploration, many functional data analysis strategies have been developed based on various schemes of smoothing, but fewer options are available for making causal inferences regarding predictor-outcome relationships, a common task seen in hypothesis-driven medical studies. To compare groups of curves, two testing strategies with good power have been proposed for high-dimensional analysis of variance: the Fourier-based adaptive Neyman test and the wavelet-based thresholding test. Using a smoking cessation clinical trial data set, this paper demonstrates how to extend the strategies for hypothesis testing into the framework of functional linear regression models (FLRMs) with continuous functional responses and categorical or continuous scalar predictors. The analysis procedure consists of three steps: first, apply the Fourier or wavelet transform to the original repeated measures; then fit a multivariate linear model in the transformed domain; and finally, test the regression coefficients using either adaptive Neyman or thresholding statistics. Since a FLRM can be viewed as a natural extension of the traditional multiple linear regression model, the development of this model and computational tools should enhance the capacity of medical statistics for longitudinal data. PMID:17610294

  1. The R Package threg to Implement Threshold Regression Models

    Tao Xiao


    This new package includes four functions: threg, and the methods hr, predict and plot for threg objects returned by threg. The threg function is the model-fitting function which is used to calculate regression coefficient estimates, asymptotic standard errors and p values. The hr method for threg objects is the hazard-ratio calculation function which provides the estimates of hazard ratios at selected time points for specified scenarios (based on given categories or value settings of covariates. The predict method for threg objects is used for prediction. And the plot method for threg objects provides plots for curves of estimated hazard functions, survival functions and probability density functions of the first-hitting-time; function curves corresponding to different scenarios can be overlaid in the same plot for comparison to give additional research insights.

  2. Statistical Inference for Partially Linear Regression Models with Measurement Errors

    Jinhong YOU; Qinfeng XU; Bin ZHOU


    In this paper, the authors investigate three aspects of statistical inference for the partially linear regression models where some covariates are measured with errors. Firstly,a bandwidth selection procedure is proposed, which is a combination of the difference-based technique and GCV method. Secondly, a goodness-of-fit test procedure is proposed,which is an extension of the generalized likelihood technique. Thirdly, a variable selection procedure for the parametric part is provided based on the nonconcave penalization and corrected profile least squares. Same as "Variable selection via nonconcave penalized like-lihood and its oracle properties" (J. Amer. Statist. Assoc., 96, 2001, 1348-1360), it is shown that the resulting estimator has an oracle property with a proper choice of regu-larization parameters and penalty function. Simulation studies are conducted to illustrate the finite sample performances of the proposed procedures.

  3. Short-term electricity prices forecasting based on support vector regression and Auto-regressive integrated moving average modeling

    In this paper, we present the use of different mathematical models to forecast electricity price under deregulated power. A successful prediction tool of electricity price can help both power producers and consumers plan their bidding strategies. Inspired by that the support vector regression (SVR) model, with the ε-insensitive loss function, admits of the residual within the boundary values of ε-tube, we propose a hybrid model that combines both SVR and Auto-regressive integrated moving average (ARIMA) models to take advantage of the unique strength of SVR and ARIMA models in nonlinear and linear modeling, which is called SVRARIMA. A nonlinear analysis of the time-series indicates the convenience of nonlinear modeling, the SVR is applied to capture the nonlinear patterns. ARIMA models have been successfully applied in solving the residuals regression estimation problems. The experimental results demonstrate that the model proposed outperforms the existing neural-network approaches, the traditional ARIMA models and other hybrid models based on the root mean square error and mean absolute percentage error.

  4. Application of Partial Least-Squares Regression Model on Temperature Analysis and Prediction of RCCD

    Yuqing Zhao; Zhenxian Xing


    This study, based on the temperature monitoring data of jiangya RCCD, uses principle and method of partial least-squares regression to analyze and predict temperature variation of RCCD. By founding partial least-squares regression model, multiple correlations of independent variables is overcome, organic combination on multiple linear regressions, multiple linear regression and canonical correlation analysis is achieved. Compared with general least-squares regression model result, it is more ...

  5. Modeling Lateral and Longitudinal Control of Human Drivers with Multiple Linear Regression Models

    Lenk, Jan; M, Claus


    In this paper, we describe results to model lateral and longitudinal control behavior of drivers with simple linear multiple regression models. This approach fits into the Bayesian Programming (BP) approach (Bessi

  6. Estimating interaction on an additive scale between continuous determinants in a logistic regression model

    Knol, Mirjam J.; van der Tweel, Ingeborg; Grobbee, Diederick E.; Numans, Mattijs E.; Geerlings, Mirjam I.


    Background To determine the presence of interaction in epidemiologic research, typically a product term is added to the regression model. In linear regression, the regression coefficient of the product term reflects interaction as departure from additivity. However, in logistic regression it refers

  7. Robust repeated median regression in moving windows with data-adaptive width selection

    Borowski, Matthias; Fried, Roland


    Online (also 'real-time' or 'sequential') signal extraction from noisy and outlier- interfered data streams is a basic but challenging goal. Fitting a robust Repeated Median (Siegel, 1982) regression line in a moving time window has turned out to be a promising approach (Davies et al., 2004; Gather et al., 2006; Schettlinger et al., 2006). The level of the regression line at the rightmost window position, which equates to the current time point in an online application, is then...

  8. Bayesian auxiliary variable models for binary and multinomial regression

    Holmes, C C; HELD, L.


    In this paper we discuss auxiliary variable approaches to Bayesian binary and multinomial regression. These approaches are ideally suited to automated Markov chain Monte Carlo simulation. In the first part we describe a simple technique using joint updating that improves the performance of the conventional probit regression algorithm. In the second part we discuss auxiliary variable methods for inference in Bayesian logistic regression, including covariate set uncertainty. Fina...

  9. Kernel Averaged Predictors for Spatio-Temporal Regression Models.

    Heaton, Matthew J; Gelfand, Alan E


    In applications where covariates and responses are observed across space and time, a common goal is to quantify the effect of a change in the covariates on the response while adequately accounting for the spatio-temporal structure of the observations. The most common approach for building such a model is to confine the relationship between a covariate and response variable to a single spatio-temporal location. However, oftentimes the relationship between the response and predictors may extend across space and time. In other words, the response may be affected by levels of predictors in spatio-temporal proximity to the response location. Here, a flexible modeling framework is proposed to capture such spatial and temporal lagged effects between a predictor and a response. Specifically, kernel functions are used to weight a spatio-temporal covariate surface in a regression model for the response. The kernels are assumed to be parametric and non-stationary with the data informing the parameter values of the kernel. The methodology is illustrated on simulated data as well as a physical data set of ozone concentrations to be explained by temperature. PMID:24010051

  10. Ultracentrifuge separative power modeling with multivariate regression using covariance matrix

    In this work, the least-squares methodology with covariance matrix is applied to determine a data curve fitting to obtain a performance function for the separative power δU of a ultracentrifuge as a function of variables that are experimentally controlled. The experimental data refer to 460 experiments on the ultracentrifugation process for uranium isotope separation. The experimental uncertainties related with these independent variables are considered in the calculation of the experimental separative power values, determining an experimental data input covariance matrix. The process variables, which significantly influence the δU values are chosen in order to give information on the ultracentrifuge behaviour when submitted to several levels of feed flow rate F, cut θ and product line pressure Pp. After the model goodness-of-fit validation, a residual analysis is carried out to verify the assumed basis concerning its randomness and independence and mainly the existence of residual heteroscedasticity with any explained regression model variable. The surface curves are made relating the separative power with the control variables F, θ and Pp to compare the fitted model with the experimental data and finally to calculate their optimized values. (author)

  11. Valuing Patents with Linear Regression : Identifying value indicators and using a linear regression model to value  patents

    Al-Khalaf, Adnan; Gustafsson, Steve Oskar


    This thesis consist of two parts. The first part of the thesis will conduct a multiple regression on a data-set obtained from the Ocean Tomo’s auction results between 2006 to 2008 with the purpose to identify key value indicators and investigate to what extent it is possible to predict the value of a patent. The final regression model consist of the following covariates Average number of citings per year, share of active family members, age of the patent, average invested USD per year, and ni...

  12. Encoding through patterns: regression tree-based neuronal population models.

    Haslinger, Robert; Pipa, Gordon; Lewis, Laura D; Nikolić, Danko; Williams, Ziv; Brown, Emery


    Although the existence of correlated spiking between neurons in a population is well known, the role such correlations play in encoding stimuli is not. We address this question by constructing pattern-based encoding models that describe how time-varying stimulus drive modulates the expression probabilities of population-wide spike patterns. The challenge is that large populations may express an astronomical number of unique patterns, and so fitting a unique encoding model for each individual pattern is not feasible. We avoid this combinatorial problem using a dimensionality-reduction approach based on regression trees. Using the insight that some patterns may, from the perspective of encoding, be statistically indistinguishable, the tree divisively clusters the observed patterns into groups whose member patterns possess similar encoding properties. These groups, corresponding to the leaves of the tree, are much smaller in number than the original patterns, and the tree itself constitutes a tractable encoding model for each pattern. Our formalism can detect an extremely weak stimulus-driven pattern structure and is based on maximizing the data likelihood, not making a priori assumptions as to how patterns should be grouped. Most important, by comparing pattern encodings with independent neuron encodings, one can determine if neurons in the population are driven independently or collectively. We demonstrate this method using multiple unit recordings from area 17 of anesthetized cat in response to a sinusoidal grating and show that pattern-based encodings are superior to those of independent neuron models. The agnostic nature of our clustering approach allows us to investigate encoding by the collective statistics that are actually present rather than those (such as pairwise) that might be presumed. PMID:23607564


    Gurudeo Anand Tularam; Siti Amri


    House price prediction continues to be important for government agencies insurance companies and real estate industry. This study investigates the performance of house sales price models based on linear and non-linear approaches to study the effects of selected variables. Linear stepwise Multivariate Regression (MR) and nonlinear models of Neural Network (NN) and Adaptive Neuro-Fuzzy (ANFIS) are developed and compared. The GIS methods are used to integrate the data for the study area (Bathurs...

  14. Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.

    Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko


    In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. PMID:26774211

  15. Application of regression model on stream water quality parameters

    Statistical analysis was conducted to evaluate the effect of solid waste leachate from the open solid waste dumping site of Salhad on the stream water quality. Five sites were selected along the stream. Two sites were selected prior to mixing of leachate with the surface water. One was of leachate and other two sites were affected with leachate. Samples were analyzed for pH, water temperature, electrical conductivity (EC), total dissolved solids (TDS), Biological oxygen demand (BOD), chemical oxygen demand (COD), dissolved oxygen (DO) and total bacterial load (TBL). In this study correlation coefficient r among different water quality parameters of various sites were calculated by using Pearson model and then average of each correlation between two parameters were also calculated, which shows TDS and EC and pH and BOD have significantly increasing r value, while temperature and TDS, temp and EC, DO and BL, DO and COD have decreasing r value. Single factor ANOVA at 5% level of significance was used which shows EC, TDS, TCL and COD were significantly differ among various sites. By the application of these two statistical approaches TDS and EC shows strongly positive correlation because the ions from the dissolved solids in water influence the ability of that water to conduct an electrical current. These two parameters significantly vary among 5 sites which are further confirmed by using linear regression. (author)

  16. Proteomics Improves the Prediction of Burns Mortality: Results from Regression Spline Modeling

    Finnerty, Celeste C.; Ju, Hyunsu; Spratt, Heidi; Victor, Sundar; Jeschke, Marc G.; Hegde, Sachin; Bhavnani, Suresh K.; Luxon, Bruce A.; Brasier, Allan R.; Herndon, David N.


    Prediction of mortality in severely burned patients remains unreliable. Although clinical covariates and plasma protein abundance have been used with varying degrees of success, the triad of burn size, inhalation injury, and age remains the most reliable predictor. We investigated the effect of combining proteomics variables with these three clinical covariates on prediction of mortality in burned children. Serum samples were collected from 330 burned children (burns covering >25% of the total body surface area) between admission and the time of the first operation for clinical chemistry analyses and proteomic assays of cytokines. Principal component analysis revealed that serum protein abundance and the clinical covariates each provided independent information regarding patient survival. To determine whether combining proteomics with clinical variables improves prediction of patient mortality, we used multivariate adaptive regression splines, since the relationships between analytes and mortality were not linear. Combining these factors increased overall outcome prediction accuracy from 52% to 81% and area under the receiver operating characteristic curve from 0.82 to 0.95. Thus, the predictive accuracy of burns mortality is substantially improved by combining protein abundance information with clinical covariates in a multivariate adaptive regression splines classifier, a model currently being validated in a prospective study. PMID:22686201

  17. An Explanation of the Effectiveness of Latent Semantic Indexing by Means of a Bayesian Regression Model.

    Story, Roger E.


    Discussion of the use of Latent Semantic Indexing to determine relevancy in information retrieval focuses on statistical regression and Bayesian methods. Topics include keyword searching; a multiple regression model; how the regression model can aid search methods; and limitations of this approach, including complexity, linearity, and…

  18. Extending the linear model with R generalized linear, mixed effects and nonparametric regression models

    Faraway, Julian J


    Linear models are central to the practice of statistics and form the foundation of a vast range of statistical methodologies. Julian J. Faraway''s critically acclaimed Linear Models with R examined regression and analysis of variance, demonstrated the different methods available, and showed in which situations each one applies. Following in those footsteps, Extending the Linear Model with R surveys the techniques that grow from the regression model, presenting three extensions to that framework: generalized linear models (GLMs), mixed effect models, and nonparametric regression models. The author''s treatment is thoroughly modern and covers topics that include GLM diagnostics, generalized linear mixed models, trees, and even the use of neural networks in statistics. To demonstrate the interplay of theory and practice, throughout the book the author weaves the use of the R software environment to analyze the data of real examples, providing all of the R commands necessary to reproduce the analyses. All of the ...


    Ahmad Subagyo


    The aim of this study was to examine the relationship theoretically global Foreign Direct Investments (FDI) to the global GDP of all countries in the world. This study emphasizes the relationship between global GDP and global FDI in all countries in the world, whether in theory has a coherent nature with theoretical expectations. Multiple linear regression analysis applied in this study. According to the results of linear  regression test the evolution of global FDI in terms of changes i...

  20. The Classical Linear Regression Model with one Incomplete Binary Variable

    Toutenburg, Helge; Nittner, T.


    We present three different methods based on the conditional mean imputation when binary explanatory variables are incomplete. Apart from the single imputation and multiple imputation especially the so-called pi imputation is presented as a new procedure. Seven procedures are compared in a simulation experiment when missing data are confined to one independent binary variable: complete case analysis, zero order regression, categorical zero order regression, pi imputation, single imputation, mu...

  1. Boosting the partial least square algorithm for regression modelling

    Ling YU; Tiejun WU


    Boosting algorithms are a class of general methods used to improve the general performance of regression analysis. The main idea is to maintain a distribution over the train set. In order to use the given distribution directly,a modified PLS algorithm is proposed and used as the base learner to deal with the nonlinear multivariate regression problems. Experiments on gasoline octane number prediction demonstrate that boosting the modified PLS algorithm has better general performance over the PLS algorithm.

  2. Inference of dense spectral reflectance images from sparse reflectance measurement using non-linear regression modeling

    Deglint, Jason; Kazemzadeh, Farnoud; Wong, Alexander; Clausi, David A.


    One method to acquire multispectral images is to sequentially capture a series of images where each image contains information from a different bandwidth of light. Another method is to use a series of beamsplitters and dichroic filters to guide different bandwidths of light onto different cameras. However, these methods are very time consuming and expensive and perform poorly in dynamic scenes or when observing transient phenomena. An alternative strategy to capturing multispectral data is to infer this data using sparse spectral reflectance measurements captured using an imaging device with overlapping bandpass filters, such as a consumer digital camera using a Bayer filter pattern. Currently the only method of inferring dense reflectance spectra is the Wiener adaptive filter, which makes Gaussian assumptions about the data. However, these assumptions may not always hold true for all data. We propose a new technique to infer dense reflectance spectra from sparse spectral measurements through the use of a non-linear regression model. The non-linear regression model used in this technique is the random forest model, which is an ensemble of decision trees and trained via the spectral characterization of the optical imaging system and spectral data pair generation. This model is then evaluated by spectrally characterizing different patches on the Macbeth color chart, as well as by reconstructing inferred multispectral images. Results show that the proposed technique can produce inferred dense reflectance spectra that correlate well with the true dense reflectance spectra, which illustrates the merits of the technique.

  3. Genetic Algorithm Based Outlier Detection Using Bayesian Information Criterion in Multiple Regression Models Having Multicollinearity Problems

    ALMA, Özlem GÜRÜNLÜ; KURT, Serdar; UĞUR, Aybars


    Multiple linear regression models are widely used applied statistical techniques and they are most useful devices for extracting and understanding the essential features of datasets. However, in multiple linear regression models problems arise when a serious outlier observation or multicollinearity present in the data. In regression however, the situation is somewhat more complex in the sense that some outlying points will have more influence on the regression than others. An important proble...

  4. Regression of retinopathy by squalamine in a mouse model.

    Higgins, Rosemary D; Yan, Yun; Geng, Yixun; Zasloff, Michael; Williams, Jon I


    The goal of this study was to determine whether an antiangiogenic agent, squalamine, given late during the evolution of oxygen-induced retinopathy (OIR) in the mouse, could improve retinal neovascularization. OIR was induced in neonatal C57BL6 mice and the neonates were treated s.c. with squalamine doses begun at various times after OIR induction. A system of retinal whole mounts and assessment of neovascular nuclei extending beyond the inner limiting membrane from animals reared under room air or OIR conditions and killed periodically from d 12 to 21 were used to assess retinopathy in squalamine-treated and untreated animals. OIR evolved after 75% oxygen exposure in neonatal mice with florid retinal neovascularization developing by d 14. Squalamine (single dose, 25 mg/kg s.c.) given on d 15 or 16, but not d 17, substantially improved retinal neovascularization in the mouse model of OIR. There was improvement seen in the degree of blood vessel tuft formation, blood vessel tortuosity, and central vasoconstriction with squalamine treatment at d 15 or 16. Single-dose squalamine at d 12 was effective at reducing subsequent development of retinal neovascularization at doses as low as 1 mg/kg. Squalamine is a very active inhibitor of OIR in mouse neonates at doses as low as 1 mg/kg given once. Further, squalamine given late in the course of OIR improves retinopathy by inducing regression of retinal neovessels and abrogating invasion of new vessels beyond the inner-limiting membrane of the retina. PMID:15128931

  5. Model-Based Software Regression Testing for Software Components

    Batra, Gagandeep; Arora, Yogesh Kumar; Sengupta, Jyotsna

    This paper presents a novel approach of generating regression test cases from UML design diagrams. Regression testing can be systematically applied at the software components architecture level so as to reduce the effort and cost of retesting modified systems. Our approach consists of transforming a UML sequence diagram of a component into a graphical structure called the control flow graph (CFG) and its revised version into an Extended control flow graph (ECFG) The nodes of the two graphs are augmented with information necessary to compose test suites in terms of test case scenarios. This information is collected from use case templates and class diagrams. The graphs are traversed in depth-first-order to generate test scenarios. Further, the two are compared for change identification. Based on change information, test cases are identified as reusable, obsolete or newly added. The regression test suite thus generated is suitable to detect any interaction and scenario faults.



    A simple but efficient method has been proposed to select variables in heteroscedastic regression models. It is shown that the pseudo empirical wavelet coefficients corresponding to the significant explanatory variables in the regression models are clearly larger than those nonsignificant ones, on the basis of which a procedure is developed to select variables in regression models. The coefficients of the models are also estimated. All estimators are proved to be consistent.

  7. Prediction models for CO2 emission in Malaysia using best subsets regression and multi-linear regression

    Tan, C. H.; Matjafri, M. Z.; Lim, H. S.


    This paper presents the prediction models which analyze and compute the CO2 emission in Malaysia. Each prediction model for CO2 emission will be analyzed based on three main groups which is transportation, electricity and heat production as well as residential buildings and commercial and public services. The prediction models were generated using data obtained from World Bank Open Data. Best subset method will be used to remove irrelevant data and followed by multi linear regression to produce the prediction models. From the results, high R-square (prediction) value was obtained and this implies that the models are reliable to predict the CO2 emission by using specific data. In addition, the CO2 emissions from these three groups are forecasted using trend analysis plots for observation purpose.

  8. Models as Approximations: How Random Predictors and Model Violations Invalidate Classical Inference in Regression

    Buja, A.; Berk, R.; Brown, L; George, E.; Pitkin, E.; Traskin, M.; Zhan, K.; Zhao, L.


    We review and interpret the early insights of Halbert White who over thirty years ago inaugurated a form of statistical inference for regression models that is asymptotically correct even under "model misspecification," that is, under the assumption that models are approximations rather than generative truths. This form of inference, which is pervasive in econometrics, relies on the "sandwich estimator" of standard error. Whereas linear models theory in statistics assumes models to be true an...

  9. A generalized additive regression model for survival times

    Scheike, Thomas H.


    Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models......Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models...

  10. Studies of relationships between free swelling index (FSI) and coal quality by regression and adaptive neuro fuzzy inference system

    Khorami, M. Tayebi [Department of Mining Engineering, Science and Research Branch, Islamic Azad University, Poonak, Hesarak Tehran (Iran, Islamic Republic of); Chelgani, S. Chehreh [Surface Science Western, Research Park, University of Western Ontario, London (Canada); Hower, James C. [Center for Applied Energy Research, University of Kentucky, Kexington (United States); Jorjani, E. [Department of Mining Engineering, Science and Research Branch, Islamic Azad University, Poonak, Hesarak Tehran (Iran, Islamic Republic of)


    The results of proximate, ultimate, and petrographic analysis for a wide range of Kentucky coal samples were used to predict Free Swelling Index (FSI) using multivariable regression and Adaptive Neuro Fuzzy Inference System (ANFIS). Three different input sets: (a) moisture, ash, and volatile matter; (b) carbon, hydrogen, nitrogen, oxygen, sulfur, and mineral matter; and (c) group-maceral analysis, mineral matter, moisture, sulfur, and R{sub max} were applied for both methods. Non-linear regression achieved the correlation coefficients (R{sup 2}) of 0.38, 0.49, and 0.70 for input sets (a), (b), and (c), respectively. By using the same input sets, ANFIS predicted FSI with higher R{sup 2} of 0.46, 0.82 and 0.95, respectively. Results show that input set (c) is the best predictor of FSI in both prediction methods, and ANFIS significantly can be used to predict FSI when regression results do not have appropriate accuracy. (author)

  11. RCR: Robust Compound Regression for Robust Estimation of Errors-in-Variables Model

    Han, Hao; Wei ZHU


    The errors-in-variables (EIV) regression model, being more realistic by accounting for measurement errors in both the dependent and the independent variables, is widely adopted in applied sciences. The traditional EIV model estimators, however, can be highly biased by outliers and other departures from the underlying assumptions. In this paper, we develop a novel nonparametric regression approach - the robust compound regression (RCR) analysis method for the robust estimation of EIV models. W...

  12. Modeling Polytomous Item Responses Using Simultaneously Estimated Multinomial Logistic Regression Models

    Anderson, Carolyn J.; Verkuilen, Jay; Peyton, Buddy L.


    Survey items with multiple response categories and multiple-choice test questions are ubiquitous in psychological and educational research. We illustrate the use of log-multiplicative association (LMA) models that are extensions of the well-known multinomial logistic regression model for multiple dependent outcome variables to reanalyze a set of…

  13. The Linear Regression Model for setting up the Futures Price

    Mario G.R. PAGLIACC; Janusz GRABARA; Madalina Gabriela ANGHEL; Cristina SACALA; Vasile Lucian ANTON


    To realize a linear regression, we have considered the computation method for futures prices that, according to economic culture, is based on the rate of the supporting asset and internal/external interest ratios, and also on the time period until maturity. The market price of a futures instrument is influenced by the demand and supply, that is the number of units traded within a certain period.

  14. GIS-Based Analytical Tools for Transport Planning: Spatial Regression Models for Transportation Demand Forecast

    Simone Becker Lopes


    Full Text Available Considering the importance of spatial issues in transport planning, the main objective of this study was to analyze the results obtained from different approaches of spatial regression models. In the case of spatial autocorrelation, spatial dependence patterns should be incorporated in the models, since that dependence may affect the predictive power of these models. The results obtained with the spatial regression models were also compared with the results of a multiple linear regression model that is typically used in trips generation estimations. The findings support the hypothesis that the inclusion of spatial effects in regression models is important, since the best results were obtained with alternative models (spatial regression models or the ones with spatial variables included. This was observed in a case study carried out in the city of Porto Alegre, in the state of Rio Grande do Sul, Brazil, in the stages of specification and calibration of the models, with two distinct datasets.

  15. Noise Reduction and Gap Filling of fAPAR Time Series Using an Adapted Local Regression Filter

    Álvaro Moreno


    Full Text Available Time series of remotely sensed data are an important source of information for understanding land cover dynamics. In particular, the fraction of absorbed photosynthetic active radiation (fAPAR is a key variable in the assessment of vegetation primary production over time. However, the fAPAR series derived from polar orbit satellites are not continuous and consistent in space and time. Filtering methods are thus required to fill in gaps and produce high-quality time series. This study proposes an adapted (iteratively reweighted local regression filter (LOESS and performs a benchmarking intercomparison with four popular and generally applicable smoothing methods: Double Logistic (DLOG, smoothing spline (SSP, Interpolation for Data Reconstruction (IDR and adaptive Savitzky-Golay (ASG. This paper evaluates the main advantages and drawbacks of the considered techniques. The results have shown that ASG and the adapted LOESS perform better in recovering fAPAR time series over multiple controlled noisy scenarios. Both methods can robustly reconstruct the fAPAR trajectories, reducing the noise up to 80% in the worst simulation scenario, which might be attributed to the quality control (QC MODIS information incorporated into these filtering algorithms, their flexibility and adaptation to the upper envelope. The adapted LOESS is particularly resistant to outliers. This method clearly outperforms the other considered methods to deal with the high presence of gaps and noise in satellite data records. The low RMSE and biases obtained with the LOESS method (|rMBE| < 8%; rRMSE < 20% reveals an optimal reconstruction even in most extreme situations with long seasonal gaps. An example of application of the LOESS method to fill in invalid values in real MODIS images presenting persistent cloud and snow coverage is also shown. The LOESS approach is recommended in most remote sensing applications, such as gap-filling, cloud-replacement, and observing temporal

  16. Comparing Standard Regression Modeling to Ensemble Modeling: How Data Mining Software Can Improve Economists' Predictions

    Joyce P. Jacobsen; Laurence M. Levin; Zachary Tausanovitch


    Economists’ wariness of data mining may be misplaced, even in cases where economic theory provides a well-specified model for estimation. We discuss how new data mining/ensemble modeling software, for example the program TreeNet, can be used to create predictive models. We then show how for a standard labor economics problem, the estimation of wage equations, TreeNet outperforms standard OLS regression in terms of lower prediction error. Ensemble modeling also resists the tendency to overfit ...

  17. Stochastic Approximation Methods for Latent Regression Item Response Models. Research Report. ETS RR-09-09

    von Davier, Matthias; Sinharay, Sandip


    This paper presents an application of a stochastic approximation EM-algorithm using a Metropolis-Hastings sampler to estimate the parameters of an item response latent regression model. Latent regression models are extensions of item response theory (IRT) to a 2-level latent variable model in which covariates serve as predictors of the…

  18. Spatial Double Generalized Beta Regression Models: Extensions and Application to Study Quality of Education in Colombia

    Cepeda-Cuervo, Edilberto; Núñez-Antón, Vicente


    In this article, a proposed Bayesian extension of the generalized beta spatial regression models is applied to the analysis of the quality of education in Colombia. We briefly revise the beta distribution and describe the joint modeling approach for the mean and dispersion parameters in the spatial regression models' setting. Finally, we…

  19. Integration of logistic regression, Markov chain and cellular automata models to simulate urban expansion

    Jokar Arsanjani, J.; Helbich, M.; Kainz, W.; Boloorani, A.


    This research analyses the suburban expansion in the metropolitan area of Tehran, Iran. A hybrid model consisting of logistic regression model, Markov chain (MC), and cellular automata (CA) was designed to improve the performance of the standard logistic regression model. Environmental and socio-eco

  20. Development of regression model for uncertainty analysis by response surface method in HANARO

    The feasibility of uncertainty analysis with regression model in reactor physics problem was investigated. Regression model as a alternative model for a MCNP/ORIGEN2 code system which is uncertainty analysis tool of fission-produced molybdenum production was developed using Response Surface Method. It was shown that the development of regression model in the reactor physics problem was possible by introducing the burnup parameter. The most important parameter affecting the uncertainty of 99Mo yield ratio was fuel thickness in the regression model. This results agree well those of Crude Monte Carlo Method for each parameter. The regression model developed in this research was shown to be suitable as a alternative model, because coefficient of determination was 0.99

  1. Quantile regression

    Hao, Lingxin


    Quantile Regression, the first book of Hao and Naiman's two-book series, establishes the seldom recognized link between inequality studies and quantile regression models. Though separate methodological literature exists for each subject, the authors seek to explore the natural connections between this increasingly sought-after tool and research topics in the social sciences. Quantile regression as a method does not rely on assumptions as restrictive as those for the classical linear regression; though more traditional models such as least squares linear regression are more widely utilized, Hao

  2. A nonparametric dynamic additive regression model for longitudinal data

    Martinussen, Torben; Scheike, Thomas H.


    dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models......dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models...

  3. Completing and adapting models of biological processes

    Margaria, Tiziana; Hinchey, Michael G.; Raffelt, Harald; Rash, James L.; Rouff, Christopher A.; Steffen, Bernhard


    We present a learning-based method for model completion and adaptation, which is based on the combination of two approaches: 1) R2D2C, a technique for mechanically transforming system requirements via provably equivalent models to running code, and 2) automata learning-based model extrapolation. The intended impact of this new combination is to make model completion and adaptation accessible to experts of the field, like biologists or engineers. The principle is briefly illustrated by gene...

  4. Modelling subject-specific childhood growth using linear mixed-effect models with cubic regression splines.

    Grajeda, LM; Ivanescu, A; Saito, M; Crainiceanu, C; Jaganath, D; Gilman, RH; Crabtree, JE; Kelleher, D; Cabrera, L.; Cama, V; Checkley, W


    Background Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and accelerati...

  5. An adaptive distance measure for use with nonparametric models

    Distance measures perform a critical task in nonparametric, locally weighted regression. Locally weighted regression (LWR) models are a form of 'lazy learning' which construct a local model 'on the fly' by comparing a query vector to historical, exemplar vectors according to a three step process. First, the distance of the query vector to each of the exemplar vectors is calculated. Next, these distances are passed to a kernel function, which converts the distances to similarities or weights. Finally, the model output or response is calculated by performing locally weighted polynomial regression. To date, traditional distance measures, such as the Euclidean, weighted Euclidean, and L1-norm have been used as the first step in the prediction process. Since these measures do not take into consideration sensor failures and drift, they are inherently ill-suited for application to 'real world' systems. This paper describes one such LWR model, namely auto associative kernel regression (AAKR), and describes a new, Adaptive Euclidean distance measure that can be used to dynamically compensate for faulty sensor inputs. In this new distance measure, the query observations that lie outside of the training range (i.e. outside the minimum and maximum input exemplars) are dropped from the distance calculation. This allows for the distance calculation to be robust to sensor drifts and failures, in addition to providing a method for managing inputs that exceed the training range. In this paper, AAKR models using the standard and Adaptive Euclidean distance are developed and compared for the pressure system of an operating nuclear power plant. It is shown that using the standard Euclidean distance for data with failed inputs, significant errors in the AAKR predictions can result. By using the Adaptive Euclidean distance it is shown that high fidelity predictions are possible, in spite of the input failure. In fact, it is shown that with the Adaptive Euclidean distance prediction

  6. Local asymptotic behavior of regression splines for marginal semiparametric models with longitudinal data


    In this paper, we study the local asymptotic behavior of the regression spline estimator in the framework of marginal semiparametric model. Similarly to Zhu, Fung and He (2008), we give explicit expression for the asymptotic bias of regression spline estimator for nonparametric function f. Our results also show that the asymptotic bias of the regression spline estimator does not depend on the working covariance matrix, which distinguishes the regression splines from the smoothing splines and the seemingly unrelated kernel. To understand the local bias result of the regression spline estimator, we show that the regression spline estimator can be obtained iteratively by applying the standard weighted least squares regression spline estimator to pseudo-observations. At each iteration, the bias of the estimator is unchanged and only the variance is updated.

  7. Theoretical Aspects Regarding the Use of the Multiple Linear Regression Model in Economic Analyses

    Constantin ANGHELACHE; Ioan PARTACHI; Adina Mihaela DINU; Ligia PRODAN; Georgeta BARDAªU (LIXANDRU)


    In this paper we have studied the dependence between GDP, final consumption and net investments. To analyze this correlation, the article proposes a multiple regression model, extremely useful tool in economic analysis. Regression model described in the article considers the GDP as outcome variables and final consumption and net investment as factorial variables.

  8. Analysis for Regression Model Behavior by Sampling Strategy for Annual Pollutant Load Estimation.

    Park, Youn Shik; Engel, Bernie A


    Water quality data are typically collected less frequently than streamflow data due to the cost of collection and analysis, and therefore water quality data may need to be estimated for additional days. Regression models are applicable to interpolate water quality data associated with streamflow data and have come to be extensively used, requiring relatively small amounts of data. There is a need to evaluate how well the regression models represent pollutant loads from intermittent water quality data sets. Both the specific regression model and water quality data frequency are important factors in pollutant load estimation. In this study, nine regression models from the Load Estimator (LOADEST) and one regression model from the Web-based Load Interpolation Tool (LOADIN) were evaluated with subsampled water quality data sets from daily measured water quality data sets for N, P, and sediment. Each water quality parameter had different correlations with streamflow, and the subsampled water quality data sets had various proportions of storm samples. The behaviors of the regression models differed not only by water quality parameter but also by proportion of storm samples. The regression models from LOADEST provided accurate and precise annual sediment and P load estimates using the water quality data of 20 to 40% storm samples. LOADIN provided more accurate and precise annual N load estimates than LOADEST. In addition, the results indicate that avoidance of water quality data extrapolation and availability of water quality data from storm events were crucial in annual pollutant load estimation using pollutant regression models. PMID:26641336

  9. Technology diffusion in hospitals: A log odds random effects regression model

    J.L.T. Blank (Jos); V.G. Valdmanis (Vivian G.)


    textabstractThis study identifies the factors that affect the diffusion of hospital innovations. We apply a log odds random effects regression model on hospital micro data. We introduce the concept of clustering innovations and the application of a log odds random effects regression model to describ

  10. Unobtrusive user modeling for adaptive hypermedia

    H.J. Holz; K. Hofmann; C. Reed


    We propose a technique for user modeling in Adaptive Hypermedia (AH) that is unobtrusive at both the level of observable behavior and that of cognition. Unobtrusive user modeling is complementary to transparent user modeling. Unobtrusive user modeling induces user models appropriate for Educational