Multiple Regressive Model Adaptive Control
Garipov, Emil; Stoilkov, Teodor; Kalaykov, Ivan
2008-01-01
The essence of the ideas applied to this text consists in the development of the strategy for control of the arbitrary in complexity continuous plant by means of a set of discrete timeinvariant linear controllers. Their number and tuned parameters correspond to the number and parameters of the linear time-invariant regressive models in the model bank, which approximate the complex plant dynamics in different operating points. Described strategy is known as Multiple Regressive Model Adaptive C...
Adaptive nonparametric instrumental regression by model selection
Johannes, Jan
2010-01-01
We consider the problem of estimating the structural function in nonparametric instrumental regression, where in the presence of an instrument W a response Y is modeled in dependence of an endogenous explanatory variable Z. The proposed estimator is based on dimension reduction and additional thresholding. The minimax optimal rate of convergence of the estimator is derived assuming that the structural function belongs to some ellipsoids which are in a certain sense linked to the conditional expectation operator of Z given W. We illustrate these results by considering classical smoothness assumptions. However, the proposed estimator requires an optimal choice of a dimension parameter depending on certain characteristics of the unknown structural function and the conditional expectation operator of Z given W, which are not known in practice. The main issue addressed in our work is a fully adaptive choice of this dimension parameter using a model selection approach under the restriction that the conditional expe...
Complex Environmental Data Modelling Using Adaptive General Regression Neural Networks
Kanevski, Mikhail
2015-04-01
The research deals with an adaptation and application of Adaptive General Regression Neural Networks (GRNN) to high dimensional environmental data. GRNN [1,2,3] are efficient modelling tools both for spatial and temporal data and are based on nonparametric kernel methods closely related to classical Nadaraya-Watson estimator. Adaptive GRNN, using anisotropic kernels, can be also applied for features selection tasks when working with high dimensional data [1,3]. In the present research Adaptive GRNN are used to study geospatial data predictability and relevant feature selection using both simulated and real data case studies. The original raw data were either three dimensional monthly precipitation data or monthly wind speeds embedded into 13 dimensional space constructed by geographical coordinates and geo-features calculated from digital elevation model. GRNN were applied in two different ways: 1) adaptive GRNN with the resulting list of features ordered according to their relevancy; and 2) adaptive GRNN applied to evaluate all possible models N [in case of wind fields N=(2^13 -1)=8191] and rank them according to the cross-validation error. In both cases training were carried out applying leave-one-out procedure. An important result of the study is that the set of the most relevant features depends on the month (strong seasonal effect) and year. The predictabilities of precipitation and wind field patterns, estimated using the cross-validation and testing errors of raw and shuffled data, were studied in detail. The results of both approaches were qualitatively and quantitatively compared. In conclusion, Adaptive GRNN with their ability to select features and efficient modelling of complex high dimensional data can be widely used in automatic/on-line mapping and as an integrated part of environmental decision support systems. 1. Kanevski M., Pozdnoukhov A., Timonin V. Machine Learning for Spatial Environmental Data. Theory, applications and software. EPFL Press
Preference learning with evolutionary Multivariate Adaptive Regression Spline model
Abou-Zleikha, Mohamed; Shaker, Noor; Christensen, Mads Græsbøll
2015-01-01
This paper introduces a novel approach for pairwise preference learning through combining an evolutionary method with Multivariate Adaptive Regression Spline (MARS). Collecting users' feedback through pairwise preferences is recommended over other ranking approaches as this method is more appealing...
Adaptive Metric Kernel Regression
Goutte, Cyril; Larsen, Jan
1998-01-01
Kernel smoothing is a widely used nonparametric pattern recognition technique. By nature, it suffers from the curse of dimensionality and is usually difficult to apply to high input dimensions. In this paper, we propose an algorithm that adapts the input metric used in multivariate regression by...... minimising a cross-validation estimate of the generalisation error. This allows one to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms the standard...
Adaptive metric kernel regression
Goutte, Cyril; Larsen, Jan
2000-01-01
Kernel smoothing is a widely used non-parametric pattern recognition technique. By nature, it suffers from the curse of dimensionality and is usually difficult to apply to high input dimensions. In this contribution, we propose an algorithm that adapts the input metric used in multivariate...... regression by minimising a cross-validation estimate of the generalisation error. This allows to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms the...
Time-adaptive quantile regression
Møller, Jan Kloppenborg; Nielsen, Henrik Aalborg; Madsen, Henrik
2008-01-01
An algorithm for time-adaptive quantile regression is presented. The algorithm is based on the simplex algorithm, and the linear optimization formulation of the quantile regression problem is given. The observations have been split to allow a direct use of the simplex algorithm. The simplex method...... and an updating procedure are combined into a new algorithm for time-adaptive quantile regression, which generates new solutions on the basis of the old solution, leading to savings in computation time. The suggested algorithm is tested against a static quantile regression model on a data set with...... wind power production, where the models combine splines and quantile regression. The comparison indicates superior performance for the time-adaptive quantile regression in all the performance parameters considered....
Adaptive functional linear regression
Comte, Fabienne
2011-01-01
We consider the estimation of the slope function in functional linear regression, where scalar responses are modeled in dependence of random functions. Cardot and Johannes [2010] have shown that a thresholded projection estimator can attain up to a constant minimax-rates of convergence in a general framework which allows to cover the prediction problem with respect to the mean squared prediction error as well as the estimation of the slope function and its derivatives. This estimation procedure, however, requires an optimal choice of a tuning parameter with regard to certain characteristics of the slope function and the covariance operator associated with the functional regressor. As this information is usually inaccessible in practice, we investigate a fully data-driven choice of the tuning parameter which combines model selection and Lepski's method. It is inspired by the recent work of Goldenshluger and Lepski [2011]. The tuning parameter is selected as minimizer of a stochastic penalized contrast function...
Xu, Man; Pinson, Pierre; Lu, Zongxiang;
2016-01-01
Wind farm power curve modeling, which characterizes the relationship between meteorological variables and power production, is a crucial procedure for wind power forecasting. In many cases, power curve modeling is more impacted by the limited quality of input data rather than the stochastic nature...... of the energy conversion process. Such nature may be due the varying wind conditions, aging and state of the turbines, etc. And, an equivalent steady-state power curve, estimated under normal operating conditions with the intention to filter abnormal data, is not sufficient to solve the problem...... because of the lack of time adaptivity. In this paper, a refined local polynomial regression algorithm is proposed to yield an adaptive robust model of the time-varying scattered power curve for forecasting applications. The time adaptivity of the algorithm is considered with a new data-driven bandwidth...
A projection-based adaptive-to-model test for regressions
Tan, Falong; Zhu, Xuehu; Zhu, Lixing
2016-01-01
A longstanding problem of existing empirical process-based tests for regressions is that when the number of covariates is greater than one, they either have no tractable limiting null distributions or are not omnibus. To attack this problem, we in this paper propose a projection-based adaptive-to-model approach. When the hypothetical model is parametric single-index, the method can fully utilize the dimension reduction model structure under the null hypothesis as if the covariate were one-dim...
Nieto, Paulino José García; Antón, Juan Carlos Álvarez; Vilán, José Antonio Vilán; García-Gonzalo, Esperanza
2014-10-01
The aim of this research work is to build a regression model of the particulate matter up to 10 micrometers in size (PM10) by using the multivariate adaptive regression splines (MARS) technique in the Oviedo urban area (Northern Spain) at local scale. This research work explores the use of a nonparametric regression algorithm known as multivariate adaptive regression splines (MARS) which has the ability to approximate the relationship between the inputs and outputs, and express the relationship mathematically. In this sense, hazardous air pollutants or toxic air contaminants refer to any substance that may cause or contribute to an increase in mortality or serious illness, or that may pose a present or potential hazard to human health. To accomplish the objective of this study, the experimental dataset of nitrogen oxides (NOx), carbon monoxide (CO), sulfur dioxide (SO2), ozone (O3) and dust (PM10) were collected over 3 years (2006-2008) and they are used to create a highly nonlinear model of the PM10 in the Oviedo urban nucleus (Northern Spain) based on the MARS technique. One main objective of this model is to obtain a preliminary estimate of the dependence between PM10 pollutant in the Oviedo urban area at local scale. A second aim is to determine the factors with the greatest bearing on air quality with a view to proposing health and lifestyle improvements. The United States National Ambient Air Quality Standards (NAAQS) establishes the limit values of the main pollutants in the atmosphere in order to ensure the health of healthy people. Firstly, this MARS regression model captures the main perception of statistical learning theory in order to obtain a good prediction of the dependence among the main pollutants in the Oviedo urban area. Secondly, the main advantages of MARS are its capacity to produce simple, easy-to-interpret models, its ability to estimate the contributions of the input variables, and its computational efficiency. Finally, on the basis of
Nieto, P J García; Antón, J C Álvarez; Vilán, J A Vilán; García-Gonzalo, E
2015-05-01
The aim of this research work is to build a regression model of air quality by using the multivariate adaptive regression splines (MARS) technique in the Oviedo urban area (northern Spain) at a local scale. To accomplish the objective of this study, the experimental data set made up of nitrogen oxides (NO x ), carbon monoxide (CO), sulfur dioxide (SO2), ozone (O3), and dust (PM10) was collected over 3 years (2006-2008). The US National Ambient Air Quality Standards (NAAQS) establishes the limit values of the main pollutants in the atmosphere in order to ensure the health of healthy people. Firstly, this MARS regression model captures the main perception of statistical learning theory in order to obtain a good prediction of the dependence among the main pollutants in the Oviedo urban area. Secondly, the main advantages of MARS are its capacity to produce simple, easy-to-interpret models, its ability to estimate the contributions of the input variables, and its computational efficiency. Finally, on the basis of these numerical calculations, using the MARS technique, conclusions of this research work are exposed. PMID:25414030
Adaptive multitrack reconstruction for particle trajectories based on fuzzy c-regression models
Niu, Li-Bo; Li, Yu-Lan; Huang, Meng; Fu, Jian-Qiang; He, Bin; Li, Yuan-Jing
2015-03-01
In this paper, an approach to straight and circle track reconstruction is presented, which is suitable for particle trajectories in an homogenous magnetic field (or 0 T) or Cherenkov rings. The method is based on fuzzy c-regression models, where the number of the models stands for the track number. The approximate number of tracks and a rough evaluation of the track parameters given by Hough transform are used to initiate the fuzzy c-regression models. The technique effectively represents a merger between track candidates finding and parameters fitting. The performance of this approach is tested by some simulated data under various scenarios. Results show that this technique is robust and could provide very accurate results efficiently. Supported by National Natural Science Foundation of China (11275109)
Adaptive Local Linear Quantile Regression
Yu-nan Su; Mao-zai Tian
2011-01-01
In this paper we propose a new method of local linear adaptive smoothing for nonparametric conditional quantile regression. Some theoretical properties of the procedure are investigated. Then we demonstrate the performance of the method on a simulated example and compare it with other methods. The simulation results demonstrate a reasonable performance of our method proposed especially in situations when the underlying image is piecewise linear or can be approximated by such images. Generally speaking, our method outperforms most other existing methods in the sense of the mean square estimation (MSE) and mean absolute estimation (MAE) criteria. The procedure is very stable with respect to increasing noise level and the algorithm can be easily applied to higher dimensional situations.
Yang, Jianhong, E-mail: yangjianhong@me.ustb.edu.cn [School of Mechanical Engineering, University of Science and Technology Beijing, Beijing, 100083 (China); Yi, Cancan; Xu, Jinwu [School of Mechanical Engineering, University of Science and Technology Beijing, Beijing, 100083 (China); Ma, Xianghong [School of Engineering and Applied Science, Aston University, Birmingham B4 7ET (United Kingdom)
2015-05-01
A new LIBS quantitative analysis method based on analytical line adaptive selection and Relevance Vector Machine (RVM) regression model is proposed. First, a scheme of adaptively selecting analytical line is put forward in order to overcome the drawback of high dependency on a priori knowledge. The candidate analytical lines are automatically selected based on the built-in characteristics of spectral lines, such as spectral intensity, wavelength and width at half height. The analytical lines which will be used as input variables of regression model are determined adaptively according to the samples for both training and testing. Second, an LIBS quantitative analysis method based on RVM is presented. The intensities of analytical lines and the elemental concentrations of certified standard samples are used to train the RVM regression model. The predicted elemental concentration analysis results will be given with a form of confidence interval of probabilistic distribution, which is helpful for evaluating the uncertainness contained in the measured spectra. Chromium concentration analysis experiments of 23 certified standard high-alloy steel samples have been carried out. The multiple correlation coefficient of the prediction was up to 98.85%, and the average relative error of the prediction was 4.01%. The experiment results showed that the proposed LIBS quantitative analysis method achieved better prediction accuracy and better modeling robustness compared with the methods based on partial least squares regression, artificial neural network and standard support vector machine. - Highlights: • Both training and testing samples are considered for analytical lines selection. • The analytical lines are auto-selected based on the built-in characteristics of spectral lines. • The new method can achieve better prediction accuracy and modeling robustness. • Model predictions are given with confidence interval of probabilistic distribution.
A new LIBS quantitative analysis method based on analytical line adaptive selection and Relevance Vector Machine (RVM) regression model is proposed. First, a scheme of adaptively selecting analytical line is put forward in order to overcome the drawback of high dependency on a priori knowledge. The candidate analytical lines are automatically selected based on the built-in characteristics of spectral lines, such as spectral intensity, wavelength and width at half height. The analytical lines which will be used as input variables of regression model are determined adaptively according to the samples for both training and testing. Second, an LIBS quantitative analysis method based on RVM is presented. The intensities of analytical lines and the elemental concentrations of certified standard samples are used to train the RVM regression model. The predicted elemental concentration analysis results will be given with a form of confidence interval of probabilistic distribution, which is helpful for evaluating the uncertainness contained in the measured spectra. Chromium concentration analysis experiments of 23 certified standard high-alloy steel samples have been carried out. The multiple correlation coefficient of the prediction was up to 98.85%, and the average relative error of the prediction was 4.01%. The experiment results showed that the proposed LIBS quantitative analysis method achieved better prediction accuracy and better modeling robustness compared with the methods based on partial least squares regression, artificial neural network and standard support vector machine. - Highlights: • Both training and testing samples are considered for analytical lines selection. • The analytical lines are auto-selected based on the built-in characteristics of spectral lines. • The new method can achieve better prediction accuracy and modeling robustness. • Model predictions are given with confidence interval of probabilistic distribution
Recursive Gaussian Process Regression Model for Adaptive Quality Monitoring in Batch Processes
Le Zhou
2015-01-01
Full Text Available In chemical batch processes with slow responses and a long duration, it is time-consuming and expensive to obtain sufficient normal data for statistical analysis. With the persistent accumulation of the newly evolving data, the modelling becomes adequate gradually and the subsequent batches will change slightly owing to the slow time-varying behavior. To efficiently make use of the small amount of initial data and the newly evolving data sets, an adaptive monitoring scheme based on the recursive Gaussian process (RGP model is designed in this paper. Based on the initial data, a Gaussian process model and the corresponding SPE statistic are constructed at first. When the new batches of data are included, a strategy based on the RGP model is used to choose the proper data for model updating. The performance of the proposed method is finally demonstrated by a penicillin fermentation batch process and the result indicates that the proposed monitoring scheme is effective for adaptive modelling and online monitoring.
Ijima, Yusuke; Nose, Takashi; Tachibana, Makoto; Kobayashi, Takao
In this paper, we propose a rapid model adaptation technique for emotional speech recognition which enables us to extract paralinguistic information as well as linguistic information contained in speech signals. This technique is based on style estimation and style adaptation using a multiple-regression HMM (MRHMM). In the MRHMM, the mean parameters of the output probability density function are controlled by a low-dimensional parameter vector, called a style vector, which corresponds to a set of the explanatory variables of the multiple regression. The recognition process consists of two stages. In the first stage, the style vector that represents the emotional expression category and the intensity of its expressiveness for the input speech is estimated on a sentence-by-sentence basis. Next, the acoustic models are adapted using the estimated style vector, and then standard HMM-based speech recognition is performed in the second stage. We assess the performance of the proposed technique in the recognition of simulated emotional speech uttered by both professional narrators and non-professional speakers.
Survival Analysis with Multivariate adaptive Regression Splines
Kriner, Monika
2007-01-01
Multivariate adaptive regression splines (MARS) are a useful tool to identify linear and nonlinear eﬀects and interactions between two covariates. In this dissertation a new proposal to model survival type data with MARS is introduced. Martingale and deviance residuals of a Cox PH model are used as response in a common MARS approach to model functional forms of covariate eﬀects as well as possible interactions in a data-driven way. Simulation studies prove that the new method yields a bett...
Kisi, Ozgur
2015-09-01
Pan evaporation (Ep) modeling is an important issue in reservoir management, regional water resources planning and evaluation of drinking-water supplies. The main purpose of this study is to investigate the accuracy of least square support vector machine (LSSVM), multivariate adaptive regression splines (MARS) and M5 Model Tree (M5Tree) in modeling Ep. The first part of the study focused on testing the ability of the LSSVM, MARS and M5Tree models in estimating the Ep data of Mersin and Antalya stations located in Mediterranean Region of Turkey by using cross-validation method. The LSSVM models outperformed the MARS and M5Tree models in estimating Ep of Mersin and Antalya stations with local input and output data. The average root mean square error (RMSE) of the M5Tree and MARS models was decreased by 24-32.1% and 10.8-18.9% using LSSVM models for the Mersin and Antalya stations, respectively. The ability of three different methods was examined in estimation of Ep using input air temperature, solar radiation, relative humidity and wind speed data from nearby station in the second part of the study (cross-station application without local input data). The results showed that the MARS models provided better accuracy than the LSSVM and M5Tree models with respect to RMSE, mean absolute error (MAE) and determination coefficient (R2) criteria. The average RMSE accuracy of the LSSVM and M5Tree was increased by 3.7% and 16.5% using MARS. In the case of without local input data, the average RMSE accuracy of the LSSVM and M5Tree was respectively increased by 11.4% and 18.4% using MARS. In the third part of the study, the ability of the applied models was examined in Ep estimation using input and output data of nearby station. The results reported that the MARS models performed better than the other models with respect to RMSE, MAE and R2 criteria. The average RMSE of the LSSVM and M5Tree was respectively decreased by 54% and 3.4% using MARS. The overall results indicated that
Flexible survival regression modelling
Cortese, Giuliana; Scheike, Thomas H; Martinussen, Torben
2009-01-01
time-varying effects. The introduced models are all applied to data on breast cancer from the Norwegian cancer registry, and these analyses clearly reveal the shortcomings of Cox's regression model and the need for other supplementary analyses with models such as those we present here.......Regression analysis of survival data, and more generally event history data, is typically based on Cox's regression model. We here review some recent methodology, focusing on the limitations of Cox's regression model. The key limitation is that the model is not well suited to represent time...
Paulino José García Nieto
2015-06-01
Full Text Available The aim of this study was to obtain a predictive model able to perform an early detection of central segregation severity in continuous cast steel slabs. Segregation in steel cast products is an internal defect that can be very harmful when slabs are rolled in heavy plate mills. In this research work, the central segregation was studied with success using the data mining methodology based on multivariate adaptive regression splines (MARS technique. For this purpose, the most important physical-chemical parameters are considered. The results of the present study are two-fold. In the first place, the significance of each physical-chemical variable on the segregation is presented through the model. Second, a model for forecasting segregation is obtained. Regression with optimal hyperparameters was performed and coefficients of determination equal to 0.93 for continuity factor estimation and 0.95 for average width were obtained when the MARS technique was applied to the experimental dataset, respectively. The agreement between experimental data and the model confirmed the good performance of the latter.
In flood forecasting modelling, large basins are often considered as hydrological systems with multiple inputs and one output. Inputs are hydrological variables such rainfall, runoff and physical characteristics of basin; output is runoff. Relating inputs to output can be achieved using deterministic, conceptual, or stochastic models. Rainfall runoff models generally lack of accuracy. Physical hydrological processes based models, either deterministic or conceptual are highly data requirement demanding and by the way very complex. Stochastic multiple input-output models, using only historical chronicles of hydrological variables particularly runoff are by the way very popular among the hydrologists for large river basin flood forecasting. Application is made on the Senegal River upstream of Bakel, where the River is formed by the main branch, Bafing, and two tributaries, Bakoye and Faleme; Bafing being regulated by Manantaly Dam. A three inputs and one output model has been used for flood forecasting on Bakel. Influence of the lead forecasting, and of the three inputs taken separately, then associated two by two, and altogether has been verified using a dimensionless variance as criterion of quality. Inadequacies occur generally between model output and observations; to put model in better compliance with current observations, we have compared four parameter updating procedure, recursive least squares, Kalman filtering, stochastic gradient method, iterative method, and an AR errors forecasting model. A combination of these model updating have been used in real time flood forecasting.(Author)
Mendes Paul E
2010-03-01
Full Text Available Abstract Background This article describes the data mining analysis of a clinical exposure study of 3585 adult smokers and 1077 nonsmokers. The analysis focused on developing models for four biomarkers of potential harm (BOPH: white blood cell count (WBC, 24 h urine 8-epi-prostaglandin F2α (EPI8, 24 h urine 11-dehydro-thromboxane B2 (DEH11, and high-density lipoprotein cholesterol (HDL. Methods Random Forest was used for initial variable selection and Multivariate Adaptive Regression Spline was used for developing the final statistical models Results The analysis resulted in the generation of models that predict each of the BOPH as function of selected variables from the smokers and nonsmokers. The statistically significant variables in the models were: platelet count, hemoglobin, C-reactive protein, triglycerides, race and biomarkers of exposure to cigarette smoke for WBC (R-squared = 0.29; creatinine clearance, liver enzymes, weight, vitamin use and biomarkers of exposure for EPI8 (R-squared = 0.41; creatinine clearance, urine creatinine excretion, liver enzymes, use of Non-steroidal antiinflammatory drugs, vitamins and biomarkers of exposure for DEH11 (R-squared = 0.29; and triglycerides, weight, age, sex, alcohol consumption and biomarkers of exposure for HDL (R-squared = 0.39. Conclusions Levels of WBC, EPI8, DEH11 and HDL were statistically associated with biomarkers of exposure to cigarette smoking and demographics and life style factors. All of the predictors togather explain 29%-41% of the variability in the BOPH.
Chardon, Jérémy; Hingray, Benoit; Favre, Anne-Catherine
2016-04-01
Scenarios of surface weather required for the impact studies have to be unbiased and adapted to the space and time scales of the considered hydro-systems. Hence, surface weather scenarios obtained from global climate models and/or numerical weather prediction models are not really appropriated. Outputs of these models have to be post-processed, which is often carried out thanks to Statistical Downscaling Methods (SDMs). Among those SDMs, approaches based on regression are often applied. For a given station, a regression link can be established between a set of large scale atmospheric predictors and the surface weather variable. These links are then used for the prediction of the latter. However, physical processes generating surface weather vary in time. This is well known for precipitation for instance. The most relevant predictors and the regression link are also likely to vary in time. A better prediction skill is thus classically obtained with a seasonal stratification of the data. Another strategy is to identify the most relevant predictor set and establish the regression link from dates that are similar - or analog - to the target date. In practice, these dates can be selected thanks to an analog model. In this study, we explore the possibility of improving the local performance of an analog model - where the analogy is applied to the geopotential heights 1000 and 500 hPa - using additional local scale predictors for the probabilistic prediction of the Safran precipitation over France. For each prediction day, the prediction is obtained from two GLM regression models - for both the occurrence and the quantity of precipitation - for which predictors and parameters are estimated from the analog dates. Firstly, the resulting combined model noticeably allows increasing the prediction performance by adapting the downscaling link for each prediction day. Secondly, the selected predictors for a given prediction depend on the large scale situation and on the
TWO REGRESSION CREDIBILITY MODELS
Constanţa-Nicoleta BODEA
2010-03-01
Full Text Available In this communication we will discuss two regression credibility models from Non – Life Insurance Mathematics that can be solved by means of matrix theory. In the first regression credibility model, starting from a well-known representation formula of the inverse for a special class of matrices a risk premium will be calculated for a contract with risk parameter θ. In the next regression credibility model, we will obtain a credibility solution in the form of a linear combination of the individual estimate (based on the data of a particular state and the collective estimate (based on aggregate USA data. To illustrate the solution with the properties mentioned above, we shall need the well-known representation theorem for a special class of matrices, the properties of the trace for a square matrix, the scalar product of two vectors, the norm with respect to a positive definite matrix given in advance and the complicated mathematical properties of conditional expectations and of conditional covariances.
Rounaghi, Mohammad Mahdi; Abbaszadeh, Mohammad Reza; Arashi, Mohammad
2015-11-01
One of the most important topics of interest to investors is stock price changes. Investors whose goals are long term are sensitive to stock price and its changes and react to them. In this regard, we used multivariate adaptive regression splines (MARS) model and semi-parametric splines technique for predicting stock price in this study. The MARS model as a nonparametric method is an adaptive method for regression and it fits for problems with high dimensions and several variables. semi-parametric splines technique was used in this study. Smoothing splines is a nonparametric regression method. In this study, we used 40 variables (30 accounting variables and 10 economic variables) for predicting stock price using the MARS model and using semi-parametric splines technique. After investigating the models, we select 4 accounting variables (book value per share, predicted earnings per share, P/E ratio and risk) as influencing variables on predicting stock price using the MARS model. After fitting the semi-parametric splines technique, only 4 accounting variables (dividends, net EPS, EPS Forecast and P/E Ratio) were selected as variables effective in forecasting stock prices.
Paulino José García Nieto
2016-05-01
Full Text Available Remaining useful life (RUL estimation is considered as one of the most central points in the prognostics and health management (PHM. The present paper describes a nonlinear hybrid ABC–MARS-based model for the prediction of the remaining useful life of aircraft engines. Indeed, it is well-known that an accurate RUL estimation allows failure prevention in a more controllable way so that the effective maintenance can be carried out in appropriate time to correct impending faults. The proposed hybrid model combines multivariate adaptive regression splines (MARS, which have been successfully adopted for regression problems, with the artificial bee colony (ABC technique. This optimization technique involves parameter setting in the MARS training procedure, which significantly influences the regression accuracy. However, its use in reliability applications has not yet been widely explored. Bearing this in mind, remaining useful life values have been predicted here by using the hybrid ABC–MARS-based model from the remaining measured parameters (input variables for aircraft engines with success. A correlation coefficient equal to 0.92 was obtained when this hybrid ABC–MARS-based model was applied to experimental data. The agreement of this model with experimental data confirmed its good performance. The main advantage of this predictive model is that it does not require information about the previous operation states of the aircraft engine.
Adaptive Rank Penalized Estimators in Multivariate Regression
Bunea, Florentina; Wegkamp, Marten
2010-01-01
We introduce a new criterion, the Rank Selection Criterion (RSC), for selecting the optimal reduced rank estimator of the coefficient matrix in multivariate response regression models. The corresponding RSC estimator minimizes the Frobenius norm of the fit plus a regularization term proportional to the number of parameters in the reduced rank model. The rank of the RSC estimator provides a consistent estimator of the rank of the coefficient matrix. The consistency results are valid not only in the classic asymptotic regime, when the number of responses $n$ and predictors $p$ stays bounded, and the number of observations $m$ grows, but also when either, or both, $n$ and $p$ grow, possibly much faster than $m$. Our finite sample prediction and estimation performance bounds show that the RSC estimator achieves the optimal balance between the approximation error and the penalty term. Furthermore, our procedure has very low computational complexity, linear in the number of candidate models, making it particularly ...
Kisi, Ozgur; Parmar, Kulwinder Singh
2016-03-01
This study investigates the accuracy of least square support vector machine (LSSVM), multivariate adaptive regression splines (MARS) and M5 model tree (M5Tree) in modeling river water pollution. Various combinations of water quality parameters, Free Ammonia (AMM), Total Kjeldahl Nitrogen (TKN), Water Temperature (WT), Total Coliform (TC), Fecal Coliform (FC) and Potential of Hydrogen (pH) monitored at Nizamuddin, Delhi Yamuna River in India were used as inputs to the applied models. Results indicated that the LSSVM and MARS models had almost same accuracy and they performed better than the M5Tree model in modeling monthly chemical oxygen demand (COD). The average root mean square error (RMSE) of the LSSVM and M5Tree models was decreased by 1.47% and 19.1% using MARS model, respectively. Adding TC input to the models did not increase their accuracy in modeling COD while adding FC and pH inputs to the models generally decreased the accuracy. The overall results indicated that the MARS and LSSVM models could be successfully used in estimating monthly river water pollution level by using AMM, TKN and WT parameters as inputs.
Forecasting with Dynamic Regression Models
Pankratz, Alan
2012-01-01
One of the most widely used tools in statistical forecasting, single equation regression models is examined here. A companion to the author's earlier work, Forecasting with Univariate Box-Jenkins Models: Concepts and Cases, the present text pulls together recent time series ideas and gives special attention to possible intertemporal patterns, distributed lag responses of output to input series and the auto correlation patterns of regression disturbance. It also includes six case studies.
Survival Data and Regression Models
Grégoire, G.
2014-12-01
We start this chapter by introducing some basic elements for the analysis of censored survival data. Then we focus on right censored data and develop two types of regression models. The first one concerns the so-called accelerated failure time models (AFT), which are parametric models where a function of a parameter depends linearly on the covariables. The second one is a semiparametric model, where the covariables enter in a multiplicative form in the expression of the hazard rate function. The main statistical tool for analysing these regression models is the maximum likelihood methodology and, in spite we recall some essential results about the ML theory, we refer to the chapter "Logistic Regression" for a more detailed presentation.
Prediction of longitudinal dispersion coefficient using multivariate adaptive regression splines
Amir Hamzeh Haghiabi
2016-07-01
In this paper, multivariate adaptive regression splines (MARS) was developed as a novel soft-computingtechnique for predicting longitudinal dispersion coefficient (DL) in rivers. As mentioned in the literature,experimental dataset related to DL was collected and used for preparing MARS model. Results of MARSmodel were compared with multi-layer neural network model and empirical formulas. To define the mosteffective parameters on DL, the Gamma test was used. Performance of MARS model was assessed bycalculation of standard error indices. Error indices showed that MARS model has suitable performanceand is more accurate compared to multi-layer neural network model and empirical formulas. Results ofthe Gamma test and MARS model showed that flow depth (H) and ratio of the mean velocity to shearvelocity (u/u^∗) were the most effective parameters on the DL.
Fuzzy linear regression forecasting models
吴冲; 惠晓峰; 朱洪文
2002-01-01
The fuzzy linear regression forecasting model is deduced from the symmetric triangular fuzzy number.With the help of the degree of fitting and the measure of fuzziness, the determination of symmetric triangularfuzzy numbers is changed into a problem of solving linear programming.
Heteroscedasticity checks for regression models
无
2001-01-01
For checking on heteroscedasticity in regression models, a unified approach is proposed to constructing test statistics in parametric and nonparametric regression models. For nonparametric regression, the test is not affected sensitively by the choice of smoothing parameters which are involved in estimation of the nonparametric regression function. The limiting null distribution of the test statistic remains the same in a wide range of the smoothing parameters. When the covariate is one-dimensional, the tests are, under some conditions, asymptotically distribution-free. In the high-dimensional cases, the validity of bootstrap approximations is investigated. It is shown that a variant of the wild bootstrap is consistent while the classical bootstrap is not in the general case, but is applicable if some extra assumption on conditional variance of the squared error is imposed. A simulation study is performed to provide evidence of how the tests work and compare with tests that have appeared in the literature. The approach may readily be extended to handle partial linear, and linear autoregressive models.
Heteroscedastic transformation cure regression models.
Chen, Chyong-Mei; Chen, Chen-Hsin
2016-06-30
Cure models have been applied to analyze clinical trials with cures and age-at-onset studies with nonsusceptibility. Lu and Ying (On semiparametric transformation cure model. Biometrika 2004; 91:331?-343. DOI: 10.1093/biomet/91.2.331) developed a general class of semiparametric transformation cure models, which assumes that the failure times of uncured subjects, after an unknown monotone transformation, follow a regression model with homoscedastic residuals. However, it cannot deal with frequently encountered heteroscedasticity, which may result from dispersed ranges of failure time span among uncured subjects' strata. To tackle the phenomenon, this article presents semiparametric heteroscedastic transformation cure models. The cure status and the failure time of an uncured subject are fitted by a logistic regression model and a heteroscedastic transformation model, respectively. Unlike the approach of Lu and Ying, we derive score equations from the full likelihood for estimating the regression parameters in the proposed model. The similar martingale difference function to their proposal is used to estimate the infinite-dimensional transformation function. Our proposed estimating approach is intuitively applicable and can be conveniently extended to other complicated models when the maximization of the likelihood may be too tedious to be implemented. We conduct simulation studies to validate large-sample properties of the proposed estimators and to compare with the approach of Lu and Ying via the relative efficiency. The estimating method and the two relevant goodness-of-fit graphical procedures are illustrated by using breast cancer data and melanoma data. Copyright © 2016 John Wiley & Sons, Ltd. PMID:26887342
Heteroscedasticity checks for regression models
ZHU; Lixing
2001-01-01
［1］Carroll, R. J., Ruppert, D., Transformation and Weighting in Regression, New York: Chapman and Hall, 1988.［2］Cook, R. D., Weisberg, S., Diagnostics for heteroscedasticity in regression, Biometrika, 1988, 70: 1—10.［3］Davidian, M., Carroll, R. J., Variance function estimation, J. Amer. Statist. Assoc., 1987, 82: 1079—1091.［4］Bickel, P., Using residuals robustly I: Tests for heteroscedasticity, Ann. Statist., 1978, 6: 266—291.［5］Carroll, R. J., Ruppert, D., On robust tests for heteroscedasticity, Ann. Statist., 1981, 9: 205—209.［6］Eubank, R. L., Thomas, W., Detecting heteroscedasticity in nonparametric regression, J. Roy. Statist. Soc., Ser. B, 1993, 55: 145—155.［7］Diblasi, A., Bowman, A., Testing for constant variance in a linear model, Statist. and Probab. Letters, 1997, 33: 95—103.［8］Dette, H., Munk, A., Testing heteoscedasticity in nonparametric regression, J. R. Statist. Soc. B, 1998, 60: 693—708.［9］Müller, H. G., Zhao, P. L., On a semi-parametric variance function model and a test for heteroscedasticity, Ann. Statist., 1995, 23: 946—967.［10］Stute, W., Manteiga, G., Quindimil, M. P., Bootstrap approximations in model checks for regression, J. Amer. Statist. Asso., 1998, 93: 141—149.［11］Stute, W., Thies, G., Zhu, L. X., Model checks for regression: An innovation approach, Ann. Statist., 1998, 26: 1916—1939.［12］Shorack, G. R., Wellner, J. A., Empirical Processes with Applications to Statistics, New York: Wiley, 1986.［13］Efron, B., Bootstrap methods: Another look at the jackknife, Ann. Statist., 1979, 7: 1—26.［14］Wu, C. F. J., Jackknife, bootstrap and other re-sampling methods in regression analysis, Ann. Statist., 1986, 14: 1261—1295.［15］H rdle, W., Mammen, E., Comparing non-parametric versus parametric regression fits, Ann. Statist., 1993, 21: 1926—1947.［16］Liu, R. Y., Bootstrap procedures under some non-i.i.d. models, Ann. Statist., 1988, 16: 1696—1708.［17
RANDOM WEIGHTING METHOD FOR CENSORED REGRESSION MODEL
ZHAO Lincheng; FANG Yixin
2004-01-01
Rao and Zhao (1992) used random weighting method to derive the approximate distribution of the M-estimator in linear regression model. In this paper we extend the result to the censored regression model (or censored "Tobit" model).
Regression Models for Market-Shares
Birch, Kristina; Olsen, Jørgen Kai; Tjur, Tue
2005-01-01
On the background of a data set of weekly sales and prices for three brands of coffee, this paper discusses various regression models and their relation to the multiplicative competitive-interaction model (the MCI model, see Cooper 1988, 1993) for market-shares. Emphasis is put on the...... interpretation of the parameters in relation to models for the total sales based on discrete choice models.Key words and phrases. MCI model, discrete choice model, market-shares, price elasitcity, regression model....
The Infinite Hierarchical Factor Regression Model
Rai, Piyush
2009-01-01
We propose a nonparametric Bayesian factor regression model that accounts for uncertainty in the number of factors, and the relationship between factors. To accomplish this, we propose a sparse variant of the Indian Buffet Process and couple this with a hierarchical model over factors, based on Kingman's coalescent. We apply this model to two problems (factor analysis and factor regression) in gene-expression data analysis.
Regression modeling methods, theory, and computation with SAS
Panik, Michael
2009-01-01
Regression Modeling: Methods, Theory, and Computation with SAS provides an introduction to a diverse assortment of regression techniques using SAS to solve a wide variety of regression problems. The author fully documents the SAS programs and thoroughly explains the output produced by the programs.The text presents the popular ordinary least squares (OLS) approach before introducing many alternative regression methods. It covers nonparametric regression, logistic regression (including Poisson regression), Bayesian regression, robust regression, fuzzy regression, random coefficients regression,
Darnah
2016-04-01
Poisson regression has been used if the response variable is count data that based on the Poisson distribution. The Poisson distribution assumed equal dispersion. In fact, a situation where count data are over dispersion or under dispersion so that Poisson regression inappropriate because it may underestimate the standard errors and overstate the significance of the regression parameters, and consequently, giving misleading inference about the regression parameters. This paper suggests the generalized Poisson regression model to handling over dispersion and under dispersion on the Poisson regression model. The Poisson regression model and generalized Poisson regression model will be applied the number of filariasis cases in East Java. Based regression Poisson model the factors influence of filariasis are the percentage of families who don't behave clean and healthy living and the percentage of families who don't have a healthy house. The Poisson regression model occurs over dispersion so that we using generalized Poisson regression. The best generalized Poisson regression model showing the factor influence of filariasis is percentage of families who don't have healthy house. Interpretation of result the model is each additional 1 percentage of families who don't have healthy house will add 1 people filariasis patient.
Applied Regression Modeling A Business Approach
Pardoe, Iain
2012-01-01
An applied and concise treatment of statistical regression techniques for business students and professionals who have little or no background in calculusRegression analysis is an invaluable statistical methodology in business settings and is vital to model the relationship between a response variable and one or more predictor variables, as well as the prediction of a response value given values of the predictors. In view of the inherent uncertainty of business processes, such as the volatility of consumer spending and the presence of market uncertainty, business professionals use regression a
Speaker adaptation of HMMs using evolutionary strategy-based linear regression
Selouani, Sid-Ahmed; O'Shaughnessy, Douglas
2002-05-01
A new framework for speaker adaptation of continuous-density hidden Markov models (HMMs) is introduced. It aims to improve the robustness of speech recognizers by adapting HMM parameters to new conditions (e.g., from new speakers). It describes an optimization technique using an evolutionary strategy for linear regression-based spectral transformation. In classical iterative maximum likelihood linear regression (MLLR), a global transform matrix is estimated to make a general model better match particular target conditions. To permit adaptation on a small amount of data, a regression tree classification is performed. However, an important drawback of MLLR is that the number of regression classes is fixed. The new approach allows the degree of freedom of the global transform to be implicitly variable, as the evolutionary optimization permits the survival of only active classes. The fitness function is evaluated by the phoneme correctness through the evolution steps. The implementation requirements such as chromosome representation, selection function, genetic operators, and evaluation function have been chosen in order to lend more reliability to the global transformation matrix. Triphone experiments used the TIMIT and ARPA-RM1 databases. For new speakers, the new technique achieves 8 percent fewer word errors than the basic MLLR method.
Nonparametric and semiparametric dynamic additive regression models
Scheike, Thomas Harder; Martinussen, Torben
Dynamic additive regression models provide a flexible class of models for analysis of longitudinal data. The approach suggested in this work is suited for measurements obtained at random time points and aims at estimating time-varying effects. Both fully nonparametric and semiparametric models can...
A simple bivariate count data regression model
Shiferaw Gurmu; John Elder
2007-01-01
This paper develops a simple bivariate count data regression model in which dependence between count variables is introduced by means of stochastically related unobserved heterogeneity components. Unlike existing commonly used bivariate models, we obtain a computationally simple closed form of the model with an unrestricted correlation pattern.
A new bivariate negative binomial regression model
Faroughi, Pouya; Ismail, Noriszura
2014-12-01
This paper introduces a new form of bivariate negative binomial (BNB-1) regression which can be fitted to bivariate and correlated count data with covariates. The BNB regression discussed in this study can be fitted to bivariate and overdispersed count data with positive, zero or negative correlations. The joint p.m.f. of the BNB1 distribution is derived from the product of two negative binomial marginals with a multiplicative factor parameter. Several testing methods were used to check overdispersion and goodness-of-fit of the model. Application of BNB-1 regression is illustrated on Malaysian motor insurance dataset. The results indicated that BNB-1 regression has better fit than bivariate Poisson and BNB-2 models with regards to Akaike information criterion.
An Application on Multinomial Logistic Regression Model
Abdalla M El-Habil
2012-01-01
This study aims to identify an application of Multinomial Logistic Regression model which is one of the important methods for categorical data analysis. This model deals with one nominal/ordinal response variable that has more than two categories, whether nominal or ordinal variable. This model has been applied in data analysis in many areas, for example health, social, behavioral, and educational.To identify the model by practical way, we used real data on physical violence against children...
Empirical Bayes Estimation in Regression Model
Li-chun Wang
2005-01-01
This paper considers the empirical Bayes (EB) estimation problem for the parameterβ of the linear regression model y = Xβ + ε with ε～ N(0, σ2I) givenβ. Based on Pitman closeness (PC) criterion and mean square error matrix (MSEM) criterion, we prove the superiority of the EB estimator over the ordinary least square estimator (OLSE).
Bootstrap inference longitudinal semiparametric regression model
Pane, Rahmawati; Otok, Bambang Widjanarko; Zain, Ismaini; Budiantara, I. Nyoman
2016-02-01
Semiparametric regression contains two components, i.e. parametric and nonparametric component. Semiparametric regression model is represented by yt i=μ (x˜'ti,zt i)+εt i where μ (x˜'ti,zt i)=x˜'tiβ ˜+g (zt i) and yti is response variable. It is assumed to have a linear relationship with the predictor variables x˜'ti=(x1 i 1,x2 i 2,…,xT i r) . Random error εti, i = 1, …, n, t = 1, …, T is normally distributed with zero mean and variance σ2 and g(zti) is a nonparametric component. The results of this study showed that the PLS approach on longitudinal semiparametric regression models obtain estimators β˜^t=[X'H(λ)X]-1X'H(λ )y ˜ and g˜^λ(z )=M (λ )y ˜ . The result also show that bootstrap was valid on longitudinal semiparametric regression model with g^λ(b )(z ) as nonparametric component estimator.
Validation of a heteroscedastic hazards regression model.
Wu, Hong-Dar Isaac; Hsieh, Fushing; Chen, Chen-Hsin
2002-03-01
A Cox-type regression model accommodating heteroscedasticity, with a power factor of the baseline cumulative hazard, is investigated for analyzing data with crossing hazards behavior. Since the approach of partial likelihood cannot eliminate the baseline hazard, an overidentified estimating equation (OEE) approach is introduced in the estimation procedure. It by-product, a model checking statistic, is presented to test for the overall adequacy of the heteroscedastic model. Further, under the heteroscedastic model setting, we propose two statistics to test the proportional hazards assumption. Implementation of this model is illustrated in a data analysis of a cancer clinical trial. PMID:11878222
Multiple Imputations for LInear Regression Models
Brownstone, David
1991-01-01
Rubin (1987) has proposed multiple imputations as a general method for estimation in the presence of missing data. Rubinâ€™s results only strictly apply to Bayesian models, but Schenker and Welsh (1988) directly prove the consistency Â multiple imputations inference~ when there are missing values of the dependent variable in linear regression models. This paper extends and modifies Schenker and Welshâ€™s theorems to give conditions where multiple imputations yield consistent inferences for bo...
Identification of regression models - application in traffic
Dohnal, Pavel
Ljubljana : Jozef Stefan Institute, 2005, s. 1-5. [International PhD Workshop on Systems and Control a Young Generation Viewpoint /6./. Izola (SI), 04.10.2005-08.10.2005] R&D Projects: GA MŠk(CZ) 1M0572 Institutional research plan: CEZ:AV0Z10750506 Keywords : regression model * model order * intensity of traffic flow * prediction Subject RIV: BC - Control Systems Theory
Iliev, I. P.; Gocheva-Ilieva, S. G.
2013-02-01
Due to advancement in computing technology researchers are focusing onto novel predictive models and techniques for improving the design and performance of the devices. This study examines a recently developed high-powered SrBr2 laser excited in a nanosecond pulse longitudinal He-SrBr2 discharge. Based on the accumulated experiment data, a new approach is proposed for determining the relationship between laser output power and basic laser characteristics: geometric design, supplied electric power, helium pressure, etc. Piece-wise linear and nonlinear statistical models have been built with the help of the flexible predictive MARS technique. It is shown that the best nonlinear MARS model containing second degree terms provides the best description of the examined data, demonstrating a coefficient of determination of over 99%. The resulting theoretical models are used to estimate and predict the experiment as well as to analyze the local behavior of the relationships between laser output power and input laser characteristics. The second order model is applied to the design and optimization of the laser in order to enhance laser generation.
Estimation of a semiparametric contaminated regression model
Vandekerkhove, Pierre
2011-01-01
We consider in this paper a contamined regression model where the distribution of the contaminating component is known when the Eu- clidean parameters of the regression model, the noise distribution, the contamination ratio and the distribution of the design data are un- known. Our model is said to be semiparametric in the sense that the probability density function (pdf) of the noise involved in the regression model is not supposed to belong to a parametric density family. When the pdf's of the noise and the contaminating phenomenon are supposed to be symmetric about zero, we propose an estimator of the various (Eu- clidean and functionnal) parameters of the model, and prove under mild conditions its convergence. We prove in particular that, under technical conditions all satisfied in the Gaussian case, the Euclidean part of the model is estimated at the rate $o_{a.s}(n-1/4+\\gamma), $\\gamma> 0$. We recall that, as it is pointed out in Bordes and Vandekerkhove (2010), this result cannot be ignored to go furth...
Modeling oil production based on symbolic regression
Numerous models have been proposed to forecast the future trends of oil production and almost all of them are based on some predefined assumptions with various uncertainties. In this study, we propose a novel data-driven approach that uses symbolic regression to model oil production. We validate our approach on both synthetic and real data, and the results prove that symbolic regression could effectively identify the true models beneath the oil production data and also make reliable predictions. Symbolic regression indicates that world oil production will peak in 2021, which broadly agrees with other techniques used by researchers. Our results also show that the rate of decline after the peak is almost half the rate of increase before the peak, and it takes nearly 12 years to drop 4% from the peak. These predictions are more optimistic than those in several other reports, and the smoother decline will provide the world, especially the developing countries, with more time to orchestrate mitigation plans. -- Highlights: •A data-driven approach has been shown to be effective at modeling the oil production. •The Hubbert model could be discovered automatically from data. •The peak of world oil production is predicted to appear in 2021. •The decline rate after peak is half of the increase rate before peak. •Oil production projected to decline 4% post-peak
Efficient robust nonparametric estimation in a semimartingale regression model
Konev, Victor
2010-01-01
The paper considers the problem of robust estimating a periodic function in a continuous time regression model with dependent disturbances given by a general square integrable semimartingale with unknown distribution. An example of such a noise is non-gaussian Ornstein-Uhlenbeck process with the L\\'evy process subordinator, which is used to model the financial Black-Scholes type markets with jumps. An adaptive model selection procedure, based on the weighted least square estimates, is proposed. Under general moment conditions on the noise distribution, sharp non-asymptotic oracle inequalities for the robust risks have been derived and the robust efficiency of the model selection procedure has been shown.
Predictive densities for day-ahead electricity prices using time-adaptive quantile regression
Jónsson, Tryggvi; Pinson, Pierre; Madsen, Henrik;
2014-01-01
A large part of the decision-making problems actors of the power system are facing on a daily basis requires scenarios for day-ahead electricity market prices. These scenarios are most likely to be generated based on marginal predictive densities for such prices, then enhanced with a temporal...... dependence structure. A semi-parametric methodology for generating such densities is presented: it includes: (i) a time-adaptive quantile regression model for the 5%–95% quantiles; and (ii) a description of the distribution tails with exponential distributions. The forecasting skill of the proposed model is...
An Application on Multinomial Logistic Regression Model
Abdalla M El-Habil
2012-03-01
Full Text Available Normal 0 false false false EN-US X-NONE X-NONE This study aims to identify an application of Multinomial Logistic Regression model which is one of the important methods for categorical data analysis. This model deals with one nominal/ordinal response variable that has more than two categories, whether nominal or ordinal variable. This model has been applied in data analysis in many areas, for example health, social, behavioral, and educational.To identify the model by practical way, we used real data on physical violence against children, from a survey of Youth 2003 which was conducted by Palestinian Central Bureau of Statistics (PCBS. Segment of the population of children in the age group (10-14 years for residents in Gaza governorate, size of 66,935 had been selected, and the response variable consisted of four categories. Eighteen of explanatory variables were used for building the primary multinomial logistic regression model. Model had been tested through a set of statistical tests to ensure its appropriateness for the data. Also the model had been tested by selecting randomly of two observations of the data used to predict the position of each observation in any classified group it can be, by knowing the values of the explanatory variables used. We concluded by using the multinomial logistic regression model that we can able to define accurately the relationship between the group of explanatory variables and the response variable, identify the effect of each of the variables, and we can predict the classification of any individual case.
General regression and representation model for classification.
Jianjun Qian
Full Text Available Recently, the regularized coding-based classification methods (e.g. SRC and CRC show a great potential for pattern classification. However, most existing coding methods assume that the representation residuals are uncorrelated. In real-world applications, this assumption does not hold. In this paper, we take account of the correlations of the representation residuals and develop a general regression and representation model (GRR for classification. GRR not only has advantages of CRC, but also takes full use of the prior information (e.g. the correlations between representation residuals and representation coefficients and the specific information (weight matrix of image pixels to enhance the classification performance. GRR uses the generalized Tikhonov regularization and K Nearest Neighbors to learn the prior information from the training data. Meanwhile, the specific information is obtained by using an iterative algorithm to update the feature (or image pixel weights of the test sample. With the proposed model as a platform, we design two classifiers: basic general regression and representation classifier (B-GRR and robust general regression and representation classifier (R-GRR. The experimental results demonstrate the performance advantages of proposed methods over state-of-the-art algorithms.
Bayesian Inference of a Multivariate Regression Model
Marick S. Sinay
2014-01-01
Full Text Available We explore Bayesian inference of a multivariate linear regression model with use of a flexible prior for the covariance structure. The commonly adopted Bayesian setup involves the conjugate prior, multivariate normal distribution for the regression coefficients and inverse Wishart specification for the covariance matrix. Here we depart from this approach and propose a novel Bayesian estimator for the covariance. A multivariate normal prior for the unique elements of the matrix logarithm of the covariance matrix is considered. Such structure allows for a richer class of prior distributions for the covariance, with respect to strength of beliefs in prior location hyperparameters, as well as the added ability, to model potential correlation amongst the covariance structure. The posterior moments of all relevant parameters of interest are calculated based upon numerical results via a Markov chain Monte Carlo procedure. The Metropolis-Hastings-within-Gibbs algorithm is invoked to account for the construction of a proposal density that closely matches the shape of the target posterior distribution. As an application of the proposed technique, we investigate a multiple regression based upon the 1980 High School and Beyond Survey.
An operational GLS model for hydrologic regression
Tasker, Gary D.; Stedinger, J.R.
1989-01-01
Recent Monte Carlo studies have documented the value of generalized least squares (GLS) procedures to estimate empirical relationships between streamflow statistics and physiographic basin characteristics. This paper presents a number of extensions of the GLS method that deal with realities and complexities of regional hydrologic data sets that were not addressed in the simulation studies. These extensions include: (1) a more realistic model of the underlying model errors; (2) smoothed estimates of cross correlation of flows; (3) procedures for including historical flow data; (4) diagnostic statistics describing leverage and influence for GLS regression; and (5) the formulation of a mathematical program for evaluating future gaging activities. ?? 1989.
Regression models for expected length of stay.
Grand, Mia Klinten; Putter, Hein
2016-03-30
In multi-state models, the expected length of stay (ELOS) in a state is not a straightforward object to relate to covariates, and the traditional approach has instead been to construct regression models for the transition intensities and calculate ELOS from these. The disadvantage of this approach is that the effect of covariates on the intensities is not easily translated into the effect on ELOS, and it typically relies on the Markov assumption. We propose to use pseudo-observations to construct regression models for ELOS, thereby allowing a direct interpretation of covariate effects while at the same time avoiding the Markov assumption. For this approach, all we need is a non-parametric consistent estimator for ELOS. For every subject (and for every state of interest), a pseudo-observation is constructed, and they are then used as outcome variables in the regression model. We furthermore show how to construct longitudinal (pseudo-) data when combining the concept of pseudo-observations with landmarking. In doing so, covariates are allowed to be time-varying, and we can investigate potential time-varying effects of the covariates. The models can be fitted using generalized estimating equations, and dependence between observations on the same subject is handled by applying the sandwich estimator. The method is illustrated using data from the US Health and Retirement Study where the impact of socio-economic factors on ELOS in health and disability is explored. Finally, we investigate the performance of our approach under different degrees of left-truncation, non-Markovianity, and right-censoring by means of simulation. PMID:26497637
An Adaptive Support Vector Regression Machine for the State Prognosis of Mechanical Systems
Qing Zhang
2015-01-01
Full Text Available Due to the unsteady state evolution of mechanical systems, the time series of state indicators exhibits volatile behavior and staged characteristics. To model hidden trends and predict deterioration failure utilizing volatile state indicators, an adaptive support vector regression (ASVR machine is proposed. In ASVR, the width of an error-insensitive tube, which is a constant in the traditional support vector regression, is set as a variable determined by the transient distribution boundary of local regions in the training time series. Thus, the localized regions are obtained using a sliding time window, and their boundaries are defined by a robust measure known as the truncated range. Utilizing an adaptive error-insensitive tube, a stabilized tolerance level for noise is achieved, whether the time series occurs in low-volatility regions or in high-volatility regions. The proposed method is evaluated by vibrational data measured on descaling pumps. The results show that ASVR is capable of capturing the local trends of the volatile time series of state indicators and is superior to the standard support vector regression for state prediction.
Hierarchical linear regression models for conditional quantiles
TIAN; Maozai
2006-01-01
The quantile regression has several useful features and therefore is gradually developing into a comprehensive approach to the statistical analysis of linear and nonlinear response models,but it cannot deal effectively with the data with a hierarchical structure.In practice,the existence of such data hierarchies is neither accidental nor ignorable,it is a common phenomenon.To ignore this hierarchical data structure risks overlooking the importance of group effects,and may also render many of the traditional statistical analysis techniques used for studying data relationships invalid.On the other hand,the hierarchical models take a hierarchical data structure into account and have also many applications in statistics,ranging from overdispersion to constructing min-max estimators.However,the hierarchical models are virtually the mean regression,therefore,they cannot be used to characterize the entire conditional distribution of a dependent variable given high-dimensional covariates.Furthermore,the estimated coefficient vector (marginal effects)is sensitive to an outlier observation on the dependent variable.In this article,a new approach,which is based on the Gauss-Seidel iteration and taking a full advantage of the quantile regression and hierarchical models,is developed.On the theoretical front,we also consider the asymptotic properties of the new method,obtaining the simple conditions for an n1/2-convergence and an asymptotic normality.We also illustrate the use of the technique with the real educational data which is hierarchical and how the results can be explained.
Quantile regression modeling for Malaysian automobile insurance premium data
Fuzi, Mohd Fadzli Mohd; Ismail, Noriszura; Jemain, Abd Aziz
2015-09-01
Quantile regression is a robust regression to outliers compared to mean regression models. Traditional mean regression models like Generalized Linear Model (GLM) are not able to capture the entire distribution of premium data. In this paper we demonstrate how a quantile regression approach can be used to model net premium data to study the effects of change in the estimates of regression parameters (rating classes) on the magnitude of response variable (pure premium). We then compare the results of quantile regression model with Gamma regression model. The results from quantile regression show that some rating classes increase as quantile increases and some decrease with decreasing quantile. Further, we found that the confidence interval of median regression (τ = O.5) is always smaller than Gamma regression in all risk factors.
Regression models for convex ROC curves.
Lloyd, C J
2000-09-01
The performance of a diagnostic test is summarized by its receiver operating characteristic (ROC) curve. Under quite natural assumptions about the latent variable underlying the test, the ROC curve is convex. Empirical data on a test's performance often comes in the form of observed true positive and false positive relative frequencies under varying conditions. This paper describes a family of regression models for analyzing such data. The underlying ROC curves are specified by a quality parameter delta and a shape parameter mu and are guaranteed to be convex provided delta > 1. Both the position along the ROC curve and the quality parameter delta are modeled linearly with covariates at the level of the individual. The shape parameter mu enters the model through the link functions log(p mu) - log(1 - p mu) of a binomial regression and is estimated either by search or from an appropriate constructed variate. One simple application is to the meta-analysis of independent studies of the same diagnostic test, illustrated on some data of Moses, Shapiro, and Littenberg (1993). A second application, to so-called vigilance data, is given, where ROC curves differ across subjects and modeling of the position along the ROC curve is of primary interest. PMID:10985227
Regression Models For Saffron Yields in Iran
S. H, Sanaeinejad; S. N, Hosseini
Saffron is an important crop in social and economical aspects in Khorassan Province (Northeast of Iran). In this research wetried to evaluate trends of saffron yield in recent years and to study the relationship between saffron yield and the climate change. A regression analysis was used to predict saffron yield based on 20 years of yield data in Birjand, Ghaen and Ferdows cities.Climatologically data for the same periods was provided by database of Khorassan Climatology Center. Climatologically data includedtemperature, rainfall, relative humidity and sunshine hours for ModelI, and temperature and rainfall for Model II. The results showed the coefficients of determination for Birjand, Ferdows and Ghaen for Model I were 0.69, 0.50 and 0.81 respectively. Also coefficients of determination for the same cities for model II were 0.53, 0.50 and 0.72 respectively. Multiple regression analysisindicated that among weather variables, temperature was the key parameter for variation ofsaffron yield. It was concluded that increasing temperature at spring was the main cause of declined saffron yield during recent years across the province. Finally, yield trend was predicted for the last 5 years using time series analysis.
An Additive-Multiplicative Cox-Aalen Regression Model
Scheike, Thomas H.; Zhang, Mei-Jie
2002-01-01
Aalen model; additive risk model; counting processes; Cox regression; survival analysis; time-varying effects......Aalen model; additive risk model; counting processes; Cox regression; survival analysis; time-varying effects...
Inferring gene regression networks with model trees
Aguilar-Ruiz Jesus S
2010-10-01
Full Text Available Abstract Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear
Multiple Linear Regression Models in Outlier Detection
S.M.A.Khaleelur Rahman
2012-02-01
Full Text Available Identifying anomalous values in the real-world database is important both for improving the quality of original data and for reducing the impact of anomalous values in the process of knowledge discovery in databases. Such anomalous values give useful information to the data analyst in discovering useful patterns. Through isolation, these data may be separated and analyzed. The analysis of outliers and influential points is an important step of the regression diagnostics. In this paper, our aim is to detect the points which are very different from the others points. They do not seem to belong to a particular population and behave differently. If these influential points are to be removed it will lead to a different model. Distinction between these points is not always obvious and clear. Hence several indicators are used for identifying and analyzing outliers. Existing methods of outlier detection are based on manual inspection of graphically represented data. In this paper, we present a new approach in automating the process of detecting and isolating outliers. Impact of anomalous values on the dataset has been established by using two indicators DFFITS and Cook’sD. The process is based on modeling the human perception of exceptional values by using multiple linear regression analysis.
Entrepreneurial intention modeling using hierarchical multiple regression
Marina Jeger
2014-12-01
Full Text Available The goal of this study is to identify the contribution of effectuation dimensions to the predictive power of the entrepreneurial intention model over and above that which can be accounted for by other predictors selected and confirmed in previous studies. As is often the case in social and behavioral studies, some variables are likely to be highly correlated with each other. Therefore, the relative amount of variance in the criterion variable explained by each of the predictors depends on several factors such as the order of variable entry and sample specifics. The results show the modest predictive power of two dimensions of effectuation prior to the introduction of the theory of planned behavior elements. The article highlights the main advantages of applying hierarchical regression in social sciences as well as in the specific context of entrepreneurial intention formation, and addresses some of the potential pitfalls that this type of analysis entails.
An adaptive online learning approach for Support Vector Regression: Online-SVR-FID
Liu, Jie; Zio, Enrico
2016-08-01
Support Vector Regression (SVR) is a popular supervised data-driven approach for building empirical models from available data. Like all data-driven methods, under non-stationary environmental and operational conditions it needs to be provided with adaptive learning capabilities, which might become computationally burdensome with large datasets cumulating dynamically. In this paper, a cost-efficient online adaptive learning approach is proposed for SVR by combining Feature Vector Selection (FVS) and Incremental and Decremental Learning. The proposed approach adaptively modifies the model only when different pattern drifts are detected according to proposed criteria. Two tolerance parameters are introduced in the approach to control the computational complexity, reduce the influence of the intrinsic noise in the data and avoid the overfitting problem of SVR. Comparisons of the prediction results is made with other online learning approaches e.g. NORMA, SOGA, KRLS, Incremental Learning, on several artificial datasets and a real case study concerning time series prediction based on data recorded on a component of a nuclear power generation system. The performance indicators MSE and MARE computed on the test dataset demonstrate the efficiency of the proposed online learning method.
Hierarchical sparsity priors for regression models
Griffin, Jim E.; Brown, Philip J
2013-01-01
We focus on the increasingly important area of sparse regression problems where there are many variables and the effects of a large subset of these are negligible. This paper describes the construction of hierarchical prior distributions when the effects are considered related. These priors allow dependence between the regression coefficients and encourage related shrinkage towards zero of different regression coefficients. The properties of these priors are discussed and applications to line...
Some Priors for Sparse Regression Modelling
Griffin, Jim E.; Brown, Philip J
2013-01-01
A wide range of methods, Bayesian and others, tackle regression when there are many variables. In the Bayesian context, the prior is constructed to reflect ideas of variable selection and to encourage appropriate shrinkage. The prior needs to be reasonably robust to different signal to noise structures. Two simple evergreen prior constructions stem from ridge regression on the one hand and g-priors on the other. We seek to embed recent ideas about sparsity of the regression coefficients and r...
Irwansyah, Edy
2015-01-01
Cox Proportional Hazard (Cox PH) model is a survival analysis method to perform model of relationship between independent variable and dependent variable which shown by time until an event occurs. This method compute residuals, martingale or deviance, which can used to diagnostic the lack of fit of a model and PH assumption. The alternative method if these not satisfied is Multivariate Adaptive Regression Splines (MARS) approach. This method use to perform the analysis of product selling time...
Ahmadlou, M.; Delavar, M. R.; Tayyebi, A.; Shafizadeh-Moghadam, H.
2015-12-01
Land use change (LUC) models used for modelling urban growth are different in structure and performance. Local models divide the data into separate subsets and fit distinct models on each of the subsets. Non-parametric models are data driven and usually do not have a fixed model structure or model structure is unknown before the modelling process. On the other hand, global models perform modelling using all the available data. In addition, parametric models have a fixed structure before the modelling process and they are model driven. Since few studies have compared local non-parametric models with global parametric models, this study compares a local non-parametric model called multivariate adaptive regression spline (MARS), and a global parametric model called artificial neural network (ANN) to simulate urbanization in Mumbai, India. Both models determine the relationship between a dependent variable and multiple independent variables. We used receiver operating characteristic (ROC) to compare the power of the both models for simulating urbanization. Landsat images of 1991 (TM) and 2010 (ETM+) were used for modelling the urbanization process. The drivers considered for urbanization in this area were distance to urban areas, urban density, distance to roads, distance to water, distance to forest, distance to railway, distance to central business district, number of agricultural cells in a 7 by 7 neighbourhoods, and slope in 1991. The results showed that the area under the ROC curve for MARS and ANN was 94.77% and 95.36%, respectively. Thus, ANN performed slightly better than MARS to simulate urban areas in Mumbai, India.
Synthesis analysis of regression models with a continuous outcome
Zhou, Xiao-Hua; Hu, Nan; Hu, Guizhou; Root, Martin
2009-01-01
To estimate the multivariate regression model from multiple individual studies, it would be challenging to obtain results if the input from individual studies only provide univariate or incomplete multivariate regression information. Samsa et al. (J. Biomed. Biotechnol. 2005; 2:113–123) proposed a simple method to combine coefficients from univariate linear regression models into a multivariate linear regression model, a method known as synthesis analysis. However, the validity of this method...
Model performance analysis and model validation in logistic regression
Rosa Arboretti Giancristofaro
2007-10-01
Full Text Available In this paper a new model validation procedure for a logistic regression model is presented. At first, we illustrate a brief review of different techniques of model validation. Next, we define a number of properties required for a model to be considered "good", and a number of quantitative performance measures. Lastly, we describe a methodology for the assessment of the performance of a given model by using an example taken from a management study.
Intra-fraction tumor tracking methods can improve radiation delivery during radiotherapy sessions. Image acquisition for tumor tracking and subsequent adjustment of the treatment beam with gating or beam tracking introduces time latency and necessitates predicting the future position of the tumor. This study evaluates the use of multi-dimensional linear adaptive filters and support vector regression to predict the motion of lung tumors tracked at 30 Hz. We expand on the prior work of other groups who have looked at adaptive filters by using a general framework of a multiple-input single-output (MISO) adaptive system that uses multiple correlated signals to predict the motion of a tumor. We compare the performance of these two novel methods to conventional methods like linear regression and single-input, single-output adaptive filters. At 400 ms latency the average root-mean-square-errors (RMSEs) for the 14 treatment sessions studied using no prediction, linear regression, single-output adaptive filter, MISO and support vector regression are 2.58, 1.60, 1.58, 1.71 and 1.26 mm, respectively. At 1 s, the RMSEs are 4.40, 2.61, 3.34, 2.66 and 1.93 mm, respectively. We find that support vector regression most accurately predicts the future tumor position of the methods studied and can provide a RMSE of less than 2 mm at 1 s latency. Also, a multi-dimensional adaptive filter framework provides improved performance over single-dimension adaptive filters. Work is underway to combine these two frameworks to improve performance.
Bayesian Model Averaging in the Instrumental Variable Regression Model
Gary Koop; Robert Leon Gonzalez; Rodney Strachan
2011-01-01
This paper considers the instrumental variable regression model when there is uncertainly about the set of instruments, exogeneity restrictions, the validity of identifying restrictions and the set of exogenous regressors. This uncertainly can result in a huge number of models. To avoid statistical problems associated with standard model selection procedures, we develop a reversible jump Markov chain Monte Carlo algorithm that allows us to do Bayesian model averaging. The algorithm is very fl...
Electricity prices forecasting by automatic dynamic harmonic regression models
The changes experienced by electricity markets in recent years have created the necessity for more accurate forecast tools of electricity prices, both for producers and consumers. Many methodologies have been applied to this aim, but in the view of the authors, state space models are not yet fully exploited. The present paper proposes a univariate dynamic harmonic regression model set up in a state space framework for forecasting prices in these markets. The advantages of the approach are threefold. Firstly, a fast automatic identification and estimation procedure is proposed based on the frequency domain. Secondly, the recursive algorithms applied offer adaptive predictions that compare favourably with respect to other techniques. Finally, since the method is based on unobserved components models, explicit information about trend, seasonal and irregular behaviours of the series can be extracted. This information is of great value to the electricity companies' managers in order to improve their strategies, i.e. it provides management innovations. The good forecast performance and the rapid adaptability of the model to changes in the data are illustrated with actual prices taken from the PJM interconnection in the US and for the Spanish market for the year 2002
Model Selection in Kernel Ridge Regression
Exterkate, Peter
Kernel ridge regression is gaining popularity as a data-rich nonlinear forecasting tool, which is applicable in many different contexts. This paper investigates the influence of the choice of kernel and the setting of tuning parameters on forecast accuracy. We review several popular kernels...
Various statistical techniques was used on five-year data from 1998-2002 of average humidity, rainfall, maximum and minimum temperatures, respectively. The relationships to regression analysis time series (RATS) were developed for determining the overall trend of these climate parameters on the basis of which forecast models can be corrected and modified. We computed the coefficient of determination as a measure of goodness of fit, to our polynomial regression analysis time series (PRATS). The correlation to multiple linear regression (MLR) and multiple linear regression analysis time series (MLRATS) were also developed for deciphering the interdependence of weather parameters. Spearman's rand correlation and Goldfeld-Quandt test were used to check the uniformity or non-uniformity of variances in our fit to polynomial regression (PR). The Breusch-Pagan test was applied to MLR and MLRATS, respectively which yielded homoscedasticity. We also employed Bartlett's test for homogeneity of variances on a five-year data of rainfall and humidity, respectively which showed that the variances in rainfall data were not homogenous while in case of humidity, were homogenous. Our results on regression and regression analysis time series show the best fit to prediction modeling on climatic data of Quetta, Pakistan. (author)
NURWAHA Deogratias; WANG Xin-hou
2008-01-01
This paper presents a comparison study of two models for predicting the strength of rotor spun cotton yarns from fiber properties. The adaptive neuro-fuzzy system inference (ANFIS) and Multiple Linear Regression models are used to predict the rotor spun yarn strength. Fiber properties and yarn count are used as inputs to train the two models and the count-strength-product (CSP) was the target. The predictive performances of the two models are estimated and compared. We found that the ANFIS has a better predictive power in comparison with linear multipleregression model. The impact of each fiber property is also illustrated.
Zheng, Zhihui; Gao, Lei; Xiao, Liping; Zhou, Bin; Gao, Shibo
2015-12-01
Our purpose is to develop a detection algorithm capable of searching for generic interest objects in real time without large training sets and long-time training stages. Instead of the classical sliding window object detection paradigm, we employ an objectness measure to produce a small set of candidate windows efficiently using Binarized Normed Gradients and a Laplacian of Gaussian-like filter. We then extract Locally Adaptive Regression Kernels (LARKs) as descriptors both from a model image and the candidate windows which measure the likeness of a pixel to its surroundings. Using a matrix cosine similarity measure, the algorithm yields a scalar resemblance map, indicating the likelihood of similarity between the model and the candidate windows. By employing nonparametric significance tests and non-maxima suppression, we detect the presence of objects similar to the given model. Experiments show that the proposed detection paradigm can automatically detect the presence, the number, as well as location of similar objects to the given model. The high quality and efficiency of our method make it suitable for real time multi-category object detection applications.
Stochastic Approximation Methods for Latent Regression Item Response Models
von Davier, Matthias; Sinharay, Sandip
2010-01-01
This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…
Hong-Juan Li
2013-04-01
Full Text Available Electric load forecasting is an important issue for a power utility, associated with the management of daily operations such as energy transfer scheduling, unit commitment, and load dispatch. Inspired by strong non-linear learning capability of support vector regression (SVR, this paper presents a SVR model hybridized with the empirical mode decomposition (EMD method and auto regression (AR for electric load forecasting. The electric load data of the New South Wales (Australia market are employed for comparing the forecasting performances of different forecasting models. The results confirm the validity of the idea that the proposed model can simultaneously provide forecasting with good accuracy and interpretability.
Multiple Retrieval Models and Regression Models for Prior Art Search
Lopez, Patrice
2009-01-01
This paper presents the system called PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach presents three main characteristics: 1. The usage of multiple retrieval models (KL, Okapi) and term index definitions (lemma, phrase, concept) for the three languages considered in the present track (English, French, German) producing ten different sets of ranked results. 2. The merging of the different results based on multiple regression models using an additional validation set created from the patent collection. 3. The exploitation of patent metadata and of the citation structures for creating restricted initial working sets of patents and for producing a final re-ranking regression model. As we exploit specific metadata of the patent documents and the citation relations only at the creation of initial working sets and during the final post ranking step, our architecture remains generic and easy to extend.
Vargas, M.; Crossa, J.; Eeuwijk, van F.A.; Ramirez, M.E.; Sayre, K.
1999-01-01
Partial least squares (PLS) and factorial regression (FR) are statistical models that incorporate external environmental and/or cultivar variables for studying and interpreting genotype × environment interaction (GEl). The Additive Main effect and Multiplicative Interaction (AMMI) model uses only th
Combination of supervised and semi-supervised regression models for improved unbiased estimation
Arenas-Garía, Jeronimo; Moriana-Varo, Carlos; Larsen, Jan
2010-01-01
In this paper we investigate the steady-state performance of semisupervised regression models adjusted using a modified RLS-like algorithm, identifying the situations where the new algorithm is expected to outperform standard RLS. By using an adaptive combination of the supervised and semisupervi......In this paper we investigate the steady-state performance of semisupervised regression models adjusted using a modified RLS-like algorithm, identifying the situations where the new algorithm is expected to outperform standard RLS. By using an adaptive combination of the supervised and...
Modeling tourism flows through gravity models: A quantile regression approach
Santeramo, Fabio Gaetano; Morelli, Mariangela
2015-01-01
Gravity models are widely used to study tourism flows. The peculiarities of the segmented international demand for agritourism in Italy is examined by means of novel approach: a panel data quantile regression. We characterize the international demand for Italian agritourism with a large dataset, by considering data of thirty-three countries of origin, from 1998 to 2010. Distance and income are major determinants, but we also found that mutual agreements and high urbanization rates in countrie...
Multiple Retrieval Models and Regression Models for Prior Art Search
Lopez, Patrice; Romary, Laurent
2009-01-01
This paper presents the system called PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach presents three main characteristics: 1. The usage of multiple retrieval models (KL, Okapi) and term index definitions (lemma, phrase, concept) for the three languages considered in the present track (English, French, German) producing ten different sets of ranked results. 2. The merging of the different results based on multiple regression m...
Drought Patterns Forecasting using an Auto-Regressive Logistic Model
del Jesus, M.; Sheffield, J.; Méndez Incera, F. J.; Losada, I. J.; Espejo, A.
2014-12-01
Drought is characterized by a water deficit that may manifest across a large range of spatial and temporal scales. Drought may create important socio-economic consequences, many times of catastrophic dimensions. A quantifiable definition of drought is elusive because depending on its impacts, consequences and generation mechanism, different water deficit periods may be identified as a drought by virtue of some definitions but not by others. Droughts are linked to the water cycle and, although a climate change signal may not have emerged yet, they are also intimately linked to climate.In this work we develop an auto-regressive logistic model for drought prediction at different temporal scales that makes use of a spatially explicit framework. Our model allows to include covariates, continuous or categorical, to improve the performance of the auto-regressive component.Our approach makes use of dimensionality reduction (principal component analysis) and classification techniques (K-Means and maximum dissimilarity) to simplify the representation of complex climatic patterns, such as sea surface temperature (SST) and sea level pressure (SLP), while including information on their spatial structure, i.e. considering their spatial patterns. This procedure allows us to include in the analysis multivariate representation of complex climatic phenomena, as the El Niño-Southern Oscillation. We also explore the impact of other climate-related variables such as sun spots. The model allows to quantify the uncertainty of the forecasts and can be easily adapted to make predictions under future climatic scenarios. The framework herein presented may be extended to other applications such as flash flood analysis, or risk assessment of natural hazards.
Support vector regression model for complex target RCS predicting
Wang Gu; Chen Weishi; Miao Jungang
2009-01-01
The electromagnetic scattering computation has developed rapidly for many years; some computing problems for complex and coated targets cannot be solved by using the existing theory and computing models. A computing model based on data is established for making up the insufficiency of theoretic models. Based on the "support vector regression method", which is formulated on the principle of minimizing a structural risk, a data model to predicate the unknown radar cross section of some appointed targets is given. Comparison between the actual data and the results of this predicting model based on support vector regression method proved that the support vector regression method is workable and with a comparative precision.
Multiattribute shopping models and ridge regression analysis
Timmermans, HJP Harry
1981-01-01
Policy decisions regarding retailing facilities essentially involve multiple attributes of shopping centres. If mathematical shopping models are to contribute to these decision processes, their structure should reflect the multiattribute character of retailing planning. Examination of existing models shows that most operational shopping models include only two policy variables. A serious problem in the calibration of the existing multiattribute shopping models is that of multicollinearity ari...
Parametric and Non-Parametric Regression Models
Brabec, Marek
Stuttgart : E. Schweizerbart'sche Verlagsbuchhandlung, 2013 - (Hermanussen, M.), s. 194-199 ISBN 978-3-510-65278-5 Institutional support: RVO:67985807 Keywords : statistical modeling * non-parametric * semi-parametric * mixed-effects model * growth curve model Subject RIV: BB - Applied Statistics, Operational Research
Yunfeng Wu
2014-01-01
Full Text Available This paper presents a novel adaptive linear and normalized combination (ALNC method that can be used to combine the component radial basis function networks (RBFNs to implement better function approximation and regression tasks. The optimization of the fusion weights is obtained by solving a constrained quadratic programming problem. According to the instantaneous errors generated by the component RBFNs, the ALNC is able to perform the selective ensemble of multiple leaners by adaptively adjusting the fusion weights from one instance to another. The results of the experiments on eight synthetic function approximation and six benchmark regression data sets show that the ALNC method can effectively help the ensemble system achieve a higher accuracy (measured in terms of mean-squared error and the better fidelity (characterized by normalized correlation coefficient of approximation, in relation to the popular simple average, weighted average, and the Bagging methods.
Symbolic regression of generative network models
Menezes, Telmo
2014-01-01
Networks are a powerful abstraction with applicability to a variety of scientific fields. Models explaining their morphology and growth processes permit a wide range of phenomena to be more systematically analysed and understood. At the same time, creating such models is often challenging and requires insights that may be counter-intuitive. Yet there currently exists no general method to arrive at better models. We have developed an approach to automatically detect realistic decentralised network growth models from empirical data, employing a machine learning technique inspired by natural selection and defining a unified formalism to describe such models as computer programs. As the proposed method is completely general and does not assume any pre-existing models, it can be applied "out of the box" to any given network. To validate our approach empirically, we systematically rediscover pre-defined growth laws underlying several canonical network generation models and credible laws for diverse real-world netwo...
A SEMIPARAMETRIC BAYESIAN MODEL FOR CIRCULAR-LINEAR REGRESSION
We present a Bayesian approach to regress a circular variable on a linear predictor. The regression coefficients are assumed to have a nonparametric distribution with a Dirichlet process prior. The semiparametric Bayesian approach gives added flexibility to the model and is usefu...
Daily Reference Evapotranspiration Estimation using Linear Regression and ANN Models
Mallikarjuna, P.; Jyothy, S. A.; Sekhar Reddy, K. C.
2012-12-01
The present study investigates the applicability of linear regression and ANN models for estimating daily reference evapotranspiration (ET0) at Tirupati, Nellore, Rajahmundry, Anakapalli and Rajendranagar regions of Andhra Pradesh. The climatic parameters influencing daily ET0 were identified through multiple and partial correlation analysis. The daily temperature, wind velocity, relative humidity and sunshine hours mostly influenced the study area in the daily ET0 estimation. Linear regression models in terms of the climatic parameters influencing the region and, optimal neural network architectures considering these influencing climatic parameters as input parameters were developed. The models' performance in the estimation of ET0 was evaluated with that estimated by FAO-56 Penman-Montieth method. The regression models showed a satisfactory performance in the daily ET0 estimation for the regions selected for the present study. The optimal ANN (4,4,1) models, however, consistently showed an improved performance over regression models.
Marginal Regression Models with Varying Coefficients for Correlated Ordinal Data
Gieger, Christian
1999-01-01
This paper discusses marginal regression models for repeated or clustered ordinal measurements in which the coefficients of explanatory variables are allowed to vary as smooth functions of other covariates. We model the marginal response probabilities and the marginal pairwise association structure by two semiparametric regressions. To estimate the fixed parameters and varying coefficients in both models we derive an algorithm that is based on penalized generalized estimating equations. This...
Fuzzy Multiple Regression Model for Estimating Software Development Time
Venus Marza
2009-10-01
Full Text Available As software becomes more complex and its scope dramatically increase, the importance of research on developing methods for estimating software development time has perpetually increased, so accurate estimation is the main goal of software managers for reducing risks of projects. The purpose of this article is to introduce a new Fuzzy Multiple Regression approach, which has the higher accurate than other methods for estimating. Furthermore, we compare Fuzzy Multiple Regression model with Fuzzy Logic model & Multiple Regression model based on their accuracy.
REPRESENTATIVE VARIABLES IN A MULTIPLE REGRESSION MODEL
Barbu Bogdan POPESCU
2013-02-01
Full Text Available There are presented econometric models developed for analysis of banking exclusion of the economic crisis. Access to public goods and services is a condition „sine qua non” for open and efficient society. Availability of banking and payment of the entire population without discrimination in our opinion should be the primary objective of public service policy.
REPRESENTATIVE VARIABLES IN A MULTIPLE REGRESSION MODEL
Barbu Bogdan POPESCU; Lavinia Stefania TOTAN
2013-01-01
There are presented econometric models developed for analysis of banking exclusion of the economic crisis. Access to public goods and services is a condition „sine qua non” for open and efficient society. Availability of banking and payment of the entire population without discrimination in our opinion should be the primary objective of public service policy.
Modeling inequality and spread in multiple regression
Rolf Aaberge; Steinar Bjerve; Kjell Doksum
2006-01-01
We consider concepts and models for measuring inequality in the distribution of resources with a focus on how inequality varies as a function of covariates. Lorenz introduced a device for measuring inequality in the distribution of income that indicates how much the incomes below the u$^{th}$ quantile fall short of the egalitarian situation where everyone has the same income. Gini introduced a summary measure of inequality that is the average over u of the difference between the Lorenz curve ...
Residual diagnostics for cross-section time series regression models
Baum, Christopher F
2001-01-01
These routines support the diagnosis of groupwise heteroskedasticity and cross-sectional correlation in the context of a regression model fit to pooled cross-section time series (xt) data. Copyright 2001 by Stata Corporation.
Ahmad A. Saifan
2016-04-01
Full Text Available Regression testing is a safeguarding procedure to validate and verify adapted software, and guarantee that no errors have emerged. However, regression testing is very costly when testers need to re-execute all the test cases against the modified software. This paper proposes a new approach in regression test selection domain. The approach is based on meta-models (test models and structured models to decrease the number of test cases to be used in the regression testing process. The approach has been evaluated using three Java applications. To measure the effectiveness of the proposed approach, we compare the results using the re-test to all approaches. The results have shown that our approach reduces the size of test suite without negative impact on the effectiveness of the fault detection.
Switching regression models with non-normal errors
Pruska, Krystyna
1997-01-01
In this paper two forms of switching regression models with non-normal errors are considered. The pseudo maximum likelihood method is proposed for the estimation of their parameters. Monte Carlo experiments results are presented for a special switching regression model, too. In this research there are compared distributions of parameters estimators for different distributions of errors. The error distributions are as follows: normal, Student’s or Laplace’s. The maximum likel...
Brunsdon, Chris; Aitkin, Murray; Fotheringham, Stewart; Charlton, Martin
1999-01-01
Compares random coefficient modelling (RCM) and geographically weighted regression for spatially non-stationary regression. Relationship between limiting long-term illness (LLTI) and social factors; Variants of RCM; Factors contributing to the prevalence of LLTI.
Robust Depth-Weighted Wavelet for Nonparametric Regression Models
Lu LIN
2005-01-01
In the nonpaxametric regression models, the original regression estimators including kernel estimator, Fourier series estimator and wavelet estimator are always constructed by the weighted sum of data, and the weights depend only on the distance between the design points and estimation points. As a result these estimators are not robust to the perturbations in data. In order to avoid this problem, a new nonparametric regression model, called the depth-weighted regression model, is introduced and then the depth-weighted wavelet estimation is defined. The new estimation is robust to the perturbations in data, which attains very high breakdown value close to 1/2. On the other hand, some asymptotic behaviours such as asymptotic normality are obtained. Some simulations illustrate that the proposed wavelet estimator is more robust than the original wavelet estimator and, as a price to pay for the robustness, the new method is slightly less efficient than the original method.
Alternative regression models to assess increase in childhood BMI
Mansmann Ulrich
2008-09-01
Full Text Available Abstract Background Body mass index (BMI data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Methods Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs, quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS. We analyzed data of 4967 children participating in the school entry health examination in Bavaria, Germany, from 2001 to 2002. TV watching, meal frequency, breastfeeding, smoking in pregnancy, maternal obesity, parental social class and weight gain in the first 2 years of life were considered as risk factors for obesity. Results GAMLSS showed a much better fit regarding the estimation of risk factors effects on transformed and untransformed BMI data than common GLMs with respect to the generalized Akaike information criterion. In comparison with GAMLSS, quantile regression allowed for additional interpretation of prespecified distribution quantiles, such as quantiles referring to overweight or obesity. The variables TV watching, maternal BMI and weight gain in the first 2 years were directly, and meal frequency was inversely significantly associated with body composition in any model type examined. In contrast, smoking in pregnancy was not directly, and breastfeeding and parental social class were not inversely significantly associated with body composition in GLM models, but in GAMLSS and partly in quantile regression models. Risk factor specific BMI percentile curves could be estimated from GAMLSS and quantile regression models. Conclusion GAMLSS and quantile regression seem to be more appropriate than common GLMs for risk factor modeling of BMI data.
A generalized regression model for a binary response
Kateri, Maria; Agresti, Alan
2009-01-01
Abstract Logistic regression is the closest model, given its sufficient statistics, to the model of constant success probability in terms of Kullback-Leibler information. A generalized binary model has this property for the more general ?-divergence. These results generalize to multinomial and other discrete data.
Multivariate Regression Models for Estimating Journal Usefulness in Physics.
Bennion, Bruce C.; Karschamroon, Sunee
1984-01-01
This study examines possibility of ranking journals in physics by means of bibliometric regression models that estimate usefulness as it is reported by 167 physicists in United States and Canada. Development of four models, patterns of deviation from models, and validity and application are discussed. Twenty-six references are cited. (EJS)
Analysis of Sting Balance Calibration Data Using Optimized Regression Models
Ulbrich, N.; Bader, Jon B.
2010-01-01
Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.
Correlation between Production and Labor based on Regression Model
Constantin Anghelache
2015-01-01
In the theoretical analysis, dependency of variables is stochastic. Consideration of the residual variable within such a model is needed. Other factors that influence the score variable are grouped in the residual. Uni-factorial nonlinear models are linearized transformations that are applied to the variables, the regression model. So, for example, a model of the form turns into a linear model by logarithm the two terms of the above equality, resulting in linear function. This model is recomm...
General bound of overfitting for MLP regression models
Rynkiewicz, Joseph
2012-01-01
Multilayer perceptrons (MLP) with one hidden layer have been used for a long time to deal with non-linear regression. However, in some task, MLP's are too powerful models and a small mean square error (MSE) may be more due to overfitting than to actual modelling. If the noise of the regression model is Gaussian, the overfitting of the model is totally determined by the behavior of the likelihood ratio test statistic (LRTS), however in numerous cases the assumption of normality of the noise is...
Roseane Cavalcanti dos Santos
2012-08-01
Full Text Available The objective of this work was to estimate the stability and adaptability of pod and seed yield in runner peanut genotypes based on the nonlinear regression and AMMI analysis. Yield data from 11 trials, distributed in six environments and three harvests, carried out in the Northeast region of Brazil during the rainy season were used. Significant effects of genotypes (G, environments (E, and GE interactions were detected in the analysis, indicating different behaviors among genotypes in favorable and unfavorable environmental conditions. The genotypes BRS Pérola Branca and LViPE‑06 are more stable and adapted to the semiarid environment, whereas LGoPE‑06 is a promising material for pod production, despite being highly dependent on favorable environments.
Regression model for Quality of Web Services dataset with WEKA
Shalini Gambhir; Puneet Arora; Jatin Gambhir
2013-01-01
The Waikato Environment for Knowledge Analysis (WEKA) came about through the perceived need for a uniﬁed workbench that would allow researchers easy access to state-of the-art techniques in machine learning algorithms for data mining tasks. It provides a general-purpose environment for automatic classiﬁcation, regression, clustering, and feature selection etc. in various research areas. This paper provides an introduction to the WEKA workbench and briefly discusses regression model for some o...
Model selection criteria for factor-augmented regressions
Jan J. J. Groen; Kapetanios, George
2009-01-01
In a factor-augmented regression, the forecast of a variable depends on a few factors estimated from a large number of predictors. But how does one determine the appropriate number of factors relevant for such a regression? Existing work has focused on criteria that can consistently estimate the appropriate number of factors in a large-dimensional panel of explanatory variables. However, not all of these factors are necessarily relevant for modeling a specific dependent variable within a fact...
Adaptive Estimation of Heteroscedastic Money Demand Model of Pakistan
Muhammad Aslam
2007-07-01
Full Text Available For the problem of estimation of Money demand model of Pakistan, money supply (M1 shows heteroscedasticity of the unknown form. For estimation of such model we compare two adaptive estimators with ordinary least squares estimator and show the attractive performance of the adaptive estimators, namely, nonparametric kernel estimator and nearest neighbour regression estimator. These comparisons are made on the basis standard errors of the estimated coefficients, standard error of regression, Akaike Information Criteria (AIC value, and the Durban-Watson statistic for autocorrelation. We further show that nearest neighbour regression estimator performs better when comparing with the other nonparametric kernel estimator.
Gologit2: Generalized Logistic Regression Models for Ordinal Dependent Variables
Richard Williams
2005-01-01
-gologit2- is a user-written program that estimates generalized logistic regression models for ordinal dependent variables. The actual values taken on by the dependent variable are irrelevant except that larger values are assumed to correspond to "higher" outcomes. A major strength of -gologit2- is that it can also estimate two special cases of the generalized model: the proportional odds model and the partial proportional odds model. Hence, -gologit2- can estimate models that are less restri...
Buffalos milk yield analysis using random regression models
A.S. Schierholt
2010-02-01
Full Text Available Data comprising 1,719 milk yield records from 357 females (predominantly Murrah breed, daughters of 110 sires, with births from 1974 to 2004, obtained from the Programa de Melhoramento Genético de Bubalinos (PROMEBUL and from records of EMBRAPA Amazônia Oriental - EAO herd, located in Belém, Pará, Brazil, were used to compare random regression models for estimating variance components and predicting breeding values of the sires. The data were analyzed by different models using the Legendre’s polynomial functions from second to fourth orders. The random regression models included the effects of herd-year, month of parity date of the control; regression coefficients for age of females (in order to describe the fixed part of the lactation curve and random regression coefficients related to the direct genetic and permanent environment effects. The comparisons among the models were based on the Akaike Infromation Criterion. The random effects regression model using third order Legendre’s polynomials with four classes of the environmental effect were the one that best described the additive genetic variation in milk yield. The heritability estimates varied from 0.08 to 0.40. The genetic correlation between milk yields in younger ages was close to the unit, but in older ages it was low.
Xiaoguang Cui
2014-01-01
Full Text Available This paper proposes a novel top-down visual saliency detection method for optical satellite images using local adaptive regression kernels. This method provides a saliency map by measuring the likeness of image patches to a given single template image. The local adaptive regression kernel (LARK is used as a descriptor to extract feature and compare against analogous feature from the target image. A multi-scale pyramid of the target image is constructed to cope with large-scale variations. In addition, accounting for rotation variations, the histogram of kernel orientation is employed to estimate the rotation angle of image patch, and then comparison is performed after rotating the patch by the estimated angle. Moreover, we use the bounded partial correlation (BPC to compare features between image patches and the template so as to rapidly generate the saliency map. Experiments were performed in optical satellite images to find airplanes, and experimental results demonstrate that the proposed method is effective and robust in complex scenes.
Hongjian Wang
2014-01-01
Full Text Available We present a support vector regression-based adaptive divided difference filter (SVRADDF algorithm for improving the low state estimation accuracy of nonlinear systems, which are typically affected by large initial estimation errors and imprecise prior knowledge of process and measurement noises. The derivative-free SVRADDF algorithm is significantly simpler to compute than other methods and is implemented using only functional evaluations. The SVRADDF algorithm involves the use of the theoretical and actual covariance of the innovation sequence. Support vector regression (SVR is employed to generate the adaptive factor to tune the noise covariance at each sampling instant when the measurement update step executes, which improves the algorithm’s robustness. The performance of the proposed algorithm is evaluated by estimating states for (i an underwater nonmaneuvering target bearing-only tracking system and (ii maneuvering target bearing-only tracking in an air-traffic control system. The simulation results show that the proposed SVRADDF algorithm exhibits better performance when compared with a traditional DDF algorithm.
An Implementation of Bayesian Adaptive Regression Splines (BARS in C with S and R Wrappers
Garrick Wallstrom
2007-02-01
Full Text Available BARS (DiMatteo, Genovese, and Kass 2001 uses the powerful reversible-jump MCMC engine to perform spline-based generalized nonparametric regression. It has been shown to work well in terms of having small mean-squared error in many examples (smaller than known competitors, as well as producing visually-appealing fits that are smooth (filtering out high-frequency noise while adapting to sudden changes (retaining high-frequency signal. However, BARS is computationally intensive. The original implementation in S was too slow to be practical in certain situations, and was found to handle some data sets incorrectly. We have implemented BARS in C for the normal and Poisson cases, the latter being important in neurophysiological and other point-process applications. The C implementation includes all needed subroutines for fitting Poisson regression, manipulating B-splines (using code created by Bates and Venables, and finding starting values for Poisson regression (using code for density estimation created by Kooperberg. The code utilizes only freely-available external libraries (LAPACK and BLAS and is otherwise self-contained. We have also provided wrappers so that BARS can be used easily within S or R.
Simulation study for model performance of multiresponse semiparametric regression
Wibowo, Wahyu; Haryatmi, Sri; Budiantara, I. Nyoman
2015-12-01
The objective of this paper is to evaluate the performance of multiresponse semiparametric regression model based on both of the function types and sample sizes. In general, multiresponse semiparametric regression model consists of parametric and nonparametric functions. This paper focuses on both linear and quadratic functions for parametric components and spline function for nonparametric component. Moreover, this model could also be seen as a spline semiparametric seemingly unrelated regression model. Simulation study is conducted by evaluating three combinations of parametric and nonparametric components, i.e. linear-trigonometric, quadratic-exponential, and multiple linear-polynomial functions respectively. Two criterias are used for assessing the model performance, i.e. R-square and Mean Square Error (MSE). The results show that both of the function types and sample sizes have significantly influenced to the model performance. In addition, this multiresponse semiparametric regression model yields the best performance at the small sample size and combination between multiple linear and polynomial functions as parametric and nonparametric components respectively. Moreover, the model performances at the big sample size tend to be similar for any combination of parametric and nonparametric components.
Testing heteroscedasticity by wavelets in a nonparametric regression model
LI; Yuan; WONG; Heung; IP; Waicheung
2006-01-01
In the nonparametric regression models, a homoscedastic structure is usually assumed. However, the homoscedasticity cannot be guaranteed a priori. Hence, testing the heteroscedasticity is needed. In this paper we propose a consistent nonparametric test for heteroscedasticity, based on wavelets. The empirical wavelet coefficients of the conditional variance in a regression model are defined first. Then they are shown to be asymptotically normal, based on which a test statistic for the heteroscedasticity is constructed by using Fan's wavelet thresholding idea. Simulations show that our test is superior to the traditional nonparametric test.
Direction of Effects in Multiple Linear Regression Models.
Wiedermann, Wolfgang; von Eye, Alexander
2015-01-01
Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed. PMID:26609741
Flexible competing risks regression modeling and goodness-of-fit
Scheike, Thomas; Zhang, Mei-Jie
2008-01-01
In this paper we consider different approaches for estimation and assessment of covariate effects for the cumulative incidence curve in the competing risks model. The classic approach is to model all cause-specific hazards and then estimate the cumulative incidence curve based on these cause......-specific hazards. Another recent approach is to directly model the cumulative incidence by a proportional model (Fine and Gray, J Am Stat Assoc 94:496-509, 1999), and then obtain direct estimates of how covariates influences the cumulative incidence curve. We consider a simple and flexible class of regression...... models that is easy to fit and contains the Fine-Gray model as a special case. One advantage of this approach is that our regression modeling allows for non-proportional hazards. This leads to a new simple goodness-of-fit procedure for the proportional subdistribution hazards assumption that is very easy...
Modelling multimodal photometric redshift regression with noisy observations
Kügler, S D
2016-01-01
In this work, we are trying to extent the existing photometric redshift regression models from modeling pure photometric data back to the spectra themselves. To that end, we developed a PCA that is capable of describing the input uncertainty (including missing values) in a dimensionality reduction framework. With this "spectrum generator" at hand, we are capable of treating the redshift regression problem in a fully Bayesian framework, returning a posterior distribution over the redshift. This approach allows therefore to approach the multimodal regression problem in an adequate fashion. In addition, input uncertainty on the magnitudes can be included quite naturally and lastly, the proposed algorithm allows in principle to make predictions outside the training values which makes it a fascinating opportunity for the detection of high-redshifted quasars.
The art of regression modeling in road safety
Hauer, Ezra
2015-01-01
This unique book explains how to fashion useful regression models from commonly available data to erect models essential for evidence-based road safety management and research. Composed from techniques and best practices presented over many years of lectures and workshops, The Art of Regression Modeling in Road Safety illustrates that fruitful modeling cannot be done without substantive knowledge about the modeled phenomenon. Class-tested in courses and workshops across North America, the book is ideal for professionals, researchers, university professors, and graduate students with an interest in, or responsibilities related to, road safety. This book also: · Presents for the first time a powerful analytical tool for road safety researchers and practitioners · Includes problems and solutions in each chapter as well as data and spreadsheets for running models and PowerPoint presentation slides · Features pedagogy well-suited for graduate courses and workshops including problems, solutions, and PowerPoint p...
Spatial stochastic regression modelling of urban land use
Urbanization is very closely linked to industrialization, commercialization or overall economic growth and development. This results in innumerable benefits of the quantity and quality of the urban environment and lifestyle but on the other hand contributes to unbounded development, urban sprawl, overcrowding and decreasing standard of living. Regulation and observation of urban development activities is crucial. The understanding of urban systems that promotes urban growth are also essential for the purpose of policy making, formulating development strategies as well as development plan preparation. This study aims to compare two different stochastic regression modeling techniques for spatial structure models of urban growth in the same specific study area. Both techniques will utilize the same datasets and their results will be analyzed. The work starts by producing an urban growth model by using stochastic regression modeling techniques namely the Ordinary Least Square (OLS) and Geographically Weighted Regression (GWR). The two techniques are compared to and it is found that, GWR seems to be a more significant stochastic regression model compared to OLS, it gives a smaller AICc (Akaike's Information Corrected Criterion) value and its output is more spatially explainable
Ge-mai Chen; Jin-hong You
2005-01-01
Consider a repeated measurement partially linear regression model with an unknown vector pasemiparametric generalized least squares estimator (SGLSE) ofβ, we propose an iterative weighted semiparametric least squares estimator (IWSLSE) and show that it improves upon the SGLSE in terms of asymptotic covariance matrix. An adaptive procedure is given to determine the number of iterations. We also show that when the number of replicates is less than or equal to two, the IWSLSE can not improve upon the SGLSE.These results are generalizations of those in [2] to the case of semiparametric regressions.
A nonparametric dynamic additive regression model for longitudinal data
Martinussen, Torben; Thomas H. Scheike
2000-01-01
In this work we study additive dynamic regression models for longitudinal data. These models provide a flexible and nonparametric method for investigating the time-dynamics of longitudinal data. The methodology is aimed at data where measurements are recorded at random time points. We model the conditional mean of responses given the full internal history and possibly time-varying covariates. We derive the asymptotic distribution for a new nonparametric least squares estimat...
Bayesian and maximin optimal designs for heteroscedastic regression models
Dette, Holger; Haines, Linda M.; Imhof, Lorens A.
2003-01-01
The problem of constructing standardized maximin D-optimal designs for weighted polynomial regression models is addressed. In particular it is shown that, by following the broad approach to the construction of maximin designs introduced recently by Dette, Haines and Imhof (2003), such designs can be obtained as weak limits of the corresponding Bayesian Φq-optimal designs. The approach is illustrated for two specific weighted polynomial models and also for a particular growth model.
Steganalysis of LSB Image Steganography using Multiple Regression and Auto Regressive (AR Model
Souvik Bhattacharyya
2011-07-01
Full Text Available The staggering growth in communication technologyand usage of public domain channels (i.e. Internet has greatly facilitated transfer of data. However, such open communication channelshave greater vulnerability to security threats causing unauthorizedin- formation access. Traditionally, encryption is used to realizethen communication security. However, important information is notprotected once decoded. Steganography is the art and science of communicating in a way which hides the existence of the communication.Important information is ﬁrstly hidden in a host data, such as digitalimage, text, video or audio, etc, and then transmitted secretly tothe receiver. Steganalysis is another important topic in informationhiding which is the art of detecting the presence of steganography. Inthis paper a novel technique for the steganalysis of Image has beenpresented. The proposed technique uses an auto-regressive model todetect the presence of the hidden messages, as well as to estimatethe relative length of the embedded messages.Various auto regressiveparameters are used to classify cover image as well as stego imagewith the help of a SVM classiﬁer. Multiple Regression analysis ofthe cover carrier along with the stego carrier has been carried outin order to ﬁnd out the existence of the negligible amount of thesecret message. Experimental results demonstrate the effectivenessand accuracy of the proposed technique.
Modeling urban growth with geographically weighted multinomial logistic regression
Luo, Jun; Kanala, Nagaraj Kapi
2008-10-01
Spatial heterogeneity is usually ignored in previous land use change studies. This paper presents a geographically weighted multinomial logistic regression model for investigating multiple land use conversion in the urban growth process. The proposed model makes estimation at each sample location and generates local coefficients of driving factors for land use conversion. A Gaussian function is used for determine the geographic weights guarantying that all other samples are involved in the calibration of the model for one location. A case study on Springfield metropolitan area is conducted. A set of independent variables are selected as driving factors. A traditional multinomial logistic regression model is set up and compared with the proposed model. Spatial variations of coefficients of independent variables are revealed by investigating the estimations at sample locations.
Applications of some discrete regression models for count data
B. M. Golam Kibria
2006-01-01
Full Text Available In this paper we have considered several regression models to fit the count data that encounter in the field of Biometrical, Environmental, Social Sciences and Transportation Engineering. We have fitted Poisson (PO, Negative Binomial (NB, Zero-Inflated Poisson (ZIP and Zero-Inflated Negative Binomial (ZINB regression models to run-off-road (ROR crash data which collected on arterial roads in south region (rural of Florida State. To compare the performance of these models, we analyzed data with moderate to high percentage of zero counts. Because the variances were almost three times greater than the means, it appeared that both NB and ZINB models performed better than PO and ZIP models for the zero inflated and over dispersed count data.
Semiparametric Robust Estimation of Truncated and Censored Regression Models
Cizek, P.
2008-01-01
Many estimation methods of truncated and censored regression models such as the maximum likelihood and symmetrically censored least squares (SCLS) are sensitive to outliers and data contamination as we document. Therefore, we propose a semipara- metric general trimmed estimator (GTE) of truncated an
PARAMETER ESTIMATION IN LINEAR REGRESSION MODELS FOR LONGITUDINAL CONTAMINATED DATA
QianWeimin; LiYumei
2005-01-01
The parameter estimation and the coefficient of contamination for the regression models with repeated measures are studied when its response variables are contaminated by another random variable sequence. Under the suitable conditions it is proved that the estimators which are established in the paper are strongly consistent estimators.
WAVELET ESTIMATION FOR JUMPS IN A HETEROSCEDASTIC REGRESSION MODEL
任浩波; 赵延孟; 李元; 谢衷洁
2002-01-01
Wavelets are applied to detect the jumps in a heteroscedastic regression model.It is shown that the wavelet coefficients of the data have significantly large absolute values across fine scale levels near the jump points. Then a procedure is developed to estimate the jumps and jump heights. All estimators are proved to be consistent.
Linearity and Misspecification Tests for Vector Smooth Transition Regression Models
Teräsvirta, Timo; Yang, Yukai
The purpose of the paper is to derive Lagrange multiplier and Lagrange multiplier type specification and misspecification tests for vector smooth transition regression models. We report results from simulation studies in which the size and power properties of the proposed asymptotic tests in small...
Time series regression model for infectious disease and weather.
Imai, Chisato; Armstrong, Ben; Chalabi, Zaid; Mangtani, Punam; Hashizume, Masahiro
2015-10-01
Time series regression has been developed and long used to evaluate the short-term associations of air pollution and weather with mortality or morbidity of non-infectious diseases. The application of the regression approaches from this tradition to infectious diseases, however, is less well explored and raises some new issues. We discuss and present potential solutions for five issues often arising in such analyses: changes in immune population, strong autocorrelations, a wide range of plausible lag structures and association patterns, seasonality adjustments, and large overdispersion. The potential approaches are illustrated with datasets of cholera cases and rainfall from Bangladesh and influenza and temperature in Tokyo. Though this article focuses on the application of the traditional time series regression to infectious diseases and weather factors, we also briefly introduce alternative approaches, including mathematical modeling, wavelet analysis, and autoregressive integrated moving average (ARIMA) models. Modifications proposed to standard time series regression practice include using sums of past cases as proxies for the immune population, and using the logarithm of lagged disease counts to control autocorrelation due to true contagion, both of which are motivated from "susceptible-infectious-recovered" (SIR) models. The complexity of lag structures and association patterns can often be informed by biological mechanisms and explored by using distributed lag non-linear models. For overdispersed models, alternative distribution models such as quasi-Poisson and negative binomial should be considered. Time series regression can be used to investigate dependence of infectious diseases on weather, but may need modifying to allow for features specific to this context. PMID:26188633
Emamgolizadeh, S.; Bateni, S. M.; Shahsavani, D.; Ashrafi, T.; Ghorbani, H.
2015-10-01
The soil cation exchange capacity (CEC) is one of the main soil chemical properties, which is required in various fields such as environmental and agricultural engineering as well as soil science. In situ measurement of CEC is time consuming and costly. Hence, numerous studies have used traditional regression-based techniques to estimate CEC from more easily measurable soil parameters (e.g., soil texture, organic matter (OM), and pH). However, these models may not be able to adequately capture the complex and highly nonlinear relationship between CEC and its influential soil variables. In this study, Genetic Expression Programming (GEP) and Multivariate Adaptive Regression Splines (MARS) were employed to estimate CEC from more readily measurable soil physical and chemical variables (e.g., OM, clay, and pH) by developing functional relations. The GEP- and MARS-based functional relations were tested at two field sites in Iran. Results showed that GEP and MARS can provide reliable estimates of CEC. Also, it was found that the MARS model (with root-mean-square-error (RMSE) of 0.318 Cmol+ kg-1 and correlation coefficient (R2) of 0.864) generated slightly better results than the GEP model (with RMSE of 0.270 Cmol+ kg-1 and R2 of 0.807). The performance of GEP and MARS models was compared with two existing approaches, namely artificial neural network (ANN) and multiple linear regression (MLR). The comparison indicated that MARS and GEP outperformed the MLP model, but they did not perform as good as ANN. Finally, a sensitivity analysis was conducted to determine the most and the least influential variables affecting CEC. It was found that OM and pH have the most and least significant effect on CEC, respectively.
A regression model to estimate regional ground water recharge.
Lorenz, David L; Delin, Geoffrey N
2007-01-01
A regional regression model was developed to estimate the spatial distribution of ground water recharge in subhumid regions. The regional regression recharge (RRR) model was based on a regression of basin-wide estimates of recharge from surface water drainage basins, precipitation, growing degree days (GDD), and average basin specific yield (SY). Decadal average recharge, precipitation, and GDD were used in the RRR model. The RRR estimates were derived from analysis of stream base flow using a computer program that was based on the Rorabaugh method. As expected, there was a strong correlation between recharge and precipitation. The model was applied to statewide data in Minnesota. Where precipitation was least in the western and northwestern parts of the state (50 to 65 cm/year), recharge computed by the RRR model also was lowest (0 to 5 cm/year). A strong correlation also exists between recharge and SY. SY was least in areas where glacial lake clay occurs, primarily in the northwest part of the state; recharge estimates in these areas were in the 0- to 5-cm/year range. In sand-plain areas where SY is greatest, recharge estimates were in the 15- to 29-cm/year range on the basis of the RRR model. Recharge estimates that were based on the RRR model compared favorably with estimates made on the basis of other methods. The RRR model can be applied in other subhumid regions where region wide data sets of precipitation, streamflow, GDD, and soils data are available. PMID:17335484
Regression model for Quality of Web Services dataset with WEKA
Shalini Gambhir
2013-06-01
Full Text Available The Waikato Environment for Knowledge Analysis (WEKA came about through the perceived need for a uniﬁed workbench that would allow researchers easy access to state-of the-art techniques in machine learning algorithms for data mining tasks. It provides a general-purpose environment for automatic classiﬁcation, regression, clustering, and feature selection etc. in various research areas. This paper provides an introduction to the WEKA workbench and briefly discusses regression model for some of the quality of web service parameters.
A regressive model of isochronism in speech units
Jassem, W.; Krzysko, M.; Stolarski, P.
1981-09-01
To define linguistic isochronism in quantitative terms, a statistical regressive method of analyzing the number of rhythmic units in human speech was employed. The material used was two taped texts spoken in standard British English totaling approximately 2,500 sounds. The sounds were divided into statistically homogeneous classes, and the mean values in each class were utilized in regressive models. Abercrombie's theory of speech rhythm postulating anacrusis and Jassem's theory postulating two types of speech units, anacrusis and a rhythmic unit in the strict sense, were tested using this material.
Using regression models to determine the poroelastic properties of cartilage.
Chung, Chen-Yuan; Mansour, Joseph M
2013-07-26
The feasibility of determining biphasic material properties using regression models was investigated. A transversely isotropic poroelastic finite element model of stress relaxation was developed and validated against known results. This model was then used to simulate load intensity for a wide range of material properties. Linear regression equations for load intensity as a function of the five independent material properties were then developed for nine time points (131, 205, 304, 390, 500, 619, 700, 800, and 1000s) during relaxation. These equations illustrate the effect of individual material property on the stress in the time history. The equations at the first four time points, as well as one at a later time (five equations) could be solved for the five unknown material properties given computed values of the load intensity. Results showed that four of the five material properties could be estimated from the regression equations to within 9% of the values used in simulation if time points up to 1000s are included in the set of equations. However, reasonable estimates of the out of plane Poisson's ratio could not be found. Although all regression equations depended on permeability, suggesting that true equilibrium was not realized at 1000s of simulation, it was possible to estimate material properties to within 10% of the expected values using equations that included data up to 800s. This suggests that credible estimates of most material properties can be obtained from tests that are not run to equilibrium, which is typically several thousand seconds. PMID:23796400
A Regression Analysis Model Based on Wavelet Networks
XIONG Zheng-feng
2002-01-01
In this paper, an approach is proposed to combine wavelet networks and techniques of regression analysis. The resulting wavelet regression estimator is well suited for regression estimation of moderately large dimension, in particular for regressions with localized irregularities.
Gaussian Process Models for Nonparametric Functional Regression with Functional Responses
LIAN, HENG
2010-01-01
Recently nonparametric functional model with functional responses has been proposed within the functional reproducing kernel Hilbert spaces (fRKHS) framework. Motivated by its superior performance and also its limitations, we propose a Gaussian process model whose posterior mode coincide with the fRKHS estimator. The Bayesian approach has several advantages compared to its predecessor. Firstly, the multiple unknown parameters can be inferred together with the regression function in a unified ...
CICAAR - Convolutive ICA with an Auto-Regressive Inverse Model
Dyrholm, Mads; Hansen, Lars Kai
2004-01-01
We invoke an auto-regressive IIR inverse model for convolutive ICA and derive expressions for the likelihood and its gradient. We argue that optimization will give a stable inverse. When there are more sensors than sources the mixing model parameters are estimated in a second step by least square...... estimation. We demonstrate the method on synthetic data and finally separate speech and music in a real room recording....
Single and multiple index functional regression models with nonparametric link
Chen, Dong; Hall, Peter; Müller, Hans-Georg
2011-01-01
Fully nonparametric methods for regression from functional data have poor accuracy from a statistical viewpoint, reflecting the fact that their convergence rates are slower than nonparametric rates for the estimation of high-dimensional functions. This difficulty has led to an emphasis on the so-called functional linear model, which is much more flexible than common linear models in finite dimension, but nevertheless imposes structural constraints on the relationship between predictors and re...
Regression and ARIMA hybrid model for new bug prediction
Madhur Srivastava; Dr.Dharmendra Badal; Ratnesh Kumar Jain
2010-01-01
A multiple linear regression and ARIMA hybrid model is proposed for new bug prediction depending upon resolved bugs and other available parameters of the open source software bug report. Analysis of last five year bug report data of a open source software “worldcontrol” is done to identify the trends followed by various parameters. Bug report data has been categorized on monthly basis and forecast is also on monthly basis. Model accounts for the parameters such as resolved, assigned, reopened...
Maximin and Bayesian Optimal Designs for Regression Models
Dette, Holger; Haines, Linda M.; Imhof, Lorens A.
2003-01-01
For many problems of statistical inference in regression modelling, the Fisher information matrix depends on certain nuisance parameters which are unknown and which enter the model nonlinearly. A common strategy to deal with this problem within the context of design is to construct maximin optimal designs as those designs which maximize the minimum value of a real valued (standardized) function of the Fisher information matrix, where the minimum is taken over a specified range of the unknown ...
Transpiration of glasshouse rose crops: evaluation of regression models
Baas, R.; Rijssel, van, E.
2006-01-01
Regression models of transpiration (T) based on global radiation inside the greenhouse (G), with or without energy input from heating pipes (Eh) and/or vapor pressure deficit (VPD) were parameterized. Therefore, data on T, G, temperatures from air, canopy and heating pipes, and VPD from both a lysimeter experiment and from a cut rose grower were analyzed. Based on daily integrals, all T models showed good fits due to the dominant effect of global radiation G (solar + supplementary radiation) ...
CICAAR - Convolutive ICA with an Auto-Regressive Inverse Model
Dyrholm, Mads; Hansen, Lars Kai
2004-01-01
We invoke an auto-regressive IIR inverse model for convolutive ICA and derive expressions for the likelihood and its gradient. We argue that optimization will give a stable inverse. When there are more sensors than sources the mixing model parameters are estimated in a second step by least squares estimation. We demonstrate the method on synthetic data and finally separate speech and music in a real room recording.
REGRESSION ANALYSIS OF PRODUCTIVITY USING MIXED EFFECT MODEL
Siana Halim
2007-01-01
Full Text Available Production plants of a company are located in several areas that spread across Middle and East Java. As the production process employs mostly manpower, we suspected that each location has different characteristics affecting the productivity. Thus, the production data may have a spatial and hierarchical structure. For fitting a linear regression using the ordinary techniques, we are required to make some assumptions about the nature of the residuals i.e. independent, identically and normally distributed. However, these assumptions were rarely fulfilled especially for data that have a spatial and hierarchical structure. We worked out the problem using mixed effect model. This paper discusses the model construction of productivity and several characteristics in the production line by taking location as a random effect. The simple model with high utility that satisfies the necessary regression assumptions was built using a free statistic software R version 2.6.1.
Zhao Haijun; Ma Yan; Huang Xiaohong; Su Yujie
2008-01-01
Predicting heartbeat message arrival time is crucial for the quality of failure detection service over internet. However, internet dynamic characteristics make it very difficult to understand message behavior and accurately predict heartbeat arrival time. To solve this problem, a novel black-box model is proposed to predict the next heartbeat arrival time. Heartbeat arrival time is modeled as auto-regressive process, heartbeat sending time is modeled as exogenous variable, the model's coefficients are estimated based on the sliding window of observations and this result is used to predict the next heartbeat arrival time. Simulation shows that this adaptive auto-regressive exogenous (ARX) model can accurately capture heartbeat arrival dynamics and minimize prediction error in different network environments.
Electricity consumption forecasting in Italy using linear regression models
The influence of economic and demographic variables on the annual electricity consumption in Italy has been investigated with the intention to develop a long-term consumption forecasting model. The time period considered for the historical data is from 1970 to 2007. Different regression models were developed, using historical electricity consumption, gross domestic product (GDP), gross domestic product per capita (GDP per capita) and population. A first part of the paper considers the estimation of GDP, price and GDP per capita elasticities of domestic and non-domestic electricity consumption. The domestic and non-domestic short run price elasticities are found to be both approximately equal to -0.06, while long run elasticities are equal to -0.24 and -0.09, respectively. On the contrary, the elasticities of GDP and GDP per capita present higher values. In the second part of the paper, different regression models, based on co-integrated or stationary data, are presented. Different statistical tests are employed to check the validity of the proposed models. A comparison with national forecasts, based on complex econometric models, such as Markal-Time, was performed, showing that the developed regressions are congruent with the official projections, with deviations of ±1% for the best case and ±11% for the worst. These deviations are to be considered acceptable in relation to the time span taken into account. (author)
Harrell , Jr , Frank E
2015-01-01
This highly anticipated second edition features new chapters and sections, 225 new references, and comprehensive R software. In keeping with the previous edition, this book is about the art and science of data analysis and predictive modeling, which entails choosing and using multiple tools. Instead of presenting isolated techniques, this text emphasizes problem solving strategies that address the many issues arising when developing multivariable models using real data and not standard textbook examples. It includes imputation methods for dealing with missing data effectively, methods for fitting nonlinear relationships and for making the estimation of transformations a formal part of the modeling process, methods for dealing with "too many variables to analyze and not enough observations," and powerful model validation techniques based on the bootstrap. The reader will gain a keen understanding of predictive accuracy, and the harm of categorizing continuous predictors or outcomes. This text realistically...
Regression Model to Predict Global Solar Irradiance in Malaysia
Hairuniza Ahmed Kutty
2015-01-01
Full Text Available A novel regression model is developed to estimate the monthly global solar irradiance in Malaysia. The model is developed based on different available meteorological parameters, including temperature, cloud cover, rain precipitate, relative humidity, wind speed, pressure, and gust speed, by implementing regression analysis. This paper reports on the details of the analysis of the effect of each prediction parameter to identify the parameters that are relevant to estimating global solar irradiance. In addition, the proposed model is compared in terms of the root mean square error (RMSE, mean bias error (MBE, and the coefficient of determination (R2 with other models available from literature studies. Seven models based on single parameters (PM1 to PM7 and five multiple-parameter models (PM7 to PM12 are proposed. The new models perform well, with RMSE ranging from 0.429% to 1.774%, R2 ranging from 0.942 to 0.992, and MBE ranging from −0.1571% to 0.6025%. In general, cloud cover significantly affects the estimation of global solar irradiance. However, cloud cover in Malaysia lacks sufficient influence when included into multiple-parameter models although it performs fairly well in single-parameter prediction models.
Zhang, Yan-jun; Liu, Wen-zhe; Fu, Xing-hu; Bi, Wei-hong
2015-10-01
According to the high precision extracting characteristics of scattering spectrum in Brillouin optical time domain reflection optical fiber sensing system, this paper proposes a new algorithm based on flies optimization algorithm with adaptive mutation and generalized regression neural network. The method takes advantages of the generalized regression neural network which has the ability of the approximation ability, learning speed and generalization of the model. Moreover, by using the strong search ability of flies optimization algorithm with adaptive mutation, it can enhance the learning ability of the neural network. Thus the fitting degree of Brillouin scattering spectrum and the extraction accuracy of frequency shift is improved. Model of actual Brillouin spectrum are constructed by Gaussian white noise on theoretical spectrum, whose center frequency is 11.213 GHz and the linewidths are 40-50, 30-60 and 20-70 MHz, respectively. Comparing the algorithm with the Levenberg-Marquardt fitting method based on finite element analysis, hybrid algorithm particle swarm optimization, Levenberg-Marquardt and the least square method, the maximum frequency shift error of the new algorithm is 0.4 MHz, the fitting degree is 0.991 2 and the root mean square error is 0.024 1. The simulation results show that the proposed algorithm has good fitting degree and minimum absolute error. Therefore, the algorithm can be used on distributed optical fiber sensing system based on Brillouin optical time domain reflection, which can improve the fitting of Brillouin scattering spectrum and the precision of frequency shift extraction effectively. PMID:26904844
Fuzzy and Regression Modelling of Hard Milling Process
A. Tamilarasan
2014-04-01
Full Text Available The present study highlights the application of box-behnken design coupled with fuzzy and regression modeling approach for making expert system in hard milling process to improve the process performance with systematic reduction of production cost. The important input fields of work piece hardness, nose radius, feed per tooth, radial depth of cut and axial depth cut were considered. The cutting forces, work surface temperature and sound pressure level were identified as key index of machining outputs. The results indicate that the fuzzy logic and regression modeling technique can be effectively used for the prediction of desired responses with less average error variation. Predicted results were verified by experiments and shown the good potential characteristics of the developed system for automated machining environment.
Procedure for Detecting Outliers in a Circular Regression Model
Rambli, Adzhar; Abuzaid, Ali H. M.; Mohamed, Ibrahim Bin; Hussin, Abdul Ghapor
2016-01-01
A number of circular regression models have been proposed in the literature. In recent years, there is a strong interest shown on the subject of outlier detection in circular regression. An outlier detection procedure can be developed by defining a new statistic in terms of the circular residuals. In this paper, we propose a new measure which transforms the circular residuals into linear measures using a trigonometric function. We then employ the row deletion approach to identify observations that affect the measure the most, a candidate of outlier. The corresponding cut-off points and the performance of the detection procedure when applied on Down and Mardia’s model are studied via simulations. For illustration, we apply the procedure on circadian data. PMID:27064566
Assessing the Usability of Predictions of Different Regression Models
Šťastný, J.; Holeňa, Martin
Seňa: Pont, 2010 - (Pardubská, D.), s. 93-98 ISBN 978-80-970179-3-4. [ITAT 2010. Conference on Theory and Practice of Information Technologies. Smrekovica (SK), 21.09.2010-25.09.2010] R&D Projects: GA ČR GA201/08/0802 Institutional research plan: CEZ:AV0Z10300504 Keywords : regression models * confidence of predictions * confidence intervals * transductive inference * sensitivity analysis Subject RIV: IN - Informatics, Computer Science
Bootstrapping heteroskedastic regression models: wild bootstrap vs. pairs bootstrap
Flachaire, Emmanuel
2005-01-01
International audience In regression models, appropriate bootstrap methods for inference robust to heteroskedasticity of unknown form are the wild bootstrap and the pairs bootstrap. The finite sample performance of a heteroskedastic-robust test is investigated with Monte Carlo experiments. The simulation results suggest that one specific version of the wild bootstrap outperforms the other versions of the wild bootstrap and of the pairs bootstrap. It is the only one for which the bootstrap ...
Central limit theorem of linear regression model under right censorship
HE; Shuyuan(何书元); HUANG; Xiang(Heung; Wong)(黄香)
2003-01-01
In this paper, the estimation of joint distribution F(y,z) of (Y, Z) and the estimation in thelinear regression model Y = b′Z + ε for complete data are extended to that of the right censored data. Theregression parameter estimates of b and the variance of ε are weighted least square estimates with randomweights. The central limit theorems of the estimators are obtained under very weak conditions and the derivedasymptotic variance has a very simple form.
THE REGRESSION MODEL OF IRAN LIBRARIES ORGANIZATIONAL CLIMATE
Jahani, Mohammad Ali; Yaminfirooz, Mousa; Siamian, Hasan
2015-01-01
Background: The purpose of this study was to drawing a regression model of organizational climate of central libraries of Iran’s universities. Methods: This study is an applied research. The statistical population of this study consisted of 96 employees of the central libraries of Iran’s public universities selected among the 117 universities affiliated to the Ministry of Health by Stratified Sampling method (510 people). Climate Qual localized questionnaire was used as research tools. For pr...
On relationship between regression models and interpretation of multiple regression coefficients
A N Varaksin; Panov, V. G.
2012-01-01
In this paper, we consider the problem of treating linear regression equation coefficients in the case of correlated predictors. It is shown that in general there are no natural ways of interpreting these coefficients similar to the case of single predictor. Nevertheless we suggest linear transformations of predictors, reducing multiple regression to a simple one and retaining the coefficient at variable of interest. The new variable can be treated as the part of the old variable that has no ...
K factor estimation in distribution transformers using linear regression models
Juan Miguel Astorga Gómez
2016-06-01
Full Text Available Background: Due to massive incorporation of electronic equipment to distribution systems, distribution transformers are subject to operation conditions other than the design ones, because of the circulation of harmonic currents. It is necessary to quantify the effect produced by these harmonic currents to determine the capacity of the transformer to withstand these new operating conditions. The K-factor is an indicator that estimates the ability of a transformer to withstand the thermal effects caused by harmonic currents. This article presents a linear regression model to estimate the value of the K-factor, from total current harmonic content obtained with low-cost equipment.Method: Two distribution transformers that feed different loads are studied variables, current total harmonic distortion factor K are recorded, and the regression model that best fits the data field is determined. To select the regression model the coefficient of determination R2 and the Akaike Information Criterion (AIC are used. With the selected model, the K-factor is estimated to actual operating conditions.Results: Once determined the model it was found that for both agricultural cargo and industrial mining, present harmonic content (THDi exceeds the values that these transformers can drive (average of 12.54% and minimum 8,90% in the case of agriculture and average value of 18.53% and a minimum of 6.80%, for industrial mining case.Conclusions: When estimating the K factor using polynomial models it was determined that studied transformers can not withstand the current total harmonic distortion of their current loads. The appropriate K factor for studied transformer should be 4; this allows transformers support the current total harmonic distortion of their respective loads.
General bound of overfitting for MLP regression models
Rynkiewicz, Joseph
2012-01-01
Multilayer perceptrons (MLP) with one hidden layer have been used for a long time to deal with non-linear regression. However, in some task, MLP's are too powerful models and a small mean square error (MSE) may be more due to overfitting than to actual modelling. If the noise of the regression model is Gaussian, the overfitting of the model is totally determined by the behavior of the likelihood ratio test statistic (LRTS), however in numerous cases the assumption of normality of the noise is arbitrary if not false. In this paper, we present an universal bound for the overfitting of such model under weak assumptions, this bound is valid without Gaussian or identifiability assumptions. The main application of this bound is to give a hint about determining the true architecture of the MLP model when the number of data goes to infinite. As an illustration, we use this theoretical result to propose and compare effective criteria to find the true architecture of an MLP.
Reconstruction of missing daily streamflow data using dynamic regression models
Tencaliec, Patricia; Favre, Anne-Catherine; Prieur, Clémentine; Mathevet, Thibault
2015-12-01
River discharge is one of the most important quantities in hydrology. It provides fundamental records for water resources management and climate change monitoring. Even very short data-gaps in this information can cause extremely different analysis outputs. Therefore, reconstructing missing data of incomplete data sets is an important step regarding the performance of the environmental models, engineering, and research applications, thus it presents a great challenge. The objective of this paper is to introduce an effective technique for reconstructing missing daily discharge data when one has access to only daily streamflow data. The proposed procedure uses a combination of regression and autoregressive integrated moving average models (ARIMA) called dynamic regression model. This model uses the linear relationship between neighbor and correlated stations and then adjusts the residual term by fitting an ARIMA structure. Application of the model to eight daily streamflow data for the Durance river watershed showed that the model yields reliable estimates for the missing data in the time series. Simulation studies were also conducted to evaluate the performance of the procedure.
Regularized multivariate regression models with skew-t error distributions
Chen, Lianfu
2014-06-01
We consider regularization of the parameters in multivariate linear regression models with the errors having a multivariate skew-t distribution. An iterative penalized likelihood procedure is proposed for constructing sparse estimators of both the regression coefficient and inverse scale matrices simultaneously. The sparsity is introduced through penalizing the negative log-likelihood by adding L1-penalties on the entries of the two matrices. Taking advantage of the hierarchical representation of skew-t distributions, and using the expectation conditional maximization (ECM) algorithm, we reduce the problem to penalized normal likelihood and develop a procedure to minimize the ensuing objective function. Using a simulation study the performance of the method is assessed, and the methodology is illustrated using a real data set with a 24-dimensional response vector. © 2014 Elsevier B.V.
Utilization of geographically weighted regression (GWR) in forestry modeling
Quirós-Segovia, María
2015-01-01
The diploma thesis is focused on the application of the Geographically Weighted Regression (GWR) in forestry models. This is a prospective method for coping with spatially heterogeneous data. In forestry, this method has been used previously in small areas with good results, but in this diploma thesis it is applied to a bigger area in the Region of Murcia, Spain. Main goal of the thesis is to evaluate GWR for developing of large scale height-diameter model based on data of National Forest Inv...
Interpreting parameters in the logistic regression model with random effects
Larsen, Klaus; Petersen, Jørgen Holm; Budtz-Jørgensen, Esben;
2000-01-01
interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects......interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects...
Multivariate Frequency-Severity Regression Models in Insurance
Edward W. Frees
2016-02-01
Full Text Available In insurance and related industries including healthcare, it is common to have several outcome measures that the analyst wishes to understand using explanatory variables. For example, in automobile insurance, an accident may result in payments for damage to one’s own vehicle, damage to another party’s vehicle, or personal injury. It is also common to be interested in the frequency of accidents in addition to the severity of the claim amounts. This paper synthesizes and extends the literature on multivariate frequency-severity regression modeling with a focus on insurance industry applications. Regression models for understanding the distribution of each outcome continue to be developed yet there now exists a solid body of literature for the marginal outcomes. This paper contributes to this body of literature by focusing on the use of a copula for modeling the dependence among these outcomes; a major advantage of this tool is that it preserves the body of work established for marginal models. We illustrate this approach using data from the Wisconsin Local Government Property Insurance Fund. This fund offers insurance protection for (i property; (ii motor vehicle; and (iii contractors’ equipment claims. In addition to several claim types and frequency-severity components, outcomes can be further categorized by time and space, requiring complex dependency modeling. We find significant dependencies for these data; specifically, we find that dependencies among lines are stronger than the dependencies between the frequency and average severity within each line.
Rogers, David
1991-01-01
G/SPLINES are a hybrid of Friedman's Multivariable Adaptive Regression Splines (MARS) algorithm with Holland's Genetic Algorithm. In this hybrid, the incremental search is replaced by a genetic search. The G/SPLINE algorithm exhibits performance comparable to that of the MARS algorithm, requires fewer least squares computations, and allows significantly larger problems to be considered.
Modeling of the Monthly Rainfall-Runoff Process Through Regressions
Campos-Aranda Daniel Francisco
2014-10-01
Full Text Available To solve the problems associated with the assessment of water resources of a river, the modeling of the rainfall-runoff process (RRP allows the deduction of runoff missing data and to extend its record, since generally the information available on precipitation is larger. It also enables the estimation of inputs to reservoirs, when their building led to the suppression of the gauging station. The simplest mathematical model that can be set for the RRP is the linear regression or curve on a monthly basis. Such a model is described in detail and is calibrated with the simultaneous record of monthly rainfall and runoff in Ballesmi hydrometric station, which covers 35 years. Since the runoff of this station has an important contribution from the spring discharge, the record is corrected first by removing that contribution. In order to do this a procedure was developed based either on the monthly average regional runoff coefficients or on nearby and similar watershed; in this case the Tancuilín gauging station was used. Both stations belong to the Partial Hydrologic Region No. 26 (Lower Rio Panuco and are located within the state of San Luis Potosi, México. The study performed indicates that the monthly regression model, due to its conceptual approach, faithfully reproduces monthly average runoff volumes and achieves an excellent approximation in relation to the dispersion, proved by calculation of the means and standard deviations.
Genetic evaluation of European quails by random regression models
Flaviana Miranda Gonçalves
2012-09-01
Full Text Available The objective of this study was to compare different random regression models, defined from different classes of heterogeneity of variance combined with different Legendre polynomial orders for the estimate of (covariance of quails. The data came from 28,076 observations of 4,507 female meat quails of the LF1 lineage. Quail body weights were determined at birth and 1, 14, 21, 28, 35 and 42 days of age. Six different classes of residual variance were fitted to Legendre polynomial functions (orders ranging from 2 to 6 to determine which model had the best fit to describe the (covariance structures as a function of time. According to the evaluated criteria (AIC, BIC and LRT, the model with six classes of residual variances and of sixth-order Legendre polynomial was the best fit. The estimated additive genetic variance increased from birth to 28 days of age, and dropped slightly from 35 to 42 days. The heritability estimates decreased along the growth curve and changed from 0.51 (1 day to 0.16 (42 days. Animal genetic and permanent environmental correlation estimates between weights and age classes were always high and positive, except for birth weight. The sixth order Legendre polynomial, along with the residual variance divided into six classes was the best fit for the growth rate curve of meat quails; therefore, they should be considered for breeding evaluation processes by random regression models.
Dynamic Regression Intervention Modeling for the Malaysian Daily Load
Fadhilah Abdrazak
2014-05-01
Full Text Available Malaysia is a unique country due to having both fixed and moving holidays. These moving holidays may overlap with other fixed holidays and therefore, increase the complexity of the load forecasting activities. The errors due to holidays’ effects in the load forecasting are known to be higher than other factors. If these effects can be estimated and removed, the behavior of the series could be better viewed. Thus, the aim of this paper is to improve the forecasting errors by using a dynamic regression model with intervention analysis. Based on the linear transfer function method, a daily load model consists of either peak or average is developed. The developed model outperformed the seasonal ARIMA model in estimating the fixed and moving holidays’ effects and achieved a smaller Mean Absolute Percentage Error (MAPE in load forecast.
On relationship between regression models and interpretation of multiple regression coefficients
Varaksin, A N
2012-01-01
In this paper, we consider the problem of treating linear regression equation coefficients in the case of correlated predictors. It is shown that in general there are no natural ways of interpreting these coefficients similar to the case of single predictor. Nevertheless we suggest linear transformations of predictors, reducing multiple regression to a simple one and retaining the coefficient at variable of interest. The new variable can be treated as the part of the old variable that has no linear statistical dependence on other presented variables.
Ciupak, Maurycy; Ozga-Zielinski, Bogdan; Adamowski, Jan; Quilty, John; Khalil, Bahaa
2015-11-01
A novel implementation of Dynamic Linear Bayesian Models (DLBM), using either a Varying Coefficient Regression (VCR) or a Discount Weighted Regression (DWR) algorithm was used in the hydrological modeling of annual hydrographs as well as 1-, 2-, and 3-day lead time stream flow forecasting. Using hydrological data (daily discharge, rainfall, and mean, maximum and minimum air temperatures) from the Upper Narew River watershed in Poland, the forecasting performance of DLBM was compared to that of traditional multiple linear regression (MLR) and more recent artificial neural network (ANN) based models. Model performance was ranked DLBM-DWR > DLBM-VCR > MLR > ANN for both annual hydrograph modeling and 1-, 2-, and 3-day lead forecasting, indicating that the DWR and VCR algorithms, operating in a DLBM framework, represent promising new methods for both annual hydrograph modeling and short-term stream flow forecasting.
Khoshravesh, Mojtaba; Sefidkouhi, Mohammad Ali Gholami; Valipour, Mohammad
2015-12-01
The proper evaluation of evapotranspiration is essential in food security investigation, farm management, pollution detection, irrigation scheduling, nutrient flows, carbon balance as well as hydrologic modeling, especially in arid environments. To achieve sustainable development and to ensure water supply, especially in arid environments, irrigation experts need tools to estimate reference evapotranspiration on a large scale. In this study, the monthly reference evapotranspiration was estimated by three different regression models including the multivariate fractional polynomial (MFP), robust regression, and Bayesian regression in Ardestan, Esfahan, and Kashan. The results were compared with Food and Agriculture Organization (FAO)-Penman-Monteith (FAO-PM) to select the best model. The results show that at a monthly scale, all models provided a closer agreement with the calculated values for FAO-PM (R 2 > 0.95 and RMSE < 12.07 mm month-1). However, the MFP model gives better estimates than the other two models for estimating reference evapotranspiration at all stations.
Regression modeling for digital test of ΣΔ modulators
Léger, G; Rueda, Adoración
2010-01-01
The cost of Analogue and Mixed-Signal circuit testing is an important bottleneck in the industry, due to time-consuming verification of specifications that require state-ofthe- art Automatic Test Equipment. In this paper, we apply the concept of Alternate Test to achieve digital testing of converters. By training an ensemble of regression models that maps simple digital defect-oriented signatures onto Signal to Noise and Distortion Ratio (SNDR), an average error of 1:7% is achieved. Beyond th...
Prototype of an adaptive disruption predictor for JET based on fuzzy logic and regression trees
Disruptions remain one of the most hazardous events in the operation of a tokamak device, since they can cause damage to the vacuum vessel and surrounding structures. Their potential danger increases with the plasma volume and energy content and therefore they will constitute an even more serious issue for the next generation of machines. For these reasons, in the recent years a lot of attention has been devoted to devise predictors, capable of foreseeing the imminence of a disruption sufficiently in advance, to allow time for undertaking remedial actions. In this paper, the results of applying fuzzy logic and classification and regression trees (CART) to the problem of predicting disruptions at JET are reported. The conceptual tools of fuzzy logic, in addition to being well suited to accommodate the opinion of experts even if not formulated in mathematical but linguistic terms, are also fully transparent, since their governing rules are human defined. They can therefore help not only in forecasting disruptions but also in studying their behaviour. The analysis leading to the rules of the fuzzy predictor has been complemented with a systematic investigation of the correlation between the various experimental signals and the imminence of a disruption. This has been performed with an exhaustive, non-linear and unbiased method based on decision trees. This investigation has confirmed that the relative importance of various signals can change significantly depending on the plasma conditions. On the basis of the results provided by CART on the information content of the various quantities, the prototype of an adaptive fuzzy logic predictor was trained and tested on JET database. Its performance is significantly better than the previous static one, proving that more flexible prediction strategies, not uniform over the whole discharge but tuned to the operational region of the plasma at any given time, can be very competitive and should be investigated systematically
A flexible count data regression model for risk analysis.
Guikema, Seth D; Coffelt, Jeremy P; Goffelt, Jeremy P
2008-02-01
In many cases, risk and reliability analyses involve estimating the probabilities of discrete events such as hardware failures and occurrences of disease or death. There is often additional information in the form of explanatory variables that can be used to help estimate the likelihood of different numbers of events in the future through the use of an appropriate regression model, such as a generalized linear model. However, existing generalized linear models (GLM) are limited in their ability to handle the types of variance structures often encountered in using count data in risk and reliability analysis. In particular, standard models cannot handle both underdispersed data (variance less than the mean) and overdispersed data (variance greater than the mean) in a single coherent modeling framework. This article presents a new GLM based on a reformulation of the Conway-Maxwell Poisson (COM) distribution that is useful for both underdispersed and overdispersed count data and demonstrates this model by applying it to the assessment of electric power system reliability. The results show that the proposed COM GLM can provide as good of fits to data as the commonly used existing models for overdispered data sets while outperforming these commonly used models for underdispersed data sets. PMID:18304118
Evaluating sediment chemistry and toxicity data using logistic regression modeling
Field, L.J.; MacDonald, D.D.; Norton, S.B.; Severn, C.G.; Ingersoll, C.G.
1999-01-01
This paper describes the use of logistic-regression modeling for evaluating matching sediment chemistry and toxicity data. Contaminant- specific logistic models were used to estimate the percentage of samples expected to be toxic at a given concentration. These models enable users to select the probability of effects of concern corresponding to their specific assessment or management objective or to estimate the probability of observing specific biological effects at any contaminant concentration. The models were developed using a large database (n = 2,524) of matching saltwater sediment chemistry and toxicity data for field-collected samples compiled from a number of different sources and geographic areas. The models for seven chemicals selected as examples showed a wide range in goodness of fit, reflecting high variability in toxicity at low concentrations and limited data on toxicity at higher concentrations for some chemicals. The models for individual test endpoints (e.g., amphipod mortality) provided a better fit to the data than the models based on all endpoints combined. A comparison of the relative sensitivity of two amphipod species to specific contaminants illustrated an important application of the logistic model approach.
Approximation by randomly weighting method in censored regression model
无
2009-01-01
Censored regression ("Tobit") models have been in common use, and their linear hypothesis testings have been widely studied. However, the critical values of these tests are usually related to quantities of an unknown error distribution and estimators of nuisance parameters. In this paper, we propose a randomly weighting test statistic and take its conditional distribution as an approximation to null distribution of the test statistic. It is shown that, under both the null and local alternative hypotheses, conditionally asymptotic distribution of the randomly weighting test statistic is the same as the null distribution of the test statistic. Therefore, the critical values of the test statistic can be obtained by randomly weighting method without estimating the nuisance parameters. At the same time, we also achieve the weak consistency and asymptotic normality of the randomly weighting least absolute deviation estimate in censored regression model. Simulation studies illustrate that the per-formance of our proposed resampling test method is better than that of central chi-square distribution under the null hypothesis.
Regression Models for Predicting Force Coefficients of Aerofoils
Mohammed ABDUL AKBAR
2015-09-01
Full Text Available Renewable sources of energy are attractive and advantageous in a lot of different ways. Among the renewable energy sources, wind energy is the fastest growing type. Among wind energy converters, Vertical axis wind turbines (VAWTs have received renewed interest in the past decade due to some of the advantages they possess over their horizontal axis counterparts. VAWTs have evolved into complex 3-D shapes. A key component in predicting the output of VAWTs through analytical studies is obtaining the values of lift and drag coefficients which is a function of shape of the aerofoil, ‘angle of attack’ of wind and Reynolds’s number of flow. Sandia National Laboratories have carried out extensive experiments on aerofoils for the Reynolds number in the range of those experienced by VAWTs. The volume of experimental data thus obtained is huge. The current paper discusses three Regression analysis models developed wherein lift and drag coefficients can be found out using simple formula without having to deal with the bulk of the data. Drag coefficients and Lift coefficients were being successfully estimated by regression models with R2 values as high as 0.98.
Approximation by randomly weighting method in censored regression model
WANG ZhanFeng; WU YaoHua; ZHAO LinCheng
2009-01-01
Censored regression ("Tobit") models have been in common use,and their linear hypothesis testings have been widely studied.However,the critical values of these tests are usually related to quantities of an unknown error distribution and estimators of nuisance parameters.In this paper,we propose a randomly weighting test statistic and take its conditional distribution as an approximation to null distribution of the test statistic.It is shown that,under both the null and local alternative hypotheses,conditionally asymptotic distribution of the randomly weighting test statistic is the same as the null distribution of the test statistic.Therefore,the critical values of the test statistic can be obtained by randomly weighting method without estimating the nuisance parameters.At the same time,we also achieve the weak consistency and asymptotic normality of the randomly weighting least absolute deviation estimate in censored regression model.Simulation studies illustrate that the performance of our proposed resampling test method is better than that of central chi-square distribution under the null hypothesis.
Empirical likelihood ratio tests for multivariate regression models
WU Jianhong; ZHU Lixing
2007-01-01
This paper proposes some diagnostic tools for checking the adequacy of multivariate regression models including classical regression and time series autoregression. In statistical inference, the empirical likelihood ratio method has been well known to be a powerful tool for constructing test and confidence region. For model checking, however, the naive empirical likelihood (EL) based tests are not of Wilks' phenomenon. Hence, we make use of bias correction to construct the EL-based score tests and derive a nonparametric version of Wilks' theorem. Moreover, by the advantages of both the EL and score test method, the EL-based score tests share many desirable features as follows: They are self-scale invariant and can detect the alternatives that converge to the null at rate n-1/2, the possibly fastest rate for lack-of-fit testing; they involve weight functions, which provides us with the flexibility to choose scores for improving power performance, especially under directional alternatives. Furthermore, when the alternatives are not directional, we construct asymptotically distribution-free maximin tests for a large class of possible alternatives. A simulation study is carried out and an application for a real dataset is analyzed.
The Application of the Cumulative Logistic Regression Model to Automated Essay Scoring
Haberman, Shelby J.; Sinharay, Sandip
2010-01-01
Most automated essay scoring programs use a linear regression model to predict an essay score from several essay features. This article applied a cumulative logit model instead of the linear regression model to automated essay scoring. Comparison of the performances of the linear regression model and the cumulative logit model was performed on a…
Multiple Linear Regression Model Used in Economic Analyses
Constantin ANGHELACHE; Madalina Gabriela ANGHEL; Ligia PRODAN; Cristina SACALA; Marius POPOVICI
2014-01-01
The multiple regression is a tool that offers the possibility to analyze the correlations between more than two variables, situation which account for most cases in macro-economic studies. The best known method of estimation for multiple regression is the method of least squares. As in the two-variable regression, we choose the regression function of sample and minimize the sum of squared residual values. Another method that allows us to take into account the number of variables factor when d...
Many regression algorithms, one unified model — A review
Stulp, Freek; Sigaud, Olivier
2015-01-01
Regression is the process of learning relationships between inputs and continuous outputs from example data, which enables predictions for novel inputs. The history of regression is closely related to the history of artificial neural networks since the seminal work of Rosenblatt (1958). The aims of this paper are to provide an overview of many regression algorithms, and to demonstrate how the function representation whose parameters they regress fall into two classes: a weighted sum of basis ...
Meta-Modeling by Symbolic Regression and Pareto Simulated Annealing
Stinstra, E.; Rennen, G.; Teeuwen, G.J.A.
2006-01-01
The subject of this paper is a new approach to Symbolic Regression.Other publications on Symbolic Regression use Genetic Programming.This paper describes an alternative method based on Pareto Simulated Annealing.Our method is based on linear regression for the estimation of constants.Interval arithm
THE REGRESSION MODEL OF IRAN LIBRARIES ORGANIZATIONAL CLIMATE
Jahani, Mohammad Ali; Yaminfirooz, Mousa; Siamian, Hasan
2015-01-01
Background: The purpose of this study was to drawing a regression model of organizational climate of central libraries of Iran’s universities. Methods: This study is an applied research. The statistical population of this study consisted of 96 employees of the central libraries of Iran’s public universities selected among the 117 universities affiliated to the Ministry of Health by Stratified Sampling method (510 people). Climate Qual localized questionnaire was used as research tools. For predicting the organizational climate pattern of the libraries is used from the multivariate linear regression and track diagram. Results: of the 9 variables affecting organizational climate, 5 variables of innovation, teamwork, customer service, psychological safety and deep diversity play a major role in prediction of the organizational climate of Iran’s libraries. The results also indicate that each of these variables with different coefficient have the power to predict organizational climate but the climate score of psychological safety (0.94) plays a very crucial role in predicting the organizational climate. Track diagram showed that five variables of teamwork, customer service, psychological safety, deep diversity and innovation directly effects on the organizational climate variable that contribution of the team work from this influence is more than any other variables. Conclusions: Of the indicator of the organizational climate of climateQual, the contribution of the team work from this influence is more than any other variables that reinforcement of teamwork in academic libraries can be more effective in improving the organizational climate of this type libraries. PMID:26622203
A Gompertz regression model for fern spores germination
Gabriel y Galán, Jose María
2015-06-01
Full Text Available Germination is one of the most important biological processes for both seed and spore plants, also for fungi. At present, mathematical models of germination have been developed in fungi, bryophytes and several plant species. However, ferns are the only group whose germination has never been modelled. In this work we develop a regression model of the germination of fern spores. We have found that for Blechnum serrulatum, Blechnum yungense, Cheilanthes pilosa, Niphidium macbridei and Polypodium feuillei species the Gompertz growth model describe satisfactorily cumulative germination. An important result is that regression parameters are independent of fern species and the model is not affected by intraspecific variation. Our results show that the Gompertz curve represents a general germination model for all the non-green spore leptosporangiate ferns, including in the paper a discussion about the physiological and ecological meaning of the model.La germinación es uno de los procesos biológicos más relevantes tanto para las plantas con esporas, como para las plantas con semillas y los hongos. Hasta el momento, se han desarrollado modelos de germinación para hongos, briofitos y diversas especies de espermatófitos. Los helechos son el único grupo de plantas cuya germinación nunca ha sido modelizada. En este trabajo se desarrolla un modelo de regresión para explicar la germinación de las esporas de helechos. Observamos que para las especies Blechnum serrulatum, Blechnum yungense, Cheilanthes pilosa, Niphidium macbridei y Polypodium feuillei el modelo de crecimiento de Gompertz describe satisfactoriamente la germinación acumulativa. Un importante resultado es que los parámetros de la regresión son independientes de la especie y que el modelo no está afectado por variación intraespecífica. Por lo tanto, los resultados del trabajo muestran que la curva de Gompertz puede representar un modelo general para todos los helechos leptosporangiados
Conceptual Model of User Adaptive Enterprise Application
Inese Šūpulniece
2015-07-01
Full Text Available The user adaptive enterprise application is a software system, which adapts its behavior to an individual user on the basis of nontrivial inferences from information about the user. The objective of this paper is to elaborate a conceptual model of the user adaptive enterprise applications. In order to conceptualize the user adaptive enterprise applications, their main characteristics are analyzed, the meta-model defining the key concepts relevant to these applications is developed, and the user adaptive enterprise application and its components are defined in terms of the meta-model. Modeling of the user adaptive enterprise application incorporates aspects of enterprise modeling, application modeling, and design of adaptive characteristics of the application. The end-user and her expectations are identified as two concepts of major importance not sufficiently explored in the existing research. Understanding these roles improves the adaptation result in the user adaptive applications.
Yang, Xiaowei; Nie, Kun
2008-03-15
Longitudinal data sets in biomedical research often consist of large numbers of repeated measures. In many cases, the trajectories do not look globally linear or polynomial, making it difficult to summarize the data or test hypotheses using standard longitudinal data analysis based on various linear models. An alternative approach is to apply the approaches of functional data analysis, which directly target the continuous nonlinear curves underlying discretely sampled repeated measures. For the purposes of data exploration, many functional data analysis strategies have been developed based on various schemes of smoothing, but fewer options are available for making causal inferences regarding predictor-outcome relationships, a common task seen in hypothesis-driven medical studies. To compare groups of curves, two testing strategies with good power have been proposed for high-dimensional analysis of variance: the Fourier-based adaptive Neyman test and the wavelet-based thresholding test. Using a smoking cessation clinical trial data set, this paper demonstrates how to extend the strategies for hypothesis testing into the framework of functional linear regression models (FLRMs) with continuous functional responses and categorical or continuous scalar predictors. The analysis procedure consists of three steps: first, apply the Fourier or wavelet transform to the original repeated measures; then fit a multivariate linear model in the transformed domain; and finally, test the regression coefficients using either adaptive Neyman or thresholding statistics. Since a FLRM can be viewed as a natural extension of the traditional multiple linear regression model, the development of this model and computational tools should enhance the capacity of medical statistics for longitudinal data. PMID:17610294
The R Package threg to Implement Threshold Regression Models
Tao Xiao
2015-08-01
This new package includes four functions: threg, and the methods hr, predict and plot for threg objects returned by threg. The threg function is the model-fitting function which is used to calculate regression coefficient estimates, asymptotic standard errors and p values. The hr method for threg objects is the hazard-ratio calculation function which provides the estimates of hazard ratios at selected time points for specified scenarios (based on given categories or value settings of covariates. The predict method for threg objects is used for prediction. And the plot method for threg objects provides plots for curves of estimated hazard functions, survival functions and probability density functions of the first-hitting-time; function curves corresponding to different scenarios can be overlaid in the same plot for comparison to give additional research insights.
Statistical Inference for Partially Linear Regression Models with Measurement Errors
Jinhong YOU; Qinfeng XU; Bin ZHOU
2008-01-01
In this paper, the authors investigate three aspects of statistical inference for the partially linear regression models where some covariates are measured with errors. Firstly,a bandwidth selection procedure is proposed, which is a combination of the difference-based technique and GCV method. Secondly, a goodness-of-fit test procedure is proposed,which is an extension of the generalized likelihood technique. Thirdly, a variable selection procedure for the parametric part is provided based on the nonconcave penalization and corrected profile least squares. Same as "Variable selection via nonconcave penalized like-lihood and its oracle properties" (J. Amer. Statist. Assoc., 96, 2001, 1348-1360), it is shown that the resulting estimator has an oracle property with a proper choice of regu-larization parameters and penalty function. Simulation studies are conducted to illustrate the finite sample performances of the proposed procedures.
In this paper, we present the use of different mathematical models to forecast electricity price under deregulated power. A successful prediction tool of electricity price can help both power producers and consumers plan their bidding strategies. Inspired by that the support vector regression (SVR) model, with the ε-insensitive loss function, admits of the residual within the boundary values of ε-tube, we propose a hybrid model that combines both SVR and Auto-regressive integrated moving average (ARIMA) models to take advantage of the unique strength of SVR and ARIMA models in nonlinear and linear modeling, which is called SVRARIMA. A nonlinear analysis of the time-series indicates the convenience of nonlinear modeling, the SVR is applied to capture the nonlinear patterns. ARIMA models have been successfully applied in solving the residuals regression estimation problems. The experimental results demonstrate that the model proposed outperforms the existing neural-network approaches, the traditional ARIMA models and other hybrid models based on the root mean square error and mean absolute percentage error.
Application of Partial Least-Squares Regression Model on Temperature Analysis and Prediction of RCCD
Yuqing Zhao; Zhenxian Xing
2013-01-01
This study, based on the temperature monitoring data of jiangya RCCD, uses principle and method of partial least-squares regression to analyze and predict temperature variation of RCCD. By founding partial least-squares regression model, multiple correlations of independent variables is overcome, organic combination on multiple linear regressions, multiple linear regression and canonical correlation analysis is achieved. Compared with general least-squares regression model result, it is more ...
Modeling Lateral and Longitudinal Control of Human Drivers with Multiple Linear Regression Models
Lenk, Jan; M, Claus
2011-01-01
In this paper, we describe results to model lateral and longitudinal control behavior of drivers with simple linear multiple regression models. This approach fits into the Bayesian Programming (BP) approach (Bessi
Knol, Mirjam J.; van der Tweel, Ingeborg; Grobbee, Diederick E.; Numans, Mattijs E.; Geerlings, Mirjam I.
2007-01-01
Background To determine the presence of interaction in epidemiologic research, typically a product term is added to the regression model. In linear regression, the regression coefficient of the product term reflects interaction as departure from additivity. However, in logistic regression it refers
Robust repeated median regression in moving windows with data-adaptive width selection
Borowski, Matthias; Fried, Roland
2011-01-01
Online (also 'real-time' or 'sequential') signal extraction from noisy and outlier- interfered data streams is a basic but challenging goal. Fitting a robust Repeated Median (Siegel, 1982) regression line in a moving time window has turned out to be a promising approach (Davies et al., 2004; Gather et al., 2006; Schettlinger et al., 2006). The level of the regression line at the rightmost window position, which equates to the current time point in an online application, is then...
Bayesian auxiliary variable models for binary and multinomial regression
Holmes, C C; HELD, L.
2006-01-01
In this paper we discuss auxiliary variable approaches to Bayesian binary and multinomial regression. These approaches are ideally suited to automated Markov chain Monte Carlo simulation. In the first part we describe a simple technique using joint updating that improves the performance of the conventional probit regression algorithm. In the second part we discuss auxiliary variable methods for inference in Bayesian logistic regression, including covariate set uncertainty. Fina...
Kernel Averaged Predictors for Spatio-Temporal Regression Models.
Heaton, Matthew J; Gelfand, Alan E
2012-12-01
In applications where covariates and responses are observed across space and time, a common goal is to quantify the effect of a change in the covariates on the response while adequately accounting for the spatio-temporal structure of the observations. The most common approach for building such a model is to confine the relationship between a covariate and response variable to a single spatio-temporal location. However, oftentimes the relationship between the response and predictors may extend across space and time. In other words, the response may be affected by levels of predictors in spatio-temporal proximity to the response location. Here, a flexible modeling framework is proposed to capture such spatial and temporal lagged effects between a predictor and a response. Specifically, kernel functions are used to weight a spatio-temporal covariate surface in a regression model for the response. The kernels are assumed to be parametric and non-stationary with the data informing the parameter values of the kernel. The methodology is illustrated on simulated data as well as a physical data set of ozone concentrations to be explained by temperature. PMID:24010051
Ultracentrifuge separative power modeling with multivariate regression using covariance matrix
In this work, the least-squares methodology with covariance matrix is applied to determine a data curve fitting to obtain a performance function for the separative power δU of a ultracentrifuge as a function of variables that are experimentally controlled. The experimental data refer to 460 experiments on the ultracentrifugation process for uranium isotope separation. The experimental uncertainties related with these independent variables are considered in the calculation of the experimental separative power values, determining an experimental data input covariance matrix. The process variables, which significantly influence the δU values are chosen in order to give information on the ultracentrifuge behaviour when submitted to several levels of feed flow rate F, cut θ and product line pressure Pp. After the model goodness-of-fit validation, a residual analysis is carried out to verify the assumed basis concerning its randomness and independence and mainly the existence of residual heteroscedasticity with any explained regression model variable. The surface curves are made relating the separative power with the control variables F, θ and Pp to compare the fitted model with the experimental data and finally to calculate their optimized values. (author)
Al-Khalaf, Adnan; Gustafsson, Steve Oskar
2015-01-01
This thesis consist of two parts. The first part of the thesis will conduct a multiple regression on a data-set obtained from the Ocean Tomo’s auction results between 2006 to 2008 with the purpose to identify key value indicators and investigate to what extent it is possible to predict the value of a patent. The final regression model consist of the following covariates Average number of citings per year, share of active family members, age of the patent, average invested USD per year, and ni...
Encoding through patterns: regression tree-based neuronal population models.
Haslinger, Robert; Pipa, Gordon; Lewis, Laura D; Nikolić, Danko; Williams, Ziv; Brown, Emery
2013-08-01
Although the existence of correlated spiking between neurons in a population is well known, the role such correlations play in encoding stimuli is not. We address this question by constructing pattern-based encoding models that describe how time-varying stimulus drive modulates the expression probabilities of population-wide spike patterns. The challenge is that large populations may express an astronomical number of unique patterns, and so fitting a unique encoding model for each individual pattern is not feasible. We avoid this combinatorial problem using a dimensionality-reduction approach based on regression trees. Using the insight that some patterns may, from the perspective of encoding, be statistically indistinguishable, the tree divisively clusters the observed patterns into groups whose member patterns possess similar encoding properties. These groups, corresponding to the leaves of the tree, are much smaller in number than the original patterns, and the tree itself constitutes a tractable encoding model for each pattern. Our formalism can detect an extremely weak stimulus-driven pattern structure and is based on maximizing the data likelihood, not making a priori assumptions as to how patterns should be grouped. Most important, by comparing pattern encodings with independent neuron encodings, one can determine if neurons in the population are driven independently or collectively. We demonstrate this method using multiple unit recordings from area 17 of anesthetized cat in response to a sinusoidal grating and show that pattern-based encodings are superior to those of independent neuron models. The agnostic nature of our clustering approach allows us to investigate encoding by the collective statistics that are actually present rather than those (such as pairwise) that might be presumed. PMID:23607564
Gurudeo Anand Tularam; Siti Amri
2012-01-01
House price prediction continues to be important for government agencies insurance companies and real estate industry. This study investigates the performance of house sales price models based on linear and non-linear approaches to study the effects of selected variables. Linear stepwise Multivariate Regression (MR) and nonlinear models of Neural Network (NN) and Adaptive Neuro-Fuzzy (ANFIS) are developed and compared. The GIS methods are used to integrate the data for the study area (Bathurs...
Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.
Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko
2016-03-01
In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. PMID:26774211
Application of regression model on stream water quality parameters
Statistical analysis was conducted to evaluate the effect of solid waste leachate from the open solid waste dumping site of Salhad on the stream water quality. Five sites were selected along the stream. Two sites were selected prior to mixing of leachate with the surface water. One was of leachate and other two sites were affected with leachate. Samples were analyzed for pH, water temperature, electrical conductivity (EC), total dissolved solids (TDS), Biological oxygen demand (BOD), chemical oxygen demand (COD), dissolved oxygen (DO) and total bacterial load (TBL). In this study correlation coefficient r among different water quality parameters of various sites were calculated by using Pearson model and then average of each correlation between two parameters were also calculated, which shows TDS and EC and pH and BOD have significantly increasing r value, while temperature and TDS, temp and EC, DO and BL, DO and COD have decreasing r value. Single factor ANOVA at 5% level of significance was used which shows EC, TDS, TCL and COD were significantly differ among various sites. By the application of these two statistical approaches TDS and EC shows strongly positive correlation because the ions from the dissolved solids in water influence the ability of that water to conduct an electrical current. These two parameters significantly vary among 5 sites which are further confirmed by using linear regression. (author)
Proteomics Improves the Prediction of Burns Mortality: Results from Regression Spline Modeling
Finnerty, Celeste C.; Ju, Hyunsu; Spratt, Heidi; Victor, Sundar; Jeschke, Marc G.; Hegde, Sachin; Bhavnani, Suresh K.; Luxon, Bruce A.; Brasier, Allan R.; Herndon, David N.
2012-01-01
Prediction of mortality in severely burned patients remains unreliable. Although clinical covariates and plasma protein abundance have been used with varying degrees of success, the triad of burn size, inhalation injury, and age remains the most reliable predictor. We investigated the effect of combining proteomics variables with these three clinical covariates on prediction of mortality in burned children. Serum samples were collected from 330 burned children (burns covering >25% of the total body surface area) between admission and the time of the first operation for clinical chemistry analyses and proteomic assays of cytokines. Principal component analysis revealed that serum protein abundance and the clinical covariates each provided independent information regarding patient survival. To determine whether combining proteomics with clinical variables improves prediction of patient mortality, we used multivariate adaptive regression splines, since the relationships between analytes and mortality were not linear. Combining these factors increased overall outcome prediction accuracy from 52% to 81% and area under the receiver operating characteristic curve from 0.82 to 0.95. Thus, the predictive accuracy of burns mortality is substantially improved by combining protein abundance information with clinical covariates in a multivariate adaptive regression splines classifier, a model currently being validated in a prospective study. PMID:22686201
Story, Roger E.
1996-01-01
Discussion of the use of Latent Semantic Indexing to determine relevancy in information retrieval focuses on statistical regression and Bayesian methods. Topics include keyword searching; a multiple regression model; how the regression model can aid search methods; and limitations of this approach, including complexity, linearity, and…
Faraway, Julian J
2005-01-01
Linear models are central to the practice of statistics and form the foundation of a vast range of statistical methodologies. Julian J. Faraway''s critically acclaimed Linear Models with R examined regression and analysis of variance, demonstrated the different methods available, and showed in which situations each one applies. Following in those footsteps, Extending the Linear Model with R surveys the techniques that grow from the regression model, presenting three extensions to that framework: generalized linear models (GLMs), mixed effect models, and nonparametric regression models. The author''s treatment is thoroughly modern and covers topics that include GLM diagnostics, generalized linear mixed models, trees, and even the use of neural networks in statistics. To demonstrate the interplay of theory and practice, throughout the book the author weaves the use of the R software environment to analyze the data of real examples, providing all of the R commands necessary to reproduce the analyses. All of the ...
ANALISYS OF GLOBAL FDI AND GDP – LINEAR REGRESSION MODEL
Ahmad Subagyo
2016-01-01
The aim of this study was to examine the relationship theoretically global Foreign Direct Investments (FDI) to the global GDP of all countries in the world. This study emphasizes the relationship between global GDP and global FDI in all countries in the world, whether in theory has a coherent nature with theoretical expectations. Multiple linear regression analysis applied in this study. According to the results of linear regression test the evolution of global FDI in terms of changes i...
The Classical Linear Regression Model with one Incomplete Binary Variable
Toutenburg, Helge; Nittner, T.
1999-01-01
We present three different methods based on the conditional mean imputation when binary explanatory variables are incomplete. Apart from the single imputation and multiple imputation especially the so-called pi imputation is presented as a new procedure. Seven procedures are compared in a simulation experiment when missing data are confined to one independent binary variable: complete case analysis, zero order regression, categorical zero order regression, pi imputation, single imputation, mu...
Boosting the partial least square algorithm for regression modelling
Ling YU; Tiejun WU
2006-01-01
Boosting algorithms are a class of general methods used to improve the general performance of regression analysis. The main idea is to maintain a distribution over the train set. In order to use the given distribution directly,a modified PLS algorithm is proposed and used as the base learner to deal with the nonlinear multivariate regression problems. Experiments on gasoline octane number prediction demonstrate that boosting the modified PLS algorithm has better general performance over the PLS algorithm.
Deglint, Jason; Kazemzadeh, Farnoud; Wong, Alexander; Clausi, David A.
2015-09-01
One method to acquire multispectral images is to sequentially capture a series of images where each image contains information from a different bandwidth of light. Another method is to use a series of beamsplitters and dichroic filters to guide different bandwidths of light onto different cameras. However, these methods are very time consuming and expensive and perform poorly in dynamic scenes or when observing transient phenomena. An alternative strategy to capturing multispectral data is to infer this data using sparse spectral reflectance measurements captured using an imaging device with overlapping bandpass filters, such as a consumer digital camera using a Bayer filter pattern. Currently the only method of inferring dense reflectance spectra is the Wiener adaptive filter, which makes Gaussian assumptions about the data. However, these assumptions may not always hold true for all data. We propose a new technique to infer dense reflectance spectra from sparse spectral measurements through the use of a non-linear regression model. The non-linear regression model used in this technique is the random forest model, which is an ensemble of decision trees and trained via the spectral characterization of the optical imaging system and spectral data pair generation. This model is then evaluated by spectrally characterizing different patches on the Macbeth color chart, as well as by reconstructing inferred multispectral images. Results show that the proposed technique can produce inferred dense reflectance spectra that correlate well with the true dense reflectance spectra, which illustrates the merits of the technique.
ALMA, Özlem GÜRÜNLÜ; KURT, Serdar; UĞUR, Aybars
2010-01-01
Multiple linear regression models are widely used applied statistical techniques and they are most useful devices for extracting and understanding the essential features of datasets. However, in multiple linear regression models problems arise when a serious outlier observation or multicollinearity present in the data. In regression however, the situation is somewhat more complex in the sense that some outlying points will have more influence on the regression than others. An important proble...
Regression of retinopathy by squalamine in a mouse model.
Higgins, Rosemary D; Yan, Yun; Geng, Yixun; Zasloff, Michael; Williams, Jon I
2004-07-01
The goal of this study was to determine whether an antiangiogenic agent, squalamine, given late during the evolution of oxygen-induced retinopathy (OIR) in the mouse, could improve retinal neovascularization. OIR was induced in neonatal C57BL6 mice and the neonates were treated s.c. with squalamine doses begun at various times after OIR induction. A system of retinal whole mounts and assessment of neovascular nuclei extending beyond the inner limiting membrane from animals reared under room air or OIR conditions and killed periodically from d 12 to 21 were used to assess retinopathy in squalamine-treated and untreated animals. OIR evolved after 75% oxygen exposure in neonatal mice with florid retinal neovascularization developing by d 14. Squalamine (single dose, 25 mg/kg s.c.) given on d 15 or 16, but not d 17, substantially improved retinal neovascularization in the mouse model of OIR. There was improvement seen in the degree of blood vessel tuft formation, blood vessel tortuosity, and central vasoconstriction with squalamine treatment at d 15 or 16. Single-dose squalamine at d 12 was effective at reducing subsequent development of retinal neovascularization at doses as low as 1 mg/kg. Squalamine is a very active inhibitor of OIR in mouse neonates at doses as low as 1 mg/kg given once. Further, squalamine given late in the course of OIR improves retinopathy by inducing regression of retinal neovessels and abrogating invasion of new vessels beyond the inner-limiting membrane of the retina. PMID:15128931
Model-Based Software Regression Testing for Software Components
Batra, Gagandeep; Arora, Yogesh Kumar; Sengupta, Jyotsna
This paper presents a novel approach of generating regression test cases from UML design diagrams. Regression testing can be systematically applied at the software components architecture level so as to reduce the effort and cost of retesting modified systems. Our approach consists of transforming a UML sequence diagram of a component into a graphical structure called the control flow graph (CFG) and its revised version into an Extended control flow graph (ECFG) The nodes of the two graphs are augmented with information necessary to compose test suites in terms of test case scenarios. This information is collected from use case templates and class diagrams. The graphs are traversed in depth-first-order to generate test scenarios. Further, the two are compared for change identification. Based on change information, test cases are identified as reusable, obsolete or newly added. The regression test suite thus generated is suitable to detect any interaction and scenario faults.
VARIABLE SELECTION BY PSEUDO WAVELETS IN HETEROSCEDASTIC REGRESSION MODELS INVOLVING TIME SERIES
无
2006-01-01
A simple but efficient method has been proposed to select variables in heteroscedastic regression models. It is shown that the pseudo empirical wavelet coefficients corresponding to the significant explanatory variables in the regression models are clearly larger than those nonsignificant ones, on the basis of which a procedure is developed to select variables in regression models. The coefficients of the models are also estimated. All estimators are proved to be consistent.
Tan, C. H.; Matjafri, M. Z.; Lim, H. S.
2015-10-01
This paper presents the prediction models which analyze and compute the CO2 emission in Malaysia. Each prediction model for CO2 emission will be analyzed based on three main groups which is transportation, electricity and heat production as well as residential buildings and commercial and public services. The prediction models were generated using data obtained from World Bank Open Data. Best subset method will be used to remove irrelevant data and followed by multi linear regression to produce the prediction models. From the results, high R-square (prediction) value was obtained and this implies that the models are reliable to predict the CO2 emission by using specific data. In addition, the CO2 emissions from these three groups are forecasted using trend analysis plots for observation purpose.
Buja, A.; Berk, R.; Brown, L; George, E.; Pitkin, E.; Traskin, M.; Zhan, K.; Zhao, L.
2014-01-01
We review and interpret the early insights of Halbert White who over thirty years ago inaugurated a form of statistical inference for regression models that is asymptotically correct even under "model misspecification," that is, under the assumption that models are approximations rather than generative truths. This form of inference, which is pervasive in econometrics, relies on the "sandwich estimator" of standard error. Whereas linear models theory in statistics assumes models to be true an...
A generalized additive regression model for survival times
Scheike, Thomas H.
2001-01-01
Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models......Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models...
Khorami, M. Tayebi [Department of Mining Engineering, Science and Research Branch, Islamic Azad University, Poonak, Hesarak Tehran (Iran, Islamic Republic of); Chelgani, S. Chehreh [Surface Science Western, Research Park, University of Western Ontario, London (Canada); Hower, James C. [Center for Applied Energy Research, University of Kentucky, Kexington (United States); Jorjani, E. [Department of Mining Engineering, Science and Research Branch, Islamic Azad University, Poonak, Hesarak Tehran (Iran, Islamic Republic of)
2011-01-01
The results of proximate, ultimate, and petrographic analysis for a wide range of Kentucky coal samples were used to predict Free Swelling Index (FSI) using multivariable regression and Adaptive Neuro Fuzzy Inference System (ANFIS). Three different input sets: (a) moisture, ash, and volatile matter; (b) carbon, hydrogen, nitrogen, oxygen, sulfur, and mineral matter; and (c) group-maceral analysis, mineral matter, moisture, sulfur, and R{sub max} were applied for both methods. Non-linear regression achieved the correlation coefficients (R{sup 2}) of 0.38, 0.49, and 0.70 for input sets (a), (b), and (c), respectively. By using the same input sets, ANFIS predicted FSI with higher R{sup 2} of 0.46, 0.82 and 0.95, respectively. Results show that input set (c) is the best predictor of FSI in both prediction methods, and ANFIS significantly can be used to predict FSI when regression results do not have appropriate accuracy. (author)
RCR: Robust Compound Regression for Robust Estimation of Errors-in-Variables Model
Han, Hao; Wei ZHU
2015-01-01
The errors-in-variables (EIV) regression model, being more realistic by accounting for measurement errors in both the dependent and the independent variables, is widely adopted in applied sciences. The traditional EIV model estimators, however, can be highly biased by outliers and other departures from the underlying assumptions. In this paper, we develop a novel nonparametric regression approach - the robust compound regression (RCR) analysis method for the robust estimation of EIV models. W...
Anderson, Carolyn J.; Verkuilen, Jay; Peyton, Buddy L.
2010-01-01
Survey items with multiple response categories and multiple-choice test questions are ubiquitous in psychological and educational research. We illustrate the use of log-multiplicative association (LMA) models that are extensions of the well-known multinomial logistic regression model for multiple dependent outcome variables to reanalyze a set of…
The Linear Regression Model for setting up the Futures Price
Mario G.R. PAGLIACC; Janusz GRABARA; Madalina Gabriela ANGHEL; Cristina SACALA; Vasile Lucian ANTON
2015-01-01
To realize a linear regression, we have considered the computation method for futures prices that, according to economic culture, is based on the rate of the supporting asset and internal/external interest ratios, and also on the time period until maturity. The market price of a futures instrument is influenced by the demand and supply, that is the number of units traded within a certain period.
Simone Becker Lopes
2014-04-01
Full Text Available Considering the importance of spatial issues in transport planning, the main objective of this study was to analyze the results obtained from different approaches of spatial regression models. In the case of spatial autocorrelation, spatial dependence patterns should be incorporated in the models, since that dependence may affect the predictive power of these models. The results obtained with the spatial regression models were also compared with the results of a multiple linear regression model that is typically used in trips generation estimations. The findings support the hypothesis that the inclusion of spatial effects in regression models is important, since the best results were obtained with alternative models (spatial regression models or the ones with spatial variables included. This was observed in a case study carried out in the city of Porto Alegre, in the state of Rio Grande do Sul, Brazil, in the stages of specification and calibration of the models, with two distinct datasets.
Noise Reduction and Gap Filling of fAPAR Time Series Using an Adapted Local Regression Filter
Álvaro Moreno
2014-08-01
Full Text Available Time series of remotely sensed data are an important source of information for understanding land cover dynamics. In particular, the fraction of absorbed photosynthetic active radiation (fAPAR is a key variable in the assessment of vegetation primary production over time. However, the fAPAR series derived from polar orbit satellites are not continuous and consistent in space and time. Filtering methods are thus required to fill in gaps and produce high-quality time series. This study proposes an adapted (iteratively reweighted local regression filter (LOESS and performs a benchmarking intercomparison with four popular and generally applicable smoothing methods: Double Logistic (DLOG, smoothing spline (SSP, Interpolation for Data Reconstruction (IDR and adaptive Savitzky-Golay (ASG. This paper evaluates the main advantages and drawbacks of the considered techniques. The results have shown that ASG and the adapted LOESS perform better in recovering fAPAR time series over multiple controlled noisy scenarios. Both methods can robustly reconstruct the fAPAR trajectories, reducing the noise up to 80% in the worst simulation scenario, which might be attributed to the quality control (QC MODIS information incorporated into these filtering algorithms, their flexibility and adaptation to the upper envelope. The adapted LOESS is particularly resistant to outliers. This method clearly outperforms the other considered methods to deal with the high presence of gaps and noise in satellite data records. The low RMSE and biases obtained with the LOESS method (|rMBE| < 8%; rRMSE < 20% reveals an optimal reconstruction even in most extreme situations with long seasonal gaps. An example of application of the LOESS method to fill in invalid values in real MODIS images presenting persistent cloud and snow coverage is also shown. The LOESS approach is recommended in most remote sensing applications, such as gap-filling, cloud-replacement, and observing temporal
Joyce P. Jacobsen; Laurence M. Levin; Zachary Tausanovitch
2014-01-01
Economists’ wariness of data mining may be misplaced, even in cases where economic theory provides a well-specified model for estimation. We discuss how new data mining/ensemble modeling software, for example the program TreeNet, can be used to create predictive models. We then show how for a standard labor economics problem, the estimation of wage equations, TreeNet outperforms standard OLS regression in terms of lower prediction error. Ensemble modeling also resists the tendency to overfit ...
von Davier, Matthias; Sinharay, Sandip
2009-01-01
This paper presents an application of a stochastic approximation EM-algorithm using a Metropolis-Hastings sampler to estimate the parameters of an item response latent regression model. Latent regression models are extensions of item response theory (IRT) to a 2-level latent variable model in which covariates serve as predictors of the…
Cepeda-Cuervo, Edilberto; Núñez-Antón, Vicente
2013-01-01
In this article, a proposed Bayesian extension of the generalized beta spatial regression models is applied to the analysis of the quality of education in Colombia. We briefly revise the beta distribution and describe the joint modeling approach for the mean and dispersion parameters in the spatial regression models' setting. Finally, we…
Jokar Arsanjani, J.; Helbich, M.; Kainz, W.; Boloorani, A.
2013-01-01
This research analyses the suburban expansion in the metropolitan area of Tehran, Iran. A hybrid model consisting of logistic regression model, Markov chain (MC), and cellular automata (CA) was designed to improve the performance of the standard logistic regression model. Environmental and socio-eco
Development of regression model for uncertainty analysis by response surface method in HANARO
The feasibility of uncertainty analysis with regression model in reactor physics problem was investigated. Regression model as a alternative model for a MCNP/ORIGEN2 code system which is uncertainty analysis tool of fission-produced molybdenum production was developed using Response Surface Method. It was shown that the development of regression model in the reactor physics problem was possible by introducing the burnup parameter. The most important parameter affecting the uncertainty of 99Mo yield ratio was fuel thickness in the regression model. This results agree well those of Crude Monte Carlo Method for each parameter. The regression model developed in this research was shown to be suitable as a alternative model, because coefficient of determination was 0.99
Hao, Lingxin
2007-01-01
Quantile Regression, the first book of Hao and Naiman's two-book series, establishes the seldom recognized link between inequality studies and quantile regression models. Though separate methodological literature exists for each subject, the authors seek to explore the natural connections between this increasingly sought-after tool and research topics in the social sciences. Quantile regression as a method does not rely on assumptions as restrictive as those for the classical linear regression; though more traditional models such as least squares linear regression are more widely utilized, Hao
A nonparametric dynamic additive regression model for longitudinal data
Martinussen, Torben; Scheike, Thomas H.
2000-01-01
dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models......dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models...
Completing and adapting models of biological processes
Margaria, Tiziana; Hinchey, Michael G.; Raffelt, Harald; Rash, James L.; Rouff, Christopher A.; Steffen, Bernhard
2006-01-01
We present a learning-based method for model completion and adaptation, which is based on the combination of two approaches: 1) R2D2C, a technique for mechanically transforming system requirements via provably equivalent models to running code, and 2) automata learning-based model extrapolation. The intended impact of this new combination is to make model completion and adaptation accessible to experts of the field, like biologists or engineers. The principle is briefly illustrated by gene...
Grajeda, LM; Ivanescu, A; Saito, M; Crainiceanu, C; Jaganath, D; Gilman, RH; Crabtree, JE; Kelleher, D; Cabrera, L.; Cama, V; Checkley, W
2016-01-01
Background Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and accelerati...
An adaptive distance measure for use with nonparametric models
Distance measures perform a critical task in nonparametric, locally weighted regression. Locally weighted regression (LWR) models are a form of 'lazy learning' which construct a local model 'on the fly' by comparing a query vector to historical, exemplar vectors according to a three step process. First, the distance of the query vector to each of the exemplar vectors is calculated. Next, these distances are passed to a kernel function, which converts the distances to similarities or weights. Finally, the model output or response is calculated by performing locally weighted polynomial regression. To date, traditional distance measures, such as the Euclidean, weighted Euclidean, and L1-norm have been used as the first step in the prediction process. Since these measures do not take into consideration sensor failures and drift, they are inherently ill-suited for application to 'real world' systems. This paper describes one such LWR model, namely auto associative kernel regression (AAKR), and describes a new, Adaptive Euclidean distance measure that can be used to dynamically compensate for faulty sensor inputs. In this new distance measure, the query observations that lie outside of the training range (i.e. outside the minimum and maximum input exemplars) are dropped from the distance calculation. This allows for the distance calculation to be robust to sensor drifts and failures, in addition to providing a method for managing inputs that exceed the training range. In this paper, AAKR models using the standard and Adaptive Euclidean distance are developed and compared for the pressure system of an operating nuclear power plant. It is shown that using the standard Euclidean distance for data with failed inputs, significant errors in the AAKR predictions can result. By using the Adaptive Euclidean distance it is shown that high fidelity predictions are possible, in spite of the input failure. In fact, it is shown that with the Adaptive Euclidean distance prediction
2009-01-01
In this paper, we study the local asymptotic behavior of the regression spline estimator in the framework of marginal semiparametric model. Similarly to Zhu, Fung and He (2008), we give explicit expression for the asymptotic bias of regression spline estimator for nonparametric function f. Our results also show that the asymptotic bias of the regression spline estimator does not depend on the working covariance matrix, which distinguishes the regression splines from the smoothing splines and the seemingly unrelated kernel. To understand the local bias result of the regression spline estimator, we show that the regression spline estimator can be obtained iteratively by applying the standard weighted least squares regression spline estimator to pseudo-observations. At each iteration, the bias of the estimator is unchanged and only the variance is updated.
Theoretical Aspects Regarding the Use of the Multiple Linear Regression Model in Economic Analyses
Constantin ANGHELACHE; Ioan PARTACHI; Adina Mihaela DINU; Ligia PRODAN; Georgeta BARDAªU (LIXANDRU)
2013-01-01
In this paper we have studied the dependence between GDP, final consumption and net investments. To analyze this correlation, the article proposes a multiple regression model, extremely useful tool in economic analysis. Regression model described in the article considers the GDP as outcome variables and final consumption and net investment as factorial variables.
Analysis for Regression Model Behavior by Sampling Strategy for Annual Pollutant Load Estimation.
Park, Youn Shik; Engel, Bernie A
2015-11-01
Water quality data are typically collected less frequently than streamflow data due to the cost of collection and analysis, and therefore water quality data may need to be estimated for additional days. Regression models are applicable to interpolate water quality data associated with streamflow data and have come to be extensively used, requiring relatively small amounts of data. There is a need to evaluate how well the regression models represent pollutant loads from intermittent water quality data sets. Both the specific regression model and water quality data frequency are important factors in pollutant load estimation. In this study, nine regression models from the Load Estimator (LOADEST) and one regression model from the Web-based Load Interpolation Tool (LOADIN) were evaluated with subsampled water quality data sets from daily measured water quality data sets for N, P, and sediment. Each water quality parameter had different correlations with streamflow, and the subsampled water quality data sets had various proportions of storm samples. The behaviors of the regression models differed not only by water quality parameter but also by proportion of storm samples. The regression models from LOADEST provided accurate and precise annual sediment and P load estimates using the water quality data of 20 to 40% storm samples. LOADIN provided more accurate and precise annual N load estimates than LOADEST. In addition, the results indicate that avoidance of water quality data extrapolation and availability of water quality data from storm events were crucial in annual pollutant load estimation using pollutant regression models. PMID:26641336
Technology diffusion in hospitals: A log odds random effects regression model
J.L.T. Blank (Jos); V.G. Valdmanis (Vivian G.)
2015-01-01
textabstractThis study identifies the factors that affect the diffusion of hospital innovations. We apply a log odds random effects regression model on hospital micro data. We introduce the concept of clustering innovations and the application of a log odds random effects regression model to describ
Unobtrusive user modeling for adaptive hypermedia
H.J. Holz; K. Hofmann; C. Reed
2008-01-01
We propose a technique for user modeling in Adaptive Hypermedia (AH) that is unobtrusive at both the level of observable behavior and that of cognition. Unobtrusive user modeling is complementary to transparent user modeling. Unobtrusive user modeling induces user models appropriate for Educational
Plackett-Luce regression: A new Bayesian model for polychotomous data
Archambeau, Cedric; Caron, Francois
2012-01-01
Multinomial logistic regression is one of the most popular models for modelling the effect of explanatory variables on a subject choice between a set of specified options. This model has found numerous applications in machine learning, psychology or economy. Bayesian inference in this model is non trivial and requires, either to resort to a MetropolisHastings algorithm, or rejection sampling within a Gibbs sampler. In this paper, we propose an alternative model to multinomial logistic regress...
Beta Regression Finite Mixture Models of Polarization and Priming
Smithson, Michael; Merkle, Edgar C.; Verkuilen, Jay
2011-01-01
This paper describes the application of finite-mixture general linear models based on the beta distribution to modeling response styles, polarization, anchoring, and priming effects in probability judgments. These models, in turn, enhance our capacity for explicitly testing models and theories regarding the aforementioned phenomena. The mixture…
Suhartono, Lee, Muhammad Hisyam; Prastyo, Dedy Dwi
2015-12-01
The aim of this research is to develop a calendar variation model for forecasting retail sales data with the Eid ul-Fitr effect. The proposed model is based on two methods, namely two levels ARIMAX and regression methods. Two levels ARIMAX and regression models are built by using ARIMAX for the first level and regression for the second level. Monthly men's jeans and women's trousers sales in a retail company for the period January 2002 to September 2009 are used as case study. In general, two levels of calendar variation model yields two models, namely the first model to reconstruct the sales pattern that already occurred, and the second model to forecast the effect of increasing sales due to Eid ul-Fitr that affected sales at the same and the previous months. The results show that the proposed two level calendar variation model based on ARIMAX and regression methods yields better forecast compared to the seasonal ARIMA model and Neural Networks.
Asymptotic Normality of LS Estimate in Simple Linear EV Regression Model
Jixue LIU
2006-01-01
Though EV model is theoretically more appropriate for applications in which measurement errors exist, people are still more inclined to use the ordinary regression models and the traditional LS method owing to the difficulties of statistical inference and computation. So it is meaningful to study the performance of LS estimate in EV model.In this article we obtain general conditions guaranteeing the asymptotic normality of the estimates of regression coefficients in the linear EV model. It is noticeable that the result is in some way different from the corresponding result in the ordinary regression model.
Multiple models adaptive feedforward decoupling controller
Wang Xin; Li Shaoyuan; Wang Zhongjie
2005-01-01
When the parameters of the system change abruptly, a new multivariable adaptive feedforward decoupling controller using multiple models is presented to improve the transient response. The system models are composed of multiple fixed models, one free-running adaptive model and one re-initialized adaptive model. The fixed models are used to provide initial control to the process. The re-initialized adaptive model can be reinitialized as the selected model to improve the adaptation speed. The free-running adaptive controller is added to guarantee the overall system stability. At each instant, the best system model is selected according to the switching index and the corresponding controller is designed. During the controller design, the interaction is viewed as the measurable disturbance and eliminated by the choice of the weighting polynomial matrix. It not only eliminates the steady-state error but also decouples the system dynamically. The global convergence is obtained and several simulation examples are presented to illustrate the effectiveness of the proposed controller.
Pawlus, Witold; Robbersmyr, Kjell G.; Karimi, Hamid Reza
2011-01-01
n this paper we present the application of regressive models to simulation of car-to-pole impacts. Three models were investigated: RARMAX, ARMAX and AR. Their suitability to estimate physical system parameters as well as to reproduce car kinematics was examined. It was found out that they not only estimate the one quantity which was used for their creation (car acceleration) but also describe the car's acceleration, velocity and crush. A virtual experiment was performed to obtain another set ...
Model Averaging Methods for Weight Trimming in Generalized Linear Regression Models
Elliott, Michael R.
2009-01-01
In sample surveys where units have unequal probabilities of inclusion, associations between the inclusion probability and the statistic of interest can induce bias in unweighted estimates. This is true even in regression models, where the estimates of the population slope may be biased if the underlying mean model is misspecified or the sampling is nonignorable. Weights equal to the inverse of the probability of inclusion are often used to counteract this bias. Highly disproportional sample d...
Flexible competing risks regression modeling and goodness-of-fit
Thomas H. Scheike; Zhang, Mei-Jie
2008-01-01
In this paper we consider different approaches for estimation and assessment of covariate effects for the cumulative incidence curve in the competing risks model. The classic approach is to model all cause-specific hazards and then estimate the cumulative incidence curve based on these cause-specific hazards. Another recent approach is to directly model the cumulative incidence by a proportional model (Fine and Gray, J Am Stat Assoc 94:496-509, 1999), and then obtain direct estimates of how c...
Correlated Component Regression: Application On Model To Determination Of Dna Damage
Sadi ELASAN; Keskin, Sıddık; Ari, Elif
2016-01-01
Objective: The number of explanatory variables, where the sample size is approaching or passing the sample size, the high-dimensional data sets in other words, the estimated regression model is one of the important questions of how to increase the reliability. One of the new methods that can be used, Correlated Component Regression Centerproduct. In this study, providing information about the associated Component Regression is intended to be introduced along with an application. Material and ...
Process Design and Optimization (MLS-S03): a Journey in Modeling, Optimization and Regression
Billeter, Julien
2015-01-01
This lecture describes the following topics: • Preamble on Linear Algebra • Dynamic and Static Models • Solving Dynamic and Static Models • Solving Optimization Problems • Solving Regression Problems
On relationship between coefficients of the different dimensions linear regression models
Panov, V. G.
2011-01-01
Considered two linear regression models of a given response variable with some predictor set and its subset. It is shown that there is a linear relationship between coefficients of these models. Some corollaries of the proved theorem is considered.
Reduction of the curvature of a class of nonlinear regression models
吴翊; 易东云
2000-01-01
It is proved that the curvature of nonlinear model can be reduced to zero by increasing measured data for a class of nonlinear regression models. The result is important to actual problem and has obtained satisfying effect on data fusing.
A generalized exponential time series regression model for electricity prices
Haldrup, Niels; Knapik, Oskar; Proietti, Tomasso
on the estimated model, the best linear predictor is constructed. Our modeling approach provides good fit within sample and outperforms competing benchmark predictors in terms of forecasting accuracy. We also find that building separate models for each hour of the day and averaging the forecasts is a...
Teacher training through the Regression Model in foreign language education
Jesús García Laborda
2011-01-01
Full Text Available In the last few years, Spain has seen dramatic changes in its educational system. Many of these changes have been rejected by most teachers after their implementation (LOGSE while others have found potential drawbacks even before starting operating (LOCE, LOE. To face these changes, schools need well qualified instructors. Given this need, and also considering that, although all the schools want the best teachers but, as teachers’ salaries are regulated by the state, few schools can actually offer incentives to their teachers and consequently schools never have the instructors they wish. Apart from this, state schools have a fixed salary for their teachers and private institutions offer no additional bonuses for things like additional training or diplomas (for example, masters or post-degree courses and, therefore, teachers are rarely interested in pursuing any further studies in methodology or any other related fields such as education or applied linguistics. Although many teachers acknowledge their love to teaching, the current situation in schools (school violence, bad salaries, depression, social desprestige, legal changes and so has made the teaching job one of the most complicated and undevoted in Spain. It is not unusual to have a couple of instructors ill due to depression and other psychological sicknesses. This paper deals with the development and implementation of a training program based on regressive visualizations of one’s experience both as a teacher as well as a learner.
Teacher training through the Regression Model in foreign language education
Jesús García Laborda
2011-01-01
Full Text Available In the last few years, Spain has seen dramatic changes in its educational system. Many of them have been rejected by most teachers after their implementation (LOGSE while others have found potential drawbacks even before starting operating (LOCE, LOE. To face these changes, schools need well qualified instructors. Given this need, and also considering that, although all the schools want the best teachers but, as teachers’ salaries are regulated by the state, few schools can actually offer incentives to their teachers and consequently schools never have the instructors they wish. Apart from this, state schools have a fixed salary for their teachers and private institutions offer no additional bonuses for things like additional training or diplomas (for example, masters or post-degree courses and, therefore, teachers are rarely interested in pursuing any further studies in methodology or any other related fields such as education or applied linguistics. Although many teachers acknowledge their love to teaching, the current situation in schools (school violence, bad salaries, depression, social desprestige, legal changes and so has made the teaching job one of the most complicated and undevoted in Spain. It is not unusual to have a couple of instructors ill due to depression and other psychological sicknesses. This paper deals with the development and implementation of a training program based on regressive visualizations of one’s experience both as a teacher as well as a learner.
Logistic Regression Models to Forecast Travelling Behaviour in Tripoli City
Amiruddin Ismail
2011-01-01
Full Text Available Transport modes are very important to Libyan’s Tripoli residents for their daily trips. However, the total number of own car and private transport namely taxi and micro buses on the road increases and causes many problems such as traffic congestion, accidents, air and noise pollution. These problems then causes other related phenomena to the travel activities such as delay in trips, stress and frustration to motorists which may affect their productivity and efficiency to both workers and students. Delay may also increase travel cost as well inefficiency in trips making if compare to other public transport users in some Arabs cities. Switching to public transport (PT modes alternatives such as buses, light rail transit and underground train could improve travel time and travel costs. A transport study has been carried out at Tripoli City Authority areas among own car users who live in areas with inadequate of private transport and poor public transportation services. Analyses about relation between factors such as travel time, travel cost, trip purpose and parking cost have been made to answer research questions. Logistic regression technique has been used to analyse these factors that influence users to switch their trips mode to public transport alternatives.
Modeling and Control with Local Linearizing Nadaraya Watson Regression
Kühn, Steffen
2008-01-01
Black box models of technical systems are purely descriptive. They do not explain why a system works the way it does. Thus, black box models are insufficient for some problems. But there are numerous applications, for example, in control engineering, for which a black box model is absolutely sufficient. In this article, we describe a general stochastic framework with which such models can be built easily and fully automated by observation. Furthermore, we give a practical example and show how this framework can be used to model and control a motorcar powertrain.
Preobrazhenskii, M. P.; Rudakov, O. B.
2016-01-01
A regression model for calculating the boiling point isobars of tetrachloromethane-organic solvent binary homogeneous systems is proposed. The parameters of the model proposed were calculated for a series of solutions. The correlation between the nonadditivity parameter of the regression model and the hydrophobicity criterion of the organic solvent is established. The parameter value of the proposed model is shown to allow prediction of the potential formation of azeotropic mixtures of solvents with tetrachloromethane.
The Relationship between Economic Growth and Money Laundering – a Linear Regression Model
Daniel Rece; Ion Stancu
2009-01-01
This study provides an overview of the relationship between economic growth and money laundering modeled by a least squares function. The report analyzes statistically data collected from USA, Russia, Romania and other eleven European countries, rendering a linear regression model. The study illustrates that 23.7% of the total variance in the regressand (level of money laundering) is “explained” by the linear regression model. In our opinion, this model will provide critical...
N. Diodato
2010-12-01
Full Text Available To reconstruct sub-regional European climate over the past centuries, several efforts have been made using historical datasets. However, only scattered information at low spatial and temporal resolution have been produced to date for the Mediterranean area. This paper has exploited, for Southern and Central Italy (Mediterranean Sub-Regional Area, an unprecedented historical dataset as an attempt to model seasonal (winter and summer air temperatures in pre-instrumental time (back to 1500. Combining information derived from proxy documentary data and large-scale simulation, a statistical methodology in the form of multiscale-temperature regression (MTR-model was developed to adapt larger-scale estimations to the sub-regional temperature pattern. The modelled response lacks essentially of autocorrelations among the residuals (marginal or any significance in the Durbin-Watson statistic, and agrees well with the independent data from the validation sample (Nash-Sutcliffe efficiency coefficient >0.60. The advantage of the approach is not merely increased accuracy in estimation. Rather, it relies on the ability to extract (and exploit the right information to replicate coherent temperature series in historical times.
Maximum likelihood polynomial regression for robust speech recognition
LU Yong; WU Zhenyang
2011-01-01
The linear hypothesis is the main disadvantage of maximum likelihood linear re- gression （MLLR）. This paper applies the polynomial regression method to model adaptation and establishes a nonlinear model adaptation algorithm using maximum likelihood polyno
J. Martínez-Fernández
2013-02-01
Full Text Available Humans are responsible for most forest fires in Europe, but anthropogenic factors behind these events are still poorly understood. We tried to identify the driving factors of human-caused fire occurrence in Spain by applying two different statistical approaches. Firstly, assuming stationary processes for the whole country, we created models based on multiple linear regression and binary logistic regression to find factors associated with fire density and fire presence, respectively. Secondly, we used geographically weighted regression (GWR to better understand and explore the local and regional variations of those factors behind human-caused fire occurrence.
The number of human-caused fires occurring within a 25-yr period (1983–2007 was computed for each of the 7638 Spanish mainland municipalities, creating a binary variable (fire/no fire to develop logistic models, and a continuous variable (fire density to build standard linear regression models. A total of 383 657 fires were registered in the study dataset. The binary logistic model, which estimates the probability of having/not having a fire, successfully classified 76.4% of the total observations, while the ordinary least squares (OLS regression model explained 53% of the variation of the fire density patterns (adjusted R^{2} = 0.53. Both approaches confirmed, in addition to forest and climatic variables, the importance of variables related with agrarian activities, land abandonment, rural population exodus and developmental processes as underlying factors of fire occurrence.
For the GWR approach, the explanatory power of the GW linear model for fire density using an adaptive bandwidth increased from 53% to 67%, while for the GW logistic model the correctly classified observations improved only slightly, from 76.4% to 78.4%, but significantly according to the corrected Akaike Information Criterion (AIC_{c}, from 3451.19 to 3321.19. The results from
STATISTICAL INFERENCES FOR VARYING-COEFFICINT MODELS BASED ON LOCALLY WEIGHTED REGRESSION TECHNIQUE
梅长林; 张文修; 梁怡
2001-01-01
Some fundamental issues on statistical inferences relating to varying-coefficient regression models are addressed and studied. An exact testing procedure is proposed for checking the goodness of fit of a varying-coefficient model fired by the locally weighted regression technique versus an ordinary linear regression model. Also, an appropriate statistic for testing variation of model parameters over the locations where the observations are collected is constructed and a formal testing approach which is essential to exploring spatial non-stationarity in geography science is suggested.
Tan, Qihua; Bathum, L; Christiansen, L;
2003-01-01
In this paper, we apply logistic regression models to measure genetic association with human survival for highly polymorphic and pleiotropic genes. By modelling genotype frequency as a function of age, we introduce a logistic regression model with polytomous responses to handle the polymorphic...... situation. Genotype and allele-based parameterization can be used to investigate the modes of gene action and to reduce the number of parameters, so that the power is increased while the amount of multiple testing minimized. A binomial logistic regression model with fractional polynomials is used to capture...
Schilling, K.E.; Wolter, C.F.
2005-01-01
Nineteen variables, including precipitation, soils and geology, land use, and basin morphologic characteristics, were evaluated to develop Iowa regression models to predict total streamflow (Q), base flow (Qb), storm flow (Qs) and base flow percentage (%Qb) in gauged and ungauged watersheds in the state. Discharge records from a set of 33 watersheds across the state for the 1980 to 2000 period were separated into Qb and Qs. Multiple linear regression found that 75.5 percent of long term average Q was explained by rainfall, sand content, and row crop percentage variables, whereas 88.5 percent of Qb was explained by these three variables plus permeability and floodplain area variables. Qs was explained by average rainfall and %Qb was a function of row crop percentage, permeability, and basin slope variables. Regional regression models developed for long term average Q and Qb were adapted to annual rainfall and showed good correlation between measured and predicted values. Combining the regression model for Q with an estimate of mean annual nitrate concentration, a map of potential nitrate loads in the state was produced. Results from this study have important implications for understanding geomorphic and land use controls on streamflow and base flow in Iowa watersheds and similar agriculture dominated watersheds in the glaciated Midwest. (JAWRA) (Copyright ?? 2005).
Prediction of the result in race walking using regularized regression models
Krzysztof Przednowek
2013-04-01
Full Text Available The following paper presents the use of regularized linear models as tools to optimize training process. The models were calculated by using data collected from race-walkers' training events. The models used predict the outcomes over a 3 km race and following a prescribed training plan. The material included a total of 122 training patterns made by 21 players. The methods of analysis include: classical model of OLS regression, ridge regression, LASSO regression and elastic net regression. In order to compare and choose the best method a cross-validation of the extit{leave-one-out} was used. All models were calculated using R language with additional packages. The best model was determined by the LASSO method which generates an error of about 26 seconds. The method has simplified the structure of the model by eliminating 5 out of 18 predictors.
Analysis of dental caries using generalized linear and count regression models
Javali M. Phil
2013-11-01
Full Text Available Generalized linear models (GLM are generalization of linear regression models, which allow fitting regression models to response data in all the sciences especially medical and dental sciences that follow a general exponential family. These are flexible and widely used class of such models that can accommodate response variables. Count data are frequently characterized by overdispersion and excess zeros. Zero-inflated count models provide a parsimonious yet powerful way to model this type of situation. Such models assume that the data are a mixture of two separate data generation processes: one generates only zeros, and the other is either a Poisson or a negative binomial data-generating process. Zero inflated count regression models such as the zero-inflated Poisson (ZIP, zero-inflated negative binomial (ZINB regression models have been used to handle dental caries count data with many zeros. We present an evaluation framework to the suitability of applying the GLM, Poisson, NB, ZIP and ZINB to dental caries data set where the count data may exhibit evidence of many zeros and over-dispersion. Estimation of the model parameters using the method of maximum likelihood is provided. Based on the Vuong test statistic and the goodness of fit measure for dental caries data, the NB and ZINB regression models perform better than other count regression models.
A Negative Binomial Regression Model for Accuracy Tests
Hung, Lai-Fa
2012-01-01
Rasch used a Poisson model to analyze errors and speed in reading tests. An important property of the Poisson distribution is that the mean and variance are equal. However, in social science research, it is very common for the variance to be greater than the mean (i.e., the data are overdispersed). This study embeds the Rasch model within an…
Additive Intensity Regression Models in Corporate Default Analysis
Lando, David; Medhat, Mamdouh; Nielsen, Mads Stenbo;
2013-01-01
We consider additive intensity (Aalen) models as an alternative to the multiplicative intensity (Cox) models for analyzing the default risk of a sample of rated, nonfinancial U.S. firms. The setting allows for estimating and testing the significance of time-varying effects. We use a variety of mo...
A Noncentral "t" Regression Model for Meta-Analysis
Camilli, Gregory; de la Torre, Jimmy; Chiu, Chia-Yi
2010-01-01
In this article, three multilevel models for meta-analysis are examined. Hedges and Olkin suggested that effect sizes follow a noncentral "t" distribution and proposed several approximate methods. Raudenbush and Bryk further refined this model; however, this procedure is based on a normal approximation. In the current research literature, this…
CONSISTENCY OF LS ESTIMATOR IN SIMPLE LINEAR EV REGRESSION MODELS
Liu Jixue; Chen Xiru
2005-01-01
Consistency of LS estimate of simple linear EV model is studied. It is shown that under some common assumptions of the model, both weak and strong consistency of the estimate are equivalent but it is not so for quadratic-mean consistency.
Misspecified poisson regression models for large-scale registry data
Grøn, Randi; Gerds, Thomas A; Andersen, Per K
2016-01-01
working models that are then likely misspecified. To support and improve conclusions drawn from such models, we discuss methods for sensitivity analysis, for estimation of average exposure effects using aggregated data, and a semi-parametric bootstrap method to obtain robust standard errors. The methods...
A New Mobile Learning Adaptation Model
Mohamd Hassan Hassan; Jehad Al-Sadi
2009-01-01
This paper introduces a new model for m- Learning context adaptation due to the need of utilizing mobile technology in education. Mobile learning; m-Learning for short; in considered to be one of the hottest topics in the educational community, many researches had been done to conceptualize this new form of learning. We are presenting a promising design for a model to adapt the learning content in mobile learning applications in order to match the learner context, preferences and the educatio...
Linking Simple Economic Theory Models and the Cointegrated Vector AutoRegressive Model
Møller, Niels Framroze
This paper attempts to clarify the connection between simple economic theory models and the approach of the Cointegrated Vector-Auto-Regressive model (CVAR). By considering (stylized) examples of simple static equilibrium models, it is illustrated in detail, how the theoretical model and its stru....... Further fundamental extensions and advances to more sophisticated theory models, such as those related to dynamics and expectations (in the structural relations) are left for future papers......This paper attempts to clarify the connection between simple economic theory models and the approach of the Cointegrated Vector-Auto-Regressive model (CVAR). By considering (stylized) examples of simple static equilibrium models, it is illustrated in detail, how the theoretical model and its...... demonstrated how other controversial hypotheses such as Rational Expectations can be formulated directly as restrictions on the CVAR-parameters. A simple example of a "Neoclassical synthetic" AS-AD model is also formulated. Finally, the partial- general equilibrium distinction is related to the CVAR as well...
ANALYSIS OF THE FINANCIAL PERFORMANCES OF THE FIRM, BY USING THE MULTIPLE REGRESSION MODEL
Constantin Anghelache; Ioan Partachi
2011-01-01
The information achieved through the use of simple linear regression are not always enough to characterize the evolution of an economic phenomenon and, furthermore, to identify its possible future evolution. To remedy these drawbacks, the special literature includes multiple regression models, in which the evolution of the dependant variable is defined depending on two or more factorial variables.
Testing normality in bivariate probit models : a simple artificial regression based LM test
Anthony Murphy
1994-01-01
A simple and convenient LM test of normality in the bivariate probit model is derived. The alternative hypothesis is based on a form of truncated Gram Charlier Type series. The LM test may be calculated as an artificial regression. However, the proposed artificial regression does not use the outer product gradient form. Thus it is likely to perform reasonably well in small samples.
Thomas, Michael S. C.; Knowland, Victoria C. P.; Karmiloff-Smith, Annette
2011-01-01
Loss of previously established behaviors in early childhood constitutes a markedly atypical developmental trajectory. It is found almost uniquely in autism and its cause is currently unknown (Baird et al., 2008). We present an artificial neural network model of developmental regression, exploring the hypothesis that regression is caused by…
Image segmentation based on adaptive mixture model
As an important research field, image segmentation has attracted considerable attention. The classical geodesic active contour (GAC) model tends to produce fake edges in smooth regions, while the Chan–Vese (CV) model cannot effectively detect images with holes and obtain the precise boundary. To address the above issues, this paper proposes an adaptive mixture model synthesizing the GAC model and the CV model by a weight function. According to image characteristics, the proposed model can adaptively adjust the weight function. In this way, the model exploits the advantages of the GAC model in regions with rich textures or edges, while exploiting the advantages of the CV model in smooth local regions. Moreover, the proposed model is extended to vector-valued images. Through experiments, it is verified that the proposed model obtains better results than the traditional models. (paper)
Estimation of Mediation Effects for Zero-inflated Regression Models
Wang, Wei; Albert, Jeffrey M.
2012-01-01
The goal of mediation analysis is to identify and explicate the mechanism that underlies a relationship between a risk factor and an outcome via an intermediate variable (mediator). In this paper, we consider the estimation of mediation effects in zero-inflated (ZI) models intended to accommodate `extra' zeros in count data. Focusing on the ZI negative binomial (ZINB) models, we provide a mediation formula approach to estimate the (overall) mediation effect in the standard two-stage mediation...
A Multi-objective Exploratory Procedure for Regression Model Selection
Sinha, Ankur; Malo, Pekka; Kuosmanen, Timo
2012-01-01
Variable selection is recognized as one of the most critical steps in statistical modeling. The problems encountered in engineering and social sciences are commonly characterized by over-abundance of explanatory variables, non-linearities and unknown interdependencies between the regressors. An added difficulty is that the analysts may have little or no prior knowledge on the relative importance of the variables. To provide a robust method for model selection, this paper introduces the Multi-...
Evaporation modeling with multiple linear regression techniques– a review
Parameshwar Sidramappa Shirgure
2013-01-01
Evaporation is influenced by number of agro-meteorological parameters and one of the integral components of the hydrological cycle and. Usually, estimates of evaporation are needed in a wide array of problems in agriculture, hydrology, agronomy, forestry and land resources planning, such as water balance computation, irrigation management, crop yield forecasting model, river flow forecasting, ecosystem modeling. Irrigation can substantially increase crop yields, but again the scheduling of th...
Highlights: • Developed hourly-indexed ARX models for robust cooling-load forecasting. • Proposed a two-stage weighted least-squares regression approach. • Considered the effect of outliers as well as trend of cooling load and weather patterns. • Included higher order terms and day type patterns in the forecasting models. • Demonstrated better accuracy compared with some ARX and ANN models. - Abstract: This paper presents a robust hourly cooling-load forecasting method based on time-indexed autoregressive with exogenous inputs (ARX) models, in which the coefficients are estimated through a two-stage weighted least squares regression. The prediction method includes a combination of two separate time-indexed ARX models to improve prediction accuracy of the cooling load over different forecasting periods. The two-stage weighted least-squares regression approach in this study is robust to outliers and suitable for fast and adaptive coefficient estimation. The proposed method is tested on a large-scale central cooling system in an academic institution. The numerical case studies show the proposed prediction method performs better than some ANN and ARX forecasting models for the given test data set
Galimberti, Giuliano; Manisi, Annamaria; Soffritti, Gabriele
2015-01-01
A general framework for dealing with both linear regression and clustering problems is described. It includes Gaussian clusterwise linear regression analysis with random covariates and cluster analysis via Gaussian mixture models with variable selection. It also admits a novel approach for detecting multiple clusterings from possibly correlated sub-vectors of variables, based on a model defined as the product of conditionally independent Gaussian mixture models. A necessary condition for the ...
The Non-linear Ripple Effect of Housing Prices in Taiwan: A Smooth Transition Regressive Model
M.S. Chien
2013-01-01
Being different from past research of regional housing prices, this paper employs smooth transition regression model, derived in Teräsvirta (1998), to investigate ripple effects among four regional house prices in Taiwan. The aim of this paper is to test whether a smooth transition regression model, which is capable of capturing this non-linear behaviour, can show a better characterisation of regional housing prices than a linear model. This empirical analysis applies the four regional house ...
Kürşad ÖZKAN
2009-01-01
The purpose of the paper is to determine a model, the soil field water capacity in accordance with soil texture. At first, multiple regression analysis has been used to determine a model. But, it was found multiple relation problem in the model because of strong relationships among the independence variables. Therefore, principle component regression analysis was applied and the problem was solved. It is known that sand, dust and clay contents play important roles on field water capacity. But...
An Adaptive-to-Model Test for Parametric Single-Index Errors-in-Variables Models
Koul, Hira L.; Xie, Chuanlong; Zhu, Lixing
2016-01-01
This paper provides some useful tests for fitting a parametric single-index regression model when covariates are measured with error and validation data is available. We propose two tests whose consistency rates do not depend on the dimension of the covariate vector when an adaptive-to-model strategy is applied. One of these tests has a bias term that becomes arbitrarily large with increasing sample size but its asymptotic variance is smaller, and the other is asymptotically unbiased with lar...
Evaporation modeling with multiple linear regression techniques– a review
Parameshwar Sidramappa Shirgure
2013-01-01
Full Text Available Evaporation is influenced by number of agro-meteorological parameters and one of the integral components of the hydrological cycle and. Usually, estimates of evaporation are needed in a wide array of problems in agriculture, hydrology, agronomy, forestry and land resources planning, such as water balance computation, irrigation management, crop yield forecasting model, river flow forecasting, ecosystem modeling. Irrigation can substantially increase crop yields, but again the scheduling of the water application is usually based on evaporation estimates. Numerous investigators developed models for estimation of evaporation. The interrelated meteorological factors having a major influence on evaporation have been incorporated into various formulae for estimating evaporation. Unfortunately, reliable estimates of evaporation are extremely difficult to obtain because of complex interactions between the components of the land-plant-atmosphere system. In hot climate, the loss of water by evaporation from rivers, canals and open-water bodies is a vital factor as evaporation takes a significant portion of all water supplies. Even in humid areas, evaporation loss is significant, although the cumulative precipitation tends to mask it due to which it is ordinarily not recognized except during rainless period. Therefore, the need for reliable models for quantifying evaporation losses from increasingly scarce water resources is greater than ever before. Accurate estimation of evaporation is fundamental for effective management of water resources. The evaporation models using MLR techniques is discussed her in details.
Deoclécio Domingos Garbuglio
2007-02-01
Full Text Available O objetivo deste trabalho foi verificar possíveis divergências entre os resultados obtidos nas avaliações da adaptabilidade de 27 genótipos de milho (Zea mays L., e na estratificação de 22 ambientes no Estado do Paraná, por meio de técnicas baseadas na análise de fatores e regressão bissegmentada. As estratificações ambientais foram feitas por meio do método tradicional e por análise de fatores, aliada ao porcentual da porção simples da interação GxA (PS%. As análises de adaptabilidade foram realizadas por meio de regressão bissegmentada e análise de fatores. Pela análise de regressão bissegmentada, os genótipos estudados apresentaram alta performance produtiva; no entanto, não foi constatado o genótipo considerado como ideal. A adaptabilidade dos genótipos, analisada por meio de plotagens gráficas, apresentou respostas diferenciadas quando comparada à regressão bissegmentada. A análise de fatores mostrou-se eficiente nos processos de estratificação ambiental e adaptabilidade dos genótipos de milho.The objective of this work was to verify possible divergences among results obtained on adaptability evaluations of 27 maize genotypes (Zea mays L., and on stratification of 22 environments on Paraná State, Brazil, through techniques of factor analysis and bissegmented regression. The environmental stratifications were made through the traditional methodology and by factor analysis, allied to the percentage of the simple portion of GxE interaction (PS%. Adaptability analyses were carried out through bissegmented regression and factor analysis. By the analysis of bissegmented regression, studied genotypes had presented high productive performance; however, it was not evidenced the genotype considered as ideal. The adaptability of the genotypes, analyzed through graphs, presented different answers when compared to bissegmented regression. Factor analysis was efficient in the processes of environment stratification and
Koon, Sharon; Petscher, Yaacov
2015-01-01
The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules…
Sykas, Dimitris; Karathanassi, Vassilia
2015-06-01
This paper presents a new method for automatically determining the optimum regression model, which enable the estimation of a parameter. The concept lies on the combination of k spectral pre-processing algorithms (SPPAs) that enhance spectral features correlated to the desired parameter. Initially a pre-processing algorithm uses as input a single spectral signature and transforms it according to the SPPA function. A k-step combination of SPPAs uses k preprocessing algorithms serially. The result of each SPPA is used as input to the next SPPA, and so on until the k desired pre-processed signatures are reached. These signatures are then used as input to three different regression methods: the Normalized band Difference Regression (NDR), the Multiple Linear Regression (MLR) and the Partial Least Squares Regression (PLSR). Three Simple Genetic Algorithms (SGAs) are used, one for each regression method, for the selection of the optimum combination of k SPPAs. The performance of the SGAs is evaluated based on the RMS error of the regression models. The evaluation not only indicates the selection of the optimum SPPA combination but also the regression method that produces the optimum prediction model. The proposed method was applied on soil spectral measurements in order to predict Soil Organic Matter (SOM). In this study, the maximum value assigned to k was 3. PLSR yielded the highest accuracy while NDR's accuracy was satisfactory compared to its complexity. MLR method showed severe drawbacks due to the presence of noise in terms of collinearity at the spectral bands. Most of the regression methods required a 3-step combination of SPPAs for achieving the highest performance. The selected preprocessing algorithms were different for each regression method since each regression method handles with a different way the explanatory variables.
Hidden Connections between Regression Models of Strain-Gage Balance Calibration Data
Ulbrich, Norbert
2013-01-01
Hidden connections between regression models of wind tunnel strain-gage balance calibration data are investigated. These connections become visible whenever balance calibration data is supplied in its design format and both the Iterative and Non-Iterative Method are used to process the data. First, it is shown how the regression coefficients of the fitted balance loads of a force balance can be approximated by using the corresponding regression coefficients of the fitted strain-gage outputs. Then, data from the manual calibration of the Ames MK40 six-component force balance is chosen to illustrate how estimates of the regression coefficients of the fitted balance loads can be obtained from the regression coefficients of the fitted strain-gage outputs. The study illustrates that load predictions obtained by applying the Iterative or the Non-Iterative Method originate from two related regression solutions of the balance calibration data as long as balance loads are given in the design format of the balance, gage outputs behave highly linear, strict statistical quality metrics are used to assess regression models of the data, and regression model term combinations of the fitted loads and gage outputs can be obtained by a simple variable exchange.
BAHADUR ASYMPTOTIC EFFICIENCY IN A SEMIPARAMETRIC REGRESSION MODEL
LIANGHUA; CHENGPING
1994-01-01
The authors give MLE θ1ML of θ1 in the model Y=θ1+g(T)-σ,then consider Bahadur asymptotic efficiency of θ1ML,where T and ε are independent,g is unknown,ε～φ(-) is known with mean 0 and variance σ2.
Vatcheva, KP; Lee, M; McCormick, JB; Rahbar, MH
2016-01-01
Objective To demonstrate the adverse impact of ignoring statistical interactions in regression models used in epidemiologic studies. Study design and setting Based on different scenarios that involved known values for coefficient of the interaction term in Cox regression models we generated 1000 samples of size 600 each. The simulated samples and a real life data set from the Cameron County Hispanic Cohort were used to evaluate the effect of ignoring statistical interactions in these models. Results Compared to correctly specified Cox regression models with interaction terms, misspecified models without interaction terms resulted in up to 8.95 fold bias in estimated regression coefficients. Whereas when data were generated from a perfect additive Cox proportional hazards regression model the inclusion of the interaction between the two covariates resulted in only 2% estimated bias in main effect regression coefficients estimates, but did not alter the main findings of no significant interactions. Conclusions When the effects are synergic, the failure to account for an interaction effect could lead to bias and misinterpretation of the results, and in some instances to incorrect policy decisions. Best practices in regression analysis must include identification of interactions, including for analysis of data from epidemiologic studies.
Modeling Pan Evaporation for Kuwait by Multiple Linear Regression
Jaber Almedeij
2012-01-01
Evaporation is an important parameter for many projects related to hydrology and water resources systems. This paper constitutes the first study conducted in Kuwait to obtain empirical relations for the estimation of daily and monthly pan evaporation as functions of available meteorological data of temperature, relative humidity, and wind speed. The data used here for the modeling are daily measurements of substantial continuity coverage, within a period of 17 years between January 1993 and D...
Estimation of mediation effects for zero-inflated regression models.
Wang, Wei; Albert, Jeffrey M
2012-11-20
The goal of mediation analysis is to identify and explicate the mechanism that underlies a relationship between a risk factor and an outcome via an intermediate variable (mediator). In this paper, we consider the estimation of mediation effects in zero-inflated (ZI) models intended to accommodate 'extra' zeros in count data. Focusing on the ZI negative binomial models, we provide a mediation formula approach to estimate the (overall) mediation effect in the standard two-stage mediation framework under a key sequential ignorability assumption. We also consider a novel decomposition of the overall mediation effect for the ZI context using a three-stage mediation model. Estimation of the components of the overall mediation effect requires an assumption involving the joint distribution of two counterfactuals. Simulation study results demonstrate low bias of mediation effect estimators and close-to-nominal coverage probability of confidence intervals. We also modify the mediation formula method by replacing 'exact' integration with a Monte Carlo integration method. The method is applied to a cohort study of dental caries in very low birth weight adolescents. For overall mediation effect estimation, sensitivity analysis was conducted to quantify the degree to which key assumption must be violated to reverse the original conclusion. PMID:22714572
Bayesian regression model for seasonal forecast of precipitation over Korea
Jo, Seongil; Lim, Yaeji; Lee, Jaeyong; Kang, Hyun-Suk; Oh, Hee-Seok
2012-08-01
In this paper, we apply three different Bayesian methods to the seasonal forecasting of the precipitation in a region around Korea (32.5°N-42.5°N, 122.5°E-132.5°E). We focus on the precipitation of summer season (June-July-August; JJA) for the period of 1979-2007 using the precipitation produced by the Global Data Assimilation and Prediction System (GDAPS) as predictors. Through cross-validation, we demonstrate improvement for seasonal forecast of precipitation in terms of root mean squared error (RMSE) and linear error in probability space score (LEPS). The proposed methods yield RMSE of 1.09 and LEPS of 0.31 between the predicted and observed precipitations, while the prediction using GDAPS output only produces RMSE of 1.20 and LEPS of 0.33 for CPC Merged Analyzed Precipitation (CMAP) data. For station-measured precipitation data, the RMSE and LEPS of the proposed Bayesian methods are 0.53 and 0.29, while GDAPS output is 0.66 and 0.33, respectively. The methods seem to capture the spatial pattern of the observed precipitation. The Bayesian paradigm incorporates the model uncertainty as an integral part of modeling in a natural way. We provide a probabilistic forecast integrating model uncertainty.
Correlated Component Regression: Application On Model To Determination Of Dna Damage
Sadi ELASAN
2016-04-01
Full Text Available Objective: The number of explanatory variables, where the sample size is approaching or passing the sample size, the high-dimensional data sets in other words, the estimated regression model is one of the important questions of how to increase the reliability. One of the new methods that can be used, Correlated Component Regression Centerproduct. In this study, providing information about the associated Component Regression is intended to be introduced along with an application. Material and Methods: Regression analysis in the scientific work to be done, the number of explanatory variables; sample width when approached or when the sample width late (in high-dimensional data sets, an estimate will be made with standard regression analysis method, the estimated coefficients, multiple connections (to be singular of covariance matrix varies due. As an alternative to this, "Associated Component Regression" (CCR can help solving the problem. If there are continuous response variables, Iber-linear regression and binary if there is, CCR is used in logistic regression, CCR is available on-Cox regression in survival data. The method uses K number associated components. These related components, as determined by the researchers can be determined by the program. Results: In this study sample size is small, the correlation coefficients are moderate and numbers of variables are high. Therefore, we performed m-fold cross-validation test due to extreme overfit of the saturated regression model. Conclusion: Encountered in regression analysis; multiple connections, CCR can be used for solving problems such as lack of or excessive integration and said can capture higher power.
An adaptive model-free fuzzy controller
In this paper, we present an adaptive, stable fuzzy controller whose parameters are optimized via a genetic algorithm. The controller model is capable of building itself on the basis of measured plant data and then of adapting to new dynamics. The stability of the overall system, made up of the plant and the controller, is guaranteed by Lyapunov's theory. As a case study, the stable adaptive fuzzy controller is employed to drive the narrow water level of a simulated Steam Generator (SG) to a desired reference trajectory. The numerical results confirm that the controller bears good performances in terms of small oscillations and fast settling time even in presence of external disturbances. (authors)
Bae, Gihyun; Huh, Hoon; Park, Sungho
This paper deals with a regression model for light weight and crashworthiness enhancement design of automotive parts in frontal car crash. The ULSAB-AVC model is employed for the crash analysis and effective parts are selected based on the amount of energy absorption during the crash behavior. Finite element analyses are carried out for designated design cases in order to investigate the crashworthiness and weight according to the material and thickness of main energy absorption parts. Based on simulations results, a regression analysis is performed to construct a regression model utilized for light weight and crashworthiness enhancement design of automotive parts. An example for weight reduction of main energy absorption parts demonstrates the validity of a regression model constructed.
Estimating transmitted waves of floating breakwater using support vector regression model
Mandal, S.; Hegde, A.V.; Kumar, V.; Patil, S.G.
is to obtain the optimum width of the breakwater (relative breakwater width) along the direction of wave propagation. In this present work the performance of support vector regression model for predicting the wave transmission coefficient of floating pipe...
Tao Hu; Heng-jian Cui; Xing-wei Tong
2009-01-01
This article considers a semiparametric varying-coefficient partially linear regression model with current status data. The semiparametric varying-coefficient partially linear regression model which is a gen-eralization of the partially linear regression model and varying-coefficient regression model that allows one to explore the possibly nonlinear effect of a certain covariate on the response variable. A Sieve maximum likelihood estimation method is proposed and the asymptotic properties of the proposed estimators are discussed. Under some mild conditions, the estimators are shown to be strongly consistent. The convergence rate of the estima-tor for the unknown smooth function is obtained and the estimator for the unknown parameter is shown to be asymptotically efficient and normally distributed. Simulation studies are conducted to examine the small-sample properties of the proposed estimates and a real dataset is used to illustrate our approach.
U.S. Geological Survey, Department of the Interior — This dataset was created using the PRISM (Parameter-elevation Regressions on Independent Slopes Model) climate mapping system, developed by Dr. Christopher Daly,...
Aboveground biomass and carbon stocks modelling using non-linear regression model
Ain Mohd Zaki, Nurul; Abd Latif, Zulkiflee; Nazip Suratman, Mohd; Zainee Zainal, Mohd
2016-06-01
Aboveground biomass (AGB) is an important source of uncertainty in the carbon estimation for the tropical forest due to the variation biodiversity of species and the complex structure of tropical rain forest. Nevertheless, the tropical rainforest holds the most extensive forest in the world with the vast diversity of tree with layered canopies. With the usage of optical sensor integrate with empirical models is a common way to assess the AGB. Using the regression, the linkage between remote sensing and a biophysical parameter of the forest may be made. Therefore, this paper exemplifies the accuracy of non-linear regression equation of quadratic function to estimate the AGB and carbon stocks for the tropical lowland Dipterocarp forest of Ayer Hitam forest reserve, Selangor. The main aim of this investigation is to obtain the relationship between biophysical parameter field plots with the remotely-sensed data using nonlinear regression model. The result showed that there is a good relationship between crown projection area (CPA) and carbon stocks (CS) with Pearson Correlation (p order to quantify the AGB and carbon stocks for a larger sample area of the lowland Dipterocarp forest.
Shahab Karimi
2014-01-01
Full Text Available In this study, the effects of ratios of dolomite, base/acid, silica, SiO2/Al2O3, and Fe2O3/CaO, base and acid oxides, and 11 oxides (SiO2, Al2O3, CaO, MgO, MnO, Na2O, K2O, Fe2O3, TiO2, P2O5, and SO3 on ash fusion temperatures for 1040 US coal samples from 12 states were evaluated using regression and adaptive neurofuzzy inference system (ANFIS methods. Different combinations of independent variables were examined to predict ash fusion temperatures in the multivariable procedure. The combination of the “11 oxides + (Base/Acid + Silica ratio” was the best predictor. Correlation coefficients (R2 of 0.891, 0.917, and 0.94 were achieved using nonlinear equations for the prediction of initial deformation temperature (IDT, softening temperature (ST, and fluid temperature (FT, respectively. The mentioned “best predictor” was used as input to the ANFIS system as well, and the correlation coefficients (R2 of the prediction were enhanced to 0.97, 0.98, and 0.99 for IDT, ST, and FT, respectively. The prediction precision that was achieved in this work exceeded that reported in previously published works.
Use of Pollutant Load Regression Models with Various Sampling Frequencies for Annual Load Estimation
Youn Shik Park; Bernie A. Engel
2014-01-01
Water quality data are collected by various sampling frequencies, and the data may not be collected at a high frequency nor over the range of streamflow conditions. Therefore, regression models are used to estimate pollutant data for days on which water quality data were not measured. Pollutant load regression models were evaluated with six sampling frequencies for daily nitrogen, phosphorus, and sediment data. Annual pollutant load estimates exhibited various behaviors by sampling frequency...
Influence diagnostics in exponentiated-Weibull regression models with censored data
Edwin M. M. Ortega; Cancho, Vicente G.; Bolfarine, Heleno
2006-01-01
Diagnostic methods have been an important tool in regression analysis to detect anomalies, such as departures from the error assumptions and the presence of outliers and influential observations with the fitted models. The literature provides plenty of approaches for detecting outlying or influential observations in data sets. In this paper, we follow the local influence approach (Cook 1986) in detecting influential observations with exponentiated-Weibull regression models. The relevance o...
REGRESSION MODELS ON DESIGN AND OPERATIONAL PARAMETERS OF SLOW SAND FILTERS
Aydin, Mehmet Emin
1996-01-01
The aim of this research was to obtain a regression model that relates the design and operational parameters and inflow water quality for slow sand filters. Therefore, three laboratory scale slow sand filters with sands of different effective diameters were operated at three different temperatures and at five flow rates. Stream water was used as inflow. Small quantities of settled sewage were added to the feed water. From the data produced, 72 regression models were developed relating inflow ...
A note on the estimation of asset pricing models using simple regression betas
Kan, Raymond; Robotti, Cesare
2009-01-01
Since Black, Jensen, and Scholes (1972) and Fama and MacBeth (1973), the two-pass cross-sectional regression (CSR) methodology has become the most popular tool for estimating and testing beta asset pricing models. In this paper, we focus on the case in which simple regression betas are used as regressors in the second-pass CSR. Under general distributional assumptions, we derive asymptotic standard errors of the risk premia estimates that are robust to model misspecification. When testing whe...
On asymptotics of t-type regression estimation in multiple linear model
CUI Hengjian
2004-01-01
We consider a robust estimator (t-type regression estimator) of multiple linear regression model by maximizing marginal likelihood of a scaled t-type error t-distribution.The marginal likelihood can also be applied to the de-correlated response when the withinsubject correlation can be consistently estimated from an initial estimate of the model based on the independent working assumption. This paper shows that such a t-type estimator is consistent.
The empirical likelihood goodness-of-fit test for regression model
Li-xing ZHU; Yong-song QIN; Wang-li XU
2007-01-01
Goodness-of-fit test for regression modes has received much attention in literature. In this paper, empirical likelihood (EL) goodness-of-fit tests for regression models including classical parametric and autoregressive (AR) time series models are proposed. Unlike the existing locally smoothing and globally smoothing methodologies, the new method has the advantage that the tests are self-scale invariant and that the asymptotic null distribution is chi-squared. Simulations are carried out to illustrate the methodology.
BUDIMAN; ENDANG ARISOESILANINGSIH
2012-01-01
Budiman, Arisoesilaningsih E. 2012. Predictive model of Amorphophallus muelleri growth in some agroforestry in East Java by multiple regression analysis. Biodiversitas 13: 18-22. The aims of this research was to determine the multiple regression models of vegetative and corm growth of Amorphophallus muelleri Blume in some age variations and habitat conditions of agroforestry in East Java. Descriptive exploratory research method was conducted by systematic random sampling at five agroforestrie...
A Financial Distress Pre-Warning Study by Fuzzy Regression Model of TSE-Listed Companies
Wen-Ying Cheng; Ender Su; Sheng-Jung Li
2006-01-01
The purpose of this paper is to construct a financial distress pre-warning model for investors and risk supervisors. Through the Securities and Futures Institute Network, we collect the financial data of the electronic companies listing on the Taiwan Security Exchange (TSE) from 1998 to 2005. By binary logistic regression test, we found that financial statement ratios show significant difference in different financial stages. On the other hand, using fuzzy regression model, we construct a rat...
A multiple regression model for the Ft. Calhoun reactor coolant pump system
Multiple regression analysis is one of the most widely used of all statistical tools. In this research paper, we introduce an application of fitting a multiple regression model on reactor coolant pump (RCP) data. The primary purpose of this research is to correlate the results obtained by Design of Experiments (DOE) and regression model fitting. Also, the idea behind using regression model is to gain more detailed information in the RCP data than provided by DOE. In engineering science, statistical quality control techniques have traditionally been applied to control manufacturing processes. An application to commercial nuclear power plant maintenance and control is presented that can greatly improve plant safety and reliability. The result obtained show that six out of ten parameters are under control specification limits and four parameters are not in the state of statistical control. The four parameters that are out of control adversely affect the regression model fitting and the final prediction equation, thereby, does not predict accurate response for the future. The analysis concludes that in order to fit a best regression model, one has to remove all out of control points from the data set, including dropping a variable from the model to have better prediction of the response variable. (author)
An adaptive stochastic model for financial markets
An adaptive stochastic model is introduced to simulate the behavior of real asset markets. The model adapts itself by changing its parameters automatically on the basis of the recent historical data. The basic idea underlying the model is that a random variable uniformly distributed within an interval with variable extremes can replicate the histograms of asset returns. These extremes are calculated according to the arrival of new market information. This adaptive model is applied to the daily returns of three well-known indices: Ibex35, Dow Jones and Nikkei, for three complete years. The model reproduces the histograms of the studied indices as well as their autocorrelation structures. It produces the same fat tails and the same power laws, with exactly the same exponents, as in the real indices. In addition, the model shows a great adaptation capability, anticipating the volatility evolution and showing the same volatility clusters observed in the assets. This approach provides a novel way to model asset markets with internal dynamics which changes quickly with time, making it impossible to define a fixed model to fit the empirical observations.
Adaptive Partially Hidden Markov Models
Forchhammer, Søren Otto; Rasmussen, Tage
1996-01-01
Partially Hidden Markov Models (PHMM) have recently been introduced. The transition and emission probabilities are conditioned on the past. In this report, the PHMM is extended with a multiple token version. The different versions of the PHMM are applied to bi-level image coding.......Partially Hidden Markov Models (PHMM) have recently been introduced. The transition and emission probabilities are conditioned on the past. In this report, the PHMM is extended with a multiple token version. The different versions of the PHMM are applied to bi-level image coding....
Using the Logistic Regression model in supporting decisions of establishing marketing strategies
Cristinel CONSTANTIN
2015-12-01
Full Text Available This paper is about an instrumental research regarding the using of Logistic Regression model for data analysis in marketing research. The decision makers inside different organisation need relevant information to support their decisions regarding the marketing strategies. The data provided by marketing research could be computed in various ways but the multivariate data analysis models can enhance the utility of the information. Among these models we can find the Logistic Regression model, which is used for dichotomous variables. Our research is based on explanation the utility of this model and interpretation of the resulted information in order to help practitioners and researchers to use it in their future investigations
Demenais, F M; Laing, A E; Bonney, G E
1992-01-01
Segregation analysis of discrete traits can be conducted by the classical mixed model and the recently introduced regressive models. The mixed model assumes an underlying liability to the disease, to which a major gene, a multifactorial component, and random environment contribute independently. Affected persons have a liability exceeding a threshold. The regressive logistic models assume that the logarithm of the odds of being affected is a linear function of major genotype effects, the phenotypes of older relatives, and other covariates. A formulation of the regressive models, based on an underlying liability model, has been recently proposed. The regression coefficients on antecedents are expressed in terms of the relevant familial correlations and a one-to-one correspondence with the parameters of the mixed model can thus be established. Computer simulations are conducted to evaluate the fit of the two formulations of the regressive models to the mixed model on nuclear families. The two forms of the class D regressive model provide a good fit to a generated mixed model, in terms of both hypothesis testing and parameter estimation. The simpler class A regressive model, which assumes that the outcomes of children depend solely on the outcomes of parents, is not robust against a sib-sib correlation exceeding that specified by the model, emphasizing testing class A against class D. The studies reported here show that if the true state of nature is that described by the mixed model, then a regressive model will do just as well. Moreover, the regressive models, allowing for more patterns of family dependence, provide a flexible framework to understand gene-environment interactions in complex diseases. PMID:1487139
Regression models tolerant to massively missing data: a case study in solar radiation nowcasting
I. Žliobaitė
2014-07-01
Full Text Available Statistical models for environmental monitoring strongly rely on automatic data acquisition systems, using various physical sensors. Often, sensor readings are missing for extended periods of time while model outputs need to be continuously available in real time. With a case study in solar radiation nowcasting, we investigate how to deal with massively missing data (around 50% of the time some data are unavailable in such situations. Our goal is to analyze the characteristics of missing data and recommend a strategy for deploying regression models, which would be robust to missing data in situations, where data are massively missing. We are after one model that performs well at all times, with and without data gaps. Due to the need to provide instantaneous outputs with minimum energy consumption for computing in the data streaming setting, we dismiss computationally demanding data imputation methods, and resort to a simple mean replacement. We use an established strategy for comparing different regression models, with the possibility of determining how many missing sensor readings can be tolerated before model outputs become obsolete. We experimentally analyze accuracies and robustness to missing data of seven linear regression models and recommend using regularized PCA regression. We recommend using our established guideline in training regression models, which themselves are robust to missing data.
Regression analysis understanding and building business and economic models using Excel
Wilson, J Holton
2012-01-01
The technique of regression analysis is used so often in business and economics today that an understanding of its use is necessary for almost everyone engaged in the field. This book will teach you the essential elements of building and understanding regression models in a business/economic context in an intuitive manner. The authors take a non-theoretical treatment that is accessible even if you have a limited statistical background. It is specifically designed to teach the correct use of regression, while advising you of its limitations and teaching about common pitfalls. This book describe
Evaluation of Regression Models of Balance Calibration Data Using an Empirical Criterion
Ulbrich, Norbert; Volden, Thomas R.
2012-01-01
An empirical criterion for assessing the significance of individual terms of regression models of wind tunnel strain gage balance outputs is evaluated. The criterion is based on the percent contribution of a regression model term. It considers a term to be significant if its percent contribution exceeds the empirical threshold of 0.05%. The criterion has the advantage that it can easily be computed using the regression coefficients of the gage outputs and the load capacities of the balance. First, a definition of the empirical criterion is provided. Then, it is compared with an alternate statistical criterion that is widely used in regression analysis. Finally, calibration data sets from a variety of balances are used to illustrate the connection between the empirical and the statistical criterion. A review of these results indicated that the empirical criterion seems to be suitable for a crude assessment of the significance of a regression model term as the boundary between a significant and an insignificant term cannot be defined very well. Therefore, regression model term reduction should only be performed by using the more universally applicable statistical criterion.
Intelligent CAD Methodology Research of Adaptive Modeling
ZHANG Weibo; LI Jun; YAN Jianrong
2006-01-01
The key to carry out ICAD technology is to establish the knowledge-based and wide rang of domains-covered product model. This paper put out a knowledge-based methodology of adaptive modeling. It is under the Ontology mind, using the Object-Oriented technology and being a knowledge-based model framework. It involves the diverse domains in product design and realizes the multi-domain modeling, embedding the relative information including standards, regulars and expert experience. To test the feasibility of the methodology, the research bonds of the automotive diaphragm spring clutch design and an adaptive clutch design model is established, using the knowledge-based modeling language-AML.
Reference model decomposition in direct adaptive control
Butler, H.; Honderd, G.; Amerongen, van, W.E.
1991-01-01
This paper introduces the method of reference model decomposition as a way to improve the robustness of model reference adaptive control systems (MRACs) with respect to unmodelled dynamics with a known structure. Such unmodelled dynamics occur when some of the nominal plant dynamics are purposely neglected in the controller design with the aim of keeping the controller order low. One of the effects of such undermodelling of the controller is a violation of the perfect model-matching condition...
S. Goyal
2012-03-01
Full Text Available This paper highlights the significance of computational intelligence models for predicting shelf life of processed cheese stored at 7-8 g.C. Linear Layer and Generalized Regression models were developed with input parameters: Soluble nitrogen, pH, Standard plate count, Yeast & mould count, Spores, and sensory score as output parameter. Mean Square Error, Root Mean Square Error, Coefficient of Determination and Nash - Sutcliffo Coefficient were used in order to compare the prediction ability of the models. The study revealed that Generalized Regression computational intelligence models are quite effective in predicting the shelf life of processed cheese stored at 7-8 g.C.
Profile-driven regression for modeling and runtime optimization of mobile networks
McClary, Dan; Syrotiuk, Violet; Kulahci, Murat
2010-01-01
Computer networks often display nonlinear behavior when examined over a wide range of operating conditions. There are few strategies available for modeling such behavior and optimizing such systems as they run. Profile-driven regression is developed and applied to modeling and runtime optimization...... of throughput in a mobile ad hoc network, a self-organizing collection of mobile wireless nodes without any fixed infrastructure. The intermediate models generated in profile-driven regression are used to fit an overall model of throughput, and are also used to optimize controllable factors at...
The Relationship between Economic Growth and Money Laundering – a Linear Regression Model
Daniel Rece
2009-09-01
Full Text Available This study provides an overview of the relationship between economic growth and money laundering modeled by a least squares function. The report analyzes statistically data collected from USA, Russia, Romania and other eleven European countries, rendering a linear regression model. The study illustrates that 23.7% of the total variance in the regressand (level of money laundering is “explained” by the linear regression model. In our opinion, this model will provide critical auxiliary judgment and decision support for anti-money laundering service systems.
Carstensen, Bendix
1996-01-01
This paper shows how to fit excess and relative risk regression models to interval censored survival data, and how to implement the models in standard statistical software. The methods developed are used for the analysis of HIV infection rates in a cohort of Danish homosexual men.......This paper shows how to fit excess and relative risk regression models to interval censored survival data, and how to implement the models in standard statistical software. The methods developed are used for the analysis of HIV infection rates in a cohort of Danish homosexual men....
Kahane, Leo H
2007-01-01
Using a friendly, nontechnical approach, the Second Edition of Regression Basics introduces readers to the fundamentals of regression. Accessible to anyone with an introductory statistics background, this book builds from a simple two-variable model to a model of greater complexity. Author Leo H. Kahane weaves four engaging examples throughout the text to illustrate not only the techniques of regression but also how this empirical tool can be applied in creative ways to consider a broad array of topics. New to the Second Edition Offers greater coverage of simple panel-data estimation:
Adaptive System Modeling for Spacecraft Simulation
Thomas, Justin
2011-01-01
This invention introduces a methodology and associated software tools for automatically learning spacecraft system models without any assumptions regarding system behavior. Data stream mining techniques were used to learn models for critical portions of the International Space Station (ISS) Electrical Power System (EPS). Evaluation on historical ISS telemetry data shows that adaptive system modeling reduces simulation error anywhere from 50 to 90 percent over existing approaches. The purpose of the methodology is to outline how someone can create accurate system models from sensor (telemetry) data. The purpose of the software is to support the methodology. The software provides analysis tools to design the adaptive models. The software also provides the algorithms to initially build system models and continuously update them from the latest streaming sensor data. The main strengths are as follows: Creates accurate spacecraft system models without in-depth system knowledge or any assumptions about system behavior. Automatically updates/calibrates system models using the latest streaming sensor data. Creates device specific models that capture the exact behavior of devices of the same type. Adapts to evolving systems. Can reduce computational complexity (faster simulations).
Sample- and segment-size specific Model Selection in Mixture Regression Analysis
Sarstedt, Marko
2006-01-01
As mixture regression models increasingly receive attention from both theory and practice, the question of selecting the correct number of segments gains urgency. A misspecification can lead to an under- or oversegmentation, thus resulting in flawed management decisions on customer targeting or product positioning. This paper presents the results of an extensive simulation study that examines the performance of commonly used information criteria in a mixture regression context with normal ...
Testing normality in bivariate probit models : a simple artificial regression based LM test
Murphy, Anthony
1994-01-01
A simple and convenient LM test of normality in the bivariate probit model is derived. The alternative hypothesis is based on a form of truncated Gram Charlier Type series. The LM test may be calculated as an artificial regression. However, the proposed artificial regression does not use the outer product gradient form. Thus it is likely to perform reasonably well in small samples. non-peer-reviewed
Regression models based on new local strategies for near infrared spectroscopic data.
Allegrini, F; Fernández Pierna, J A; Fragoso, W D; Olivieri, A C; Baeten, V; Dardenne, P
2016-08-24
In this work, a comparative study of two novel algorithms to perform sample selection in local regression based on Partial Least Squares Regression (PLS) is presented. These methodologies were applied for Near Infrared Spectroscopy (NIRS) quantification of five major constituents in corn seeds and are compared and contrasted with global PLS calibrations. Validation results show a significant improvement in the prediction quality when local models implemented by the proposed algorithms are applied to large data bases. PMID:27496996
Testing and Modeling Fuel Regression Rate in a Miniature Hybrid Burner
Luciano Fanton; Christian Paravan; Luigi T. De Luca
2012-01-01
Ballistic characterization of an extended group of innovative HTPB-based solid fuel formulations for hybrid rocket propulsion was performed in a lab-scale burner. An optical time-resolved technique was used to assess the quasisteady regression history of single perforation, cylindrical samples. The effects of metalized additives and radiant heat transfer on the regression rate of such formulations were assessed. Under the investigated operating conditions and based on phenomenological models ...
Regression model for daily passenger volume of high-speed railway line under capacity constraint
骆泳吉; 刘军; 孙迅; 赖晴鹰
2015-01-01
A non-linear regression model is proposed to forecast the aggregated passenger volume of Beijing−Shanghai high-speed railway (HSR) line in China. Train services and temporal features of passenger volume are studied to have a prior knowledge about this high-speed railway line. Then, based on a theoretical curve that depicts the relationship among passenger demand, transportation capacity and passenger volume, a non-linear regression model is established with consideration of the effect of capacity constraint. Through experiments, it is found that the proposed model can perform better in both forecasting accuracy and stability compared with linear regression models and back-propagation neural networks. In addition to the forecasting ability, with a definite formation, the proposed model can be further used to forecast the effects of train planning policies.
Structured Additive Regression Models: An R Interface to BayesX
Nikolaus Umlauf
2015-02-01
Full Text Available Structured additive regression (STAR models provide a flexible framework for model- ing possible nonlinear effects of covariates: They contain the well established frameworks of generalized linear models and generalized additive models as special cases but also allow a wider class of effects, e.g., for geographical or spatio-temporal data, allowing for specification of complex and realistic models. BayesX is standalone software package providing software for fitting general class of STAR models. Based on a comprehensive open-source regression toolbox written in C++, BayesX uses Bayesian inference for estimating STAR models based on Markov chain Monte Carlo simulation techniques, a mixed model representation of STAR models, or stepwise regression techniques combining penalized least squares estimation with model selection. BayesX not only covers models for responses from univariate exponential families, but also models from less-standard regression situations such as models for multi-categorical responses with either ordered or unordered categories, continuous time survival data, or continuous time multi-state models. This paper presents a new fully interactive R interface to BayesX: the R package R2BayesX. With the new package, STAR models can be conveniently specified using Rs formula language (with some extended terms, fitted using the BayesX binary, represented in R with objects of suitable classes, and finally printed/summarized/plotted. This makes BayesX much more accessible to users familiar with R and adds extensive graphics capabilities for visualizing fitted STAR models. Furthermore, R2BayesX complements the already impressive capabilities for semiparametric regression in R by a comprehensive toolbox comprising in particular more complex response types and alternative inferential procedures such as simulation-based Bayesian inference.
Hybrid adaptive control of a dragonfly model
Couceiro, Micael S.; Ferreira, Nuno M. F.; Machado, J. A. Tenreiro
2012-02-01
Dragonflies show unique and superior flight performances than most of other insect species and birds. They are equipped with two pairs of independently controlled wings granting an unmatchable flying performance and robustness. In this paper, it is presented an adaptive scheme controlling a nonlinear model inspired in a dragonfly-like robot. It is proposed a hybrid adaptive ( HA) law for adjusting the parameters analyzing the tracking error. At the current stage of the project it is considered essential the development of computational simulation models based in the dynamics to test whether strategies or algorithms of control, parts of the system (such as different wing configurations, tail) as well as the complete system. The performance analysis proves the superiority of the HA law over the direct adaptive ( DA) method in terms of faster and improved tracking and parameter convergence.
Validation of regression models for nitrate concentrations in the upper groundwater in sandy soils
For Dutch sandy regions, linear regression models have been developed that predict nitrate concentrations in the upper groundwater on the basis of residual nitrate contents in the soil in autumn. The objective of our study was to validate these regression models for one particular sandy region dominated by dairy farming. No data from this area were used for calibrating the regression models. The model was validated by additional probability sampling. This sample was used to estimate errors in 1) the predicted areal fractions where the EU standard of 50 mg l-1 is exceeded for farms with low N surpluses (ALT) and farms with higher N surpluses (REF); 2) predicted cumulative frequency distributions of nitrate concentration for both groups of farms. Both the errors in the predicted areal fractions as well as the errors in the predicted cumulative frequency distributions indicate that the regression models are invalid for the sandy soils of this study area. - This study indicates that linear regression models that predict nitrate concentrations in the upper groundwater using residual soil N contents should be applied with care.
Kamaruddin, Ainur Amira; Ali, Zalila; Noor, Norlida Mohd.; Baharum, Adam; Ahmad, Wan Muhamad Amir W.
2014-07-01
Logistic regression analysis examines the influence of various factors on a dichotomous outcome by estimating the probability of the event's occurrence. Logistic regression, also called a logit model, is a statistical procedure used to model dichotomous outcomes. In the logit model the log odds of the dichotomous outcome is modeled as a linear combination of the predictor variables. The log odds ratio in logistic regression provides a description of the probabilistic relationship of the variables and the outcome. In conducting logistic regression, selection procedures are used in selecting important predictor variables, diagnostics are used to check that assumptions are valid which include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers and a test statistic is calculated to determine the aptness of the model. This study used the binary logistic regression model to investigate overweight and obesity among rural secondary school students on the basis of their demographics profile, medical history, diet and lifestyle. The results indicate that overweight and obesity of students are influenced by obesity in family and the interaction between a student's ethnicity and routine meals intake. The odds of a student being overweight and obese are higher for a student having a family history of obesity and for a non-Malay student who frequently takes routine meals as compared to a Malay student.
Modelling and (adaptive) control of greenhouse climates
Udink ten Cate, A.J.
1983-01-01
The material presented in this thesis can be grouped around four themes, system concepts, modeling, control and adaptive control. In this summary these themes will be treated separately.System conceptsIn Chapters 1 and 2 an overview of the problem formulation is presented. It is suggested that there
Semantic models for adaptive interactive systems
Hussein, Tim; Lukosch, Stephan; Ziegler, Jürgen; Calvary, Gaëlle
2013-01-01
Providing insights into methodologies for designing adaptive systems based on semantic data, and introducing semantic models that can be used for building interactive systems, this book showcases many of the applications made possible by the use of semantic models.Ontologies may enhance the functional coverage of an interactive system as well as its visualization and interaction capabilities in various ways. Semantic models can also contribute to bridging gaps; for example, between user models, context-aware interfaces, and model-driven UI generation. There is considerable potential for using
Ünvan, Yüksel Akay; Gamze ÖZEL
2010-01-01
Abstract: Multinomial logistic (ML) and multinomial conditional logistic (MCL) regression models are used for modeling the relationships between a polytomous response variable and a set of explanatory variables. In this study, key factors affecting the European Union (EU) membership process are determined using ML and MCL models. We compare the ML and MCL models and argue that MCL is more preferable than the more complex ML model. Then for each candidate or potential candidate ...
Meuwissen Theo HE; Ødegård Jørgen; Lillehammer Marie
2009-01-01
Abstract The combination of a sire model and a random regression term describing genotype by environment interactions may lead to biased estimates of genetic variance components because of heterogeneous residual variance. In order to test different models, simulated data with genotype by environment interactions, and dairy cattle data assumed to contain such interactions, were analyzed. Two animal models were compared to four sire models. Models differed in their ability to handle heterogeneo...
Use of posterior predictive assessments to evaluate model fit in multilevel logistic regression
Green, Martin J.; Medley, Graham F; Browne, William J.
2009-01-01
Assessing the fit of a model is an important final step in any statistical analysis, but this is not straightforward when complex discrete response models are used. Cross validation and posterior predictions have been suggested as methods to aid model criticism. In this paper a comparison is made between four methods of model predictive assessment in the context of a three level logistic regression model for clinical mastitis in dairy cattle; cross validation, a prediction using the full post...
Pradhan, B.; Buchroithner, M. F.; Mansor, S.
2009-04-01
This paper presents the assessment results of spatially based probabilistic three models using Geoinformation Techniques (GIT) for landslide susceptibility analysis at Penang Island in Malaysia. Landslide locations within the study areas were identified by interpreting aerial photographs, satellite images and supported with field surveys. Maps of the topography, soil type, lineaments and land cover were constructed from the spatial data sets. There are nine landslide related factors were extracted from the spatial database and the neural network, frequency ratio and logistic regression coefficients of each factor was computed. Landslide susceptibility maps were drawn for study area using neural network, frequency ratios and logistic regression models. For verification, the results of the analyses were compared with actual landslide locations in study area. The verification results show that frequency ratio model provides higher prediction accuracy than the ANN and regression models.
Use of Pollutant Load Regression Models with Various Sampling Frequencies for Annual Load Estimation
Youn Shik Park
2014-06-01
Full Text Available Water quality data are collected by various sampling frequencies, and the data may not be collected at a high frequency nor over the range of streamflow conditions. Therefore, regression models are used to estimate pollutant data for days on which water quality data were not measured. Pollutant load regression models were evaluated with six sampling frequencies for daily nitrogen, phosphorus, and sediment data. Annual pollutant load estimates exhibited various behaviors by sampling frequency and also by the regression model used. Several distinct sampling frequency features were observed in the study. The first was that more frequent sampling did not necessarily lead to more accurate and precise annual pollutant load estimates. The second was that use of water quality data collected from storm events improved both accuracy and precision in annual pollutant load estimates for all water quality parameters. The third was that the pollutant regression model automatically selected by LOADEST did not necessarily lead to more accurate and precise annual pollutant load estimates. The fourth was that pollutant regression models displayed different behaviors for different water quality parameters in annual pollutant load estimation.
Larsen, Ulrik; Pierobon, Leonardo; Wronski, Jorrit;
2014-01-01
power. In this study we propose four linear regression models to predict the maximum obtainable thermal efficiency for simple and recuperated ORCs. A previously derived methodology is able to determine the maximum thermal efficiency among many combinations of fluids and processes, given the boundary...... conditions of the process. Hundreds of optimised cases with varied design parameters are used as observations in four multiple regression analyses. We analyse the model assumptions, prediction abilities and extrapolations, and compare the results with recent studies in the literature. The models are in...
Efficient Quantile Estimation for Functional-Coefficient Partially Linear Regression Models
Zhangong ZHOU; Rong JIANG; Weimin QIAN
2011-01-01
The quantile estimation methods are proposed for functional-coefficient partially linear regression (FCPLR) model by combining nonparametric and functional-coefficient regression (FCR) model.The local linear scheme and the integrated method are used to obtain local quantile estimators of all unknown functions in the FCPLR model.These resulting estimators are asymptotically normal,but each of them has big variance.To reduce variances of these quantile estimators,the one-step backfitting technique is used to obtain the efficient quantile estimators of all unknown functions,and their asymptotic normalities are derived.Two simulated examples are carried out to illustrate the proposed estimation methodology.
Automated adaptive inference of phenomenological dynamical models
Daniels, Bryan C.; Nemenman, Ilya
2015-08-01
Dynamics of complex systems is often driven by large and intricate networks of microscopic interactions, whose sheer size obfuscates understanding. With limited experimental data, many parameters of such dynamics are unknown, and thus detailed, mechanistic models risk overfitting and making faulty predictions. At the other extreme, simple ad hoc models often miss defining features of the underlying systems. Here we develop an approach that instead constructs phenomenological, coarse-grained models of network dynamics that automatically adapt their complexity to the available data. Such adaptive models produce accurate predictions even when microscopic details are unknown. The approach is computationally tractable, even for a relatively large number of dynamical variables. Using simulated data, it correctly infers the phase space structure for planetary motion, avoids overfitting in a biological signalling system and produces accurate predictions for yeast glycolysis with tens of data points and over half of the interacting species unobserved.
Adaptive Modeling for Security Infrastructure Fault Response
CUI Zhong-jie; YAO Shu-ping; HU Chang-zhen
2008-01-01
Based on the analysis of inherent limitations in existing security response decision-making systems, a dynamic adaptive model of fault response is presented. Several security fault levels were founded, which comprise the basic level, equipment level and mechanism level. Fault damage cost is calculated using the analytic hierarchy process. Meanwhile, the model evaluates the impact of different responses upon fault repair and normal operation. Response operation cost and response negative cost are introduced through quantitative calculation. This model adopts a comprehensive response decision of security fault in three principles-the maximum and minimum principle, timeliness principle, acquiescence principle, which assure optimal response countermeasure is selected for different situations. Experimental results show that the proposed model has good self-adaptation ability, timeliness and cost-sensitiveness.
Testing and Modeling Fuel Regression Rate in a Miniature Hybrid Burner
Luciano Fanton
2012-01-01
Full Text Available Ballistic characterization of an extended group of innovative HTPB-based solid fuel formulations for hybrid rocket propulsion was performed in a lab-scale burner. An optical time-resolved technique was used to assess the quasisteady regression history of single perforation, cylindrical samples. The effects of metalized additives and radiant heat transfer on the regression rate of such formulations were assessed. Under the investigated operating conditions and based on phenomenological models from the literature, analyses of the collected experimental data show an appreciable influence of the radiant heat flux from burnt gases and soot for both unloaded and loaded fuel formulations. Pure HTPB regression rate data are satisfactorily reproduced, while the impressive initial regression rates of metalized formulations require further assessment.
Adaptive Cruise Control and Driver Modeling
Bengtsson, Johan
2001-01-01
Many vehicle manufacturers have lately introduced advance driver support in some of their automobiles. One of those new features is Adaptive Cruise Control DACCE, which extends the conventional cruise control system to control of relative speed and distance to other vehicles. In order to design an ACC controller it is suitable to have a model of driver behavior. The approach in the thesis is to use system identification methodology to obtain dynamic models of driver behavior useful for ACC ap...
On pseudo-values for regression analysis in competing risks models
Gerds, Thomas Alexander; Graw, F; Schumacher, M
2009-01-01
For regression on state and transition probabilities in multi-state models Andersen et al. (Biometrika 90:15-27, 2003) propose a technique based on jackknife pseudo-values. In this article we analyze the pseudo-values suggested for competing risks models and prove some conjectures regarding their...
Application of Fuzzy Regression Model to the Prediction of Field Mouse Occurrence Rate
XU Fei
2009-01-01
Expressions were given to describe the closeness between the estimated value and observed value for two asymmetric exponential fuzzy numbers. Based on that, the model was given to solve the question of fuzzy multivariable regression with fuzzy input, fuzzy output and crisp coefficients. Finally, with this model, the prediction of field mouse occurrence rate had been done and the satisfied result was obtained.
Preacher, Kristopher J.; Curran, Patrick J.; Bauer, Daniel J.
2006-01-01
Simple slopes, regions of significance, and confidence bands are commonly used to evaluate interactions in multiple linear regression (MLR) models, and the use of these techniques has recently been extended to multilevel or hierarchical linear modeling (HLM) and latent curve analysis (LCA). However, conducting these tests and plotting the…
Sample Size Determination for Regression Models Using Monte Carlo Methods in R
Beaujean, A. Alexander
2014-01-01
A common question asked by researchers using regression models is, What sample size is needed for my study? While there are formulae to estimate sample sizes, their assumptions are often not met in the collected data. A more realistic approach to sample size determination requires more information such as the model of interest, strength of the…
de Vries, SO; Fidler, [No Value; Kuipers, WD; Hunink, MGM
1998-01-01
The purpose of this study was to develop a model that predicts the outcome of supervised exercise for intermittent claudication. The authors present an example of the use of autoregressive logistic regression for modeling observed longitudinal data. Data were collected from 329 participants in a six
de Vries, S.O.; Fidler, V.; Kuipers, W.D.; Hunink, M.G.
1998-01-01
The purpose of this study was to develop a model that predicts the outcome of supervised exercise for intermittent claudication. The authors present an example of the use of autoregressive logistic regression for modeling observed longitudinal data. Data were collected from 329 participants in a six
Multiple regression models for energy use in air-conditioned office buildings in different climates
An attempt was made to develop multiple regression models for office buildings in the five major climates in China - severe cold, cold, hot summer and cold winter, mild, and hot summer and warm winter. A total of 12 key building design variables were identified through parametric and sensitivity analysis, and considered as inputs in the regression models. The coefficient of determination R2 varies from 0.89 in Harbin to 0.97 in Kunming, indicating that 89-97% of the variations in annual building energy use can be explained by the changes in the 12 parameters. A pseudo-random number generator based on three simple multiplicative congruential generators was employed to generate random designs for evaluation of the regression models. The difference between regression-predicted and DOE-simulated annual building energy use are largely within 10%. It is envisaged that the regression models developed can be used to estimate the likely energy savings/penalty during the initial design stage when different building schemes and design concepts are being considered.
Grégoire, G.
2014-12-01
The logistic regression originally is intended to explain the relationship between the probability of an event and a set of covariables. The model's coefficients can be interpreted via the odds and odds ratio, which are presented in introduction of the chapter. The observations are possibly got individually, then we speak of binary logistic regression. When they are grouped, the logistic regression is said binomial. In our presentation we mainly focus on the binary case. For statistical inference the main tool is the maximum likelihood methodology: we present the Wald, Rao and likelihoods ratio results and their use to compare nested models. The problems we intend to deal with are essentially the same as in multiple linear regression: testing global effect, individual effect, selection of variables to build a model, measure of the fitness of the model, prediction of new values… . The methods are demonstrated on data sets using R. Finally we briefly consider the binomial case and the situation where we are interested in several events, that is the polytomous (multinomial) logistic regression and the particular case of ordinal logistic regression.
Longitudinal beta regression models for analyzing health-related quality of life scores over time
Hunger Matthias
2012-09-01
Full Text Available Abstract Background Health-related quality of life (HRQL has become an increasingly important outcome parameter in clinical trials and epidemiological research. HRQL scores are typically bounded at both ends of the scale and often highly skewed. Several regression techniques have been proposed to model such data in cross-sectional studies, however, methods applicable in longitudinal research are less well researched. This study examined the use of beta regression models for analyzing longitudinal HRQL data using two empirical examples with distributional features typically encountered in practice. Methods We used SF-6D utility data from a German older age cohort study and stroke-specific HRQL data from a randomized controlled trial. We described the conceptual differences between mixed and marginal beta regression models and compared both models to the commonly used linear mixed model in terms of overall fit and predictive accuracy. Results At any measurement time, the beta distribution fitted the SF-6D utility data and stroke-specific HRQL data better than the normal distribution. The mixed beta model showed better likelihood-based fit statistics than the linear mixed model and respected the boundedness of the outcome variable. However, it tended to underestimate the true mean at the upper part of the distribution. Adjusted group means from marginal beta model and linear mixed model were nearly identical but differences could be observed with respect to standard errors. Conclusions Understanding the conceptual differences between mixed and marginal beta regression models is important for their proper use in the analysis of longitudinal HRQL data. Beta regression fits the typical distribution of HRQL data better than linear mixed models, however, if focus is on estimating group mean scores rather than making individual predictions, the two methods might not differ substantially.
Adaptive-network models of swarm dynamics
Huepe, Cristian [614 N Paulina Street, Chicago, IL 60622-6062 (United States); Zschaler, Gerd; Do, Anne-Ly; Gross, Thilo, E-mail: cristian@northwestern.edu [Max-Planck-Institut fuer Physik komplexer Systeme, Noethnitzer Strasse 38, 01187 Dresden (Germany)
2011-07-15
We propose a simple adaptive-network model describing recent swarming experiments. Exploiting an analogy with human decision making, we capture the dynamics of the model using a low-dimensional system of equations permitting analytical investigation. We find that the model reproduces several characteristic features of swarms, including spontaneous symmetry breaking, noise- and density-driven order-disorder transitions that can be of first or second order, and intermittency. Reproducing these experimental observations using a non-spatial model suggests that spatial geometry may have less of an impact on collective motion than previously thought.
Adaptive-network models of swarm dynamics
We propose a simple adaptive-network model describing recent swarming experiments. Exploiting an analogy with human decision making, we capture the dynamics of the model using a low-dimensional system of equations permitting analytical investigation. We find that the model reproduces several characteristic features of swarms, including spontaneous symmetry breaking, noise- and density-driven order-disorder transitions that can be of first or second order, and intermittency. Reproducing these experimental observations using a non-spatial model suggests that spatial geometry may have less of an impact on collective motion than previously thought.
Adaptive Behaviour Assessment System: Indigenous Australian Adaptation Model (ABAS: IAAM)
du Plessis, Santie
2015-01-01
The study objectives were to develop, trial and evaluate a cross-cultural adaptation of the Adaptive Behavior Assessment System-Second Edition Teacher Form (ABAS-II TF) ages 5-21 for use with Indigenous Australian students ages 5-14. This study introduced a multiphase mixed-method design with semi-structured and informal interviews, school…
Time-varying parameter auto-regressive models for autocovariance nonstationary time series
无
2009-01-01
In this paper, autocovariance nonstationary time series is clearly defined on a family of time series. We propose three types of TVPAR (time-varying parameter auto-regressive) models: the full order TVPAR model, the time-unvarying order TVPAR model and the time-varying order TV-PAR model for autocovariance nonstationary time series. Related minimum AIC (Akaike information criterion) estimations are carried out.
Time-varying parameter auto-regressive models for autocovariance nonstationary time series
FEI WanChun; BAI Lun
2009-01-01
In this paper,autocovariance nonstationary time series is clearly defined on a family of time series.We propose three types of TVPAR (time-varying parameter auto-regressive) models:the full order TVPAR model,the time-unvarying order TVPAR model and the time-varying order TVPAR model for autocovariance nonstationary time series.Related minimum AIC (Akaike information criterion) estimations are carried out.
Modeling interactions in count-data regression: Principles and implementation in Stata
Heinz Leitgöb
2014-01-01
During the past decades, count-data models (in particular, Poisson and negative-binomial-based regression models) have gained relevance in empirical social research. While identifying and interpreting main effects is relatively straightforward for this class of models, the integration of interactions between predictors proves to be complex. As a consequence of the exponential mean function implemented in count-data models (which restricts the possible range of the conditional expected count t...
Bivariate Poisson and Diagonal Inflated Bivariate Poisson Regression Models in R
Dimitris Karlis; Ioannis Ntzoufras
2005-01-01
In this paper we present an R package called bivpois for maximum likelihood estimation of the parameters of bivariate and diagonal inflated bivariate Poisson regression models. An Expectation-Maximization (EM) algorithm is implemented. Inflated models allow for modelling both over-dispersion (or under-dispersion) and negative correlation and thus they are appropriate for a wide range of applications. Extensions of the algorithms for several other models are also discussed. Detailed guidance a...
Effects of model sensitivity and nonlinearity on nonlinear regression of ground water flow
Yager, R.M.
2004-01-01
Nonlinear regression is increasingly applied to the calibration of hydrologic models through the use of perturbation methods to compute the Jacobian or sensitivity matrix required by the Gauss-Newton optimization method. Sensitivities obtained by perturbation methods can be less accurate than those obtained by direct differentiation, however, and concern has arisen that the optimal parameter values and the associated parameter covariance matrix computed by perturbation could also be less accurate. Sensitivities computed by both perturbation and direct differentiation were applied in nonlinear regression calibration of seven ground water flow models. The two methods gave virtually identical optimum parameter values and covariances for the three models that were relatively linear and two of the models that were relatively nonlinear, but gave widely differing results for two other nonlinear models. The perturbation method performed better than direct differentiation in some regressions with the nonlinear models, apparently because approximate sensitivities computed for an interval yielded better search directions than did more accurately computed sensitivities for a point. The method selected to avoid overshooting minima on the error surface when updating parameter values with the Gauss-Newton procedure appears for nonlinear models to be more important than the method of sensitivity calculation in controlling regression convergence.
2013-01-01
Background Integrase inhibitors (INI) form a new drug class in the treatment of HIV-1 patients. We developed a linear regression modeling approach to make a quantitative raltegravir (RAL) resistance phenotype prediction, as Fold Change in IC50 against a wild type virus, from mutations in the integrase genotype. Methods We developed a clonal genotype-phenotype database with 991 clones from 153 clinical isolates of INI naïve and RAL treated patients, and 28 site-directed mutants. We did the development of the RAL linear regression model in two stages, employing a genetic algorithm (GA) to select integrase mutations by consensus. First, we ran multiple GAs to generate first order linear regression models (GA models) that were stochastically optimized to reach a goal R2 accuracy, and consisted of a fixed-length subset of integrase mutations to estimate INI resistance. Secondly, we derived a consensus linear regression model in a forward stepwise regression procedure, considering integrase mutations or mutation pairs by descending prevalence in the GA models. Results The most frequently occurring mutations in the GA models were 92Q, 97A, 143R and 155H (all 100%), 143G (90%), 148H/R (89%), 148K (88%), 151I (81%), 121Y (75%), 143C (72%), and 74M (69%). The RAL second order model contained 30 single mutations and five mutation pairs (p INI naïve patients. Conclusions We describe a systematic approach to derive a model for predicting INI resistance from a limited amount of clonal samples. Our RAL second order model is made available as an Additional file for calculating a resistance phenotype as the sum of integrase mutations and mutation pairs. PMID:23282253
Schmid, Maximilian P.; Fidarova, Elena [Dept. of Radiotherapy, Comprehensive Cancer Center, Medical Univ. of Vienna, Vienna (Austria)], e-mail: maximilian.schmid@akhwien.at; Poetter, Richard [Dept. of Radiotherapy, Comprehensive Cancer Center, Medical Univ. of Vienna, Vienna (Austria); Christian Doppler Lab. for Medical Radiation Research for Radiation Oncology, Medical Univ. of Vienna (Austria)] [and others
2013-10-15
Purpose: To investigate the impact of magnetic resonance imaging (MRI)-morphologic differences in parametrial infiltration on tumour response during primary radio chemotherapy in cervical cancer. Material and methods: Eighty-five consecutive cervical cancer patients with FIGO stages IIB (n = 59) and IIIB (n = 26), treated by external beam radiotherapy ({+-}chemotherapy) and image-guided adaptive brachytherapy, underwent T2-weighted MRI at the time of diagnosis and at the time of brachytherapy. MRI patterns of parametrial tumour infiltration at the time of diagnosis were assessed with regard to predominant morphology and maximum extent of parametrial tumour infiltration and were stratified into five tumour groups (TG): 1) expansive with spiculae; 2) expansive with spiculae and infiltrating parts; 3) infiltrative into the inner third of the parametrial space (PM); 4) infiltrative into the middle third of the PM; and 5) infiltrative into the outer third of the PM. MRI at the time of brachytherapy was used for identifying presence (residual vs. no residual disease) and signal intensity (high vs. intermediate) of residual disease within the PM. Left and right PM of each patient were evaluated separately at both time points. The impact of the TG on tumour remission status within the PM was analysed using {chi}2-test and logistic regression analysis. Results: In total, 170 PM were analysed. The TG 1, 2, 3, 4, 5 were present in 12%, 11%, 35%, 25% and 12% of the cases, respectively. Five percent of the PM were tumour-free. Residual tumour in the PM was identified in 19%, 68%, 88%, 90% and 85% of the PM for the TG 1, 2, 3, 4, and 5, respectively. The TG 3 - 5 had significantly higher rates of residual tumour in the PM in comparison to TG 1 + 2 (88% vs. 43%, p < 0.01). Conclusion: MRI-morphologic features of PM infiltration appear to allow for prediction of tumour response during external beam radiotherapy and chemotherapy. A predominantly infiltrative tumour spread at the
Some New Methods for the Comparison of Two Linear Regression Models
Liu, Wei; Jamshidian, Mortaza; Zhang, Ying; Bretz, Frank; Han, Xiaoliang
2006-01-01
The frequently used approach to the comparison of two linear regression models is to use the partial F test. It is pointed out in this paper that the partial F test has in fact a naturally associated two-sided simultaneous confidence band, which is much more informative than the test itself. But this confidence band is over the entire range of all the covariates. As regression models are true or of interest often only over a restricted region of the covariates, the part of this confidence ban...
Blind identification of threshold auto-regressive model for machine fault diagnosis
LI Zhinong; HE Yongyong; CHU Fulei; WU Zhaotong
2007-01-01
A blind identification method was developed for the threshold auto-regressive (TAR) model. The method had good identification accuracy and rapid convergence, especially for higher order systems. The proposed method was then combined with the hidden Markov model (HMM) to determine the auto-regressive (AR) coefficients for each interval used for feature extraction, with the HMM as a classifier. The fault diagnoses during the speed-up and speed- down processes for rotating machinery have been success- fully completed. The result of the experiment shows that the proposed method is practical and effective.
Small Area Estimation of Poverty Proportions under Random Regression Coefficient Models
Hobza, Tomáš; Morales, D.
Berlin: Springer, 2011 - (Pardo, L.; Balakrishnan, N.; Gil, M.), s. 315-328. (Understanding Complex Systems Springer Complexity). ISBN 978-3-642-20852-2 R&D Projects: GA MŠk 1M0572 Institutional research plan: CEZ:AV0Z10750506 Keywords : small area estimation * random regression coefficient model * EBLUP estimates Subject RIV: BB - Applied Statistics, Operational Research http://library.utia.cas.cz/separaty/2011/SI/hobza-small area estimation of poverty proportions under random regression coefficient models.pdf
Methods and applications of linear models regression and the analysis of variance
Hocking, Ronald R
2013-01-01
Praise for the Second Edition"An essential desktop reference book . . . it should definitely be on your bookshelf." -Technometrics A thoroughly updated book, Methods and Applications of Linear Models: Regression and the Analysis of Variance, Third Edition features innovative approaches to understanding and working with models and theory of linear regression. The Third Edition provides readers with the necessary theoretical concepts, which are presented using intuitive ideas rather than complicated proofs, to describe the inference that is appropriate for the methods being discussed. The book
FlexMix: A General Framework for Finite Mixture Models and Latent Class Regression in R
Friedrich Leisch
2004-10-01
Full Text Available FlexMix implements a general framework for fitting discrete mixtures of regression models in the R statistical computing environment: three variants of the EM algorithm can be used for parameter estimation, regressors and responses may be multivariate with arbitrary dimension, data may be grouped, e.g., to account for multiple observations per individual, the usual formula interface of the S language is used for convenient model specification, and a modular concept of driver functions allows to interface many different types of regression models. Existing drivers implement mixtures of standard linear models, generalized linear models and model-based clustering. FlexMix provides the E-step and all data handling, while the M-step can be supplied by the user to easily define new models.
Much attention is focused on increasing the energy efficiency to decrease fuel costs and CO2 emissions throughout industrial sectors. The ORC (organic Rankine cycle) is a relatively simple but efficient process that can be used for this purpose by converting low and medium temperature waste heat to power. In this study we propose four linear regression models to predict the maximum obtainable thermal efficiency for simple and recuperated ORCs. A previously derived methodology is able to determine the maximum thermal efficiency among many combinations of fluids and processes, given the boundary conditions of the process. Hundreds of optimised cases with varied design parameters are used as observations in four multiple regression analyses. We analyse the model assumptions, prediction abilities and extrapolations, and compare the results with recent studies in the literature. The models are in agreement with the literature, and they present an opportunity for accurate prediction of the potential of an ORC to convert heat sources with temperatures from 80 to 360 °C, without detailed knowledge or need for simulation of the process. - Highlights: • The maximum thermal efficiency of ORCs in hundreds of cases was analysed. • Multiple regression models were derived to predict the maximum obtainable efficiency of ORCs. • Using only key design parameters, the maximum obtainable efficiency can be evaluated. • The regression models decrease the resources needed to evaluate the maximum potential. • The models are statistically strong and in good agreement with the literature
Analysis of Multivariate Experimental Data Using A Simplified Regression Model Search Algorithm
Ulbrich, Norbert Manfred
2013-01-01
A new regression model search algorithm was developed in 2011 that may be used to analyze both general multivariate experimental data sets and wind tunnel strain-gage balance calibration data. The new algorithm is a simplified version of a more complex search algorithm that was originally developed at the NASA Ames Balance Calibration Laboratory. The new algorithm has the advantage that it needs only about one tenth of the original algorithm's CPU time for the completion of a search. In addition, extensive testing showed that the prediction accuracy of math models obtained from the simplified algorithm is similar to the prediction accuracy of math models obtained from the original algorithm. The simplified algorithm, however, cannot guarantee that search constraints related to a set of statistical quality requirements are always satisfied in the optimized regression models. Therefore, the simplified search algorithm is not intended to replace the original search algorithm. Instead, it may be used to generate an alternate optimized regression model of experimental data whenever the application of the original search algorithm either fails or requires too much CPU time. Data from a machine calibration of NASA's MK40 force balance is used to illustrate the application of the new regression model search algorithm.
Keat, Sim Chong; Chun, Beh Boon; San, Lim Hwee; Jafri, Mohd Zubir Mat
2015-04-01
Climate change due to carbon dioxide (CO2) emissions is one of the most complex challenges threatening our planet. This issue considered as a great and international concern that primary attributed from different fossil fuels. In this paper, regression model is used for analyzing the causal relationship among CO2 emissions based on the energy consumption in Malaysia using time series data for the period of 1980-2010. The equations were developed using regression model based on the eight major sources that contribute to the CO2 emissions such as non energy, Liquefied Petroleum Gas (LPG), diesel, kerosene, refinery gas, Aviation Turbine Fuel (ATF) and Aviation Gasoline (AV Gas), fuel oil and motor petrol. The related data partly used for predict the regression model (1980-2000) and partly used for validate the regression model (2001-2010). The results of the prediction model with the measured data showed a high correlation coefficient (R2=0.9544), indicating the model's accuracy and efficiency. These results are accurate and can be used in early warning of the population to comply with air quality standards.
Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne
2016-04-01
Existing evidence suggests that ambient ultrafine particles (UFPs) (mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure. PMID:26720396
Adaptive numerical algorithms in space weather modeling
Tóth, Gábor; van der Holst, Bart; Sokolov, Igor V.; De Zeeuw, Darren L.; Gombosi, Tamas I.; Fang, Fang; Manchester, Ward B.; Meng, Xing; Najib, Dalal; Powell, Kenneth G.; Stout, Quentin F.; Glocer, Alex; Ma, Ying-Juan; Opher, Merav
2012-02-01
Space weather describes the various processes in the Sun-Earth system that present danger to human health and technology. The goal of space weather forecasting is to provide an opportunity to mitigate these negative effects. Physics-based space weather modeling is characterized by disparate temporal and spatial scales as well as by different relevant physics in different domains. A multi-physics system can be modeled by a software framework comprising several components. Each component corresponds to a physics domain, and each component is represented by one or more numerical models. The publicly available Space Weather Modeling Framework (SWMF) can execute and couple together several components distributed over a parallel machine in a flexible and efficient manner. The framework also allows resolving disparate spatial and temporal scales with independent spatial and temporal discretizations in the various models. Several of the computationally most expensive domains of the framework are modeled by the Block-Adaptive Tree Solarwind Roe-type Upwind Scheme (BATS-R-US) code that can solve various forms of the magnetohydrodynamic (MHD) equations, including Hall, semi-relativistic, multi-species and multi-fluid MHD, anisotropic pressure, radiative transport and heat conduction. Modeling disparate scales within BATS-R-US is achieved by a block-adaptive mesh both in Cartesian and generalized coordinates. Most recently we have created a new core for BATS-R-US: the Block-Adaptive Tree Library (BATL) that provides a general toolkit for creating, load balancing and message passing in a 1, 2 or 3 dimensional block-adaptive grid. We describe the algorithms of BATL and demonstrate its efficiency and scaling properties for various problems. BATS-R-US uses several time-integration schemes to address multiple time-scales: explicit time stepping with fixed or local time steps, partially steady-state evolution, point-implicit, semi-implicit, explicit/implicit, and fully implicit
Adaptive numerical algorithms in space weather modeling
Space weather describes the various processes in the Sun–Earth system that present danger to human health and technology. The goal of space weather forecasting is to provide an opportunity to mitigate these negative effects. Physics-based space weather modeling is characterized by disparate temporal and spatial scales as well as by different relevant physics in different domains. A multi-physics system can be modeled by a software framework comprising several components. Each component corresponds to a physics domain, and each component is represented by one or more numerical models. The publicly available Space Weather Modeling Framework (SWMF) can execute and couple together several components distributed over a parallel machine in a flexible and efficient manner. The framework also allows resolving disparate spatial and temporal scales with independent spatial and temporal discretizations in the various models. Several of the computationally most expensive domains of the framework are modeled by the Block-Adaptive Tree Solarwind Roe-type Upwind Scheme (BATS-R-US) code that can solve various forms of the magnetohydrodynamic (MHD) equations, including Hall, semi-relativistic, multi-species and multi-fluid MHD, anisotropic pressure, radiative transport and heat conduction. Modeling disparate scales within BATS-R-US is achieved by a block-adaptive mesh both in Cartesian and generalized coordinates. Most recently we have created a new core for BATS-R-US: the Block-Adaptive Tree Library (BATL) that provides a general toolkit for creating, load balancing and message passing in a 1, 2 or 3 dimensional block-adaptive grid. We describe the algorithms of BATL and demonstrate its efficiency and scaling properties for various problems. BATS-R-US uses several time-integration schemes to address multiple time-scales: explicit time stepping with fixed or local time steps, partially steady-state evolution, point-implicit, semi-implicit, explicit/implicit, and fully implicit
INVESTIGATION OF E-MAIL TRAFFIC BY USING ZERO-INFLATED REGRESSION MODELS
Yılmaz KAYA
2012-06-01
Full Text Available Based on count data obtained with a value of zero may be greater than anticipated. These types of data sets should be used to analyze by regression methods taking into account zero values. Zero- Inflated Poisson (ZIP, Zero-Inflated negative binomial (ZINB, Poisson Hurdle (PH, negative binomial Hurdle (NBH are more common approaches in modeling more zero value possessing dependent variables than expected. In the present study, the e-mail traffic of Yüzüncü Yıl University in 2009 spring semester was investigated. ZIP and ZINB, PH and NBH regression methods were applied on the data set because more zeros counting (78.9% were found in data set than expected. ZINB and NBH regression considered zero dispersion and overdispersion were found to be more accurate results due to overdispersion and zero dispersion in sending e-mail. ZINB is determined to be best model accordingto Vuong statistics and information criteria.
Stahel-Donoho kernel estimation for fixed design nonparametric regression models
LIN Lu; CUI Xia
2006-01-01
This paper reports a robust kernel estimation for fixed design nonparametric regression models.A Stahel-Donoho kernel estimation is introduced,in which the weight functions depend on both the depths of data and the distances between the design points and the estimation points.Based on a local approximation,a computational technique is given to approximate to the incomputable depths of the errors.As a result the new estimator is computationally efficient.The proposed estimator attains a high breakdown point and has perfect asymptotic behaviors such as the asymptotic normality and convergence in the mean squared error.Unlike the depth-weighted estimator for parametric regression models,this depth-weighted nonparametric estimator has a simple variance structure and then we can compare its efficiency with the original one.Some simulations show that the new method can smooth the regression estimation and achieve some desirable balances between robustness and efficiency.
Accounting for spatial effects in land use regression for urban air pollution modeling.
Bertazzon, Stefania; Johnson, Markey; Eccles, Kristin; Kaplan, Gilaad G
2015-01-01
In order to accurately assess air pollution risks, health studies require spatially resolved pollution concentrations. Land-use regression (LUR) models estimate ambient concentrations at a fine spatial scale. However, spatial effects such as spatial non-stationarity and spatial autocorrelation can reduce the accuracy of LUR estimates by increasing regression errors and uncertainty; and statistical methods for resolving these effects--e.g., spatially autoregressive (SAR) and geographically weighted regression (GWR) models--may be difficult to apply simultaneously. We used an alternate approach to address spatial non-stationarity and spatial autocorrelation in LUR models for nitrogen dioxide. Traditional models were re-specified to include a variable capturing wind speed and direction, and re-fit as GWR models. Mean R(2) values for the resulting GWR-wind models (summer: 0.86, winter: 0.73) showed a 10-20% improvement over traditional LUR models. GWR-wind models effectively addressed both spatial effects and produced meaningful predictive models. These results suggest a useful method for improving spatially explicit models. PMID:26530819
Wen-Cheng Wang
2014-01-01
Full Text Available It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models.
Logistic回归模型及其应用%Logistic regression model and its application
常振海; 刘薇
2012-01-01
为了利用Logistic模型提高多分类定性因变量的预测准确率,在二分类Logistic回归模型的基础上,对实际统计数据建立三类别的Logistic模型.采用似然比检验法对自变量的显著性进行检验,剔除了不显著的变量;对每个类别的因变量都确定了1个线性回归函数,并进行了模型检验.分析结果表明,在处理因变量为定性变量的回归分析中,Logistic模型具有很好的预测准确度和实用推广性.%To improve the forecasting accuracy of the multinomial qualitative dependent variable by using logistic model,ternary logistic model is established for actual statistical data based on binary logistic regression model.The significance of independent variables is tested by using the likelihood ratio test method to remove the non-significant variable.A linear regression function is determined for each category dependent variable,and the models are tested.The analysis results show that logistic regression model has good predictive accuracy and practical promotional value in handling regression analysis of qualitative dependent variable.
The absorbed dose for equivalent soft tissue is determined,it is imparted by ophthalmologic applicators, (90 Sr/90 Y, 1850 MBq) using an extrapolation chamber of variable electrodes; when estimating the slope of the extrapolation curve using a simple lineal regression model is observed that the dose values are underestimated from 17.7 percent up to a 20.4 percent in relation to the estimate of this dose by means of a regression model polynomial two grade, at the same time are observed an improvement in the standard error for the quadratic model until in 50%. Finally the global uncertainty of the dose is presented, taking into account the reproducibility of the experimental arrangement. As conclusion it can infers that in experimental arrangements where the source is to contact with the extrapolation chamber, it was recommended to substitute the lineal regression model by the quadratic regression model, in the determination of the slope of the extrapolation curve, for more exact and accurate measurements of the absorbed dose. (Author)
Liu, Dawei; Lin, Xihong; Ghosh, Debashis
2007-01-01
We consider a semiparametric regression model that relates a normal outcome to covariates and a genetic pathway, where the covariate effects are modeled parametrically and the pathway effect of multiple gene expressions is modeled parametrically or nonparametrically using least-squares kernel machines (LSKMs). This unified framework allows a flexible function for the joint effect of multiple genes within a pathway by specifying a kernel function and allows for the possibility that each gene e...
Extensions of linear regression models based on set arithmetic for interval data
Blanco-Fernández, Angela; García-Bárzana, Marta; Colubi, Ana; Kontoghiorghes, Erricos J.
2012-01-01
Extensions of previous linear regression models for interval data are presented. A more flexible simple linear model is formalized. The new model may express cross-relationships between mid-points and spreads of the interval data in a unique equation based on the interval arithmetic. Moreover, extensions to the multiple case are addressed. The associated least-squares estimation problem are solved. Empirical results and a real-life application are presented in order to show the applicability ...
Pricing model performance and the two-pass cross-sectional regression methodology
Kan, Raymond; Robotti, Cesare; Shanken, Jay
2009-01-01
Since Black, Jensen, and Scholes (1972) and Fama and MacBeth (1973), the two-pass cross-sectional regression (CSR) methodology has become the most popular approach for estimating and testing asset pricing models. Statistical inference with this method is typically conducted under the assumption that the models are correctly specified, that is, expected returns are exactly linear in asset betas. This assumption can be a problem in practice since all models are, at best, approximations of reali...
Additive Hazard Regression Models: An Application to the Natural History of Human Papillomavirus
Xianhong Xie; STRICKLER, Howard D.; Xiaonan Xue
2013-01-01
There are several statistical methods for time-to-event analysis, among which is the Cox proportional hazards model that is most commonly used. However, when the absolute change in risk, instead of the risk ratio, is of primary interest or when the proportional hazard assumption for the Cox proportional hazards model is violated, an additive hazard regression model may be more appropriate. In this paper, we give an overview of this approach and then apply a semiparametric as well as a nonpara...
Adaptive Learning Models of Consumer Behavior
Hopkins, Ed
2007-01-01
In a model of dynamic duopoly, optimal price policies are characterized assuming consumers learn adaptively about the relative quality of the two products. A contrast is made between belief-based and reinforcement learning. Under reinforcement learning, consumers can become locked into the habit of purchasing inferior goods. Such lock-in permits the existence of multiple history-dependent asymmetric steady states in which one firm dominates. In contrast, belief-based learning rules must lead ...
Bayesian Network Models for Adaptive Testing
Plajner, Martin; Vomlel, Jiří
Achen: Sun SITE Central Europe, 2016 - (Agosta, J.; Carvalho, R.), s. 24-33. (CEUR Workshop Proceedings. Vol 1565). ISSN 1613-0073. [The Twelfth UAI Bayesian Modeling Applications Workshop (BMAW 2015). Amsterdam (NL), 16.07.2015] R&D Projects: GA ČR GA13-20012S Institutional support: RVO:67985556 Keywords : Bayesian networks * Computerized adaptive testing Subject RIV: JD - Computer Applications, Robotics http://library.utia.cas.cz/separaty/2016/MTR/plajner-0458062.pdf
Reinforcement Learning Using Local Adaptive Models
Borga, Magnus
1995-01-01
In this thesis, the theory of reinforcement learning is described and its relation to learning in biological systems is discussed. Some basic issues in reinforcement learning, the credit assignment problem and perceptual aliasing, are considered. The methods of temporal difference are described. Three important design issues are discussed: information representation and system architecture, rules for improving the behaviour and rules for the reward mechanisms. The use of local adaptive models...
Liu, Dawei; Lin, Xihong; Ghosh, Debashis
2007-12-01
We consider a semiparametric regression model that relates a normal outcome to covariates and a genetic pathway, where the covariate effects are modeled parametrically and the pathway effect of multiple gene expressions is modeled parametrically or nonparametrically using least-squares kernel machines (LSKMs). This unified framework allows a flexible function for the joint effect of multiple genes within a pathway by specifying a kernel function and allows for the possibility that each gene expression effect might be nonlinear and the genes within the same pathway are likely to interact with each other in a complicated way. This semiparametric model also makes it possible to test for the overall genetic pathway effect. We show that the LSKM semiparametric regression can be formulated using a linear mixed model. Estimation and inference hence can proceed within the linear mixed model framework using standard mixed model software. Both the regression coefficients of the covariate effects and the LSKM estimator of the genetic pathway effect can be obtained using the best linear unbiased predictor in the corresponding linear mixed model formulation. The smoothing parameter and the kernel parameter can be estimated as variance components using restricted maximum likelihood. A score test is developed to test for the genetic pathway effect. Model/variable selection within the LSKM framework is discussed. The methods are illustrated using a prostate cancer data set and evaluated using simulations. PMID:18078480
Adaptive human behavior in epidemiological models.
Fenichel, Eli P; Castillo-Chavez, Carlos; Ceddia, M G; Chowell, Gerardo; Parra, Paula A Gonzalez; Hickling, Graham J; Holloway, Garth; Horan, Richard; Morin, Benjamin; Perrings, Charles; Springborn, Michael; Velazquez, Leticia; Villalobos, Cristina
2011-04-12
The science and management of infectious disease are entering a new stage. Increasingly public policy to manage epidemics focuses on motivating people, through social distancing policies, to alter their behavior to reduce contacts and reduce public disease risk. Person-to-person contacts drive human disease dynamics. People value such contacts and are willing to accept some disease risk to gain contact-related benefits. The cost-benefit trade-offs that shape contact behavior, and hence the course of epidemics, are often only implicitly incorporated in epidemiological models. This approach creates difficulty in parsing out the effects of adaptive behavior. We use an epidemiological-economic model of disease dynamics to explicitly model the trade-offs that drive person-to-person contact decisions. Results indicate that including adaptive human behavior significantly changes the predicted course of epidemics and that this inclusion has implications for parameter estimation and interpretation and for the development of social distancing policies. Acknowledging adaptive behavior requires a shift in thinking about epidemiological processes and parameters. PMID:21444809
Lukianenko Iryna H.
2014-01-01
Full Text Available The article considers possibilities and specific features of modelling economic phenomena with the help of the category of models that unite elements of econometric regressions and artificial neural networks. This category of models contains auto-regression neural networks (AR-NN, regressions of smooth transition (STR/STAR, multi-mode regressions of smooth transition (MRSTR/MRSTAR and smooth transition regressions with neural coefficients (NCSTR/NCSTAR. Availability of the neural network component allows models of this category achievement of a high empirical authenticity, including reproduction of complex non-linear interrelations. On the other hand, the regression mechanism expands possibilities of interpretation of the obtained results. An example of multi-mode monetary rule is used to show one of the cases of specification and interpretation of this model. In particular, the article models and interprets principles of management of the UAH exchange rate that come into force when economy passes from a relatively stable into a crisis state.
Bias and Uncertainty in Regression-Calibrated Models of Groundwater Flow in Heterogeneous Media
Cooley, R.L.; Christensen, Steen
2006-01-01
to be too small. Model error is accounted for in the weighted nonlinear regression methodology developed to estimate θ* and assess model uncertainties by incorporating the second-moment matrix of the model errors into the weight matrix. Techniques developed by statisticians to analyze classical...... nonlinear regression methods are extended to analyze the revised method. The analysis develops analytical expressions for bias terms reflecting the interaction of model nonlinearity and model error, for correction factors needed to adjust the sizes of confidence and prediction intervals for this interaction......, and for correction factors needed to adjust the sizes of confidence and prediction intervals for possible use of a diagonal weight matrix in place of the correct one. If terms expressing the degree of intrinsic nonlinearity for f(β) and f(γθ*) are small, then most of the biases are small and the...
TIAN Lin-ya; HUA Xi-sheng
2007-01-01
To ensure the safety of buildings surrounding foundation pits, a study was made on a settlement monitoring and trend prediction method. A statistical testing method for analyzing the stability of a settlement monitoring datum has been discussed. According to a comprehensive survey, data of 16 stages at operating control point, were verified by a standard t test to determine the stability of the operating control point. A stationary auto-regression model, AR(p), used for the observation point settlement prediction has been investigated. Given the 16 stages of the settlement data at an observation point, the applicability of this model was analyzed. Settlement of last four stages was predicted using the stationary auto-regression model AR (1); the maximum difference between predicted and measured values was 0.6 mm,indicating good prediction results of the model. Hence, this model can be applied to settlement predictions for buildings surrounding foundation pits.
On the recursive estimation using copula function in the regression model
Djamila Bennafla
2016-01-01
Full Text Available The main aim of this paper is to study the recursive estimation of the regression model by the transformed copula, giving its asymptotic properties to improve the performance of predictors nonparametric kernel, reducing their time calculations by using recursive kernels.
Estimation of Panel Data Regression Models with Two-Sided Censoring or Truncation
Alan, Sule; Honore, Bo E.; Hu, Luojia;
2014-01-01
This paper constructs estimators for panel data regression models with individual speci…fic heterogeneity and two–sided censoring and truncation. Following Powell (1986) the estimation strategy is based on moment conditions constructed from re–censored or re–truncated residuals. While these moment...
Fitting a linear regression model by combining least squares and least absolute value estimation
Allende, Sira; Bouza, Carlos; Romero, Isidro
1995-01-01
Robust estimation of the multiple regression is modeled by using a convex combination of Least Squares and Least Absolute Value criterions. A Bicriterion Parametric algorithm is developed for computing the corresponding estimates. The proposed procedure should be specially useful when outliers are expected. Its behavior is analyzed using some examples.
CORM: An R Package Implementing the Clustering of Regression Models Method for Gene Clustering
Jiejun Shi; Li-Xuan Qin
2014-01-01
We report a new R package implementing the clustering of regression models (CORM) method for clustering genes using gene expression data and provide data examples illustrating each clustering function in the package. The CORM package is freely available at CRAN from http://cran.r-project.org.
The Development and Demonstration of Multiple Regression Models for Operant Conditioning Questions.
Fanning, Fred; Newman, Isadore
Based on the assumption that inferential statistics can make the operant conditioner more sensitive to possible significant relationships, regressions models were developed to test the statistical significance between slopes and Y intercepts of the experimental and control group subjects. These results were then compared to the traditional operant…
Multiple linear regression models are often used to predict levels of fecal indicator bacteria (FIB) in recreational swimming waters based on independent variables (IVs) such as meteorologic, hydrodynamic, and water-quality measures. The IVs used for these analyses are traditiona...
Petersen, Jørgen Holm
2016-01-01
This paper describes a new approach to the estimation in a logistic regression model with two crossed random effects where special interest is in estimating the variance of one of the effects while not making distributional assumptions about the other effect. A composite likelihood is studied. For...
Sieve M-estimation for semiparametric varying-coefficient partially linear regression model
无
2010-01-01
This article considers a semiparametric varying-coefficient partially linear regression model.The semiparametric varying-coefficient partially linear regression model which is a generalization of the partially linear regression model and varying-coefficient regression model that allows one to explore the possibly nonlinear effect of a certain covariate on the response variable.A sieve M-estimation method is proposed and the asymptotic properties of the proposed estimators are discussed.Our main object is to estimate the nonparametric component and the unknown parameters simultaneously.It is easier to compute and the required computation burden is much less than the existing two-stage estimation method.Furthermore,the sieve M-estimation is robust in the presence of outliers if we choose appropriate ρ(·).Under some mild conditions,the estimators are shown to be strongly consistent;the convergence rate of the estimator for the unknown nonparametric component is obtained and the estimator for the unknown parameter is shown to be asymptotically normally distributed.Numerical experiments are carried out to investigate the performance of the proposed method.
Modeling protein tandem mass spectrometry data with an extended linear regression strategy.
Liu, Han; Bonner, Anthony J; Emili, Andrew
2004-01-01
Tandem mass spectrometry (MS/MS) has emerged as a cornerstone of proteomics owing in part to robust spectral interpretation algorithm. The intensity patterns presented in mass spectra are useful information for identification of peptides and proteins. However, widely used algorithms can not predicate the peak intensity patterns exactly. We have developed a systematic analytical approach based on a family of extended regression models, which permits routine, large scale protein expression profile modeling. By proving an important technical result that the regression coefficient vector is just the eigenvector corresponding to the least eigenvalue of a space transformed version of the original data, this extended regression problem can be reduced to a SVD decomposition problem, thus gain the robustness and efficiency. To evaluate the performance of our model, from 60,960 spectra, we chose 2,859 with high confidence, non redundant matches as training data, based on this specific problem, we derived some measurements of goodness of fit to show that our modeling method is reasonable. The issues of overfitting and underfitting are also discussed. This extended regression strategy therefore offers an effective and efficient framework for in-depth investigation of complex mammalian proteomes. PMID:17270923
Johansen, Søren
2008-01-01
The reduced rank regression model is a multivariate regression model with a coefficient matrix with reduced rank. The reduced rank regression algorithm is an estimation procedure, which estimates the reduced rank regression model. It is related to canonical correlations and involves calculating...... eigenvalues and eigenvectors. We give a number of different applications to regression and time series analysis, and show how the reduced rank regression estimator can be derived as a Gaussian maximum likelihood estimator. We briefly mention asymptotic results...
Comparison of regression methods for modeling intensive care length of stay.
Ilona W M Verburg
Full Text Available Intensive care units (ICUs are increasingly interested in assessing and improving their performance. ICU Length of Stay (LoS could be seen as an indicator for efficiency of care. However, little consensus exists on which prognostic method should be used to adjust ICU LoS for case-mix factors. This study compared the performance of different regression models when predicting ICU LoS. We included data from 32,667 unplanned ICU admissions to ICUs participating in the Dutch National Intensive Care Evaluation (NICE in the year 2011. We predicted ICU LoS using eight regression models: ordinary least squares regression on untransformed ICU LoS,LoS truncated at 30 days and log-transformed LoS; a generalized linear model with a Gaussian distribution and a logarithmic link function; Poisson regression; negative binomial regression; Gamma regression with a logarithmic link function; and the original and recalibrated APACHE IV model, for all patients together and for survivors and non-survivors separately. We assessed the predictive performance of the models using bootstrapping and the squared Pearson correlation coefficient (R2, root mean squared prediction error (RMSPE, mean absolute prediction error (MAPE and bias. The distribution of ICU LoS was skewed to the right with a median of 1.7 days (interquartile range 0.8 to 4.0 and a mean of 4.2 days (standard deviation 7.9. The predictive performance of the models was between 0.09 and 0.20 for R2, between 7.28 and 8.74 days for RMSPE, between 3.00 and 4.42 days for MAPE and between -2.99 and 1.64 days for bias. The predictive performance was slightly better for survivors than for non-survivors. We were disappointed in the predictive performance of the regression models and conclude that it is difficult to predict LoS of unplanned ICU admissions using patient characteristics at admission time only.
A brief introduction to regression designs and mixed-effects modelling by a recent convert
Balling, Laura Winther
2008-01-01
This article discusses the advantages of multiple regression designs over the factorial designs traditionally used in many psycholinguistic experiments. It is shown that regression designs are typically more informative, statistically more powerful and better suited to the analysis of naturalistic...... tasks. The advantages of including both fixed and random effects are demonstrated with reference to linear mixed-effects models, and problems of collinearity, variable distribution and variable selection are discussed. The advantages of these techniques are exemplified in an analysis of a word...
Genetic Parameters for Number of Piglets Born Alive Using a Random Regression Model
Zoran Luković
2003-06-01
Full Text Available A random regression model (RRM was applied to estimate dispersion parameters for number of piglets born alive (NBA from first to tenth parity. Random regressions on Legendre polynomials of standardized parity were included for common litter environmental, permanent environmental and additive genetic effects. Estimated phenotypic variance and variance components (ratios for NBA changed over parities and differed between farms. Eigenvalues for additive genetic effect were calculated in order to detect the proportion of additive genetic variability explained with individual production curves of animals. Existence of the 10-20 % genetic variability in the shape of the curves confirms a possibility for selection on persistency in litter size.
Exploratory regression analysis: a tool for selecting models and determining predictor importance.
Braun, Michael T; Oswald, Frederick L
2011-06-01
Linear regression analysis is one of the most important tools in a researcher's toolbox for creating and testing predictive models. Although linear regression analysis indicates how strongly a set of predictor variables, taken together, will predict a relevant criterion (i.e., the multiple R), the analysis cannot indicate which predictors are the most important. Although there is no definitive or unambiguous method for establishing predictor variable importance, there are several accepted methods. This article reviews those methods for establishing predictor importance and provides a program (in Excel) for implementing them (available for direct download at http://dl.dropbox.com/u/2480715/ERA.xlsm?dl=1) . The program investigates all 2(p) - 1 submodels and produces several indices of predictor importance. This exploratory approach to linear regression, similar to other exploratory data analysis techniques, has the potential to yield both theoretical and practical benefits. PMID:21298571
Fatigue design of a cellular phone folder using regression model-based multi-objective optimization
Kim, Young Gyun; Lee, Jongsoo
2016-08-01
In a folding cellular phone, the folding device is repeatedly opened and closed by the user, which eventually results in fatigue damage, particularly to the front of the folder. Hence, it is important to improve the safety and endurance of the folder while also reducing its weight. This article presents an optimal design for the folder front that maximizes its fatigue endurance while minimizing its thickness. Design data for analysis and optimization were obtained experimentally using a test jig. Multi-objective optimization was carried out using a nonlinear regression model. Three regression methods were employed: back-propagation neural networks, logistic regression and support vector machines. The AdaBoost ensemble technique was also used to improve the approximation. Two-objective Pareto-optimal solutions were identified using the non-dominated sorting genetic algorithm (NSGA-II). Finally, a numerically optimized solution was validated against experimental product data, in terms of both fatigue endurance and thickness index.
A note on constrained M-estimation and its recursive analog in multivariate linear regression models
RAO; Calyampudi; R
2009-01-01
In this paper,the constrained M-estimation of the regression coeffcients and scatter parameters in a general multivariate linear regression model is considered.Since the constrained M-estimation is not easy to compute,an up-dating recursion procedure is proposed to simplify the com-putation of the estimators when a new observation is obtained.We show that,under mild conditions,the recursion estimates are strongly consistent.In addition,the asymptotic normality of the recursive constrained M-estimators of regression coeffcients is established.A Monte Carlo simulation study of the recursion estimates is also provided.Besides,robustness and asymptotic behavior of constrained M-estimators are briefly discussed.
Age estimation based on pelvic ossification using regression models from conventional radiography.
Zhang, Kui; Dong, Xiao-Ai; Fan, Fei; Deng, Zhen-Hua
2016-07-01
To establish regression models for age estimation from the combination of the ossification of iliac crest and ischial tuberosity. One thousand three hundred and seventy-nine conventional pelvic radiographs at the West China Hospital of Sichuan University between January 2010 and June 2012 were evaluated retrospectively. The receiver operating characteristic analysis was performed to measure the value of estimation of 18 years of age with the classification scheme for the iliac crest and ischial tuberosity. Regression analysis was performed, and formulas for calculating approximate chronological age according to the combination developmental status of the ossification for the iliac crest and ischial tuberosity were developed. The areas under the receiver operating characteristic (ROC) curves were above 0.9 (p systems, and the cubic regression model was found to have the highest R-square value (R (2) = 0.744 for female and R (2) = 0.753 for male). The present classification scheme for apophyseal iliac crest ossification and the ischial tuberosity may be used for age estimation. And the present established cubic regression model according to the combination developmental status of the ossification for the iliac crest and ischial tuberosity can be used for age estimation. PMID:27169673
Kupek Emil
2006-03-01
Full Text Available Abstract Background Structural equation modelling (SEM has been increasingly used in medical statistics for solving a system of related regression equations. However, a great obstacle for its wider use has been its difficulty in handling categorical variables within the framework of generalised linear models. Methods A large data set with a known structure among two related outcomes and three independent variables was generated to investigate the use of Yule's transformation of odds ratio (OR into Q-metric by (OR-1/(OR+1 to approximate Pearson's correlation coefficients between binary variables whose covariance structure can be further analysed by SEM. Percent of correctly classified events and non-events was compared with the classification obtained by logistic regression. The performance of SEM based on Q-metric was also checked on a small (N = 100 random sample of the data generated and on a real data set. Results SEM successfully recovered the generated model structure. SEM of real data suggested a significant influence of a latent confounding variable which would have not been detectable by standard logistic regression. SEM classification performance was broadly similar to that of the logistic regression. Conclusion The analysis of binary data can be greatly enhanced by Yule's transformation of odds ratios into estimated correlation matrix that can be further analysed by SEM. The interpretation of results is aided by expressing them as odds ratios which are the most frequently used measure of effect in medical statistics.
The applicability of linear regression models in working environments' thermal evaluation.
Pablo Adamoglu de Oliveira
2006-04-01
Full Text Available The simultaneous analysis of thermal variables with normal distribution with the aim of checking if there is any significative correlation among them or if there is the possibility of making predictions of the values of some of them based on others’ values is considered a problem of great importance in statistics studies. The aim of this paper is to study the applicability of linear regression models in working environments’ thermal comfort studies, thus contributing for the comprehension of the possible environmental cooling, heating or winding needs. It starts with a bibliographical research, followed by a field research, data collection and and software statistical-mathematical data treatment. It was then performed data analysis and the construction of the regression linear models using the t and F tests for determining the consistency of the models and their parameters, as well as the building of conclusions based on the information obtained and on the significance of the mathematical models built.
Floating Car Data Based Nonparametric Regression Model for Short-Term Travel Speed Prediction
WENG Jian-cheng; HU Zhong-wei; YU Quan; REN Fu-tian
2007-01-01
A K-nearest neighbor (K-NN) based nonparametric regression model was proposed to predict travel speed for Beijing expressway. By using the historical traffic data collected from the detectors in Beijing expressways, a specically designed database was developed via the processes including data filtering, wavelet analysis and clustering. The relativity based weighted Euclidean distance was used as the distance metric to identify the K groups of nearest data series. Then, a K-NN nonparametric regression model was built to predict the average travel speeds up to 6 min into the future. Several randomly selected travel speed data series,collected from the floating car data (FCD) system, were used to validate the model. The results indicate that using the FCD, the model can predict average travel speeds with an accuracy of above 90%, and hence is feasible and effective.
BOOTSTRAP WAVELET IN THE NONPARAMETRIC REGRESSION MODEL WITH WEAKLY DEPENDENT PROCESSES
林路; 张润楚
2004-01-01
This paper introduces a method of bootstrap wavelet estimation in a nonparametric regression model with weakly dependent processes for both fixed and random designs. The asymptotic bounds for the bias and variance of the bootstrap wavelet estimators are given in the fixed design model. The conditional normality for a modified version of the bootstrap wavelet estimators is obtained in the fixed model. The consistency for the bootstrap wavelet estimator is also proved in the random design model. These results show that the bootstrap wavelet method is valid for the model with weakly dependent processes.
Menon Carlo
2011-09-01
Full Text Available Abstract Background Several regression models have been proposed for estimation of isometric joint torque using surface electromyography (SEMG signals. Common issues related to torque estimation models are degradation of model accuracy with passage of time, electrode displacement, and alteration of limb posture. This work compares the performance of the most commonly used regression models under these circumstances, in order to assist researchers with identifying the most appropriate model for a specific biomedical application. Methods Eleven healthy volunteers participated in this study. A custom-built rig, equipped with a torque sensor, was used to measure isometric torque as each volunteer flexed and extended his wrist. SEMG signals from eight forearm muscles, in addition to wrist joint torque data were gathered during the experiment. Additional data were gathered one hour and twenty-four hours following the completion of the first data gathering session, for the purpose of evaluating the effects of passage of time and electrode displacement on accuracy of models. Acquired SEMG signals were filtered, rectified, normalized and then fed to models for training. Results It was shown that mean adjusted coefficient of determination (Ra2 values decrease between 20%-35% for different models after one hour while altering arm posture decreased mean Ra2 values between 64% to 74% for different models. Conclusions Model estimation accuracy drops significantly with passage of time, electrode displacement, and alteration of limb posture. Therefore model retraining is crucial for preserving estimation accuracy. Data resampling can significantly reduce model training time without losing estimation accuracy. Among the models compared, ordinary least squares linear regression model (OLS was shown to have high isometric torque estimation accuracy combined with very short training times.
Goyal, S; Goyal, G. K.
2012-01-01
This paper highlights the significance of computational intelligence models for predicting shelf life of processed cheese stored at 7-8 g.C. Linear Layer and Generalized Regression models were developed with input parameters: Soluble nitrogen, pH, Standard plate count, Yeast & mould count, Spores, and sensory score as output parameter. Mean Square Error, Root Mean Square Error, Coefficient of Determination and Nash - Sutcliffo Coefficient were used in order to compare the prediction ability o...
Regression spline bivariate probit models: a practical approach to testing for exogeneity
Marra, G.; Radice, Rosalba; Filippou, P
2015-01-01
Bivariate probit models can deal with a problem usually known as endogeneity. This issue is likely to arise in observational studies when confounders are unobserved. We are concerned with testing the hypothesis of exogeneity (or absence of endogeneity) when using regression spline recursive and sample selection bivariate probit models. Likelihood ratio and gradient tests are discussed in this context and their empirical properties investigated and compared with those of the Lagrange multiplie...
Interpretation of Linear Regression Coefficients under Mean Model Miss-Specification
Brannath, Werner; Scharpenberg, Martin
2014-01-01
Linear regression is a frequently used tool in statistics, however, its validity and interpretability relies on strong model assumptions. While robust estimates of the coefficients' covariance extend the validity of hypothesis tests and confidence intervals, a clear interpretation of the coefficients is lacking if the mean structure of the model is miss-specified. We therefore suggest a new intuitive and mathematical rigorous interpretation of the coefficients that is independent from specifi...
Determining the number of breaks in a piecewise linear regression model
Strikholm, Birgit
2006-01-01
In this paper we propose a sequential method for determining the number of breaks in piecewise linear structural break models. An advantage of the method is that it is based on standard statistical inference. Tests available for testing linearity against switching regression type nonlinearity are applied sequentially to determine the number of regimes in the structural break model. A simulation study is performed in order to investigate the finite-sample behaviour of the procedure and to comp...
Inferring Preferences in Multiple Criteria Decision Analysis Using a Logistic Regression Model
Theodor J Stewart
1984-01-01
A method is proposed for the analysis of multiple criteria decision making problems in an interactive environment, when decision-maker preferences are inconsistent with a simple utility model and/or are self-inconsistent (e.g., showing intransitivities). A maximum likelihood estimation procedure is invoked which is based on a logistic regression model relating the probability of selecting one decision option over another to a linear function of attribute values. The method is illustrated by a...
General model selection estimation of a periodic regression with a Gaussian noise
Konev, Victor; Pergamenchtchikov, Serguei
2010-01-01
This paper considers the problem of estimating a periodic function in a continuous time regression model with an additive stationary gaussian noise having unknown correlation function. A general model selection procedure on the basis of arbitrary projective estimates, which does not need the knowledge of the noise correlation function, is proposed. A non-asymptotic upper bound for quadratic risk (oracle inequality) has been derived under mild conditions on the noise. For the Ornstein-Uhlenbec...
Modelling Scale Effect in Crosssection Data:The Case of Hedonic Price Regression
DUo Qin; Yimeng Liu
2013-01-01
An innovative and simple experiment with cross-section data ordering is carried out to exploit a basic and common feature between many economic variablesâ€“nonlinear scale dependence. The experiment is tried on hedonic price regression models using two data sets, one for automobiles and the other computers. The key findings are: (a) Hedonic price indices can be significantly biased if they are constructed using models which disregard possible nonlinear scale effects latent in random data samp...
Kupek Emil
2006-01-01
Abstract Background Structural equation modelling (SEM) has been increasingly used in medical statistics for solving a system of related regression equations. However, a great obstacle for its wider use has been its difficulty in handling categorical variables within the framework of generalised linear models. Methods A large data set with a known structure among two related outcomes and three independent variables was generated to investigate the use of Yule's transformation of odds ratio (O...
Rachna Aggarwal; Maneek Kumar; Sharma, R K; M. K. Sharma
2014-01-01
This paper presents Reliability Based Design Optimization (RBDO) model to deal with uncertainties involved in concrete mix design process. The optimization problem is formulated in such a way that probabilistic concrete mix input parameters showing random characteristics are determined by minimizing the cost of concrete subjected to concrete compressive strength constraint for a given target reliability. Linear and quadratic models based on Ordinary Least Square Regression (OLSR), Traditiona...
Management parameters from the random regressions testday model to advic farmers on cow nutrition
Caccamo, M.
2012-01-01
Accurate monitoring and adequate planning of activities at modern dairy farms are important to improve farm profitability. The aim of this study was to investigate the use of test-day information to support farmers in management of Sicilian dairy herds. To this purpose, a test-day random regression model was developed for the analysis of production data of Sicilian dairy herds. Highest between-herd variation found in the variance components analysis using the test day model showed clear evide...
APPLICATION OF REGRESSION MODELLING TECHNIQUES IN DESALINATION OF SEA WATER BY MEMBRANE DISTILLATION
SELVI S. R
2015-08-01
Full Text Available The objective of this work is to gain an idea about the statistical significance of experimental parameters on the performance of membrane distillation. In this work the raw sea water sample without pretreatment was collected from Puducherry and desalinated using direct contact membrane distillation method. Experimental data analysis was carried out using statistical methods. The experimental data involves the effects of feed temperature, feed flow rate and feed concentration on the permeate flux. In statistical methods, regression model was developed to correlate the significance of input parameters like feed temperature, feed concentration and feed flow rate with the output parameter like permeate flux in the process of membrane distillation. Since the performance of the membrane distillation in the desalination of water is characterised by permeate flux, regression model using simple linear method was carried out. Goodness of model fitting should always has to be validated. Regression model was validated using ANOVA. Estimates of ANOVA for the parameter study was given and the coefficient obtained by regression analysis was specified in the regression equation and concluded that the highest coefficient of input parameter is significant, highly influences the response. Feed flow rate and feed temperature has higher influence on permeate flux than that of feed concentration. The coefficient of feed concentration was found to be negative which indicates less significant factor on permeate flux. The chemical composition of sea water was given by water quality analysis . TDS of membrane distilled water was found to be 18ppm than the initial feed TDS of sea water 27,720 ppm. From the experimental work it was found, salt rejection as 99% and water analysis report confirms the quality of distillate obtained by this desalination process as potable water.
Support vector regression model based predictive control of water level of U-tube steam generators
Highlights: • Water level of U-tube steam generators was controlled in a model predictive fashion. • Models for steam generator water level were built using support vector regression. • Cost function minimization for future optimal controls was performed by using the steepest descent method. • The results indicated the feasibility of the proposed method. - Abstract: A predictive control algorithm using support vector regression based models was proposed for controlling the water level of U-tube steam generators of pressurized water reactors. Steam generator data were obtained using a transfer function model of U-tube steam generators. Support vector regression based models were built using a time series type model structure for five different operating powers. Feedwater flow controls were calculated by minimizing a cost function that includes the level error, the feedwater change and the mismatch between feedwater and steam flow rates. Proposed algorithm was applied for a scenario consisting of a level setpoint change and a steam flow disturbance. The results showed that steam generator level can be controlled at all powers effectively by the proposed method
Mel'nikov, A. V.
1996-10-01
Contents Introduction Chapter I. Basic notions and results from contemporary martingale theory §1.1. General notions of the martingale theory §1.2. Convergence (a.s.) of semimartingales. The strong law of large numbers and the law of the iterated logarithm Chapter II. Stochastic differential equations driven by semimartingales §2.1. Basic notions and results of the theory of stochastic differential equations driven by semimartingales §2.2. The method of monotone approximations. Existence of strong solutions of stochastic equations with non-smooth coefficients §2.3. Linear stochastic equations. Properties of stochastic exponentials §2.4. Linear stochastic equations. Applications to models of the financial market Chapter III. Procedures of stochastic approximation as solutions of stochastic differential equations driven by semimartingales §3.1. Formulation of the problem. A general model and its relation to the classical one §3.2. A general description of the approach to the procedures of stochastic approximation. Convergence (a.s.) and asymptotic normality §3.3. The Gaussian model of stochastic approximation. Averaged procedures and their effectiveness Chapter IV. Statistical estimation in regression models with martingale noises §4.1. The formulation of the problem and classical regression models §4.2. Asymptotic properties of MLS-estimators. Strong consistency, asymptotic normality, the law of the iterated logarithm §4.3. Regression models with deterministic regressors §4.4. Sequential MLS-estimators with guaranteed accuracy and sequential statistical inferences Bibliography
Rachna Aggarwal
2014-12-01
Full Text Available This paper presents Reliability Based Design Optimization (RBDO model to deal with uncertainties involved in concrete mix design process. The optimization problem is formulated in such a way that probabilistic concrete mix input parameters showing random characteristics are determined by minimizing the cost of concrete subjected to concrete compressive strength constraint for a given target reliability. Linear and quadratic models based on Ordinary Least Square Regression (OLSR, Traditional Ridge Regression (TRR and Generalized Ridge Regression (GRR techniques have been explored to select the best model to explicitly represent compressive strength of concrete. The RBDO model is solved by Sequential Optimization and Reliability Assessment (SORA method using fully quadratic GRR model. Optimization results for a wide range of target compressive strength and reliability levels of 0.90, 0.95 and 0.99 have been reported. Also, safety factor based Deterministic Design Optimization (DDO designs for each case are obtained. It has been observed that deterministic optimal designs are cost effective but proposed RBDO model gives improved design performance.
An adaptive contextual quantum language model
Li, Jingfei; Zhang, Peng; Song, Dawei; Hou, Yuexian
2016-08-01
User interactions in search system represent a rich source of implicit knowledge about the user's cognitive state and information need that continuously evolves over time. Despite massive efforts that have been made to exploiting and incorporating this implicit knowledge in information retrieval, it is still a challenge to effectively capture the term dependencies and the user's dynamic information need (reflected by query modifications) in the context of user interaction. To tackle these issues, motivated by the recent Quantum Language Model (QLM), we develop a QLM based retrieval model for session search, which naturally incorporates the complex term dependencies occurring in user's historical queries and clicked documents with density matrices. In order to capture the dynamic information within users' search session, we propose a density matrix transformation framework and further develop an adaptive QLM ranking model. Extensive comparative experiments show the effectiveness of our session quantum language models.
Model reference adaptive control and adaptive stability augmentation
Henningsen, Arne; Ravn, Ole
1993-01-01
A comparison of the standard concepts in MRAC design suggests that a combination of the implicit and the explicit design techniques may lead to an improvement of the overall system performance in the presence of unmodelled dynamics. Using the ideas of adaptive stability augmentation a combined...
Model reference adaptive control and adaptive stability augmentation
Henningsen, Arne; Ravn, Ole
A comparison of the standard concepts in MRAC design suggests that a combination of the implicit and the explicit design techniques may lead to an improvement of the overall system performance in the presence of unmodelled dynamics. Using the ideas of adaptive stability augmentation a combined...
Das, Iswar; Stein, Alfred; Kerle, Norman; Dadhwal, Vinay K.
2012-12-01
Landslide susceptibility mapping (LSM) along road corridors in the Indian Himalayas is an essential exercise that helps planners and decision makers in determining the severity of probable slope failure areas. Logistic regression is commonly applied for this purpose, as it is a robust and straightforward technique that is relatively easy to handle. Ordinary logistic regression as a data-driven technique, however, does not allow inclusion of prior information. This study presents Bayesian logistic regression (BLR) for landslide susceptibility assessment along road corridors. The methodology is tested in a landslide-prone area in the Bhagirathi river valley in the Indian Himalayas. Parameter estimates from BLR are compared with those obtained from ordinary logistic regression. By means of iterative Markov Chain Monte Carlo simulation, BLR provides a rich set of results on parameter estimation. We assessed model performance by the receiver operator characteristics curve analysis, and validated the model using 50% of the landslide cells kept apart for testing and validation. The study concludes that BLR performs better in posterior parameter estimation in general and the uncertainty estimation in particular.
Asavaskulkiet, Krissada
2014-01-01
This paper proposes a novel face super-resolution reconstruction (hallucination) technique for YCbCr color space. The underlying idea is to learn with an error regression model and multi-linear principal component analysis (MPCA). From hallucination framework, many color face images are explained in YCbCr space. To reduce the time complexity of color face hallucination, we can be naturally described the color face imaged as tensors or multi-linear arrays. In addition, the error regression analysis is used to find the error estimation which can be obtained from the existing LR in tensor space. In learning process is from the mistakes in reconstruct face images of the training dataset by MPCA, then finding the relationship between input and error by regression analysis. In hallucinating process uses normal method by backprojection of MPCA, after that the result is corrected with the error estimation. In this contribution we show that our hallucination technique can be suitable for color face images both in RGB and YCbCr space. By using the MPCA subspace with error regression model, we can generate photorealistic color face images. Our approach is demonstrated by extensive experiments with high-quality hallucinated color faces. Comparison with existing algorithms shows the effectiveness of the proposed method.
The study on Sanmenxia annual flow forecasting in the Yellow River with mix regression model
JIANG Xiaohui; LIU Changming; WANG Yu; WANG Hongrui
2004-01-01
This paper established mix regression model for simulating annual flow, in which annual runoff is auto-regression factor, precipitation, air temperature and water consumption are regression factors; we adopted 9 hypothesis climate change schemes to forecast the change of annual flow of Sanmenxia Station. The results show: (1) When temperature is steady, the average annual runoff will increase by 8.3% if precipitation increases by 10%; when precipitation decreases by 10%, the average annual runoff will decrease by 8.2%; when precipitation is steady, the average annual runoff will decrease by 2.4% if temperature increases 1 ℃; if temperature decreases 1 ℃, runoff will increase by 1.2%. The mix regression model can well simulate annual runoff. (2) As to 9 different temperature and precipitation scenarios, scenario 9 is the most adverse to the runoff of Sanmenxia Station of Yellow River; i.e. temperature increases 1℃and precipitation decreases by 10%. Under this condition, the simulated average annual runoff decreases by 10.8%. On the contrary, scenario 1 is the best to the enhancement of runoff; i.e. when temperature decreases 1 ℃ precipitation will increase by 10%, which will make the annual runoff of Sanmenxia increase by 10.6%.
Estimating strength of DDoS attack using various regression models
Gupta, B B; Misra, Manoj
2012-01-01
Anomaly-based DDoS detection systems construct profile of the traffic normally seen in the network, and identify anomalies whenever traffic deviate from normal profile beyond a threshold. This extend of deviation is normally not utilised. This paper reports the evaluation results of proposed approach that utilises this extend of deviation from detection threshold to estimate strength of DDoS attack using various regression models. A relationship is established between number of zombies and observed deviation in sample entropy. Various statistical performance measures, such as coefficient of determination (R2), coefficient of correlation (CC), sum of square error (SSE), mean square error (MSE), root mean square error (RMSE), normalised mean square error (NMSE), Nash-Sutcliffe efficiency index ({\\eta}) and mean absolute error (MAE) are used to measure the performance of various regression models. Internet type topologies used for simulation are generated using transit-stub model of GT-ITM topology generator. NS...
Oliveira, María; Einbeck, Jochen; Higueras, Manuel; Ainsbury, Elizabeth; Puig, Pedro; Rothkamm, Kai
2016-03-01
Within the field of cytogenetic biodosimetry, Poisson regression is the classical approach for modeling the number of chromosome aberrations as a function of radiation dose. However, it is common to find data that exhibit overdispersion. In practice, the assumption of equidispersion may be violated due to unobserved heterogeneity in the cell population, which will render the variance of observed aberration counts larger than their mean, and/or the frequency of zero counts greater than expected for the Poisson distribution. This phenomenon is observable for both full- and partial-body exposure, but more pronounced for the latter. In this work, different methodologies for analyzing cytogenetic chromosomal aberrations datasets are compared, with special focus on zero-inflated Poisson and zero-inflated negative binomial models. A score test for testing for zero inflation in Poisson regression models under the identity link is also developed. PMID:26461836
Wang, Shuang; Jiang, Xiaoqian; Wu, Yuan; Cui, Lijuan; Cheng, Samuel; Ohno-Machado, Lucila
2013-01-01
We developed an EXpectation Propagation LOgistic REgRession (EXPLORER) model for distributed privacy-preserving online learning. The proposed framework provides a high level guarantee for protecting sensitive information, since the information exchanged between the server and the client is the encrypted posterior distribution of coefficients. Through experimental results, EXPLORER shows the same performance (e.g., discrimination, calibration, feature selection etc.) as the traditional frequentist Logistic Regression model, but provides more flexibility in model updating. That is, EXPLORER can be updated one point at a time rather than having to retrain the entire data set when new observations are recorded. The proposed EXPLORER supports asynchronized communication, which relieves the participants from coordinating with one another, and prevents service breakdown from the absence of participants or interrupted communications. PMID:23562651
Random regression models for daily feed intake in Danish Duroc pigs
Strathe, Anders Bjerring; Mark, Thomas; Jensen, Just;
The objective of this study was to develop random regression models and estimate covariance functions for daily feed intake (DFI) in Danish Duroc pigs. A total of 476201 DFI records were available on 6542 Duroc boars between 70 to 160 days of age. The data originated from the National test station......-year-season, permanent, and animal genetic effects. The functional form was based on Legendre polynomials. A total of 64 models for random regressions were initially ranked by BIC to identify the approximate order for the Legendre polynomials using AI-REML. The parsimonious model included Legendre polynomials of 2nd....... Eigenvalues of the genetic covariance function showed that 33% of genetic variability was explained by the individual genetic curve of the pigs. This proportion was covered by linear (27%) and quadratic (6%) coefficients. Genetic eigenfunctions revealed that altering the shape of the feed intake curve by...
Mahmood, Ehab A.; Rana, Sohel; Hussin, Abdul Ghapor; Midi, Habshah
2016-06-01
The circular regression model may contain one or more data points which appear to be peculiar or inconsistent with the main part of the model. This may be occur due to recording errors, sudden short events, sampling under abnormal conditions etc. The existence of these data points "outliers" in the data set cause lot of problems in the research results and the conclusions. Therefore, we should identify them before applying statistical analysis. In this article, we aim to propose a statistic to identify outliers in the both of the response and explanatory variables of the simple circular regression model. Our proposed statistic is robust circular distance RCDxy and it is justified by the three robust measurements such as proportion of detection outliers, masking and swamping rates.
Probing turbulence intermittency via Auto-Regressive Moving-Average models
Faranda, Davide; Dubrulle, Berengere; Daviaud, Francois
2014-01-01
We suggest a new approach to probing intermittency corrections to the Kolmogorov law in turbulent flows based on the Auto-Regressive Moving-Average modeling of turbulent time series. We introduce a new index $\\Upsilon$ that measures the distance from a Kolmogorov-Obukhov model in the Auto-Regressive Moving-Average models space. Applying our analysis to Particle Image Velocimetry and Laser Doppler Velocimetry measurements in a von K\\'arm\\'an swirling flow, we show that $\\Upsilon$ is proportional to the traditional intermittency correction computed from the structure function. Therefore it provides the same information, using much shorter time series. We conclude that $\\Upsilon$ is a suitable index to reconstruct the spatial intermittency of the dissipation in both numerical and experimental turbulent fields.
González-Aparicio, I; Hidalgo, J; Baklanov, A; Padró, A; Santa-Coloma, O
2013-07-01
There is extensive evidence of the negative impacts on health linked to the rise of the regional background of particulate matter (PM) 10 levels. These levels are often increased over urban areas becoming one of the main air pollution concerns. This is the case on the Bilbao metropolitan area, Spain. This study describes a data-driven model to diagnose PM10 levels in Bilbao at hourly intervals. The model is built with a training period of 7-year historical data covering different urban environments (inland, city centre and coastal sites). The explanatory variables are quantitative-log [NO2], temperature, short-wave incoming radiation, wind speed and direction, specific humidity, hour and vehicle intensity-and qualitative-working days/weekends, season (winter/summer), the hour (from 00 to 23 UTC) and precipitation/no precipitation. Three different linear regression models are compared: simple linear regression; linear regression with interaction terms (INT); and linear regression with interaction terms following the Sawa's Bayesian Information Criteria (INT-BIC). Each type of model is calculated selecting two different periods: the training (it consists of 6 years) and the testing dataset (it consists of 1 year). The results of each type of model show that the INT-BIC-based model (R(2) = 0.42) is the best. Results were R of 0.65, 0.63 and 0.60 for the city centre, inland and coastal sites, respectively, a level of confidence similar to the state-of-the art methodology. The related error calculated for longer time intervals (monthly or seasonal means) diminished significantly (R of 0.75-0.80 for monthly means and R of 0.80 to 0.98 at seasonally means) with respect to shorter periods. PMID:23247520
Adaptive Genetic Algorithm Model for Intrusion Detection
K. S. Anil Kumar
2012-09-01
Full Text Available Intrusion detection systems are intelligent systems designed to identify and prevent the misuse of computer networks and systems. Various approaches to Intrusion Detection are currently being used, but they are relatively ineffective. Thus the emerging network security systems need be part of the life system and this ispossible only by embedding knowledge into the network. The Adaptive Genetic Algorithm Model - IDS comprising of K-Means clustering Algorithm, Genetic Algorithm and Neural Network techniques. Thetechnique is tested using multitude of background knowledge sets in DARPA network traffic datasets.
A Model for Dynamic Adaptive Coscheduling
LU Sanglu; ZHOU Xiaobo; XIE Li
1999-01-01
This paper proposes a dynamic adaptive coscheduling modelDASIC to take advantage of excess available resources in anetwork of workstations (NOW). Besides coscheduling related subtasksdynamically, DASIC can scale up or down the process space dependingupon the number of available processors on an NOW. Based on thedynamic idle processor group (IPG), DASIC employs three modules: thecoscheduling module, the scalable scheduling module and the loadbalancing module, and uses six algorithms to achieve scalability. Asimplified DASIC was also implemented, and experimental results arepresented in this paper, which show that it can maximize systemutilization, and achieve task parallelism as much as possible.
A class of additive-accelerated means regression models for recurrent event data
无
2010-01-01
In this article, we propose a class of additive-accelerated means regression models for analyzing recurrent event data. The class includes the proportional means model, the additive rates model, the accelerated failure time model, the accelerated rates model and the additive-accelerated rate model as special cases. The new model offers great flexibility in formulating the effects of covariates on the mean functions of counting processes while leaving the stochastic structure completely unspecified. For the inference on the model parameters, estimating equation approaches are derived and asymptotic properties of the proposed estimators are established. In addition, a technique is provided for model checking. The finite-sample behavior of the proposed methods is examined through Monte Carlo simulation studies, and an application to a bladder cancer study is illustrated.
Adaptive model training system and method
Bickford, Randall L; Palnitkar, Rahul M; Lee, Vo
2014-04-15
An adaptive model training system and method for filtering asset operating data values acquired from a monitored asset for selectively choosing asset operating data values that meet at least one predefined criterion of good data quality while rejecting asset operating data values that fail to meet at least the one predefined criterion of good data quality; and recalibrating a previously trained or calibrated model having a learned scope of normal operation of the asset by utilizing the asset operating data values that meet at least the one predefined criterion of good data quality for adjusting the learned scope of normal operation of the asset for defining a recalibrated model having the adjusted learned scope of normal operation of the asset.
Adaptive model training system and method
Bickford, Randall L; Palnitkar, Rahul M
2014-11-18
An adaptive model training system and method for filtering asset operating data values acquired from a monitored asset for selectively choosing asset operating data values that meet at least one predefined criterion of good data quality while rejecting asset operating data values that fail to meet at least the one predefined criterion of good data quality; and recalibrating a previously trained or calibrated model having a learned scope of normal operation of the asset by utilizing the asset operating data values that meet at least the one predefined criterion of good data quality for adjusting the learned scope of normal operation of the asset for defining a recalibrated model having the adjusted learned scope of normal operation of the asset.
Highlights: ► Original software for composite dynamic envelope’s thermal performance forecasting. ► Construction of two hypothetical composite dynamic wall’s prototypes. ► Different simulation scenarios based on fractional factorial simulation design. ► Development of polynomial regression models. ► Validation and evaluation of polynomial regression models. - Abstract: The building envelope’s insulating efficiency is always a key element regarding the energy consumption control of the whole building. This article aims to propose a simple method based on classic and fractional factorial simulation plans to obtain regression models in the form of polynomial functions that link the angle, the thermal conductivity and the thickness of each envelope’s component to the overall wall’s thermal resistance. Original software that combines classic and novel modeling techniques has been used in order to have a precise and validated numerical investigation that focuses in a variety of possible composite dynamic wall’s configurations. For the purposes of this study, the combined radiation/conduction heat transfer finite volume numerical model was updated complex enough to predict the temperature distribution and heat transfer in composite envelopes for a variety of inclination angles. The model takes into account the coupling between the solid conduction of both solid and fibrous systems and the gaseous conduction and radiation. The radiation heat transfer through each insulating layer has been modeled via the two flux approximation in order to take into account both optically thick and optically thin materials, as well as potential reflective surfaces currently used on composite wall’s applications. Different simulation scenarios have been conceived according to basic fractional factorial simulation plans in order to obtain valid empirical polynomial functions. To validate this statistical forecast system, many simulation scenarios were carried out and
Adaptive dynamics for physiologically structured population models.
Durinx, Michel; Metz, J A J Hans; Meszéna, Géza
2008-05-01
We develop a systematic toolbox for analyzing the adaptive dynamics of multidimensional traits in physiologically structured population models with point equilibria (sensu Dieckmann et al. in Theor. Popul. Biol. 63:309-338, 2003). Firstly, we show how the canonical equation of adaptive dynamics (Dieckmann and Law in J. Math. Biol. 34:579-612, 1996), an approximation for the rate of evolutionary change in characters under directional selection, can be extended so as to apply to general physiologically structured population models with multiple birth states. Secondly, we show that the invasion fitness function (up to and including second order terms, in the distances of the trait vectors to the singularity) for a community of N coexisting types near an evolutionarily singular point has a rational form, which is model-independent in the following sense: the form depends on the strategies of the residents and the invader, and on the second order partial derivatives of the one-resident fitness function at the singular point. This normal form holds for Lotka-Volterra models as well as for physiologically structured population models with multiple birth states, in discrete as well as continuous time and can thus be considered universal for the evolutionary dynamics in the neighbourhood of singular points. Only in the case of one-dimensional trait spaces or when N = 1 can the normal form be reduced to a Taylor polynomial. Lastly we show, in the form of a stylized recipe, how these results can be combined into a systematic approach for the analysis of the (large) class of evolutionary models that satisfy the above restrictions. PMID:17943289
Cluster regression model and level fluctuation features of Van Lake, Turkey
Z. Şen
Full Text Available Lake water levels change under the influences of natural and/or anthropogenic environmental conditions. Among these influences are the climate change, greenhouse effects and ozone layer depletions which are reflected in the hydrological cycle features over the lake drainage basins. Lake levels are among the most significant hydrological variables that are influenced by different atmospheric and environmental conditions. Consequently, lake level time series in many parts of the world include nonstationarity components such as shifts in the mean value, apparent or hidden periodicities. On the other hand, many lake level modeling techniques have a stationarity assumption. The main purpose of this work is to develop a cluster regression model for dealing with nonstationarity especially in the form of shifting means. The basis of this model is the combination of transition probability and classical regression technique. Both parts of the model are applied to monthly level fluctuations of Lake Van in eastern Turkey. It is observed that the cluster regression procedure does preserve the statistical properties and the transitional probabilities that are indistinguishable from the original data.
Key words. Hydrology (hydrologic budget; stochastic processes · Meteorology and atmospheric dynamics (ocean-atmosphere interactions
Renata Pires Gonçalves
2012-02-01
. The experiments of type dosage x response are very common in the determination of levels of nutrients in optimal food balance and include the use of regression models to achieve this objective. Nevertheless, the regression analysis routine, generally, uses a priori information about a possible relationship between the response variable. The isotonic regression is a method of estimation by least squares that generates estimates which preserves data ordering. In the theory of isotonic regression this information is essential and it is expected to increase fitting efficiency. The objective of this work was to use an isotonic regression methodology, as an alternative way of analyzing data of Zn deposition in tibia of male birds of Hubbard lineage. We considered the models of plateau response of polynomial quadratic and linear exponential forms. In addition to these models, we also proposed the fitting of a logarithmic model to the data and the efficiency of the methodology was evaluated by Monte Carlo simulations, considering different scenarios for the parametric values. The isotonization of the data yielded an improvement in all the fitting quality parameters evaluated. Among the models used, the logarithmic presented estimates of the parameters more consistent with the values reported in literature.
R B Magar; V Jothiprakash
2011-12-01
In this study, multi-linear regression (MLR) approach is used to construct intermittent reservoir daily inflow forecasting system. To illustrate the applicability and effect of using lumped and distributed input data in MLR approach, Koyna river watershed in Maharashtra, India is chosen as a case study. The results are also compared with autoregressive integrated moving average (ARIMA) models. MLR attempts to model the relationship between two or more independent variables over a dependent variable by fitting a linear regression equation. The main aim of the present study is to see the consequences of development and applicability of simple models, when sufficient data length is available. Out of 47 years of daily historical rainfall and reservoir inflow data, 33 years of data is used for building the model and 14 years of data is used for validating the model. Based on the observed daily rainfall and reservoir inflow, various types of time-series, cause-effect and combined models are developed using lumped and distributed input data. Model performance was evaluated using various performance criteria and it was found that as in the present case, of well correlated input data, both lumped and distributed MLR models perform equally well. For the present case study considered, both MLR and ARIMA models performed equally sound due to availability of large dataset.
Giuliano de Oliveira Freitas
2013-10-01
Full Text Available PURPOSE: To determine linear regression models between Alpins descriptive indices and Thibos astigmatic power vectors (APV, assessing the validity and strength of such correlations. METHODS: This case series prospectively assessed 62 eyes of 31 consecutive cataract patients with preoperative corneal astigmatism between 0.75 and 2.50 diopters in both eyes. Patients were randomly assorted among two phacoemulsification groups: one assigned to receive AcrySof®Toric intraocular lens (IOL in both eyes and another assigned to have AcrySof Natural IOL associated with limbal relaxing incisions, also in both eyes. All patients were reevaluated postoperatively at 6 months, when refractive astigmatism analysis was performed using both Alpins and Thibos methods. The ratio between Thibos postoperative APV and preoperative APV (APVratio and its linear regression to Alpins percentage of success of astigmatic surgery, percentage of astigmatism corrected and percentage of astigmatism reduction at the intended axis were assessed. RESULTS: Significant negative correlation between the ratio of post- and preoperative Thibos APVratio and Alpins percentage of success (%Success was found (Spearman's ρ=-0.93; linear regression is given by the following equation: %Success = (-APVratio + 1.00x100. CONCLUSION: The linear regression we found between APVratio and %Success permits a validated mathematical inference concerning the overall success of astigmatic surgery.
Nowadays, due to power crisis, electricity demand forecasting is deemed an important area for socioeconomic development and proper anticipation of the load forecasting is considered essential step towards efficient power system operation, scheduling and planning. In this paper, we present STLF (Short Term Load Forecasting) using multiple regression techniques (i.e. linear, multiple linear, quadratic and exponential) by considering hour by hour load model based on specific targeted day approach with temperature variant parameter. The proposed work forecasts the future load demand correlation with linear and non-linear parameters (i.e. considering temperature in our case) through different regression approaches. The overall load forecasting error is 2.98% which is very much acceptable. From proposed regression techniques, Quadratic Regression technique performs better compared to than other techniques because it can optimally fit broad range of functions and data sets. The work proposed in this paper, will pave a path to effectively forecast the specific day load with multiple variance factors in a way that optimal accuracy can be maintained. (author)
Plant adaptive behaviour in hydrological models (Invited)
van der Ploeg, M. J.; Teuling, R.
2013-12-01
Models that will be able to cope with future precipitation and evaporation regimes need a solid base that describes the essence of the processes involved [1]. Micro-behaviour in the soil-vegetation-atmosphere system may have a large impact on patterns emerging at larger scales. A complicating factor in the micro-behaviour is the constant interaction between vegetation and geology in which water plays a key role. The resilience of the coupled vegetation-soil system critically depends on its sensitivity to environmental changes. As a result of environmental changes vegetation may wither and die, but such environmental changes may also trigger gene adaptation. Constant exposure to environmental stresses, biotic or abiotic, influences plant physiology, gene adaptations, and flexibility in gene adaptation [2-6]. Gene expression as a result of different environmental conditions may profoundly impact drought responses across the same plant species. Differences in response to an environmental stress, has consequences for the way species are currently being treated in models (single plant to global scale). In particular, model parameters that control root water uptake and plant transpiration are generally assumed to be a property of the plant functional type. Assigning plant functional types does not allow for local plant adaptation to be reflected in the model parameters, nor does it allow for correlations that might exist between root parameters and soil type. Models potentially provide a means to link root water uptake and transport to large scale processes (e.g. Rosnay and Polcher 1998, Feddes et al. 2001, Jung 2010), especially when powered with an integrated hydrological, ecological and physiological base. We explore the experimental evidence from natural vegetation to formulate possible alternative modeling concepts. [1] Seibert, J. 2000. Multi-criteria calibration of a conceptual runoff model using a genetic algorithm. Hydrology and Earth System Sciences 4(2): 215
Lihua Yang
2015-04-01
Full Text Available In order to improve the accuracy of grain production forecasting, this study proposed a new combination forecasting model, the model combined stepwise regression method with RBF neural network by assigning proper weights using inverse variance method. By comparing different criteria, the result indicates that the combination forecasting model is superior to other models. The performance of the models is measured using three types of error measurement, which are Mean Absolute Percentage Error (MAPE, Theil Inequality Coefficient (Theil IC and Root Mean Squared Error (RMSE. The model with smallest value of MAPE, Theil IC and RMSE stands out to be the best model in predicting the grain production. Based on the MAPE, Theil IC and RMSE evaluation criteria, the combination model can reduce the forecasting error and has high prediction accuracy in grain production forecasting, making the decision more scientific and rational.
A coarsened multinomial regression model for perinatal mother to child transmission of HIV
Brown Elizabeth R
2008-07-01
Full Text Available Abstract Background In trials designed to estimate rates of perinatal mother to child transmission of HIV, HIV assays are scheduled at multiple points in time. Still, infection status for some infants at some time points may be unknown, particularly when interim analyses are conducted. Methods Logistic regression models are commonly used to estimate covariate-adjusted transmission rates, but their methods for handling missing data may be inadequate. Here we propose using coarsened multinomial regression models to estimate cumulative and conditional rates of HIV transmission. Through simulation, we compare the proposed models to standard logistic models in terms of bias, mean squared error, coverage probability, and power. We consider a range of treatment effect and visit process scenarios, while including imperfect sensitivity of the assay and contamination of the endpoint due to early breastfeeding transmission. We illustrate the approach through analysis of data from a clinical trial designed to prevent perinatal transmission. Results The proposed cumulative and conditional models performed well when compared to their logistic counterparts. Performance of the proposed cumulative model was particularly strong under scenarios where treatment was assumed to increase the risk of in utero transmission but decrease the risk of intrapartum and overall perinatal transmission and under scenarios designed to represent interim analyses. Power to estimate intrapartum and perinatal transmission was consistently higher for the proposed models. Conclusion Coarsened multinomial regression models are preferred to standard logistic models for estimation of perinatal mother to child transmission of HIV, particularly when assays are missing or occur off-schedule for some infants.
Lee, Myung Hee; Liu, Yufeng
2013-12-01
The continuum regression technique provides an appealing regression framework connecting ordinary least squares, partial least squares and principal component regression in one family. It offers some insight on the underlying regression model for a given application. Moreover, it helps to provide deep understanding of various regression techniques. Despite the useful framework, however, the current development on continuum regression is only for linear regression. In many applications, nonlinear regression is necessary. The extension of continuum regression from linear models to nonlinear models using kernel learning is considered. The proposed kernel continuum regression technique is quite general and can handle very flexible regression model estimation. An efficient algorithm is developed for fast implementation. Numerical examples have demonstrated the usefulness of the proposed technique. PMID:24058224
Innovation Model of the Concept of Professional Adaptation of Personnel
Kurina Nataliya S.; Darchenko Nataliya D.
2013-01-01
The article considers the essence and types of adaptation as an important element of the modern theory of personnel management. It analyses problems of practical adaptation of personnel at domestic and Russian enterprises. It proves urgency and offers a concept of professional adaptation – adaptation management. It describes main moments of the model-concept of professional adaptation of young specialists, possibilities and prospects of its introduction, risks and weaknesses. It shows innovat...
Observer-based and Regression Model-based Detection of Emerging Faults in Coal Mills
Odgaard, Peter Fogh; Lin, Bao; Jørgensen, Sten Bay
2006-01-01
In order to improve the reliability of power plants it is important to detect fault as fast as possible. Doing this it is interesting to find the most efficient method. Since modeling of large scale systems is time consuming it is interesting to compare a model-based method with data driven ones....... In this paper three different fault detection approaches are compared using a example of a coal mill, where a fault emerges. The compared methods are based on: an optimal unknown input observer, static and dynamic regression model-based detections. The conclusion on the comparison is that observer...
The limiting behavior of the estimated parameters in a misspecified random field regression model
Dahl, Christian Møller; Qin, Yu
nonlinear functions and it has the added advantage that there is no "curse of dimensionality."Contrary to existing literature on the asymptotic properties of the estimated parameters in random field models our results do not require that the explanatory variables are sampled on a grid. However, as a...... convenient new uniform convergence results that we propose. This theory may have applications beyond those presented here. Our results indicate that classical statistical inference techniques, in general, works very well for random field regression models in finite samples and that these models succesfully...
Bayesian estimation of a shift point in a two-phase regression model
Jadamus-Hacura, Maria
1997-01-01
The purpose of this paper is to carry out the Bayesian analysis of a two-phase regression model with an unknown break point. Essentially, there are two problems associated with a changing linear model. Firstly, one will want to be able to detect a break point, and secondly, assuming that a change has occurred, to be able to estimate it as well as other parameters of the model. Much of the classical testing procedure for the parameter constancy (as the Chow test, CUSUM, CUSUMSQ,...
A coarsened multinomial regression model for perinatal mother to child transmission of HIV
Brown Elizabeth R; Gard Charlotte C
2008-01-01
Abstract Background In trials designed to estimate rates of perinatal mother to child transmission of HIV, HIV assays are scheduled at multiple points in time. Still, infection status for some infants at some time points may be unknown, particularly when interim analyses are conducted. Methods Logistic regression models are commonly used to estimate covariate-adjusted transmission rates, but their methods for handling missing data may be inadequate. Here we propose using coarsened multinomial...
Bhattacharjee, Arnab; Bhattacharjee, Madhuchhanda
2007-01-01
We propose Bayesian inference in hazard regression models where the baseline hazard is unknown, covariate effects are possibly age-varying (non-proportional), and there is multiplicative frailty with arbitrary distribution. Our framework incorporates a wide variety of order restrictions on covariate dependence and duration dependence (ageing). We propose estimation and evaluation of age-varying covariate effects when covariate dependence is monotone rather than proportional. In particular, we...
Random regression models for milk, fat and protein in Colombian Buffaloes
Naudin Hurtado-Lugo; Humberto Tonhati; Raul Aspilcuelta-Borquis; Cruz Enríquez-Valencia; Mario Cerón-Muñoz
2015-01-01
Objective. Covariance functions for additive genetic and permanent environmental effects and, subsequently, genetic parameters for test-day milk (MY), fat (FY) protein (PY) yields and mozzarella cheese (MP) in buffaloes from Colombia were estimate by using Random regression models (RRM) with Legendre polynomials (LP). Materials and Methods. Test-day records of MY, FY, PY and MP from 1884 first lactations of buffalo cows from 228 sires were analyzed. The animals belonged to 14 herds in Colombi...
A simple artificial regression based Lagrange multiplier test of normality in the probit model
Murphy, Anthony
1994-01-01
A convenient artifical regression based LM test of non-normality in the probit model is derived using a Gram Charlier type A alternative. The test is simply derived and may be extended to the bivariate probit case. The outer product gradient form of LM test is not used so the proposed test is likely to perform reasonably well in small samples. The test is compared with two other existing tests. non-peer-reviewed
Forecasting Model for IPTV Service in Korea Using Bootstrap Ridge Regression Analysis
Lee, Byoung Chul; Kee, Seho; Kim, Jae Bum; Kim, Yun Bae
The telecom firms in Korea are taking new step to prepare for the next generation of convergence services, IPTV. In this paper we described our analysis on the effective method for demand forecasting about IPTV broadcasting. We have tried according to 3 types of scenarios based on some aspects of IPTV potential market and made a comparison among the results. The forecasting method used in this paper is the multi generation substitution model with bootstrap ridge regression analysis.
Irfan Ahmed Halepoto; Muhammad Aslam Uqaili; Bhawani Shanker Chowdhry
2014-01-01
Nowadays, due to power crisis, electricity demand forecasting is deemed an important area for socioeconomic development and proper anticipation of the load forecasting is considered essential step towards efficient power system operation, scheduling and planning. In this paper, we present STLF (Short Term Load Forecasting) using multiple regression techniques (i.e. linear, multiple linear, quadratic and exponential) by considering hour by hour load model based on specific targeted day approac...
An Analysis of Transit Bus Driver Distraction Using Multinomial Logistic Regression Models
D'Souza, Kelwyn
2012-01-01
This paper explores the problem of distracted driving at a regional bus transit agency to identify the sources of distraction and provide an understanding of factors responsible for driver distraction. A risk range system was developed to classify the distracting activities into four risk zones. The high risk zone distracting activities were analyzed using multinomial logistic regression models to determine the impact of various factors on the multiple categorical levels of driver distraction...
USE OF THE SIMPLE LINEAR REGRESSION MODEL IN MACRO-ECONOMICAL ANALYSES
Constantin ANGHELACHE
2011-01-01
The article presents the fundamental aspects of the linear regression, as a toolbox which can be used in macroeconomic analyses. The article describes the estimation of the parameters, the statistical tests used, the homoscesasticity and heteroskedasticity. The use of econometrics instrument in macroeconomics is an important factor that guarantees the quality of the models, analyses, results and possible interpretation that can be drawn at this level.
P. Arockia Jansi Rani; V. Sadasivam
2010-01-01
Image compression is very important in reducing the costs of data storage and transmission in relatively slow channels. In this paper, a still image compression scheme driven by Self-Organizing Map with polynomial regression modeling and entropy coding, employed within the wavelet framework is presented. The image compressibility and interpretability are improved by incorporating noise reduction into the compression scheme. The implementation begins with the classical wavelet decomposition, q...
Predicting Number of Zombies in DDoS Attacks Using Pace Regression Model
Gupta, B. B.
2012-01-01
A DDoS attacker attempts to disrupt a target, by flooding it with illegitimate packets which are generated from a large number of zombies, usurping its bandwidth and overtaxing it to prevent legitimate inquiries from getting through. This paper reports the evaluation results of proposed approach that is used to predict number of zombies using Pace Regression Model. A relationship is established between number of zombies and observed deviation in sample entropy. Various statistical performance...
Impact of trade factors on economic growth: seemingly unrelated regression model
Nawaz , Samar; Aziz , Arshad; Khalid ZAMAN
2014-01-01
This study explores the impact of trade on the economic growth, using seemingly unrelated regression model for SAARC countries namely Bangladesh, India, Pakistan and Srilanka for the period of 1980-2012. Trade factors include total exports, total imports, terms of trade, trade openness and investment. The results indicate the strong correlation between trade factors and economic growth, however, the magnitude of influencing economic growth varies factors to factors of international trade.
A simple artificial regression based Lagrange multiplier test of normality in the probit model
Murphy, Anthony
1994-01-01
A convenient artifical regression based LM test of non-normality in the probit model is derived using a Gram Charlier type A alternative. The test is simply derived and may be extended to the bivariate probit case. The outer product gradient form of LM test is not used so the proposed test is likely to perform reasonably well in small samples. The test is compared with two other existing tests.
Asymptotic Properties in Semiparametric Partially Linear Regression Models for Functional Data
Tao ZHANG
2013-01-01
We consider the semiparametric partially linear regression models with mean function xTβ+g(z),where X and z are functional data.The new estimators of β and g(z) are presented and some asymptotic results are given.The strong convergence rates of the proposed estimators are obtained.In our estimation,the observation number of each subject will be completely flexible.Some simulation study is conducted to investigate the finite sample performance of the proposed estimators.
USE OF THE SIMPLE LINEAR REGRESSION MODEL IN MACRO-ECONOMICAL ANALYSES
Constantin ANGHELACHE
2011-10-01
Full Text Available The article presents the fundamental aspects of the linear regression, as a toolbox which can be used in macroeconomic analyses. The article describes the estimation of the parameters, the statistical tests used, the homoscesasticity and heteroskedasticity. The use of econometrics instrument in macroeconomics is an important factor that guarantees the quality of the models, analyses, results and possible interpretation that can be drawn at this level.
Spackman, K. A.
1991-01-01
This paper presents maximum likelihood back-propagation (ML-BP), an approach to training neural networks. The widely reported original approach uses least squares back-propagation (LS-BP), minimizing the sum of squared errors (SSE). Unfortunately, least squares estimation does not give a maximum likelihood (ML) estimate of the weights in the network. Logistic regression, on the other hand, gives ML estimates for single layer linear models only. This report describes how to obtain ML estimates...
Larsen, Ulrik; Pierobon, Leonardo; Wronski, Jorrit; Haglind, Fredrik
2014-01-01
Much attention is focused on increasing the energy efficiency to decrease fuel costs and CO2 emissions throughout industrial sectors. The ORC (organic Rankine cycle) is a relatively simple but efficient process that can be used for this purpose by converting low and medium temperature waste heat to power. In this study we propose four linear regression models to predict the maximum obtainable thermal efficiency for simple and recuperated ORCs. A previously derived methodology is able to deter...
Javali Shivalingappa; Pandit Parameshwar
2010-01-01
Aim: The study aimed to determine the factors associated with periodontal disease (different levels of severity) by using different regression models for ordinal data. Design: A cross-sectional design was employed using clinical examination and ′questionnaire with interview′ method. Materials and Methods: The study was conducted during June 2008 to October 2008 in Dharwad, Karnataka, India. It involved a systematic random sample of 1760 individuals aged 18-40 years. The periodon...
Regression Models for the Analysis of Longitudinal Gaussian Data from Multiple Sources
O’Brien, Liam M.; FITZMAURICE, GARRETT M.
2005-01-01
We present a regression model for the joint analysis of longitudinal multiple source Gaussian data. Longitudinal multiple source data arise when repeated measurements are taken from two or more sources, and each source provides a measure of the same underlying variable and on the same scale. This type of data generally produces a relatively large number of observations per subject; thus estimation of an unstructured covariance matrix often may not be possible. We consider two methods by which...
MULTIPLE LOGISTIC REGRESSION MODEL TO PREDICT RISK FACTORS OF ORAL HEALTH DISEASES
Parameshwar V Pandit; Javali, Shivalingappa B.
2012-01-01
Purpose: To analysis the dependence of oral health diseases i.e. dental caries and periodontal disease on considering the number of risk factors through the applications of logistic regression model. Method: The cross sectional study involves a systematic random sample of 1760 permanent dentition aged between 18-40 years in Dharwad, Karnataka, India. Dharwad is situated in North Karnataka. The mean age was 34.26±7.28. The risk factors of dental caries and periodontal disease were established ...
Estimating the Impact of Urbanization on Air Quality in China Using Spatial Regression Models
Chuanglin Fang; Haimeng Liu; Guangdong Li; Dongqi Sun; Zhuang Miao
2015-01-01
Urban air pollution is one of the most visible environmental problems to have accompanied China’s rapid urbanization. Based on emission inventory data from 2014, gathered from 289 cities, we used Global and Local Moran’s I to measure the spatial autorrelation of Air Quality Index (AQI) values at the city level, and employed Ordinary Least Squares (OLS), Spatial Lag Model (SAR), and Geographically Weighted Regression (GWR) to quantitatively estimate the comprehensive impact and spatial variati...
Flexible regression models for ROC and risk analysis, with or without a gold standard
Branscum, AJ; Johnson, WO; Hanson, TE; Baron, AT
2015-01-01
A novel semiparametric regression model is developed for evaluating the covariate-specific accuracy of a continuous medical test or biomarker. Ideally, studies designed to estimate or compare medical test accuracy will use a separate, flawless gold-standard procedure to determine the true disease status of sampled individuals. We treat this as a special case of the more complicated and increasingly common scenario in which disease status is unknown because a gold-standard procedure does not e...
Sohair F. Higazi; Dina H. Abdel-Hady; Samir Ahmed Al-Oulfi
2013-01-01
Regression analysis depends on several assumptions that have to be satisfied. A major assumption that is never satisfied when variables are from contiguous observations is the independence of error terms. Spatial analysis treated the violation of that assumption by two derived models that put contiguity of observations into consideration. Data used are from Egypt's 2006 latest census, for 93 counties in middle delta seven adjacent Governorates. The dependent variable used is the percent of in...
Generation of Natural Runoff Monthly Series at Ungauged Sites Using a Regional Regressive Model
Dario Pumo
2016-05-01
Full Text Available Many hydrologic applications require reliable estimates of runoff in river basins to face the widespread lack of data, both in time and in space. A regional method for the reconstruction of monthly runoff series is here developed and applied to Sicily (Italy. A simple modeling structure is adopted, consisting of a regression-based rainfall–runoff model with four model parameters, calibrated through a two-step procedure. Monthly runoff estimates are based on precipitation, temperature, and exploiting the autocorrelation with runoff at the previous month. Model parameters are assessed by specific regional equations as a function of easily measurable physical and climate basin descriptors. The first calibration step is aimed at the identification of a set of parameters optimizing model performances at the level of single basin. Such “optimal” sets are used at the second step, part of a regional regression analysis, to establish the regional equations for model parameters assessment as a function of basin attributes. All the gauged watersheds across the region have been analyzed, selecting 53 basins for model calibration and using the other six basins exclusively for validation. Performances, quantitatively evaluated by different statistical indexes, demonstrate relevant model ability in reproducing the observed hydrological time-series at both the monthly and coarser time resolutions. The methodology, which is easily transferable to other arid and semi-arid areas, provides a reliable tool for filling/reconstructing runoff time series at any gauged or ungauged basin of a region.