Adaptive regression for modeling nonlinear relationships
Knafl, George J
2016-01-01
This book presents methods for investigating whether relationships are linear or nonlinear and for adaptively fitting appropriate models when they are nonlinear. Data analysts will learn how to incorporate nonlinearity in one or more predictor variables into regression models for different types of outcome variables. Such nonlinear dependence is often not considered in applied research, yet nonlinear relationships are common and so need to be addressed. A standard linear analysis can produce misleading conclusions, while a nonlinear analysis can provide novel insights into data, not otherwise possible. A variety of examples of the benefits of modeling nonlinear relationships are presented throughout the book. Methods are covered using what are called fractional polynomials based on real-valued power transformations of primary predictor variables combined with model selection based on likelihood cross-validation. The book covers how to formulate and conduct such adaptive fractional polynomial modeling in the s...
Preference learning with evolutionary Multivariate Adaptive Regression Spline model
DEFF Research Database (Denmark)
Abou-Zleikha, Mohamed; Shaker, Noor; Christensen, Mads Græsbøll
2015-01-01
This paper introduces a novel approach for pairwise preference learning through combining an evolutionary method with Multivariate Adaptive Regression Spline (MARS). Collecting users' feedback through pairwise preferences is recommended over other ranking approaches as this method is more appealing...... for human decision making. Learning models from pairwise preference data is however an NP-hard problem. Therefore, constructing models that can effectively learn such data is a challenging task. Models are usually constructed with accuracy being the most important factor. Another vitally important aspect...... that is usually given less attention is expressiveness, i.e. how easy it is to explain the relationship between the model input and output. Most machine learning techniques are focused either on performance or on expressiveness. This paper employ MARS models which have the advantage of being a powerful method...
Adaptive Metric Kernel Regression
DEFF Research Database (Denmark)
Goutte, Cyril; Larsen, Jan
1998-01-01
Kernel smoothing is a widely used nonparametric pattern recognition technique. By nature, it suffers from the curse of dimensionality and is usually difficult to apply to high input dimensions. In this paper, we propose an algorithm that adapts the input metric used in multivariate regression by...... minimising a cross-validation estimate of the generalisation error. This allows one to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms the standard...
Adaptive metric kernel regression
DEFF Research Database (Denmark)
Goutte, Cyril; Larsen, Jan
2000-01-01
Kernel smoothing is a widely used non-parametric pattern recognition technique. By nature, it suffers from the curse of dimensionality and is usually difficult to apply to high input dimensions. In this contribution, we propose an algorithm that adapts the input metric used in multivariate...... regression by minimising a cross-validation estimate of the generalisation error. This allows to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms the...
Time-adaptive quantile regression
DEFF Research Database (Denmark)
Møller, Jan Kloppenborg; Nielsen, Henrik Aalborg; Madsen, Henrik
2008-01-01
An algorithm for time-adaptive quantile regression is presented. The algorithm is based on the simplex algorithm, and the linear optimization formulation of the quantile regression problem is given. The observations have been split to allow a direct use of the simplex algorithm. The simplex method...... and an updating procedure are combined into a new algorithm for time-adaptive quantile regression, which generates new solutions on the basis of the old solution, leading to savings in computation time. The suggested algorithm is tested against a static quantile regression model on a data set with wind power...
Nieto, Paulino José García; Antón, Juan Carlos Álvarez; Vilán, José Antonio Vilán; García-Gonzalo, Esperanza
2014-10-01
The aim of this research work is to build a regression model of the particulate matter up to 10 micrometers in size (PM10) by using the multivariate adaptive regression splines (MARS) technique in the Oviedo urban area (Northern Spain) at local scale. This research work explores the use of a nonparametric regression algorithm known as multivariate adaptive regression splines (MARS) which has the ability to approximate the relationship between the inputs and outputs, and express the relationship mathematically. In this sense, hazardous air pollutants or toxic air contaminants refer to any substance that may cause or contribute to an increase in mortality or serious illness, or that may pose a present or potential hazard to human health. To accomplish the objective of this study, the experimental dataset of nitrogen oxides (NOx), carbon monoxide (CO), sulfur dioxide (SO2), ozone (O3) and dust (PM10) were collected over 3 years (2006-2008) and they are used to create a highly nonlinear model of the PM10 in the Oviedo urban nucleus (Northern Spain) based on the MARS technique. One main objective of this model is to obtain a preliminary estimate of the dependence between PM10 pollutant in the Oviedo urban area at local scale. A second aim is to determine the factors with the greatest bearing on air quality with a view to proposing health and lifestyle improvements. The United States National Ambient Air Quality Standards (NAAQS) establishes the limit values of the main pollutants in the atmosphere in order to ensure the health of healthy people. Firstly, this MARS regression model captures the main perception of statistical learning theory in order to obtain a good prediction of the dependence among the main pollutants in the Oviedo urban area. Secondly, the main advantages of MARS are its capacity to produce simple, easy-to-interpret models, its ability to estimate the contributions of the input variables, and its computational efficiency. Finally, on the basis of
Institute of Scientific and Technical Information of China (English)
Wengang Zhang; Anthony T.C. Goh
2016-01-01
Piles are long, slender structural elements used to transfer the loads from the superstructure through weak strata onto stiffer soils or rocks. For driven piles, the impact of the piling hammer induces compression and tension stresses in the piles. Hence, an important design consideration is to check that the strength of the pile is sufficient to resist the stresses caused by the impact of the pile hammer. Due to its complexity, pile drivability lacks a precise analytical solution with regard to the phenomena involved. In situations where measured data or numerical hypothetical results are available, neural networks stand out in mapping the nonlinear interactions and relationships between the system’s predictors and dependent responses. In addition, unlike most computational tools, no mathematical relationship assumption between the dependent and independent variables has to be made. Nevertheless, neural networks have been criticized for their long trial-and-error training process since the optimal configu-ration is not known a priori. This paper investigates the use of a fairly simple nonparametric regression algorithm known as multivariate adaptive regression splines (MARS), as an alternative to neural net-works, to approximate the relationship between the inputs and dependent response, and to mathe-matically interpret the relationship between the various parameters. In this paper, the Back propagation neural network (BPNN) and MARS models are developed for assessing pile drivability in relation to the prediction of the Maximum compressive stresses (MCS), Maximum tensile stresses (MTS), and Blow per foot (BPF). A database of more than four thousand piles is utilized for model development and comparative performance between BPNN and MARS predictions.
DEFF Research Database (Denmark)
Xu, Man; Pinson, Pierre; Lu, Zongxiang;
2016-01-01
method, which is a combination of pilot estimation based on blockwise least-squares parabolic fitting and the probability integral transform. The regression model is then extended to a more robust one, in which a new dynamic forgetting factor is defined to make the estimator forget the out-of-date data...
Adaptive Local Linear Quantile Regression
Institute of Scientific and Technical Information of China (English)
Yu-nan Su; Mao-zai Tian
2011-01-01
In this paper we propose a new method of local linear adaptive smoothing for nonparametric conditional quantile regression. Some theoretical properties of the procedure are investigated. Then we demonstrate the performance of the method on a simulated example and compare it with other methods. The simulation results demonstrate a reasonable performance of our method proposed especially in situations when the underlying image is piecewise linear or can be approximated by such images. Generally speaking, our method outperforms most other existing methods in the sense of the mean square estimation (MSE) and mean absolute estimation (MAE) criteria. The procedure is very stable with respect to increasing noise level and the algorithm can be easily applied to higher dimensional situations.
Directory of Open Access Journals (Sweden)
Lujin Hu
2016-08-01
Full Text Available Heavy air pollution, especially fine particulate matter (PM2.5, poses serious challenges to environmental sustainability in Beijing. Epidemiological studies and the identification of measures for preventing serious air pollution both require accurate PM2.5 spatial distribution data. Land use regression (LUR models are promising for estimating the spatial distribution of PM2.5 at a high spatial resolution. However, typical LUR models have a limited sampling point explanation rate (SPER, i.e., the rate of the sampling points with reasonable predicted concentrations to the total number of sampling points and accuracy. Hence, self-adaptive revised LUR models are proposed in this paper for improving the SPER and accuracy of typical LUR models. The self-adaptive revised LUR model combines a typical LUR model with self-adaptive LUR model groups. The typical LUR model was used to estimate the PM2.5 concentrations, and the self-adaptive LUR model groups were constructed for all of the sampling points removed from the typical LUR model because they were beyond the prediction data range, which was from 60% of the minimum observation to 120% of the maximum observation. The final results were analyzed using three methods, including an accuracy analysis, and were compared with typical LUR model results and the spatial variations in Beijing. The accuracy satisfied the demands of the analysis, and the accuracies at the different monitoring sites indicated spatial variations in the accuracy of the self-adaptive revised LUR model. The accuracy was high in the central area and low in suburban areas. The comparison analysis showed that the self-adaptive LUR model increased the SPER from 75% to 90% and increased the accuracy (based on the root-mean-square error from 20.643 μg/m3 to 17.443 μg/m3 for the PM2.5 concentrations during the winter of 2014 in Beijing. The spatial variation analysis for Beijing showed that the PM2.5 concentrations were low in the north
Survival Analysis with Multivariate adaptive Regression Splines
Kriner, Monika
2007-01-01
Multivariate adaptive regression splines (MARS) are a useful tool to identify linear and nonlinear eﬀects and interactions between two covariates. In this dissertation a new proposal to model survival type data with MARS is introduced. Martingale and deviance residuals of a Cox PH model are used as response in a common MARS approach to model functional forms of covariate eﬀects as well as possible interactions in a data-driven way. Simulation studies prove that the new method yields a bett...
Flexible survival regression modelling
DEFF Research Database (Denmark)
Cortese, Giuliana; Scheike, Thomas H; Martinussen, Torben
2009-01-01
Regression analysis of survival data, and more generally event history data, is typically based on Cox's regression model. We here review some recent methodology, focusing on the limitations of Cox's regression model. The key limitation is that the model is not well suited to represent time......-varying effects. We start by considering classical and also more recent goodness-of-fit procedures for the Cox model that will reveal when the Cox model does not capture important aspects of the data, such as time-varying effects. We present recent regression models that are able to deal with and describe...... such time-varying effects. The introduced models are all applied to data on breast cancer from the Norwegian cancer registry, and these analyses clearly reveal the shortcomings of Cox's regression model and the need for other supplementary analyses with models such as those we present here....
Directory of Open Access Journals (Sweden)
Mendes Paul E
2010-03-01
Full Text Available Abstract Background This article describes the data mining analysis of a clinical exposure study of 3585 adult smokers and 1077 nonsmokers. The analysis focused on developing models for four biomarkers of potential harm (BOPH: white blood cell count (WBC, 24 h urine 8-epi-prostaglandin F2α (EPI8, 24 h urine 11-dehydro-thromboxane B2 (DEH11, and high-density lipoprotein cholesterol (HDL. Methods Random Forest was used for initial variable selection and Multivariate Adaptive Regression Spline was used for developing the final statistical models Results The analysis resulted in the generation of models that predict each of the BOPH as function of selected variables from the smokers and nonsmokers. The statistically significant variables in the models were: platelet count, hemoglobin, C-reactive protein, triglycerides, race and biomarkers of exposure to cigarette smoke for WBC (R-squared = 0.29; creatinine clearance, liver enzymes, weight, vitamin use and biomarkers of exposure for EPI8 (R-squared = 0.41; creatinine clearance, urine creatinine excretion, liver enzymes, use of Non-steroidal antiinflammatory drugs, vitamins and biomarkers of exposure for DEH11 (R-squared = 0.29; and triglycerides, weight, age, sex, alcohol consumption and biomarkers of exposure for HDL (R-squared = 0.39. Conclusions Levels of WBC, EPI8, DEH11 and HDL were statistically associated with biomarkers of exposure to cigarette smoking and demographics and life style factors. All of the predictors togather explain 29%-41% of the variability in the BOPH.
Chardon, Jérémy; Hingray, Benoit; Favre, Anne-Catherine
2016-04-01
Scenarios of surface weather required for the impact studies have to be unbiased and adapted to the space and time scales of the considered hydro-systems. Hence, surface weather scenarios obtained from global climate models and/or numerical weather prediction models are not really appropriated. Outputs of these models have to be post-processed, which is often carried out thanks to Statistical Downscaling Methods (SDMs). Among those SDMs, approaches based on regression are often applied. For a given station, a regression link can be established between a set of large scale atmospheric predictors and the surface weather variable. These links are then used for the prediction of the latter. However, physical processes generating surface weather vary in time. This is well known for precipitation for instance. The most relevant predictors and the regression link are also likely to vary in time. A better prediction skill is thus classically obtained with a seasonal stratification of the data. Another strategy is to identify the most relevant predictor set and establish the regression link from dates that are similar - or analog - to the target date. In practice, these dates can be selected thanks to an analog model. In this study, we explore the possibility of improving the local performance of an analog model - where the analogy is applied to the geopotential heights 1000 and 500 hPa - using additional local scale predictors for the probabilistic prediction of the Safran precipitation over France. For each prediction day, the prediction is obtained from two GLM regression models - for both the occurrence and the quantity of precipitation - for which predictors and parameters are estimated from the analog dates. Firstly, the resulting combined model noticeably allows increasing the prediction performance by adapting the downscaling link for each prediction day. Secondly, the selected predictors for a given prediction depend on the large scale situation and on the
TWO REGRESSION CREDIBILITY MODELS
Directory of Open Access Journals (Sweden)
Constanţa-Nicoleta BODEA
2010-03-01
Full Text Available In this communication we will discuss two regression credibility models from Non – Life Insurance Mathematics that can be solved by means of matrix theory. In the first regression credibility model, starting from a well-known representation formula of the inverse for a special class of matrices a risk premium will be calculated for a contract with risk parameter θ. In the next regression credibility model, we will obtain a credibility solution in the form of a linear combination of the individual estimate (based on the data of a particular state and the collective estimate (based on aggregate USA data. To illustrate the solution with the properties mentioned above, we shall need the well-known representation theorem for a special class of matrices, the properties of the trace for a square matrix, the scalar product of two vectors, the norm with respect to a positive definite matrix given in advance and the complicated mathematical properties of conditional expectations and of conditional covariances.
Butte, Nancy F.; Wong, William W.; Adolph, Anne L.; Puyau, Maurice R; Vohra, Firoz A.; Zakeri, Issa F.
2010-01-01
Accurate, nonintrusive, and inexpensive techniques are needed to measure energy expenditure (EE) in free-living populations. Our primary aim in this study was to validate cross-sectional time series (CSTS) and multivariate adaptive regression splines (MARS) models based on observable participant characteristics, heart rate (HR), and accelerometer counts (AC) for prediction of minute-by-minute EE, and hence 24-h total EE (TEE), against a 7-d doubly labeled water (DLW) method in children and ad...
Rounaghi, Mohammad Mahdi; Abbaszadeh, Mohammad Reza; Arashi, Mohammad
2015-11-01
One of the most important topics of interest to investors is stock price changes. Investors whose goals are long term are sensitive to stock price and its changes and react to them. In this regard, we used multivariate adaptive regression splines (MARS) model and semi-parametric splines technique for predicting stock price in this study. The MARS model as a nonparametric method is an adaptive method for regression and it fits for problems with high dimensions and several variables. semi-parametric splines technique was used in this study. Smoothing splines is a nonparametric regression method. In this study, we used 40 variables (30 accounting variables and 10 economic variables) for predicting stock price using the MARS model and using semi-parametric splines technique. After investigating the models, we select 4 accounting variables (book value per share, predicted earnings per share, P/E ratio and risk) as influencing variables on predicting stock price using the MARS model. After fitting the semi-parametric splines technique, only 4 accounting variables (dividends, net EPS, EPS Forecast and P/E Ratio) were selected as variables effective in forecasting stock prices.
Adaptive support vector regression for UAV flight control.
Shin, Jongho; Jin Kim, H; Kim, Youdan
2011-01-01
This paper explores an application of support vector regression for adaptive control of an unmanned aerial vehicle (UAV). Unlike neural networks, support vector regression (SVR) generates global solutions, because SVR basically solves quadratic programming (QP) problems. With this advantage, the input-output feedback-linearized inverse dynamic model and the compensation term for the inversion error are identified off-line, which we call I-SVR (inversion SVR) and C-SVR (compensation SVR), respectively. In order to compensate for the inversion error and the unexpected uncertainty, an online adaptation algorithm for the C-SVR is proposed. Then, the stability of the overall error dynamics is analyzed by the uniformly ultimately bounded property in the nonlinear system theory. In order to validate the effectiveness of the proposed adaptive controller, numerical simulations are performed on the UAV model.
Directory of Open Access Journals (Sweden)
Paulino José García Nieto
2016-05-01
Full Text Available Remaining useful life (RUL estimation is considered as one of the most central points in the prognostics and health management (PHM. The present paper describes a nonlinear hybrid ABC–MARS-based model for the prediction of the remaining useful life of aircraft engines. Indeed, it is well-known that an accurate RUL estimation allows failure prevention in a more controllable way so that the effective maintenance can be carried out in appropriate time to correct impending faults. The proposed hybrid model combines multivariate adaptive regression splines (MARS, which have been successfully adopted for regression problems, with the artificial bee colony (ABC technique. This optimization technique involves parameter setting in the MARS training procedure, which significantly influences the regression accuracy. However, its use in reliability applications has not yet been widely explored. Bearing this in mind, remaining useful life values have been predicted here by using the hybrid ABC–MARS-based model from the remaining measured parameters (input variables for aircraft engines with success. A correlation coefficient equal to 0.92 was obtained when this hybrid ABC–MARS-based model was applied to experimental data. The agreement of this model with experimental data confirmed its good performance. The main advantage of this predictive model is that it does not require information about the previous operation states of the aircraft engine.
Adaptive Rank Penalized Estimators in Multivariate Regression
Bunea, Florentina; Wegkamp, Marten
2010-01-01
We introduce a new criterion, the Rank Selection Criterion (RSC), for selecting the optimal reduced rank estimator of the coefficient matrix in multivariate response regression models. The corresponding RSC estimator minimizes the Frobenius norm of the fit plus a regularization term proportional to the number of parameters in the reduced rank model. The rank of the RSC estimator provides a consistent estimator of the rank of the coefficient matrix. The consistency results are valid not only in the classic asymptotic regime, when the number of responses $n$ and predictors $p$ stays bounded, and the number of observations $m$ grows, but also when either, or both, $n$ and $p$ grow, possibly much faster than $m$. Our finite sample prediction and estimation performance bounds show that the RSC estimator achieves the optimal balance between the approximation error and the penalty term. Furthermore, our procedure has very low computational complexity, linear in the number of candidate models, making it particularly ...
Kisi, Ozgur; Parmar, Kulwinder Singh
2016-03-01
This study investigates the accuracy of least square support vector machine (LSSVM), multivariate adaptive regression splines (MARS) and M5 model tree (M5Tree) in modeling river water pollution. Various combinations of water quality parameters, Free Ammonia (AMM), Total Kjeldahl Nitrogen (TKN), Water Temperature (WT), Total Coliform (TC), Fecal Coliform (FC) and Potential of Hydrogen (pH) monitored at Nizamuddin, Delhi Yamuna River in India were used as inputs to the applied models. Results indicated that the LSSVM and MARS models had almost same accuracy and they performed better than the M5Tree model in modeling monthly chemical oxygen demand (COD). The average root mean square error (RMSE) of the LSSVM and M5Tree models was decreased by 1.47% and 19.1% using MARS model, respectively. Adding TC input to the models did not increase their accuracy in modeling COD while adding FC and pH inputs to the models generally decreased the accuracy. The overall results indicated that the MARS and LSSVM models could be successfully used in estimating monthly river water pollution level by using AMM, TKN and WT parameters as inputs.
Forecasting with Dynamic Regression Models
Pankratz, Alan
2012-01-01
One of the most widely used tools in statistical forecasting, single equation regression models is examined here. A companion to the author's earlier work, Forecasting with Univariate Box-Jenkins Models: Concepts and Cases, the present text pulls together recent time series ideas and gives special attention to possible intertemporal patterns, distributed lag responses of output to input series and the auto correlation patterns of regression disturbance. It also includes six case studies.
Ridge Regression for Interactive Models.
Tate, Richard L.
1988-01-01
An exploratory study of the value of ridge regression for interactive models is reported. Assuming that the linear terms in a simple interactive model are centered to eliminate non-essential multicollinearity, a variety of common models, representing both ordinal and disordinal interactions, are shown to have "orientations" that are favorable to…
Survival Data and Regression Models
Grégoire, G.
2014-12-01
We start this chapter by introducing some basic elements for the analysis of censored survival data. Then we focus on right censored data and develop two types of regression models. The first one concerns the so-called accelerated failure time models (AFT), which are parametric models where a function of a parameter depends linearly on the covariables. The second one is a semiparametric model, where the covariables enter in a multiplicative form in the expression of the hazard rate function. The main statistical tool for analysing these regression models is the maximum likelihood methodology and, in spite we recall some essential results about the ML theory, we refer to the chapter "Logistic Regression" for a more detailed presentation.
Prediction of longitudinal dispersion coefficient using multivariate adaptive regression splines
Indian Academy of Sciences (India)
Amir Hamzeh Haghiabi
2016-07-01
In this paper, multivariate adaptive regression splines (MARS) was developed as a novel soft-computingtechnique for predicting longitudinal dispersion coefficient (DL) in rivers. As mentioned in the literature,experimental dataset related to DL was collected and used for preparing MARS model. Results of MARSmodel were compared with multi-layer neural network model and empirical formulas. To define the mosteffective parameters on DL, the Gamma test was used. Performance of MARS model was assessed bycalculation of standard error indices. Error indices showed that MARS model has suitable performanceand is more accurate compared to multi-layer neural network model and empirical formulas. Results ofthe Gamma test and MARS model showed that flow depth (H) and ratio of the mean velocity to shearvelocity (u/u^∗) were the most effective parameters on the DL.
Fuzzy linear regression forecasting models
Institute of Scientific and Technical Information of China (English)
吴冲; 惠晓峰; 朱洪文
2002-01-01
The fuzzy linear regression forecasting model is deduced from the symmetric triangular fuzzy number.With the help of the degree of fitting and the measure of fuzziness, the determination of symmetric triangularfuzzy numbers is changed into a problem of solving linear programming.
Heteroscedasticity checks for regression models
Institute of Scientific and Technical Information of China (English)
无
2001-01-01
For checking on heteroscedasticity in regression models, a unified approach is proposed to constructing test statistics in parametric and nonparametric regression models. For nonparametric regression, the test is not affected sensitively by the choice of smoothing parameters which are involved in estimation of the nonparametric regression function. The limiting null distribution of the test statistic remains the same in a wide range of the smoothing parameters. When the covariate is one-dimensional, the tests are, under some conditions, asymptotically distribution-free. In the high-dimensional cases, the validity of bootstrap approximations is investigated. It is shown that a variant of the wild bootstrap is consistent while the classical bootstrap is not in the general case, but is applicable if some extra assumption on conditional variance of the squared error is imposed. A simulation study is performed to provide evidence of how the tests work and compare with tests that have appeared in the literature. The approach may readily be extended to handle partial linear, and linear autoregressive models.
Heteroscedasticity checks for regression models
Institute of Scientific and Technical Information of China (English)
ZHU; Lixing
2001-01-01
［1］Carroll, R. J., Ruppert, D., Transformation and Weighting in Regression, New York: Chapman and Hall, 1988.［2］Cook, R. D., Weisberg, S., Diagnostics for heteroscedasticity in regression, Biometrika, 1988, 70: 1—10.［3］Davidian, M., Carroll, R. J., Variance function estimation, J. Amer. Statist. Assoc., 1987, 82: 1079—1091.［4］Bickel, P., Using residuals robustly I: Tests for heteroscedasticity, Ann. Statist., 1978, 6: 266—291.［5］Carroll, R. J., Ruppert, D., On robust tests for heteroscedasticity, Ann. Statist., 1981, 9: 205—209.［6］Eubank, R. L., Thomas, W., Detecting heteroscedasticity in nonparametric regression, J. Roy. Statist. Soc., Ser. B, 1993, 55: 145—155.［7］Diblasi, A., Bowman, A., Testing for constant variance in a linear model, Statist. and Probab. Letters, 1997, 33: 95—103.［8］Dette, H., Munk, A., Testing heteoscedasticity in nonparametric regression, J. R. Statist. Soc. B, 1998, 60: 693—708.［9］Müller, H. G., Zhao, P. L., On a semi-parametric variance function model and a test for heteroscedasticity, Ann. Statist., 1995, 23: 946—967.［10］Stute, W., Manteiga, G., Quindimil, M. P., Bootstrap approximations in model checks for regression, J. Amer. Statist. Asso., 1998, 93: 141—149.［11］Stute, W., Thies, G., Zhu, L. X., Model checks for regression: An innovation approach, Ann. Statist., 1998, 26: 1916—1939.［12］Shorack, G. R., Wellner, J. A., Empirical Processes with Applications to Statistics, New York: Wiley, 1986.［13］Efron, B., Bootstrap methods: Another look at the jackknife, Ann. Statist., 1979, 7: 1—26.［14］Wu, C. F. J., Jackknife, bootstrap and other re-sampling methods in regression analysis, Ann. Statist., 1986, 14: 1261—1295.［15］H rdle, W., Mammen, E., Comparing non-parametric versus parametric regression fits, Ann. Statist., 1993, 21: 1926—1947.［16］Liu, R. Y., Bootstrap procedures under some non-i.i.d. models, Ann. Statist., 1988, 16: 1696—1708.［17
RANDOM WEIGHTING METHOD FOR CENSORED REGRESSION MODEL
Institute of Scientific and Technical Information of China (English)
ZHAO Lincheng; FANG Yixin
2004-01-01
Rao and Zhao (1992) used random weighting method to derive the approximate distribution of the M-estimator in linear regression model. In this paper we extend the result to the censored regression model (or censored "Tobit" model).
Durmaz, Murat; Karslioglu, Mahmut Onur
2015-04-01
There are various global and regional methods that have been proposed for the modeling of ionospheric vertical total electron content (VTEC). Global distribution of VTEC is usually modeled by spherical harmonic expansions, while tensor products of compactly supported univariate B-splines can be used for regional modeling. In these empirical parametric models, the coefficients of the basis functions as well as differential code biases (DCBs) of satellites and receivers can be treated as unknown parameters which can be estimated from geometry-free linear combinations of global positioning system observables. In this work we propose a new semi-parametric multivariate adaptive regression B-splines (SP-BMARS) method for the regional modeling of VTEC together with satellite and receiver DCBs, where the parametric part of the model is related to the DCBs as fixed parameters and the non-parametric part adaptively models the spatio-temporal distribution of VTEC. The latter is based on multivariate adaptive regression B-splines which is a non-parametric modeling technique making use of compactly supported B-spline basis functions that are generated from the observations automatically. This algorithm takes advantage of an adaptive scale-by-scale model building strategy that searches for best-fitting B-splines to the data at each scale. The VTEC maps generated from the proposed method are compared numerically and visually with the global ionosphere maps (GIMs) which are provided by the Center for Orbit Determination in Europe (CODE). The VTEC values from SP-BMARS and CODE GIMs are also compared with VTEC values obtained through calibration using local ionospheric model. The estimated satellite and receiver DCBs from the SP-BMARS model are compared with the CODE distributed DCBs. The results show that the SP-BMARS algorithm can be used to estimate satellite and receiver DCBs while adaptively and flexibly modeling the daily regional VTEC.
Adaptive sparse polynomial chaos expansion based on least angle regression
Blatman, Géraud; Sudret, Bruno
2011-03-01
Polynomial chaos (PC) expansions are used in stochastic finite element analysis to represent the random model response by a set of coefficients in a suitable (so-called polynomial chaos) basis. The number of terms to be computed grows dramatically with the size of the input random vector, which makes the computational cost of classical solution schemes (may it be intrusive (i.e. of Galerkin type) or non intrusive) unaffordable when the deterministic finite element model is expensive to evaluate. To address such problems, the paper describes a non intrusive method that builds a sparse PC expansion. First, an original strategy for truncating the PC expansions, based on hyperbolic index sets, is proposed. Then an adaptive algorithm based on least angle regression (LAR) is devised for automatically detecting the significant coefficients of the PC expansion. Beside the sparsity of the basis, the experimental design used at each step of the algorithm is systematically complemented in order to avoid the overfitting phenomenon. The accuracy of the PC metamodel is checked using an estimate inspired by statistical learning theory, namely the corrected leave-one-out error. As a consequence, a rather small number of PC terms are eventually retained ( sparse representation), which may be obtained at a reduced computational cost compared to the classical "full" PC approximation. The convergence of the algorithm is shown on an analytical function. Then the method is illustrated on three stochastic finite element problems. The first model features 10 input random variables, whereas the two others involve an input random field, which is discretized into 38 and 30 - 500 random variables, respectively.
Regression Models for Market-Shares
DEFF Research Database (Denmark)
Birch, Kristina; Olsen, Jørgen Kai; Tjur, Tue
2005-01-01
On the background of a data set of weekly sales and prices for three brands of coffee, this paper discusses various regression models and their relation to the multiplicative competitive-interaction model (the MCI model, see Cooper 1988, 1993) for market-shares. Emphasis is put on the...... interpretation of the parameters in relation to models for the total sales based on discrete choice models.Key words and phrases. MCI model, discrete choice model, market-shares, price elasitcity, regression model....
Semiparametric Regression and Model Refining
Institute of Scientific and Technical Information of China (English)
无
2002-01-01
This paper presents a semiparametric adjustment method suitable for general cases.Assuming that the regularizer matrix is positive definite,the calculation method is discussed and the corresponding formulae are presented.Finally,a simulated adjustment problem is constructed to explain the method given in this paper.The results from the semiparametric model and G-M model are compared.The results demonstrate that the model errors or the systematic errors of the observations can be detected correctly with the semiparametric estimate method.
The Infinite Hierarchical Factor Regression Model
Rai, Piyush
2009-01-01
We propose a nonparametric Bayesian factor regression model that accounts for uncertainty in the number of factors, and the relationship between factors. To accomplish this, we propose a sparse variant of the Indian Buffet Process and couple this with a hierarchical model over factors, based on Kingman's coalescent. We apply this model to two problems (factor analysis and factor regression) in gene-expression data analysis.
Darnah
2016-04-01
Poisson regression has been used if the response variable is count data that based on the Poisson distribution. The Poisson distribution assumed equal dispersion. In fact, a situation where count data are over dispersion or under dispersion so that Poisson regression inappropriate because it may underestimate the standard errors and overstate the significance of the regression parameters, and consequently, giving misleading inference about the regression parameters. This paper suggests the generalized Poisson regression model to handling over dispersion and under dispersion on the Poisson regression model. The Poisson regression model and generalized Poisson regression model will be applied the number of filariasis cases in East Java. Based regression Poisson model the factors influence of filariasis are the percentage of families who don't behave clean and healthy living and the percentage of families who don't have a healthy house. The Poisson regression model occurs over dispersion so that we using generalized Poisson regression. The best generalized Poisson regression model showing the factor influence of filariasis is percentage of families who don't have healthy house. Interpretation of result the model is each additional 1 percentage of families who don't have healthy house will add 1 people filariasis patient.
Regression modeling methods, theory, and computation with SAS
Panik, Michael
2009-01-01
Regression Modeling: Methods, Theory, and Computation with SAS provides an introduction to a diverse assortment of regression techniques using SAS to solve a wide variety of regression problems. The author fully documents the SAS programs and thoroughly explains the output produced by the programs.The text presents the popular ordinary least squares (OLS) approach before introducing many alternative regression methods. It covers nonparametric regression, logistic regression (including Poisson regression), Bayesian regression, robust regression, fuzzy regression, random coefficients regression,
Nonparametric and semiparametric dynamic additive regression models
DEFF Research Database (Denmark)
Scheike, Thomas Harder; Martinussen, Torben
Dynamic additive regression models provide a flexible class of models for analysis of longitudinal data. The approach suggested in this work is suited for measurements obtained at random time points and aims at estimating time-varying effects. Both fully nonparametric and semiparametric models can...
Applied Regression Modeling A Business Approach
Pardoe, Iain
2012-01-01
An applied and concise treatment of statistical regression techniques for business students and professionals who have little or no background in calculusRegression analysis is an invaluable statistical methodology in business settings and is vital to model the relationship between a response variable and one or more predictor variables, as well as the prediction of a response value given values of the predictors. In view of the inherent uncertainty of business processes, such as the volatility of consumer spending and the presence of market uncertainty, business professionals use regression a
A new bivariate negative binomial regression model
Faroughi, Pouya; Ismail, Noriszura
2014-12-01
This paper introduces a new form of bivariate negative binomial (BNB-1) regression which can be fitted to bivariate and correlated count data with covariates. The BNB regression discussed in this study can be fitted to bivariate and overdispersed count data with positive, zero or negative correlations. The joint p.m.f. of the BNB1 distribution is derived from the product of two negative binomial marginals with a multiplicative factor parameter. Several testing methods were used to check overdispersion and goodness-of-fit of the model. Application of BNB-1 regression is illustrated on Malaysian motor insurance dataset. The results indicated that BNB-1 regression has better fit than bivariate Poisson and BNB-2 models with regards to Akaike information criterion.
An Application on Multinomial Logistic Regression Model
Abdalla M El-Habil
2012-01-01
This study aims to identify an application of Multinomial Logistic Regression model which is one of the important methods for categorical data analysis. This model deals with one nominal/ordinal response variable that has more than two categories, whether nominal or ordinal variable. This model has been applied in data analysis in many areas, for example health, social, behavioral, and educational.To identify the model by practical way, we used real data on physical violence against children...
Constrained regression models for optimization and forecasting
Directory of Open Access Journals (Sweden)
P.J.S. Bruwer
2003-12-01
Full Text Available Linear regression models and the interpretation of such models are investigated. In practice problems often arise with the interpretation and use of a given regression model in spite of the fact that researchers may be quite "satisfied" with the model. In this article methods are proposed which overcome these problems. This is achieved by constructing a model where the "area of experience" of the researcher is taken into account. This area of experience is represented as a convex hull of available data points. With the aid of a linear programming model it is shown how conclusions can be formed in a practical way regarding aspects such as optimal levels of decision variables and forecasting.
Modeling confounding by half-sibling regression
DEFF Research Database (Denmark)
Schölkopf, Bernhard; Hogg, David W; Wang, Dun;
2016-01-01
We describe a method for removing the effect of confounders to reconstruct a latent quantity of interest. The method, referred to as "half-sibling regression," is inspired by recent work in causal inference using additive noise models. We provide a theoretical justification, discussing both indep...
Empirical Bayes Estimation in Regression Model
Institute of Scientific and Technical Information of China (English)
Li-chun Wang
2005-01-01
This paper considers the empirical Bayes (EB) estimation problem for the parameterβ of the linear regression model y = Xβ + ε with ε～ N(0, σ2I) givenβ. Based on Pitman closeness (PC) criterion and mean square error matrix (MSEM) criterion, we prove the superiority of the EB estimator over the ordinary least square estimator (OLSE).
Bootstrap inference longitudinal semiparametric regression model
Pane, Rahmawati; Otok, Bambang Widjanarko; Zain, Ismaini; Budiantara, I. Nyoman
2016-02-01
Semiparametric regression contains two components, i.e. parametric and nonparametric component. Semiparametric regression model is represented by yt i=μ (x˜'ti,zt i)+εt i where μ (x˜'ti,zt i)=x˜'tiβ ˜+g (zt i) and yti is response variable. It is assumed to have a linear relationship with the predictor variables x˜'ti=(x1 i 1,x2 i 2,…,xT i r) . Random error εti, i = 1, …, n, t = 1, …, T is normally distributed with zero mean and variance σ2 and g(zti) is a nonparametric component. The results of this study showed that the PLS approach on longitudinal semiparametric regression models obtain estimators β˜^t=[X'H(λ)X]-1X'H(λ )y ˜ and g˜^λ(z )=M (λ )y ˜ . The result also show that bootstrap was valid on longitudinal semiparametric regression model with g^λ(b )(z ) as nonparametric component estimator.
Validation of a heteroscedastic hazards regression model.
Wu, Hong-Dar Isaac; Hsieh, Fushing; Chen, Chen-Hsin
2002-03-01
A Cox-type regression model accommodating heteroscedasticity, with a power factor of the baseline cumulative hazard, is investigated for analyzing data with crossing hazards behavior. Since the approach of partial likelihood cannot eliminate the baseline hazard, an overidentified estimating equation (OEE) approach is introduced in the estimation procedure. It by-product, a model checking statistic, is presented to test for the overall adequacy of the heteroscedastic model. Further, under the heteroscedastic model setting, we propose two statistics to test the proportional hazards assumption. Implementation of this model is illustrated in a data analysis of a cancer clinical trial. PMID:11878222
Modeling oil production based on symbolic regression
International Nuclear Information System (INIS)
Numerous models have been proposed to forecast the future trends of oil production and almost all of them are based on some predefined assumptions with various uncertainties. In this study, we propose a novel data-driven approach that uses symbolic regression to model oil production. We validate our approach on both synthetic and real data, and the results prove that symbolic regression could effectively identify the true models beneath the oil production data and also make reliable predictions. Symbolic regression indicates that world oil production will peak in 2021, which broadly agrees with other techniques used by researchers. Our results also show that the rate of decline after the peak is almost half the rate of increase before the peak, and it takes nearly 12 years to drop 4% from the peak. These predictions are more optimistic than those in several other reports, and the smoother decline will provide the world, especially the developing countries, with more time to orchestrate mitigation plans. -- Highlights: •A data-driven approach has been shown to be effective at modeling the oil production. •The Hubbert model could be discovered automatically from data. •The peak of world oil production is predicted to appear in 2021. •The decline rate after peak is half of the increase rate before peak. •Oil production projected to decline 4% post-peak
Efficient robust nonparametric estimation in a semimartingale regression model
Konev, Victor
2010-01-01
The paper considers the problem of robust estimating a periodic function in a continuous time regression model with dependent disturbances given by a general square integrable semimartingale with unknown distribution. An example of such a noise is non-gaussian Ornstein-Uhlenbeck process with the L\\'evy process subordinator, which is used to model the financial Black-Scholes type markets with jumps. An adaptive model selection procedure, based on the weighted least square estimates, is proposed. Under general moment conditions on the noise distribution, sharp non-asymptotic oracle inequalities for the robust risks have been derived and the robust efficiency of the model selection procedure has been shown.
An Application on Multinomial Logistic Regression Model
Directory of Open Access Journals (Sweden)
Abdalla M El-Habil
2012-03-01
Full Text Available Normal 0 false false false EN-US X-NONE X-NONE This study aims to identify an application of Multinomial Logistic Regression model which is one of the important methods for categorical data analysis. This model deals with one nominal/ordinal response variable that has more than two categories, whether nominal or ordinal variable. This model has been applied in data analysis in many areas, for example health, social, behavioral, and educational.To identify the model by practical way, we used real data on physical violence against children, from a survey of Youth 2003 which was conducted by Palestinian Central Bureau of Statistics (PCBS. Segment of the population of children in the age group (10-14 years for residents in Gaza governorate, size of 66,935 had been selected, and the response variable consisted of four categories. Eighteen of explanatory variables were used for building the primary multinomial logistic regression model. Model had been tested through a set of statistical tests to ensure its appropriateness for the data. Also the model had been tested by selecting randomly of two observations of the data used to predict the position of each observation in any classified group it can be, by knowing the values of the explanatory variables used. We concluded by using the multinomial logistic regression model that we can able to define accurately the relationship between the group of explanatory variables and the response variable, identify the effect of each of the variables, and we can predict the classification of any individual case.
General regression and representation model for classification.
Directory of Open Access Journals (Sweden)
Jianjun Qian
Full Text Available Recently, the regularized coding-based classification methods (e.g. SRC and CRC show a great potential for pattern classification. However, most existing coding methods assume that the representation residuals are uncorrelated. In real-world applications, this assumption does not hold. In this paper, we take account of the correlations of the representation residuals and develop a general regression and representation model (GRR for classification. GRR not only has advantages of CRC, but also takes full use of the prior information (e.g. the correlations between representation residuals and representation coefficients and the specific information (weight matrix of image pixels to enhance the classification performance. GRR uses the generalized Tikhonov regularization and K Nearest Neighbors to learn the prior information from the training data. Meanwhile, the specific information is obtained by using an iterative algorithm to update the feature (or image pixel weights of the test sample. With the proposed model as a platform, we design two classifiers: basic general regression and representation classifier (B-GRR and robust general regression and representation classifier (R-GRR. The experimental results demonstrate the performance advantages of proposed methods over state-of-the-art algorithms.
Bayesian Inference of a Multivariate Regression Model
Directory of Open Access Journals (Sweden)
Marick S. Sinay
2014-01-01
Full Text Available We explore Bayesian inference of a multivariate linear regression model with use of a flexible prior for the covariance structure. The commonly adopted Bayesian setup involves the conjugate prior, multivariate normal distribution for the regression coefficients and inverse Wishart specification for the covariance matrix. Here we depart from this approach and propose a novel Bayesian estimator for the covariance. A multivariate normal prior for the unique elements of the matrix logarithm of the covariance matrix is considered. Such structure allows for a richer class of prior distributions for the covariance, with respect to strength of beliefs in prior location hyperparameters, as well as the added ability, to model potential correlation amongst the covariance structure. The posterior moments of all relevant parameters of interest are calculated based upon numerical results via a Markov chain Monte Carlo procedure. The Metropolis-Hastings-within-Gibbs algorithm is invoked to account for the construction of a proposal density that closely matches the shape of the target posterior distribution. As an application of the proposed technique, we investigate a multiple regression based upon the 1980 High School and Beyond Survey.
Regression Models for Count Data in R
Directory of Open Access Journals (Sweden)
Christian Kleiber
2008-06-01
Full Text Available The classical Poisson, geometric and negative binomial regression models for count data belong to the family of generalized linear models and are available at the core of the statistics toolbox in the R system for statistical computing. After reviewing the conceptual and computational features of these methods, a new implementation of hurdle and zero-inﬂated regression models in the functions hurdle( and zeroinfl( from the package pscl is introduced. It re-uses design and functionality of the basic R functions just as the underlying conceptual tools extend the classical models. Both hurdle and zero-inﬂated model, are able to incorporate over-dispersion and excess zeros-two problems that typically occur in count data sets in economics and the social sciences—better than their classical counterparts. Using cross-section data on the demand for medical care, it is illustrated how the classical as well as the zero-augmented models can be ﬁtted, inspected and tested in practice.
Bayesian model selection in Gaussian regression
Abramovich, Felix
2009-01-01
We consider a Bayesian approach to model selection in Gaussian linear regression, where the number of predictors might be much larger than the number of observations. From a frequentist view, the proposed procedure results in the penalized least squares estimation with a complexity penalty associated with a prior on the model size. We investigate the optimality properties of the resulting estimator. We establish the oracle inequality and specify conditions on the prior that imply its asymptotic minimaxity within a wide range of sparse and dense settings for "nearly-orthogonal" and "multicollinear" designs.
Regression models for expected length of stay.
Grand, Mia Klinten; Putter, Hein
2016-03-30
In multi-state models, the expected length of stay (ELOS) in a state is not a straightforward object to relate to covariates, and the traditional approach has instead been to construct regression models for the transition intensities and calculate ELOS from these. The disadvantage of this approach is that the effect of covariates on the intensities is not easily translated into the effect on ELOS, and it typically relies on the Markov assumption. We propose to use pseudo-observations to construct regression models for ELOS, thereby allowing a direct interpretation of covariate effects while at the same time avoiding the Markov assumption. For this approach, all we need is a non-parametric consistent estimator for ELOS. For every subject (and for every state of interest), a pseudo-observation is constructed, and they are then used as outcome variables in the regression model. We furthermore show how to construct longitudinal (pseudo-) data when combining the concept of pseudo-observations with landmarking. In doing so, covariates are allowed to be time-varying, and we can investigate potential time-varying effects of the covariates. The models can be fitted using generalized estimating equations, and dependence between observations on the same subject is handled by applying the sandwich estimator. The method is illustrated using data from the US Health and Retirement Study where the impact of socio-economic factors on ELOS in health and disability is explored. Finally, we investigate the performance of our approach under different degrees of left-truncation, non-Markovianity, and right-censoring by means of simulation. PMID:26497637
Hierarchical linear regression models for conditional quantiles
Institute of Scientific and Technical Information of China (English)
TIAN Maozai; CHEN Gemai
2006-01-01
The quantile regression has several useful features and therefore is gradually developing into a comprehensive approach to the statistical analysis of linear and nonlinear response models,but it cannot deal effectively with the data with a hierarchical structure.In practice,the existence of such data hierarchies is neither accidental nor ignorable,it is a common phenomenon.To ignore this hierarchical data structure risks overlooking the importance of group effects,and may also render many of the traditional statistical analysis techniques used for studying data relationships invalid.On the other hand,the hierarchical models take a hierarchical data structure into account and have also many applications in statistics,ranging from overdispersion to constructing min-max estimators.However,the hierarchical models are virtually the mean regression,therefore,they cannot be used to characterize the entire conditional distribution of a dependent variable given high-dimensional covariates.Furthermore,the estimated coefficient vector (marginal effects)is sensitive to an outlier observation on the dependent variable.In this article,a new approach,which is based on the Gauss-Seidel iteration and taking a full advantage of the quantile regression and hierarchical models,is developed.On the theoretical front,we also consider the asymptotic properties of the new method,obtaining the simple conditions for an n1/2-convergence and an asymptotic normality.We also illustrate the use of the technique with the real educational data which is hierarchical and how the results can be explained.
Regression Models For Saffron Yields in Iran
S. H, Sanaeinejad; S. N, Hosseini
Saffron is an important crop in social and economical aspects in Khorassan Province (Northeast of Iran). In this research wetried to evaluate trends of saffron yield in recent years and to study the relationship between saffron yield and the climate change. A regression analysis was used to predict saffron yield based on 20 years of yield data in Birjand, Ghaen and Ferdows cities.Climatologically data for the same periods was provided by database of Khorassan Climatology Center. Climatologically data includedtemperature, rainfall, relative humidity and sunshine hours for ModelI, and temperature and rainfall for Model II. The results showed the coefficients of determination for Birjand, Ferdows and Ghaen for Model I were 0.69, 0.50 and 0.81 respectively. Also coefficients of determination for the same cities for model II were 0.53, 0.50 and 0.72 respectively. Multiple regression analysisindicated that among weather variables, temperature was the key parameter for variation ofsaffron yield. It was concluded that increasing temperature at spring was the main cause of declined saffron yield during recent years across the province. Finally, yield trend was predicted for the last 5 years using time series analysis.
Quantile regression modeling for Malaysian automobile insurance premium data
Fuzi, Mohd Fadzli Mohd; Ismail, Noriszura; Jemain, Abd Aziz
2015-09-01
Quantile regression is a robust regression to outliers compared to mean regression models. Traditional mean regression models like Generalized Linear Model (GLM) are not able to capture the entire distribution of premium data. In this paper we demonstrate how a quantile regression approach can be used to model net premium data to study the effects of change in the estimates of regression parameters (rating classes) on the magnitude of response variable (pure premium). We then compare the results of quantile regression model with Gamma regression model. The results from quantile regression show that some rating classes increase as quantile increases and some decrease with decreasing quantile. Further, we found that the confidence interval of median regression (τ = O.5) is always smaller than Gamma regression in all risk factors.
An Additive-Multiplicative Cox-Aalen Regression Model
DEFF Research Database (Denmark)
Scheike, Thomas H.; Zhang, Mei-Jie
2002-01-01
Aalen model; additive risk model; counting processes; Cox regression; survival analysis; time-varying effects......Aalen model; additive risk model; counting processes; Cox regression; survival analysis; time-varying effects...
Inferring gene regression networks with model trees
Directory of Open Access Journals (Sweden)
Aguilar-Ruiz Jesus S
2010-10-01
Full Text Available Abstract Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear
Multiple Linear Regression Models in Outlier Detection
Directory of Open Access Journals (Sweden)
S.M.A.Khaleelur Rahman
2012-02-01
Full Text Available Identifying anomalous values in the real-world database is important both for improving the quality of original data and for reducing the impact of anomalous values in the process of knowledge discovery in databases. Such anomalous values give useful information to the data analyst in discovering useful patterns. Through isolation, these data may be separated and analyzed. The analysis of outliers and influential points is an important step of the regression diagnostics. In this paper, our aim is to detect the points which are very different from the others points. They do not seem to belong to a particular population and behave differently. If these influential points are to be removed it will lead to a different model. Distinction between these points is not always obvious and clear. Hence several indicators are used for identifying and analyzing outliers. Existing methods of outlier detection are based on manual inspection of graphically represented data. In this paper, we present a new approach in automating the process of detecting and isolating outliers. Impact of anomalous values on the dataset has been established by using two indicators DFFITS and Cook’sD. The process is based on modeling the human perception of exceptional values by using multiple linear regression analysis.
Predictive densities for day-ahead electricity prices using time-adaptive quantile regression
DEFF Research Database (Denmark)
Jónsson, Tryggvi; Pinson, Pierre; Madsen, Henrik;
2014-01-01
is compared to that of four benchmark approaches and the well-known the generalist autoregressive conditional heteroskedasticity (GARCH) model over a three-year evaluation period. While all benchmarks are outperformed in terms of forecasting skill overall, the superiority of the semi-parametric model over......A large part of the decision-making problems actors of the power system are facing on a daily basis requires scenarios for day-ahead electricity market prices. These scenarios are most likely to be generated based on marginal predictive densities for such prices, then enhanced with a temporal...... dependence structure. A semi-parametric methodology for generating such densities is presented: it includes: (i) a time-adaptive quantile regression model for the 5%–95% quantiles; and (ii) a description of the distribution tails with exponential distributions. The forecasting skill of the proposed model...
Entrepreneurial intention modeling using hierarchical multiple regression
Directory of Open Access Journals (Sweden)
Marina Jeger
2014-12-01
Full Text Available The goal of this study is to identify the contribution of effectuation dimensions to the predictive power of the entrepreneurial intention model over and above that which can be accounted for by other predictors selected and confirmed in previous studies. As is often the case in social and behavioral studies, some variables are likely to be highly correlated with each other. Therefore, the relative amount of variance in the criterion variable explained by each of the predictors depends on several factors such as the order of variable entry and sample specifics. The results show the modest predictive power of two dimensions of effectuation prior to the introduction of the theory of planned behavior elements. The article highlights the main advantages of applying hierarchical regression in social sciences as well as in the specific context of entrepreneurial intention formation, and addresses some of the potential pitfalls that this type of analysis entails.
Pax6 in Collembola: Adaptive Evolution of Eye Regression.
Hou, Ya-Nan; Li, Sheng; Luan, Yun-Xia
2016-01-01
Unlike the compound eyes in insects, collembolan eyes are comparatively simple: some species have eyes with different numbers of ocelli (1 + 1 to 8 + 8), and some species have no apparent eye structures. Pax6 is a universal master control gene for eye morphogenesis. In this study, full-length Pax6 cDNAs, Fc-Pax6 and Cd-Pax6, were cloned from an eyeless collembolan (Folsomia candida, soil-dwelling) and an eyed one (Ceratophysella denticulata, surface-dwelling), respectively. Their phylogenetic positions are between the two Pax6 paralogs in insects, eyeless (ey) and twin of eyeless (toy), and their protein sequences are more similar to Ey than to Toy. Both Fc-Pax6 and Cd-Pax6 could induce ectopic eyes in Drosophila, while Fc-Pax6 exhibited much weaker transactivation ability than Cd-Pax6. The C-terminus of collembolan Pax6 is indispensable for its transactivation ability, and determines the differences of transactivation ability between Fc-Pax6 and Cd-Pax6. One of the possible reasons is that Fc-Pax6 accumulated more mutations at some key functional sites of C-terminus under a lower selection pressure on eye development due to the dark habitats of F. candida. The composite data provide a first molecular evidence for the monophyletic origin of collembolan eyes, and indicate the eye degeneration of collembolans is caused by adaptive evolution. PMID:26856893
Model selection in kernel ridge regression
DEFF Research Database (Denmark)
Exterkate, Peter
2013-01-01
Kernel ridge regression is a technique to perform ridge regression with a potentially infinite number of nonlinear transformations of the independent variables as regressors. This method is gaining popularity as a data-rich nonlinear forecasting tool, which is applicable in many different context...
An adaptive online learning approach for Support Vector Regression: Online-SVR-FID
Liu, Jie; Zio, Enrico
2016-08-01
Support Vector Regression (SVR) is a popular supervised data-driven approach for building empirical models from available data. Like all data-driven methods, under non-stationary environmental and operational conditions it needs to be provided with adaptive learning capabilities, which might become computationally burdensome with large datasets cumulating dynamically. In this paper, a cost-efficient online adaptive learning approach is proposed for SVR by combining Feature Vector Selection (FVS) and Incremental and Decremental Learning. The proposed approach adaptively modifies the model only when different pattern drifts are detected according to proposed criteria. Two tolerance parameters are introduced in the approach to control the computational complexity, reduce the influence of the intrinsic noise in the data and avoid the overfitting problem of SVR. Comparisons of the prediction results is made with other online learning approaches e.g. NORMA, SOGA, KRLS, Incremental Learning, on several artificial datasets and a real case study concerning time series prediction based on data recorded on a component of a nuclear power generation system. The performance indicators MSE and MARE computed on the test dataset demonstrate the efficiency of the proposed online learning method.
Irwansyah, Edy
2015-01-01
Cox Proportional Hazard (Cox PH) model is a survival analysis method to perform model of relationship between independent variable and dependent variable which shown by time until an event occurs. This method compute residuals, martingale or deviance, which can used to diagnostic the lack of fit of a model and PH assumption. The alternative method if these not satisfied is Multivariate Adaptive Regression Splines (MARS) approach. This method use to perform the analysis of product selling time...
Ahmadlou, M.; Delavar, M. R.; Tayyebi, A.; Shafizadeh-Moghadam, H.
2015-12-01
Land use change (LUC) models used for modelling urban growth are different in structure and performance. Local models divide the data into separate subsets and fit distinct models on each of the subsets. Non-parametric models are data driven and usually do not have a fixed model structure or model structure is unknown before the modelling process. On the other hand, global models perform modelling using all the available data. In addition, parametric models have a fixed structure before the modelling process and they are model driven. Since few studies have compared local non-parametric models with global parametric models, this study compares a local non-parametric model called multivariate adaptive regression spline (MARS), and a global parametric model called artificial neural network (ANN) to simulate urbanization in Mumbai, India. Both models determine the relationship between a dependent variable and multiple independent variables. We used receiver operating characteristic (ROC) to compare the power of the both models for simulating urbanization. Landsat images of 1991 (TM) and 2010 (ETM+) were used for modelling the urbanization process. The drivers considered for urbanization in this area were distance to urban areas, urban density, distance to roads, distance to water, distance to forest, distance to railway, distance to central business district, number of agricultural cells in a 7 by 7 neighbourhoods, and slope in 1991. The results showed that the area under the ROC curve for MARS and ANN was 94.77% and 95.36%, respectively. Thus, ANN performed slightly better than MARS to simulate urban areas in Mumbai, India.
Directory of Open Access Journals (Sweden)
M. Ahmadlou
2015-12-01
Full Text Available Land use change (LUC models used for modelling urban growth are different in structure and performance. Local models divide the data into separate subsets and fit distinct models on each of the subsets. Non-parametric models are data driven and usually do not have a fixed model structure or model structure is unknown before the modelling process. On the other hand, global models perform modelling using all the available data. In addition, parametric models have a fixed structure before the modelling process and they are model driven. Since few studies have compared local non-parametric models with global parametric models, this study compares a local non-parametric model called multivariate adaptive regression spline (MARS, and a global parametric model called artificial neural network (ANN to simulate urbanization in Mumbai, India. Both models determine the relationship between a dependent variable and multiple independent variables. We used receiver operating characteristic (ROC to compare the power of the both models for simulating urbanization. Landsat images of 1991 (TM and 2010 (ETM+ were used for modelling the urbanization process. The drivers considered for urbanization in this area were distance to urban areas, urban density, distance to roads, distance to water, distance to forest, distance to railway, distance to central business district, number of agricultural cells in a 7 by 7 neighbourhoods, and slope in 1991. The results showed that the area under the ROC curve for MARS and ANN was 94.77% and 95.36%, respectively. Thus, ANN performed slightly better than MARS to simulate urban areas in Mumbai, India.
Model performance analysis and model validation in logistic regression
Directory of Open Access Journals (Sweden)
Rosa Arboretti Giancristofaro
2007-10-01
Full Text Available In this paper a new model validation procedure for a logistic regression model is presented. At first, we illustrate a brief review of different techniques of model validation. Next, we define a number of properties required for a model to be considered "good", and a number of quantitative performance measures. Lastly, we describe a methodology for the assessment of the performance of a given model by using an example taken from a management study.
Bayesian Model Averaging in the Instrumental Variable Regression Model
Gary Koop; Robert Leon Gonzalez; Rodney Strachan
2011-01-01
This paper considers the instrumental variable regression model when there is uncertainly about the set of instruments, exogeneity restrictions, the validity of identifying restrictions and the set of exogenous regressors. This uncertainly can result in a huge number of models. To avoid statistical problems associated with standard model selection procedures, we develop a reversible jump Markov chain Monte Carlo algorithm that allows us to do Bayesian model averaging. The algorithm is very fl...
Electricity prices forecasting by automatic dynamic harmonic regression models
International Nuclear Information System (INIS)
The changes experienced by electricity markets in recent years have created the necessity for more accurate forecast tools of electricity prices, both for producers and consumers. Many methodologies have been applied to this aim, but in the view of the authors, state space models are not yet fully exploited. The present paper proposes a univariate dynamic harmonic regression model set up in a state space framework for forecasting prices in these markets. The advantages of the approach are threefold. Firstly, a fast automatic identification and estimation procedure is proposed based on the frequency domain. Secondly, the recursive algorithms applied offer adaptive predictions that compare favourably with respect to other techniques. Finally, since the method is based on unobserved components models, explicit information about trend, seasonal and irregular behaviours of the series can be extracted. This information is of great value to the electricity companies' managers in order to improve their strategies, i.e. it provides management innovations. The good forecast performance and the rapid adaptability of the model to changes in the data are illustrated with actual prices taken from the PJM interconnection in the US and for the Spanish market for the year 2002
Model Selection in Kernel Ridge Regression
DEFF Research Database (Denmark)
Exterkate, Peter
Kernel ridge regression is gaining popularity as a data-rich nonlinear forecasting tool, which is applicable in many different contexts. This paper investigates the influence of the choice of kernel and the setting of tuning parameters on forecast accuracy. We review several popular kernels...
Improved Estimation of Earth Rotation Parameters Using the Adaptive Ridge Regression
Huang, Chengli; Jin, Wenjing
1998-05-01
The multicollinearity among regression variables is a common phenomenon in the reduction of astronomical data. The phenomenon of multicollinearity and the diagnostic factors are introduced first. As a remedy, a new method, called adaptive ridge regression (ARR), which is an improved method of choosing the departure constant θ in ridge regression, is suggested and applied in a case that the Earth orientation parameters (EOP) are determined by lunar laser ranging (LLR). It is pointed out, via a diagnosis, the variance inflation factors (VIFs), that there exists serious multicollinearity among the regression variables. It is shown that the ARR method is effective in reducing the multicollinearity and makes the regression coefficients more stable than that of using ordinary least squares estimation (LS), especially when there is serious multicollinearity.
Algamal, Zakariya Yahya; Lee, Muhammad Hisyam
2015-12-01
Cancer classification and gene selection in high-dimensional data have been popular research topics in genetics and molecular biology. Recently, adaptive regularized logistic regression using the elastic net regularization, which is called the adaptive elastic net, has been successfully applied in high-dimensional cancer classification to tackle both estimating the gene coefficients and performing gene selection simultaneously. The adaptive elastic net originally used elastic net estimates as the initial weight, however, using this weight may not be preferable for certain reasons: First, the elastic net estimator is biased in selecting genes. Second, it does not perform well when the pairwise correlations between variables are not high. Adjusted adaptive regularized logistic regression (AAElastic) is proposed to address these issues and encourage grouping effects simultaneously. The real data results indicate that AAElastic is significantly consistent in selecting genes compared to the other three competitor regularization methods. Additionally, the classification performance of AAElastic is comparable to the adaptive elastic net and better than other regularization methods. Thus, we can conclude that AAElastic is a reliable adaptive regularized logistic regression method in the field of high-dimensional cancer classification.
A Dirty Model for Multiple Sparse Regression
Jalali, Ali; Sanghavi, Sujay
2011-01-01
Sparse linear regression -- finding an unknown vector from linear measurements -- is now known to be possible with fewer samples than variables, via methods like the LASSO. We consider the multiple sparse linear regression problem, where several related vectors -- with partially shared support sets -- have to be recovered. A natural question in this setting is whether one can use the sharing to further decrease the overall number of samples required. A line of recent research has studied the use of \\ell_1/\\ell_q norm block-regularizations with q>1 for such problems; however these could actually perform worse in sample complexity -- vis a vis solving each problem separately ignoring sharing -- depending on the level of sharing. We present a new method for multiple sparse linear regression that can leverage support and parameter overlap when it exists, but not pay a penalty when it does not. A very simple idea: we decompose the parameters into two components and regularize these differently. We show both theore...
Institute of Scientific and Technical Information of China (English)
NURWAHA Deogratias; WANG Xin-hou
2008-01-01
This paper presents a comparison study of two models for predicting the strength of rotor spun cotton yarns from fiber properties. The adaptive neuro-fuzzy system inference (ANFIS) and Multiple Linear Regression models are used to predict the rotor spun yarn strength. Fiber properties and yarn count are used as inputs to train the two models and the count-strength-product (CSP) was the target. The predictive performances of the two models are estimated and compared. We found that the ANFIS has a better predictive power in comparison with linear multipleregression model. The impact of each fiber property is also illustrated.
Zheng, Zhihui; Gao, Lei; Xiao, Liping; Zhou, Bin; Gao, Shibo
2015-12-01
Our purpose is to develop a detection algorithm capable of searching for generic interest objects in real time without large training sets and long-time training stages. Instead of the classical sliding window object detection paradigm, we employ an objectness measure to produce a small set of candidate windows efficiently using Binarized Normed Gradients and a Laplacian of Gaussian-like filter. We then extract Locally Adaptive Regression Kernels (LARKs) as descriptors both from a model image and the candidate windows which measure the likeness of a pixel to its surroundings. Using a matrix cosine similarity measure, the algorithm yields a scalar resemblance map, indicating the likelihood of similarity between the model and the candidate windows. By employing nonparametric significance tests and non-maxima suppression, we detect the presence of objects similar to the given model. Experiments show that the proposed detection paradigm can automatically detect the presence, the number, as well as location of similar objects to the given model. The high quality and efficiency of our method make it suitable for real time multi-category object detection applications.
Directory of Open Access Journals (Sweden)
Hong-Juan Li
2013-04-01
Full Text Available Electric load forecasting is an important issue for a power utility, associated with the management of daily operations such as energy transfer scheduling, unit commitment, and load dispatch. Inspired by strong non-linear learning capability of support vector regression (SVR, this paper presents a SVR model hybridized with the empirical mode decomposition (EMD method and auto regression (AR for electric load forecasting. The electric load data of the New South Wales (Australia market are employed for comparing the forecasting performances of different forecasting models. The results confirm the validity of the idea that the proposed model can simultaneously provide forecasting with good accuracy and interpretability.
Multiple Retrieval Models and Regression Models for Prior Art Search
Lopez, Patrice
2009-01-01
This paper presents the system called PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach presents three main characteristics: 1. The usage of multiple retrieval models (KL, Okapi) and term index definitions (lemma, phrase, concept) for the three languages considered in the present track (English, French, German) producing ten different sets of ranked results. 2. The merging of the different results based on multiple regression models using an additional validation set created from the patent collection. 3. The exploitation of patent metadata and of the citation structures for creating restricted initial working sets of patents and for producing a final re-ranking regression model. As we exploit specific metadata of the patent documents and the citation relations only at the creation of initial working sets and during the final post ranking step, our architecture remains generic and easy to extend.
Vargas, M.; Crossa, J.; Eeuwijk, van F.A.; Ramirez, M.E.; Sayre, K.
1999-01-01
Partial least squares (PLS) and factorial regression (FR) are statistical models that incorporate external environmental and/or cultivar variables for studying and interpreting genotype × environment interaction (GEl). The Additive Main effect and Multiplicative Interaction (AMMI) model uses only th
Relative risk regression models with inverse polynomials.
Ning, Yang; Woodward, Mark
2013-08-30
The proportional hazards model assumes that the log hazard ratio is a linear function of parameters. In the current paper, we model the log relative risk as an inverse polynomial, which is particularly suitable for modeling bounded and asymmetric functions. The parameters estimated by maximizing the partial likelihood are consistent and asymptotically normal. The advantages of the inverse polynomial model over the ordinary polynomial model and the fractional polynomial model for fitting various asymmetric log relative risk functions are shown by simulation. The utility of the method is further supported by analyzing two real data sets, addressing the specific question of the location of the minimum risk threshold.
Impact of multicollinearity on small sample hydrologic regression models
Kroll, Charles N.; Song, Peter
2013-06-01
Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.
Multiattribute shopping models and ridge regression analysis
Timmermans, HJP Harry
1981-01-01
Policy decisions regarding retailing facilities essentially involve multiple attributes of shopping centres. If mathematical shopping models are to contribute to these decision processes, their structure should reflect the multiattribute character of retailing planning. Examination of existing models shows that most operational shopping models include only two policy variables. A serious problem in the calibration of the existing multiattribute shopping models is that of multicollinearity ari...
Drought Patterns Forecasting using an Auto-Regressive Logistic Model
del Jesus, M.; Sheffield, J.; Méndez Incera, F. J.; Losada, I. J.; Espejo, A.
2014-12-01
Drought is characterized by a water deficit that may manifest across a large range of spatial and temporal scales. Drought may create important socio-economic consequences, many times of catastrophic dimensions. A quantifiable definition of drought is elusive because depending on its impacts, consequences and generation mechanism, different water deficit periods may be identified as a drought by virtue of some definitions but not by others. Droughts are linked to the water cycle and, although a climate change signal may not have emerged yet, they are also intimately linked to climate.In this work we develop an auto-regressive logistic model for drought prediction at different temporal scales that makes use of a spatially explicit framework. Our model allows to include covariates, continuous or categorical, to improve the performance of the auto-regressive component.Our approach makes use of dimensionality reduction (principal component analysis) and classification techniques (K-Means and maximum dissimilarity) to simplify the representation of complex climatic patterns, such as sea surface temperature (SST) and sea level pressure (SLP), while including information on their spatial structure, i.e. considering their spatial patterns. This procedure allows us to include in the analysis multivariate representation of complex climatic phenomena, as the El Niño-Southern Oscillation. We also explore the impact of other climate-related variables such as sun spots. The model allows to quantify the uncertainty of the forecasts and can be easily adapted to make predictions under future climatic scenarios. The framework herein presented may be extended to other applications such as flash flood analysis, or risk assessment of natural hazards.
Support vector regression model for complex target RCS predicting
Institute of Scientific and Technical Information of China (English)
Wang Gu; Chen Weishi; Miao Jungang
2009-01-01
The electromagnetic scattering computation has developed rapidly for many years; some computing problems for complex and coated targets cannot be solved by using the existing theory and computing models. A computing model based on data is established for making up the insufficiency of theoretic models. Based on the "support vector regression method", which is formulated on the principle of minimizing a structural risk, a data model to predicate the unknown radar cross section of some appointed targets is given. Comparison between the actual data and the results of this predicting model based on support vector regression method proved that the support vector regression method is workable and with a comparative precision.
Directory of Open Access Journals (Sweden)
Yunfeng Wu
2014-01-01
Full Text Available This paper presents a novel adaptive linear and normalized combination (ALNC method that can be used to combine the component radial basis function networks (RBFNs to implement better function approximation and regression tasks. The optimization of the fusion weights is obtained by solving a constrained quadratic programming problem. According to the instantaneous errors generated by the component RBFNs, the ALNC is able to perform the selective ensemble of multiple leaners by adaptively adjusting the fusion weights from one instance to another. The results of the experiments on eight synthetic function approximation and six benchmark regression data sets show that the ALNC method can effectively help the ensemble system achieve a higher accuracy (measured in terms of mean-squared error and the better fidelity (characterized by normalized correlation coefficient of approximation, in relation to the popular simple average, weighted average, and the Bagging methods.
A SEMIPARAMETRIC BAYESIAN MODEL FOR CIRCULAR-LINEAR REGRESSION
We present a Bayesian approach to regress a circular variable on a linear predictor. The regression coefficients are assumed to have a nonparametric distribution with a Dirichlet process prior. The semiparametric Bayesian approach gives added flexibility to the model and is usefu...
REPRESENTATIVE VARIABLES IN A MULTIPLE REGRESSION MODEL
Directory of Open Access Journals (Sweden)
Barbu Bogdan POPESCU
2013-02-01
Full Text Available There are presented econometric models developed for analysis of banking exclusion of the economic crisis. Access to public goods and services is a condition „sine qua non” for open and efficient society. Availability of banking and payment of the entire population without discrimination in our opinion should be the primary objective of public service policy.
Augmented mixed beta regression models for periodontal proportion data.
Galvis, Diana M; Bandyopadhyay, Dipankar; Lachos, Victor H
2014-09-20
Continuous (clustered) proportion data often arise in various domains of medicine and public health where the response variable of interest is a proportion (or percentage) quantifying disease status for the cluster units, ranging between zero and one. However, because of the presence of relatively disease-free as well as heavily diseased subjects in any study, the proportion values can lie in the interval [0,1]. While beta regression can be adapted to assess covariate effects in these situations, its versatility is often challenged because of the presence/excess of zeros and ones because the beta support lies in the interval (0,1). To circumvent this, we augment the probabilities of zero and one with the beta density, controlling for the clustering effect. Our approach is Bayesian with the ability to borrow information across various stages of the complex model hierarchy and produces a computationally convenient framework amenable to available freeware. The marginal likelihood is tractable and can be used to develop Bayesian case-deletion influence diagnostics based on q-divergence measures. Both simulation studies and application to a real dataset from a clinical periodontology study quantify the gain in model fit and parameter estimation over other ad hoc alternatives and provide quantitative insight into assessing the true covariate effects on the proportion responses.
Residual diagnostics for cross-section time series regression models
Baum, Christopher F
2001-01-01
These routines support the diagnosis of groupwise heteroskedasticity and cross-sectional correlation in the context of a regression model fit to pooled cross-section time series (xt) data. Copyright 2001 by Stata Corporation.
Directory of Open Access Journals (Sweden)
Ahmad A. Saifan
2016-04-01
Full Text Available Regression testing is a safeguarding procedure to validate and verify adapted software, and guarantee that no errors have emerged. However, regression testing is very costly when testers need to re-execute all the test cases against the modified software. This paper proposes a new approach in regression test selection domain. The approach is based on meta-models (test models and structured models to decrease the number of test cases to be used in the regression testing process. The approach has been evaluated using three Java applications. To measure the effectiveness of the proposed approach, we compare the results using the re-test to all approaches. The results have shown that our approach reduces the size of test suite without negative impact on the effectiveness of the fault detection.
Matrix variate logistic regression model with application to EEG data.
Hung, Hung; Wang, Chen-Chien
2013-01-01
Logistic regression has been widely applied in the field of biomedical research for a long time. In some applications, the covariates of interest have a natural structure, such as that of a matrix, at the time of collection. The rows and columns of the covariate matrix then have certain physical meanings, and they must contain useful information regarding the response. If we simply stack the covariate matrix as a vector and fit a conventional logistic regression model, relevant information can be lost, and the problem of inefficiency will arise. Motivated from these reasons, we propose in this paper the matrix variate logistic (MV-logistic) regression model. The advantages of the MV-logistic regression model include the preservation of the inherent matrix structure of covariates and the parsimony of parameters needed. In the EEG Database Data Set, we successfully extract the structural effects of covariate matrix, and a high classification accuracy is achieved.
Robust Depth-Weighted Wavelet for Nonparametric Regression Models
Institute of Scientific and Technical Information of China (English)
Lu LIN
2005-01-01
In the nonpaxametric regression models, the original regression estimators including kernel estimator, Fourier series estimator and wavelet estimator are always constructed by the weighted sum of data, and the weights depend only on the distance between the design points and estimation points. As a result these estimators are not robust to the perturbations in data. In order to avoid this problem, a new nonparametric regression model, called the depth-weighted regression model, is introduced and then the depth-weighted wavelet estimation is defined. The new estimation is robust to the perturbations in data, which attains very high breakdown value close to 1/2. On the other hand, some asymptotic behaviours such as asymptotic normality are obtained. Some simulations illustrate that the proposed wavelet estimator is more robust than the original wavelet estimator and, as a price to pay for the robustness, the new method is slightly less efficient than the original method.
Alternative regression models to assess increase in childhood BMI
Directory of Open Access Journals (Sweden)
Mansmann Ulrich
2008-09-01
Full Text Available Abstract Background Body mass index (BMI data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Methods Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs, quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS. We analyzed data of 4967 children participating in the school entry health examination in Bavaria, Germany, from 2001 to 2002. TV watching, meal frequency, breastfeeding, smoking in pregnancy, maternal obesity, parental social class and weight gain in the first 2 years of life were considered as risk factors for obesity. Results GAMLSS showed a much better fit regarding the estimation of risk factors effects on transformed and untransformed BMI data than common GLMs with respect to the generalized Akaike information criterion. In comparison with GAMLSS, quantile regression allowed for additional interpretation of prespecified distribution quantiles, such as quantiles referring to overweight or obesity. The variables TV watching, maternal BMI and weight gain in the first 2 years were directly, and meal frequency was inversely significantly associated with body composition in any model type examined. In contrast, smoking in pregnancy was not directly, and breastfeeding and parental social class were not inversely significantly associated with body composition in GLM models, but in GAMLSS and partly in quantile regression models. Risk factor specific BMI percentile curves could be estimated from GAMLSS and quantile regression models. Conclusion GAMLSS and quantile regression seem to be more appropriate than common GLMs for risk factor modeling of BMI data.
Correlation between Production and Labor based on Regression Model
Constantin Anghelache
2015-01-01
In the theoretical analysis, dependency of variables is stochastic. Consideration of the residual variable within such a model is needed. Other factors that influence the score variable are grouped in the residual. Uni-factorial nonlinear models are linearized transformations that are applied to the variables, the regression model. So, for example, a model of the form turns into a linear model by logarithm the two terms of the above equality, resulting in linear function. This model is recomm...
Analysis of Sting Balance Calibration Data Using Optimized Regression Models
Ulbrich, N.; Bader, Jon B.
2010-01-01
Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.
Regression model for Quality of Web Services dataset with WEKA
Shalini Gambhir; Puneet Arora; Jatin Gambhir
2013-01-01
The Waikato Environment for Knowledge Analysis (WEKA) came about through the perceived need for a uniﬁed workbench that would allow researchers easy access to state-of the-art techniques in machine learning algorithms for data mining tasks. It provides a general-purpose environment for automatic classiﬁcation, regression, clustering, and feature selection etc. in various research areas. This paper provides an introduction to the WEKA workbench and briefly discusses regression model for some o...
Directory of Open Access Journals (Sweden)
Roseane Cavalcanti dos Santos
2012-08-01
Full Text Available The objective of this work was to estimate the stability and adaptability of pod and seed yield in runner peanut genotypes based on the nonlinear regression and AMMI analysis. Yield data from 11 trials, distributed in six environments and three harvests, carried out in the Northeast region of Brazil during the rainy season were used. Significant effects of genotypes (G, environments (E, and GE interactions were detected in the analysis, indicating different behaviors among genotypes in favorable and unfavorable environmental conditions. The genotypes BRS Pérola Branca and LViPE‑06 are more stable and adapted to the semiarid environment, whereas LGoPE‑06 is a promising material for pod production, despite being highly dependent on favorable environments.
Joint regression analysis and AMMI model applied to oat improvement
Oliveira, A.; Oliveira, T. A.; Mejza, S.
2012-09-01
In our work we present an application of some biometrical methods useful in genotype stability evaluation, namely AMMI model, Joint Regression Analysis (JRA) and multiple comparison tests. A genotype stability analysis of oat (Avena Sativa L.) grain yield was carried out using data of the Portuguese Plant Breeding Board, sample of the 22 different genotypes during the years 2002, 2003 and 2004 in six locations. In Ferreira et al. (2006) the authors state the relevance of the regression models and of the Additive Main Effects and Multiplicative Interactions (AMMI) model, to study and to estimate phenotypic stability effects. As computational techniques we use the Zigzag algorithm to estimate the regression coefficients and the agricolae-package available in R software for AMMI model analysis.
Buffalos milk yield analysis using random regression models
Directory of Open Access Journals (Sweden)
A.S. Schierholt
2010-02-01
Full Text Available Data comprising 1,719 milk yield records from 357 females (predominantly Murrah breed, daughters of 110 sires, with births from 1974 to 2004, obtained from the Programa de Melhoramento Genético de Bubalinos (PROMEBUL and from records of EMBRAPA Amazônia Oriental - EAO herd, located in Belém, Pará, Brazil, were used to compare random regression models for estimating variance components and predicting breeding values of the sires. The data were analyzed by different models using the Legendre’s polynomial functions from second to fourth orders. The random regression models included the effects of herd-year, month of parity date of the control; regression coefficients for age of females (in order to describe the fixed part of the lactation curve and random regression coefficients related to the direct genetic and permanent environment effects. The comparisons among the models were based on the Akaike Infromation Criterion. The random effects regression model using third order Legendre’s polynomials with four classes of the environmental effect were the one that best described the additive genetic variation in milk yield. The heritability estimates varied from 0.08 to 0.40. The genetic correlation between milk yields in younger ages was close to the unit, but in older ages it was low.
Default Bayes Factors for Model Selection in Regression
Rouder, J.N.; Morey, R.D.
2012-01-01
In this article, we present a Bayes factor solution for inference in multiple regression. Bayes factors are principled measures of the relative evidence from data for various models or positions, including models that embed null hypotheses. In this regard, they may be used to state positive evidence
Directory of Open Access Journals (Sweden)
Xiaoguang Cui
2014-01-01
Full Text Available This paper proposes a novel top-down visual saliency detection method for optical satellite images using local adaptive regression kernels. This method provides a saliency map by measuring the likeness of image patches to a given single template image. The local adaptive regression kernel (LARK is used as a descriptor to extract feature and compare against analogous feature from the target image. A multi-scale pyramid of the target image is constructed to cope with large-scale variations. In addition, accounting for rotation variations, the histogram of kernel orientation is employed to estimate the rotation angle of image patch, and then comparison is performed after rotating the patch by the estimated angle. Moreover, we use the bounded partial correlation (BPC to compare features between image patches and the template so as to rapidly generate the saliency map. Experiments were performed in optical satellite images to find airplanes, and experimental results demonstrate that the proposed method is effective and robust in complex scenes.
Locally adaptive regression filter-based infrared focal plane array non-uniformity correction
Li, Jia; Qin, Hanlin; Yan, Xiang; Huang, He; Zhao, Yingjuan; Zhou, Huixin
2015-10-01
Due to the limitations of the manufacturing technology, the response rates to the same infrared radiation intensity in each infrared detector unit are not identical. As a result, the non-uniformity of infrared focal plane array, also known as fixed pattern noise (FPN), is generated. To solve this problem, correcting the non-uniformity in infrared image is a promising approach, and many non-uniformity correction (NUC) methods have been proposed. However, they have some defects such as slow convergence, ghosting and scene degradation. To overcome these defects, a novel non-uniformity correction method based on locally adaptive regression filter is proposed. First, locally adaptive regression method is used to separate the infrared image into base layer containing main scene information and the detail layer containing detailed scene with FPN. Then, the detail layer sequence is filtered by non-linear temporal filter to obtain the non-uniformity. Finally, the high quality infrared image is obtained by subtracting non-uniformity component from original image. The experimental results show that the proposed method can significantly eliminate the ghosting and the scene degradation. The results of correction are superior to the THPF-NUC and NN-NUC in the aspects of subjective visual and objective evaluation index.
Directory of Open Access Journals (Sweden)
Hongjian Wang
2014-01-01
Full Text Available We present a support vector regression-based adaptive divided difference filter (SVRADDF algorithm for improving the low state estimation accuracy of nonlinear systems, which are typically affected by large initial estimation errors and imprecise prior knowledge of process and measurement noises. The derivative-free SVRADDF algorithm is significantly simpler to compute than other methods and is implemented using only functional evaluations. The SVRADDF algorithm involves the use of the theoretical and actual covariance of the innovation sequence. Support vector regression (SVR is employed to generate the adaptive factor to tune the noise covariance at each sampling instant when the measurement update step executes, which improves the algorithm’s robustness. The performance of the proposed algorithm is evaluated by estimating states for (i an underwater nonmaneuvering target bearing-only tracking system and (ii maneuvering target bearing-only tracking in an air-traffic control system. The simulation results show that the proposed SVRADDF algorithm exhibits better performance when compared with a traditional DDF algorithm.
An Implementation of Bayesian Adaptive Regression Splines (BARS in C with S and R Wrappers
Directory of Open Access Journals (Sweden)
Garrick Wallstrom
2007-02-01
Full Text Available BARS (DiMatteo, Genovese, and Kass 2001 uses the powerful reversible-jump MCMC engine to perform spline-based generalized nonparametric regression. It has been shown to work well in terms of having small mean-squared error in many examples (smaller than known competitors, as well as producing visually-appealing fits that are smooth (filtering out high-frequency noise while adapting to sudden changes (retaining high-frequency signal. However, BARS is computationally intensive. The original implementation in S was too slow to be practical in certain situations, and was found to handle some data sets incorrectly. We have implemented BARS in C for the normal and Poisson cases, the latter being important in neurophysiological and other point-process applications. The C implementation includes all needed subroutines for fitting Poisson regression, manipulating B-splines (using code created by Bates and Venables, and finding starting values for Poisson regression (using code for density estimation created by Kooperberg. The code utilizes only freely-available external libraries (LAPACK and BLAS and is otherwise self-contained. We have also provided wrappers so that BARS can be used easily within S or R.
Flexible competing risks regression modeling and goodness-of-fit
DEFF Research Database (Denmark)
Scheike, Thomas; Zhang, Mei-Jie
2008-01-01
-specific hazards. Another recent approach is to directly model the cumulative incidence by a proportional model (Fine and Gray, J Am Stat Assoc 94:496-509, 1999), and then obtain direct estimates of how covariates influences the cumulative incidence curve. We consider a simple and flexible class of regression......In this paper we consider different approaches for estimation and assessment of covariate effects for the cumulative incidence curve in the competing risks model. The classic approach is to model all cause-specific hazards and then estimate the cumulative incidence curve based on these cause...... models that is easy to fit and contains the Fine-Gray model as a special case. One advantage of this approach is that our regression modeling allows for non-proportional hazards. This leads to a new simple goodness-of-fit procedure for the proportional subdistribution hazards assumption that is very easy...
A fitter use of Monte Carlo simulations in regression models
Directory of Open Access Journals (Sweden)
Alessandro Ferrarini
2011-12-01
Full Text Available In this article, I focus on the use of Monte Carlo simulations (MCS within regression models, being this application very frequent in biology, ecology and economy as well. I'm interested in enhancing a typical fault in this application of MCS, i.e. the inner correlations among independent variables are not used when generating random numbers that fit their distributions. By means of an illustrative example, I provide proof that the misuse of MCS in regression models produces misleading results. Furthermore, I also provide a solution for this topic.
Testing heteroscedasticity by wavelets in a nonparametric regression model
Institute of Scientific and Technical Information of China (English)
LI; Yuan; WONG; Heung; IP; Waicheung
2006-01-01
In the nonparametric regression models, a homoscedastic structure is usually assumed. However, the homoscedasticity cannot be guaranteed a priori. Hence, testing the heteroscedasticity is needed. In this paper we propose a consistent nonparametric test for heteroscedasticity, based on wavelets. The empirical wavelet coefficients of the conditional variance in a regression model are defined first. Then they are shown to be asymptotically normal, based on which a test statistic for the heteroscedasticity is constructed by using Fan's wavelet thresholding idea. Simulations show that our test is superior to the traditional nonparametric test.
Direction of Effects in Multiple Linear Regression Models.
Wiedermann, Wolfgang; von Eye, Alexander
2015-01-01
Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed. PMID:26609741
Maximum Entropy Discrimination Poisson Regression for Software Reliability Modeling.
Chatzis, Sotirios P; Andreou, Andreas S
2015-11-01
Reliably predicting software defects is one of the most significant tasks in software engineering. Two of the major components of modern software reliability modeling approaches are: 1) extraction of salient features for software system representation, based on appropriately designed software metrics and 2) development of intricate regression models for count data, to allow effective software reliability data modeling and prediction. Surprisingly, research in the latter frontier of count data regression modeling has been rather limited. More specifically, a lack of simple and efficient algorithms for posterior computation has made the Bayesian approaches appear unattractive, and thus underdeveloped in the context of software reliability modeling. In this paper, we try to address these issues by introducing a novel Bayesian regression model for count data, based on the concept of max-margin data modeling, effected in the context of a fully Bayesian model treatment with simple and efficient posterior distribution updates. Our novel approach yields a more discriminative learning technique, making more effective use of our training data during model inference. In addition, it allows of better handling uncertainty in the modeled data, which can be a significant problem when the training data are limited. We derive elegant inference algorithms for our model under the mean-field paradigm and exhibit its effectiveness using the publicly available benchmark data sets.
Modelling multimodal photometric redshift regression with noisy observations
Kügler, S D
2016-01-01
In this work, we are trying to extent the existing photometric redshift regression models from modeling pure photometric data back to the spectra themselves. To that end, we developed a PCA that is capable of describing the input uncertainty (including missing values) in a dimensionality reduction framework. With this "spectrum generator" at hand, we are capable of treating the redshift regression problem in a fully Bayesian framework, returning a posterior distribution over the redshift. This approach allows therefore to approach the multimodal regression problem in an adequate fashion. In addition, input uncertainty on the magnitudes can be included quite naturally and lastly, the proposed algorithm allows in principle to make predictions outside the training values which makes it a fascinating opportunity for the detection of high-redshifted quasars.
The art of regression modeling in road safety
Hauer, Ezra
2015-01-01
This unique book explains how to fashion useful regression models from commonly available data to erect models essential for evidence-based road safety management and research. Composed from techniques and best practices presented over many years of lectures and workshops, The Art of Regression Modeling in Road Safety illustrates that fruitful modeling cannot be done without substantive knowledge about the modeled phenomenon. Class-tested in courses and workshops across North America, the book is ideal for professionals, researchers, university professors, and graduate students with an interest in, or responsibilities related to, road safety. This book also: · Presents for the first time a powerful analytical tool for road safety researchers and practitioners · Includes problems and solutions in each chapter as well as data and spreadsheets for running models and PowerPoint presentation slides · Features pedagogy well-suited for graduate courses and workshops including problems, solutions, and PowerPoint p...
Spatial stochastic regression modelling of urban land use
International Nuclear Information System (INIS)
Urbanization is very closely linked to industrialization, commercialization or overall economic growth and development. This results in innumerable benefits of the quantity and quality of the urban environment and lifestyle but on the other hand contributes to unbounded development, urban sprawl, overcrowding and decreasing standard of living. Regulation and observation of urban development activities is crucial. The understanding of urban systems that promotes urban growth are also essential for the purpose of policy making, formulating development strategies as well as development plan preparation. This study aims to compare two different stochastic regression modeling techniques for spatial structure models of urban growth in the same specific study area. Both techniques will utilize the same datasets and their results will be analyzed. The work starts by producing an urban growth model by using stochastic regression modeling techniques namely the Ordinary Least Square (OLS) and Geographically Weighted Regression (GWR). The two techniques are compared to and it is found that, GWR seems to be a more significant stochastic regression model compared to OLS, it gives a smaller AICc (Akaike's Information Corrected Criterion) value and its output is more spatially explainable
Steganalysis of LSB Image Steganography using Multiple Regression and Auto Regressive (AR Model
Directory of Open Access Journals (Sweden)
Souvik Bhattacharyya
2011-07-01
Full Text Available The staggering growth in communication technologyand usage of public domain channels (i.e. Internet has greatly facilitated transfer of data. However, such open communication channelshave greater vulnerability to security threats causing unauthorizedin- formation access. Traditionally, encryption is used to realizethen communication security. However, important information is notprotected once decoded. Steganography is the art and science of communicating in a way which hides the existence of the communication.Important information is ﬁrstly hidden in a host data, such as digitalimage, text, video or audio, etc, and then transmitted secretly tothe receiver. Steganalysis is another important topic in informationhiding which is the art of detecting the presence of steganography. Inthis paper a novel technique for the steganalysis of Image has beenpresented. The proposed technique uses an auto-regressive model todetect the presence of the hidden messages, as well as to estimatethe relative length of the embedded messages.Various auto regressiveparameters are used to classify cover image as well as stego imagewith the help of a SVM classiﬁer. Multiple Regression analysis ofthe cover carrier along with the stego carrier has been carried outin order to ﬁnd out the existence of the negligible amount of thesecret message. Experimental results demonstrate the effectivenessand accuracy of the proposed technique.
Time series regression model for infectious disease and weather.
Imai, Chisato; Armstrong, Ben; Chalabi, Zaid; Mangtani, Punam; Hashizume, Masahiro
2015-10-01
Time series regression has been developed and long used to evaluate the short-term associations of air pollution and weather with mortality or morbidity of non-infectious diseases. The application of the regression approaches from this tradition to infectious diseases, however, is less well explored and raises some new issues. We discuss and present potential solutions for five issues often arising in such analyses: changes in immune population, strong autocorrelations, a wide range of plausible lag structures and association patterns, seasonality adjustments, and large overdispersion. The potential approaches are illustrated with datasets of cholera cases and rainfall from Bangladesh and influenza and temperature in Tokyo. Though this article focuses on the application of the traditional time series regression to infectious diseases and weather factors, we also briefly introduce alternative approaches, including mathematical modeling, wavelet analysis, and autoregressive integrated moving average (ARIMA) models. Modifications proposed to standard time series regression practice include using sums of past cases as proxies for the immune population, and using the logarithm of lagged disease counts to control autocorrelation due to true contagion, both of which are motivated from "susceptible-infectious-recovered" (SIR) models. The complexity of lag structures and association patterns can often be informed by biological mechanisms and explored by using distributed lag non-linear models. For overdispersed models, alternative distribution models such as quasi-Poisson and negative binomial should be considered. Time series regression can be used to investigate dependence of infectious diseases on weather, but may need modifying to allow for features specific to this context. PMID:26188633
Semiparametric Robust Estimation of Truncated and Censored Regression Models
Cizek, P.
2008-01-01
Many estimation methods of truncated and censored regression models such as the maximum likelihood and symmetrically censored least squares (SCLS) are sensitive to outliers and data contamination as we document. Therefore, we propose a semipara- metric general trimmed estimator (GTE) of truncated an
WAVELET ESTIMATION FOR JUMPS IN A HETEROSCEDASTIC REGRESSION MODEL
Institute of Scientific and Technical Information of China (English)
任浩波; 赵延孟; 李元; 谢衷洁
2002-01-01
Wavelets are applied to detect the jumps in a heteroscedastic regression model.It is shown that the wavelet coefficients of the data have significantly large absolute values across fine scale levels near the jump points. Then a procedure is developed to estimate the jumps and jump heights. All estimators are proved to be consistent.
Change-point estimation for censored regression model
Institute of Scientific and Technical Information of China (English)
2007-01-01
In this paper, we consider the change-point estimation in the censored regression model assuming that there exists one change point. A nonparametric estimate of the change-point is proposed and is shown to be strongly consistent. Furthermore, its convergence rate is also obtained.
PARAMETER ESTIMATION IN LINEAR REGRESSION MODELS FOR LONGITUDINAL CONTAMINATED DATA
Institute of Scientific and Technical Information of China (English)
QianWeimin; LiYumei
2005-01-01
The parameter estimation and the coefficient of contamination for the regression models with repeated measures are studied when its response variables are contaminated by another random variable sequence. Under the suitable conditions it is proved that the estimators which are established in the paper are strongly consistent estimators.
Regression Models and Experimental Designs : A Tutorial for Simulation Analaysts
Kleijnen, J.P.C.
2006-01-01
This tutorial explains the basics of linear regression models. especially low-order polynomials. and the corresponding statistical designs. namely, designs of resolution III, IV, V, and Central Composite Designs (CCDs).This tutorial assumes 'white noise', which means that the residuals of the fitted
Multivariate Student -t Regression Models : Pitfalls and Inference
Fernández, C.; Steel, M.F.J.
1997-01-01
We consider likelihood-based inference from multivariate regression models with independent Student-t errors. Some very intruiging pitfalls of both Bayesian and classical methods on the basis of point observations are uncovered. Bayesian inference may be precluded as a consequence of the coarse natu
Linearity and Misspecification Tests for Vector Smooth Transition Regression Models
DEFF Research Database (Denmark)
Teräsvirta, Timo; Yang, Yukai
The purpose of the paper is to derive Lagrange multiplier and Lagrange multiplier type specification and misspecification tests for vector smooth transition regression models. We report results from simulation studies in which the size and power properties of the proposed asymptotic tests in small...
Institute of Scientific and Technical Information of China (English)
Ge-mai Chen; Jin-hong You
2005-01-01
Consider a repeated measurement partially linear regression model with an unknown vector pasemiparametric generalized least squares estimator (SGLSE) ofβ, we propose an iterative weighted semiparametric least squares estimator (IWSLSE) and show that it improves upon the SGLSE in terms of asymptotic covariance matrix. An adaptive procedure is given to determine the number of iterations. We also show that when the number of replicates is less than or equal to two, the IWSLSE can not improve upon the SGLSE.These results are generalizations of those in [2] to the case of semiparametric regressions.
A regression model to estimate regional ground water recharge.
Lorenz, David L; Delin, Geoffrey N
2007-01-01
A regional regression model was developed to estimate the spatial distribution of ground water recharge in subhumid regions. The regional regression recharge (RRR) model was based on a regression of basin-wide estimates of recharge from surface water drainage basins, precipitation, growing degree days (GDD), and average basin specific yield (SY). Decadal average recharge, precipitation, and GDD were used in the RRR model. The RRR estimates were derived from analysis of stream base flow using a computer program that was based on the Rorabaugh method. As expected, there was a strong correlation between recharge and precipitation. The model was applied to statewide data in Minnesota. Where precipitation was least in the western and northwestern parts of the state (50 to 65 cm/year), recharge computed by the RRR model also was lowest (0 to 5 cm/year). A strong correlation also exists between recharge and SY. SY was least in areas where glacial lake clay occurs, primarily in the northwest part of the state; recharge estimates in these areas were in the 0- to 5-cm/year range. In sand-plain areas where SY is greatest, recharge estimates were in the 15- to 29-cm/year range on the basis of the RRR model. Recharge estimates that were based on the RRR model compared favorably with estimates made on the basis of other methods. The RRR model can be applied in other subhumid regions where region wide data sets of precipitation, streamflow, GDD, and soils data are available.
Regression model for Quality of Web Services dataset with WEKA
Directory of Open Access Journals (Sweden)
Shalini Gambhir
2013-06-01
Full Text Available The Waikato Environment for Knowledge Analysis (WEKA came about through the perceived need for a uniﬁed workbench that would allow researchers easy access to state-of the-art techniques in machine learning algorithms for data mining tasks. It provides a general-purpose environment for automatic classiﬁcation, regression, clustering, and feature selection etc. in various research areas. This paper provides an introduction to the WEKA workbench and briefly discusses regression model for some of the quality of web service parameters.
On concurvity in nonlinear and nonparametric regression models
Directory of Open Access Journals (Sweden)
Sonia Amodio
2014-12-01
Full Text Available When data are affected by multicollinearity in the linear regression framework, then concurvity will be present in fitting a generalized additive model (GAM. The term concurvity describes nonlinear dependencies among the predictor variables. As collinearity results in inflated variance of the estimated regression coefficients in the linear regression model, the result of the presence of concurvity leads to instability of the estimated coefficients in GAMs. Even if the backfitting algorithm will always converge to a solution, in case of concurvity the final solution of the backfitting procedure in fitting a GAM is influenced by the starting functions. While exact concurvity is highly unlikely, approximate concurvity, the analogue of multicollinearity, is of practical concern as it can lead to upwardly biased estimates of the parameters and to underestimation of their standard errors, increasing the risk of committing type I error. We compare the existing approaches to detect concurvity, pointing out their advantages and drawbacks, using simulated and real data sets. As a result, this paper will provide a general criterion to detect concurvity in nonlinear and non parametric regression models.
Using regression models to determine the poroelastic properties of cartilage.
Chung, Chen-Yuan; Mansour, Joseph M
2013-07-26
The feasibility of determining biphasic material properties using regression models was investigated. A transversely isotropic poroelastic finite element model of stress relaxation was developed and validated against known results. This model was then used to simulate load intensity for a wide range of material properties. Linear regression equations for load intensity as a function of the five independent material properties were then developed for nine time points (131, 205, 304, 390, 500, 619, 700, 800, and 1000s) during relaxation. These equations illustrate the effect of individual material property on the stress in the time history. The equations at the first four time points, as well as one at a later time (five equations) could be solved for the five unknown material properties given computed values of the load intensity. Results showed that four of the five material properties could be estimated from the regression equations to within 9% of the values used in simulation if time points up to 1000s are included in the set of equations. However, reasonable estimates of the out of plane Poisson's ratio could not be found. Although all regression equations depended on permeability, suggesting that true equilibrium was not realized at 1000s of simulation, it was possible to estimate material properties to within 10% of the expected values using equations that included data up to 800s. This suggests that credible estimates of most material properties can be obtained from tests that are not run to equilibrium, which is typically several thousand seconds. PMID:23796400
Using regression models to determine the poroelastic properties of cartilage.
Chung, Chen-Yuan; Mansour, Joseph M
2013-07-26
The feasibility of determining biphasic material properties using regression models was investigated. A transversely isotropic poroelastic finite element model of stress relaxation was developed and validated against known results. This model was then used to simulate load intensity for a wide range of material properties. Linear regression equations for load intensity as a function of the five independent material properties were then developed for nine time points (131, 205, 304, 390, 500, 619, 700, 800, and 1000s) during relaxation. These equations illustrate the effect of individual material property on the stress in the time history. The equations at the first four time points, as well as one at a later time (five equations) could be solved for the five unknown material properties given computed values of the load intensity. Results showed that four of the five material properties could be estimated from the regression equations to within 9% of the values used in simulation if time points up to 1000s are included in the set of equations. However, reasonable estimates of the out of plane Poisson's ratio could not be found. Although all regression equations depended on permeability, suggesting that true equilibrium was not realized at 1000s of simulation, it was possible to estimate material properties to within 10% of the expected values using equations that included data up to 800s. This suggests that credible estimates of most material properties can be obtained from tests that are not run to equilibrium, which is typically several thousand seconds.
Emamgolizadeh, S.; Bateni, S. M.; Shahsavani, D.; Ashrafi, T.; Ghorbani, H.
2015-10-01
The soil cation exchange capacity (CEC) is one of the main soil chemical properties, which is required in various fields such as environmental and agricultural engineering as well as soil science. In situ measurement of CEC is time consuming and costly. Hence, numerous studies have used traditional regression-based techniques to estimate CEC from more easily measurable soil parameters (e.g., soil texture, organic matter (OM), and pH). However, these models may not be able to adequately capture the complex and highly nonlinear relationship between CEC and its influential soil variables. In this study, Genetic Expression Programming (GEP) and Multivariate Adaptive Regression Splines (MARS) were employed to estimate CEC from more readily measurable soil physical and chemical variables (e.g., OM, clay, and pH) by developing functional relations. The GEP- and MARS-based functional relations were tested at two field sites in Iran. Results showed that GEP and MARS can provide reliable estimates of CEC. Also, it was found that the MARS model (with root-mean-square-error (RMSE) of 0.318 Cmol+ kg-1 and correlation coefficient (R2) of 0.864) generated slightly better results than the GEP model (with RMSE of 0.270 Cmol+ kg-1 and R2 of 0.807). The performance of GEP and MARS models was compared with two existing approaches, namely artificial neural network (ANN) and multiple linear regression (MLR). The comparison indicated that MARS and GEP outperformed the MLP model, but they did not perform as good as ANN. Finally, a sensitivity analysis was conducted to determine the most and the least influential variables affecting CEC. It was found that OM and pH have the most and least significant effect on CEC, respectively.
Support vector regression-based internal model control
Institute of Scientific and Technical Information of China (English)
HUANG Yan-wei; PENG Tie-gen
2007-01-01
This paper proposes a design of internal model control systems for process with delay by using support vector regression (SVR). The proposed system fully uses the excellent nonlinear estimation performance of SVR with the structural risk minimization principle. Closed-system stability and steady error are analyzed for the existence of modeling errors. The simulations show that the proposed control systems have the better control performance than that by neural networks in the cases of the training samples with small size and noises.
CICAAR - Convolutive ICA with an Auto-Regressive Inverse Model
DEFF Research Database (Denmark)
Dyrholm, Mads; Hansen, Lars Kai
2004-01-01
We invoke an auto-regressive IIR inverse model for convolutive ICA and derive expressions for the likelihood and its gradient. We argue that optimization will give a stable inverse. When there are more sensors than sources the mixing model parameters are estimated in a second step by least square...... estimation. We demonstrate the method on synthetic data and finally separate speech and music in a real room recording....
CONSERVATIVE ESTIMATING FUNCTIONIN THE NONLINEAR REGRESSION MODEL WITHAGGREGATED DATA
Institute of Scientific and Technical Information of China (English)
无
2000-01-01
The purpose of this paper is to study the theory of conservative estimating functions in nonlinear regression model with aggregated data. In this model, a quasi-score function with aggregated data is defined. When this function happens to be conservative, it is projection of the true score function onto a class of estimation functions. By constructing, the potential function for the projected score with aggregated data is obtained, which have some properties of log-likelihood function.
CICAAR - Convolutive ICA with an Auto-Regressive Inverse Model
Dyrholm, Mads; Hansen, Lars Kai
2004-01-01
We invoke an auto-regressive IIR inverse model for convolutive ICA and derive expressions for the likelihood and its gradient. We argue that optimization will give a stable inverse. When there are more sensors than sources the mixing model parameters are estimated in a second step by least squares estimation. We demonstrate the method on synthetic data and finally separate speech and music in a real room recording.
A Regression Analysis Model Based on Wavelet Networks
Institute of Scientific and Technical Information of China (English)
XIONG Zheng-feng
2002-01-01
In this paper, an approach is proposed to combine wavelet networks and techniques of regression analysis. The resulting wavelet regression estimator is well suited for regression estimation of moderately large dimension, in particular for regressions with localized irregularities.
REGRESSION ANALYSIS OF PRODUCTIVITY USING MIXED EFFECT MODEL
Directory of Open Access Journals (Sweden)
Siana Halim
2007-01-01
Full Text Available Production plants of a company are located in several areas that spread across Middle and East Java. As the production process employs mostly manpower, we suspected that each location has different characteristics affecting the productivity. Thus, the production data may have a spatial and hierarchical structure. For fitting a linear regression using the ordinary techniques, we are required to make some assumptions about the nature of the residuals i.e. independent, identically and normally distributed. However, these assumptions were rarely fulfilled especially for data that have a spatial and hierarchical structure. We worked out the problem using mixed effect model. This paper discusses the model construction of productivity and several characteristics in the production line by taking location as a random effect. The simple model with high utility that satisfies the necessary regression assumptions was built using a free statistic software R version 2.6.1.
Harrell , Jr , Frank E
2015-01-01
This highly anticipated second edition features new chapters and sections, 225 new references, and comprehensive R software. In keeping with the previous edition, this book is about the art and science of data analysis and predictive modeling, which entails choosing and using multiple tools. Instead of presenting isolated techniques, this text emphasizes problem solving strategies that address the many issues arising when developing multivariable models using real data and not standard textbook examples. It includes imputation methods for dealing with missing data effectively, methods for fitting nonlinear relationships and for making the estimation of transformations a formal part of the modeling process, methods for dealing with "too many variables to analyze and not enough observations," and powerful model validation techniques based on the bootstrap. The reader will gain a keen understanding of predictive accuracy, and the harm of categorizing continuous predictors or outcomes. This text realistically...
Hierarchical Neural Regression Models for Customer Churn Prediction
Directory of Open Access Journals (Sweden)
Golshan Mohammadi
2013-01-01
Full Text Available As customers are the main assets of each industry, customer churn prediction is becoming a major task for companies to remain in competition with competitors. In the literature, the better applicability and efficiency of hierarchical data mining techniques has been reported. This paper considers three hierarchical models by combining four different data mining techniques for churn prediction, which are backpropagation artificial neural networks (ANN, self-organizing maps (SOM, alpha-cut fuzzy c-means (α-FCM, and Cox proportional hazards regression model. The hierarchical models are ANN + ANN + Cox, SOM + ANN + Cox, and α-FCM + ANN + Cox. In particular, the first component of the models aims to cluster data in two churner and nonchurner groups and also filter out unrepresentative data or outliers. Then, the clustered data as the outputs are used to assign customers to churner and nonchurner groups by the second technique. Finally, the correctly classified data are used to create Cox proportional hazards model. To evaluate the performance of the hierarchical models, an Iranian mobile dataset is considered. The experimental results show that the hierarchical models outperform the single Cox regression baseline model in terms of prediction accuracy, Types I and II errors, RMSE, and MAD metrics. In addition, the α-FCM + ANN + Cox model significantly performs better than the two other hierarchical models.
Institute of Scientific and Technical Information of China (English)
Zhao Haijun; Ma Yan; Huang Xiaohong; Su Yujie
2008-01-01
Predicting heartbeat message arrival time is crucial for the quality of failure detection service over internet. However, internet dynamic characteristics make it very difficult to understand message behavior and accurately predict heartbeat arrival time. To solve this problem, a novel black-box model is proposed to predict the next heartbeat arrival time. Heartbeat arrival time is modeled as auto-regressive process, heartbeat sending time is modeled as exogenous variable, the model's coefficients are estimated based on the sliding window of observations and this result is used to predict the next heartbeat arrival time. Simulation shows that this adaptive auto-regressive exogenous (ARX) model can accurately capture heartbeat arrival dynamics and minimize prediction error in different network environments.
Regression Model to Predict Global Solar Irradiance in Malaysia
Directory of Open Access Journals (Sweden)
Hairuniza Ahmed Kutty
2015-01-01
Full Text Available A novel regression model is developed to estimate the monthly global solar irradiance in Malaysia. The model is developed based on different available meteorological parameters, including temperature, cloud cover, rain precipitate, relative humidity, wind speed, pressure, and gust speed, by implementing regression analysis. This paper reports on the details of the analysis of the effect of each prediction parameter to identify the parameters that are relevant to estimating global solar irradiance. In addition, the proposed model is compared in terms of the root mean square error (RMSE, mean bias error (MBE, and the coefficient of determination (R2 with other models available from literature studies. Seven models based on single parameters (PM1 to PM7 and five multiple-parameter models (PM7 to PM12 are proposed. The new models perform well, with RMSE ranging from 0.429% to 1.774%, R2 ranging from 0.942 to 0.992, and MBE ranging from −0.1571% to 0.6025%. In general, cloud cover significantly affects the estimation of global solar irradiance. However, cloud cover in Malaysia lacks sufficient influence when included into multiple-parameter models although it performs fairly well in single-parameter prediction models.
Phone Duration Modeling of Affective Speech Using Support Vector Regression
Directory of Open Access Journals (Sweden)
Alexandros Lazaridis
2012-07-01
Full Text Available In speech synthesis accurate modeling of prosody is important for producing high quality synthetic speech. One of the main aspects of prosody is phone duration. Robust phone duration modeling is a prerequisite for synthesizing emotional speech with natural sounding. In this work ten phone duration models are evaluated. These models belong to well known and widely used categories of algorithms, such as the decision trees, linear regression, lazy-learning algorithms and meta-learning algorithms. Furthermore, we investigate the effectiveness of Support Vector Regression (SVR in phone duration modeling in the context of emotional speech. The evaluation of the eleven models is performed on a Modern Greek emotional speech database which consists of four categories of emotional speech (anger, fear, joy, sadness plus neutral speech. The experimental results demonstrated that the SVR-based modeling outperforms the other ten models across all the four emotion categories. Specifically, the SVR model achieved an average relative reduction of 8% in terms of root mean square error (RMSE throughout all emotional categories.
Predicting and Modelling of Survival Data when Cox's Regression Model does not hold
DEFF Research Database (Denmark)
Scheike, Thomas H.; Zhang, Mei-Jie
2002-01-01
Aalen model; additive risk model; counting processes; competing risk; Cox regression; flexible modeling; goodness of fit; prediction of survival; survival analysis; time-varying effects......Aalen model; additive risk model; counting processes; competing risk; Cox regression; flexible modeling; goodness of fit; prediction of survival; survival analysis; time-varying effects...
Fuzzy and Regression Modelling of Hard Milling Process
Directory of Open Access Journals (Sweden)
A. Tamilarasan
2014-04-01
Full Text Available The present study highlights the application of box-behnken design coupled with fuzzy and regression modeling approach for making expert system in hard milling process to improve the process performance with systematic reduction of production cost. The important input fields of work piece hardness, nose radius, feed per tooth, radial depth of cut and axial depth cut were considered. The cutting forces, work surface temperature and sound pressure level were identified as key index of machining outputs. The results indicate that the fuzzy logic and regression modeling technique can be effectively used for the prediction of desired responses with less average error variation. Predicted results were verified by experiments and shown the good potential characteristics of the developed system for automated machining environment.
Central limit theorem of linear regression model under right censorship
Institute of Scientific and Technical Information of China (English)
何书元; 黄香
2003-01-01
In this paper, the estimation of joint distribution F(y,z) of (Y, Z) and the estimation in thelinear regression model Y = b′Z + ε for complete data are extended to that of the right censored data. Theregression parameter estimates of b and the variance of ε are weighted least square estimates with randomweights. The central limit theorems of the estimators are obtained under very weak conditions and the derivedasymptotic variance has a very simple form.
Bootstrapping heteroskedastic regression models: wild bootstrap vs. pairs bootstrap
Flachaire, Emmanuel
2005-01-01
International audience In regression models, appropriate bootstrap methods for inference robust to heteroskedasticity of unknown form are the wild bootstrap and the pairs bootstrap. The finite sample performance of a heteroskedastic-robust test is investigated with Monte Carlo experiments. The simulation results suggest that one specific version of the wild bootstrap outperforms the other versions of the wild bootstrap and of the pairs bootstrap. It is the only one for which the bootstrap ...
Extending Regression Discontinuity Models Beyond the Jump Point
Ai, C.; Norton, E; Yang, Z.
2011-01-01
This paper proposes a new estimation method for regression discontinuity models, allowing for estimation of a treatment effect beyond the jump point (with additional assumptions). The proposed procedure consistently estimates the treatment effect function, as well as the average outcome in the absence of treatment. The treatment effect estimator is root-N consistent. We apply the method to an important question in health economicsâ€“what is the effect of having Medicare insurance on admission...
ASYMPTOTIC NORMALITY OF WAVELET ESTIMATOR IN HETEROSCEDASTIC REGRESSION MODEL
Institute of Scientific and Technical Information of China (English)
无
2007-01-01
The following heteroscedastic regression model Yi = g(xi) + σiei (1 ≤ i ≤ n) is considered, where it is assumed that σ2i = f(ui), the design points (xi, ui) are known and nonrandom, g and f are unknown functions. Under the unobservable disturbance ei form martingale differences, the asymptotic normality of wavelet estimators of g with f being known or unknown function is studied.
Directory of Open Access Journals (Sweden)
Zhang Xiaohua
2003-11-01
Full Text Available Abstract In the search for genetic determinants of complex disease, two approaches to association analysis are most often employed, testing single loci or testing a small group of loci jointly via haplotypes for their relationship to disease status. It is still debatable which of these approaches is more favourable, and under what conditions. The former has the advantage of simplicity but suffers severely when alleles at the tested loci are not in linkage disequilibrium (LD with liability alleles; the latter should capture more of the signal encoded in LD, but is far from simple. The complexity of haplotype analysis could be especially troublesome for association scans over large genomic regions, which, in fact, is becoming the standard design. For these reasons, the authors have been evaluating statistical methods that bridge the gap between single-locus and haplotype-based tests. In this article, they present one such method, which uses non-parametric regression techniques embodied by Bayesian adaptive regression splines (BARS. For a set of markers falling within a common genomic region and a corresponding set of single-locus association statistics, the BARS procedure integrates these results into a single test by examining the class of smooth curves consistent with the data. The non-parametric BARS procedure generally finds no signal when no liability allele exists in the tested region (ie it achieves the specified size of the test and it is sensitive enough to pick up signals when a liability allele is present. The BARS procedure provides a robust and potentially powerful alternative to classical tests of association, diminishes the multiple testing problem inherent in those tests and can be applied to a wide range of data types, including genotype frequencies estimated from pooled samples.
Adaptive visual attention model
Hügli, Heinz; Bur, Alexandre
2009-01-01
Visual attention, defined as the ability of a biological or artificial vision system to rapidly detect potentially relevant parts of a visual scene, provides a general purpose solution for low level feature detection in a vision architecture. Well considered for its universal detection behaviour, the general model of visual attention is suited for any environment but inferior to dedicated feature detectors in more specific environments. The goal of the development presented in this paper is t...
K factor estimation in distribution transformers using linear regression models
Directory of Open Access Journals (Sweden)
Juan Miguel Astorga Gómez
2016-06-01
Full Text Available Background: Due to massive incorporation of electronic equipment to distribution systems, distribution transformers are subject to operation conditions other than the design ones, because of the circulation of harmonic currents. It is necessary to quantify the effect produced by these harmonic currents to determine the capacity of the transformer to withstand these new operating conditions. The K-factor is an indicator that estimates the ability of a transformer to withstand the thermal effects caused by harmonic currents. This article presents a linear regression model to estimate the value of the K-factor, from total current harmonic content obtained with low-cost equipment.Method: Two distribution transformers that feed different loads are studied variables, current total harmonic distortion factor K are recorded, and the regression model that best fits the data field is determined. To select the regression model the coefficient of determination R2 and the Akaike Information Criterion (AIC are used. With the selected model, the K-factor is estimated to actual operating conditions.Results: Once determined the model it was found that for both agricultural cargo and industrial mining, present harmonic content (THDi exceeds the values that these transformers can drive (average of 12.54% and minimum 8,90% in the case of agriculture and average value of 18.53% and a minimum of 6.80%, for industrial mining case.Conclusions: When estimating the K factor using polynomial models it was determined that studied transformers can not withstand the current total harmonic distortion of their current loads. The appropriate K factor for studied transformer should be 4; this allows transformers support the current total harmonic distortion of their respective loads.
Reconstruction of missing daily streamflow data using dynamic regression models
Tencaliec, Patricia; Favre, Anne-Catherine; Prieur, Clémentine; Mathevet, Thibault
2015-12-01
River discharge is one of the most important quantities in hydrology. It provides fundamental records for water resources management and climate change monitoring. Even very short data-gaps in this information can cause extremely different analysis outputs. Therefore, reconstructing missing data of incomplete data sets is an important step regarding the performance of the environmental models, engineering, and research applications, thus it presents a great challenge. The objective of this paper is to introduce an effective technique for reconstructing missing daily discharge data when one has access to only daily streamflow data. The proposed procedure uses a combination of regression and autoregressive integrated moving average models (ARIMA) called dynamic regression model. This model uses the linear relationship between neighbor and correlated stations and then adjusts the residual term by fitting an ARIMA structure. Application of the model to eight daily streamflow data for the Durance river watershed showed that the model yields reliable estimates for the missing data in the time series. Simulation studies were also conducted to evaluate the performance of the procedure.
Regularized multivariate regression models with skew-t error distributions
Chen, Lianfu
2014-06-01
We consider regularization of the parameters in multivariate linear regression models with the errors having a multivariate skew-t distribution. An iterative penalized likelihood procedure is proposed for constructing sparse estimators of both the regression coefficient and inverse scale matrices simultaneously. The sparsity is introduced through penalizing the negative log-likelihood by adding L1-penalties on the entries of the two matrices. Taking advantage of the hierarchical representation of skew-t distributions, and using the expectation conditional maximization (ECM) algorithm, we reduce the problem to penalized normal likelihood and develop a procedure to minimize the ensuing objective function. Using a simulation study the performance of the method is assessed, and the methodology is illustrated using a real data set with a 24-dimensional response vector. © 2014 Elsevier B.V.
Utilization of geographically weighted regression (GWR) in forestry modeling
Quirós-Segovia, María
2015-01-01
The diploma thesis is focused on the application of the Geographically Weighted Regression (GWR) in forestry models. This is a prospective method for coping with spatially heterogeneous data. In forestry, this method has been used previously in small areas with good results, but in this diploma thesis it is applied to a bigger area in the Region of Murcia, Spain. Main goal of the thesis is to evaluate GWR for developing of large scale height-diameter model based on data of National Forest Inv...
Interpreting parameters in the logistic regression model with random effects
DEFF Research Database (Denmark)
Larsen, Klaus; Petersen, Jørgen Holm; Budtz-Jørgensen, Esben;
2000-01-01
interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects......interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects...
Genetic evaluation of European quails by random regression models
Directory of Open Access Journals (Sweden)
Flaviana Miranda Gonçalves
2012-09-01
Full Text Available The objective of this study was to compare different random regression models, defined from different classes of heterogeneity of variance combined with different Legendre polynomial orders for the estimate of (covariance of quails. The data came from 28,076 observations of 4,507 female meat quails of the LF1 lineage. Quail body weights were determined at birth and 1, 14, 21, 28, 35 and 42 days of age. Six different classes of residual variance were fitted to Legendre polynomial functions (orders ranging from 2 to 6 to determine which model had the best fit to describe the (covariance structures as a function of time. According to the evaluated criteria (AIC, BIC and LRT, the model with six classes of residual variances and of sixth-order Legendre polynomial was the best fit. The estimated additive genetic variance increased from birth to 28 days of age, and dropped slightly from 35 to 42 days. The heritability estimates decreased along the growth curve and changed from 0.51 (1 day to 0.16 (42 days. Animal genetic and permanent environmental correlation estimates between weights and age classes were always high and positive, except for birth weight. The sixth order Legendre polynomial, along with the residual variance divided into six classes was the best fit for the growth rate curve of meat quails; therefore, they should be considered for breeding evaluation processes by random regression models.
Dynamic Regression Intervention Modeling for the Malaysian Daily Load
Directory of Open Access Journals (Sweden)
Fadhilah Abdrazak
2014-05-01
Full Text Available Malaysia is a unique country due to having both fixed and moving holidays. These moving holidays may overlap with other fixed holidays and therefore, increase the complexity of the load forecasting activities. The errors due to holidays’ effects in the load forecasting are known to be higher than other factors. If these effects can be estimated and removed, the behavior of the series could be better viewed. Thus, the aim of this paper is to improve the forecasting errors by using a dynamic regression model with intervention analysis. Based on the linear transfer function method, a daily load model consists of either peak or average is developed. The developed model outperformed the seasonal ARIMA model in estimating the fixed and moving holidays’ effects and achieved a smaller Mean Absolute Percentage Error (MAPE in load forecast.
Ciupak, Maurycy; Ozga-Zielinski, Bogdan; Adamowski, Jan; Quilty, John; Khalil, Bahaa
2015-11-01
A novel implementation of Dynamic Linear Bayesian Models (DLBM), using either a Varying Coefficient Regression (VCR) or a Discount Weighted Regression (DWR) algorithm was used in the hydrological modeling of annual hydrographs as well as 1-, 2-, and 3-day lead time stream flow forecasting. Using hydrological data (daily discharge, rainfall, and mean, maximum and minimum air temperatures) from the Upper Narew River watershed in Poland, the forecasting performance of DLBM was compared to that of traditional multiple linear regression (MLR) and more recent artificial neural network (ANN) based models. Model performance was ranked DLBM-DWR > DLBM-VCR > MLR > ANN for both annual hydrograph modeling and 1-, 2-, and 3-day lead forecasting, indicating that the DWR and VCR algorithms, operating in a DLBM framework, represent promising new methods for both annual hydrograph modeling and short-term stream flow forecasting.
Khoshravesh, Mojtaba; Sefidkouhi, Mohammad Ali Gholami; Valipour, Mohammad
2015-12-01
The proper evaluation of evapotranspiration is essential in food security investigation, farm management, pollution detection, irrigation scheduling, nutrient flows, carbon balance as well as hydrologic modeling, especially in arid environments. To achieve sustainable development and to ensure water supply, especially in arid environments, irrigation experts need tools to estimate reference evapotranspiration on a large scale. In this study, the monthly reference evapotranspiration was estimated by three different regression models including the multivariate fractional polynomial (MFP), robust regression, and Bayesian regression in Ardestan, Esfahan, and Kashan. The results were compared with Food and Agriculture Organization (FAO)-Penman-Monteith (FAO-PM) to select the best model. The results show that at a monthly scale, all models provided a closer agreement with the calculated values for FAO-PM (R 2 > 0.95 and RMSE < 12.07 mm month-1). However, the MFP model gives better estimates than the other two models for estimating reference evapotranspiration at all stations.
Fuzzy regression modeling for tool performance prediction and degradation detection.
Li, X; Er, M J; Lim, B S; Zhou, J H; Gan, O P; Rutkowski, L
2010-10-01
In this paper, the viability of using Fuzzy-Rule-Based Regression Modeling (FRM) algorithm for tool performance and degradation detection is investigated. The FRM is developed based on a multi-layered fuzzy-rule-based hybrid system with Multiple Regression Models (MRM) embedded into a fuzzy logic inference engine that employs Self Organizing Maps (SOM) for clustering. The FRM converts a complex nonlinear problem to a simplified linear format in order to further increase the accuracy in prediction and rate of convergence. The efficacy of the proposed FRM is tested through a case study - namely to predict the remaining useful life of a ball nose milling cutter during a dry machining process of hardened tool steel with a hardness of 52-54 HRc. A comparative study is further made between four predictive models using the same set of experimental data. It is shown that the FRM is superior as compared with conventional MRM, Back Propagation Neural Networks (BPNN) and Radial Basis Function Networks (RBFN) in terms of prediction accuracy and learning speed.
Regression Models for Predicting Force Coefficients of Aerofoils
Directory of Open Access Journals (Sweden)
Mohammed ABDUL AKBAR
2015-09-01
Full Text Available Renewable sources of energy are attractive and advantageous in a lot of different ways. Among the renewable energy sources, wind energy is the fastest growing type. Among wind energy converters, Vertical axis wind turbines (VAWTs have received renewed interest in the past decade due to some of the advantages they possess over their horizontal axis counterparts. VAWTs have evolved into complex 3-D shapes. A key component in predicting the output of VAWTs through analytical studies is obtaining the values of lift and drag coefficients which is a function of shape of the aerofoil, ‘angle of attack’ of wind and Reynolds’s number of flow. Sandia National Laboratories have carried out extensive experiments on aerofoils for the Reynolds number in the range of those experienced by VAWTs. The volume of experimental data thus obtained is huge. The current paper discusses three Regression analysis models developed wherein lift and drag coefficients can be found out using simple formula without having to deal with the bulk of the data. Drag coefficients and Lift coefficients were being successfully estimated by regression models with R2 values as high as 0.98.
Empirical likelihood ratio tests for multivariate regression models
Institute of Scientific and Technical Information of China (English)
WU Jianhong; ZHU Lixing
2007-01-01
This paper proposes some diagnostic tools for checking the adequacy of multivariate regression models including classical regression and time series autoregression. In statistical inference, the empirical likelihood ratio method has been well known to be a powerful tool for constructing test and confidence region. For model checking, however, the naive empirical likelihood (EL) based tests are not of Wilks' phenomenon. Hence, we make use of bias correction to construct the EL-based score tests and derive a nonparametric version of Wilks' theorem. Moreover, by the advantages of both the EL and score test method, the EL-based score tests share many desirable features as follows: They are self-scale invariant and can detect the alternatives that converge to the null at rate n-1/2, the possibly fastest rate for lack-of-fit testing; they involve weight functions, which provides us with the flexibility to choose scores for improving power performance, especially under directional alternatives. Furthermore, when the alternatives are not directional, we construct asymptotically distribution-free maximin tests for a large class of possible alternatives. A simulation study is carried out and an application for a real dataset is analyzed.
Approximation by randomly weighting method in censored regression model
Institute of Scientific and Technical Information of China (English)
无
2009-01-01
Censored regression ("Tobit") models have been in common use, and their linear hypothesis testings have been widely studied. However, the critical values of these tests are usually related to quantities of an unknown error distribution and estimators of nuisance parameters. In this paper, we propose a randomly weighting test statistic and take its conditional distribution as an approximation to null distribution of the test statistic. It is shown that, under both the null and local alternative hypotheses, conditionally asymptotic distribution of the randomly weighting test statistic is the same as the null distribution of the test statistic. Therefore, the critical values of the test statistic can be obtained by randomly weighting method without estimating the nuisance parameters. At the same time, we also achieve the weak consistency and asymptotic normality of the randomly weighting least absolute deviation estimate in censored regression model. Simulation studies illustrate that the per-formance of our proposed resampling test method is better than that of central chi-square distribution under the null hypothesis.
Approximation by randomly weighting method in censored regression model
Institute of Scientific and Technical Information of China (English)
WANG ZhanFeng; WU YaoHua; ZHAO LinCheng
2009-01-01
Censored regression ("Tobit") models have been in common use,and their linear hypothesis testings have been widely studied.However,the critical values of these tests are usually related to quantities of an unknown error distribution and estimators of nuisance parameters.In this paper,we propose a randomly weighting test statistic and take its conditional distribution as an approximation to null distribution of the test statistic.It is shown that,under both the null and local alternative hypotheses,conditionally asymptotic distribution of the randomly weighting test statistic is the same as the null distribution of the test statistic.Therefore,the critical values of the test statistic can be obtained by randomly weighting method without estimating the nuisance parameters.At the same time,we also achieve the weak consistency and asymptotic normality of the randomly weighting least absolute deviation estimate in censored regression model.Simulation studies illustrate that the performance of our proposed resampling test method is better than that of central chi-square distribution under the null hypothesis.
Meta-Modeling by Symbolic Regression and Pareto Simulated Annealing
Stinstra, E.; Rennen, G.; Teeuwen, G.J.A.
2006-01-01
The subject of this paper is a new approach to Symbolic Regression.Other publications on Symbolic Regression use Genetic Programming.This paper describes an alternative method based on Pareto Simulated Annealing.Our method is based on linear regression for the estimation of constants.Interval arithm
Information Criteria for Deciding between Normal Regression Models
Maier, Robert S
2013-01-01
Regression models fitted to data can be assessed on their goodness of fit, though models with many parameters should be disfavored to prevent over-fitting. Statisticians' tools for this are little known to physical scientists. These include the Akaike Information Criterion (AIC), a penalized goodness-of-fit statistic, and the AICc, a variant including a small-sample correction. They entered the physical sciences through being used by astrophysicists to compare cosmological models; e.g., predictions of the distance-redshift relation. The AICc is shown to have been misapplied, being applicable only if error variances are unknown. If error bars accompany the data, the AIC should be used instead. Erroneous applications of the AICc are listed in an appendix. It is also shown how the variability of the AIC difference between models with a known error variance can be estimated. This yields a significance test that can potentially replace the use of `Akaike weights' for deciding between such models. Additionally, the...
Genomic breeding value estimation using nonparametric additive regression models
Directory of Open Access Journals (Sweden)
Solberg Trygve
2009-01-01
Full Text Available Abstract Genomic selection refers to the use of genomewide dense markers for breeding value estimation and subsequently for selection. The main challenge of genomic breeding value estimation is the estimation of many effects from a limited number of observations. Bayesian methods have been proposed to successfully cope with these challenges. As an alternative class of models, non- and semiparametric models were recently introduced. The present study investigated the ability of nonparametric additive regression models to predict genomic breeding values. The genotypes were modelled for each marker or pair of flanking markers (i.e. the predictors separately. The nonparametric functions for the predictors were estimated simultaneously using additive model theory, applying a binomial kernel. The optimal degree of smoothing was determined by bootstrapping. A mutation-drift-balance simulation was carried out. The breeding values of the last generation (genotyped was predicted using data from the next last generation (genotyped and phenotyped. The results show moderate to high accuracies of the predicted breeding values. A determination of predictor specific degree of smoothing increased the accuracy.
A Gompertz regression model for fern spores germination
Directory of Open Access Journals (Sweden)
Gabriel y Galán, Jose María
2015-06-01
Full Text Available Germination is one of the most important biological processes for both seed and spore plants, also for fungi. At present, mathematical models of germination have been developed in fungi, bryophytes and several plant species. However, ferns are the only group whose germination has never been modelled. In this work we develop a regression model of the germination of fern spores. We have found that for Blechnum serrulatum, Blechnum yungense, Cheilanthes pilosa, Niphidium macbridei and Polypodium feuillei species the Gompertz growth model describe satisfactorily cumulative germination. An important result is that regression parameters are independent of fern species and the model is not affected by intraspecific variation. Our results show that the Gompertz curve represents a general germination model for all the non-green spore leptosporangiate ferns, including in the paper a discussion about the physiological and ecological meaning of the model.La germinación es uno de los procesos biológicos más relevantes tanto para las plantas con esporas, como para las plantas con semillas y los hongos. Hasta el momento, se han desarrollado modelos de germinación para hongos, briofitos y diversas especies de espermatófitos. Los helechos son el único grupo de plantas cuya germinación nunca ha sido modelizada. En este trabajo se desarrolla un modelo de regresión para explicar la germinación de las esporas de helechos. Observamos que para las especies Blechnum serrulatum, Blechnum yungense, Cheilanthes pilosa, Niphidium macbridei y Polypodium feuillei el modelo de crecimiento de Gompertz describe satisfactoriamente la germinación acumulativa. Un importante resultado es que los parámetros de la regresión son independientes de la especie y que el modelo no está afectado por variación intraespecífica. Por lo tanto, los resultados del trabajo muestran que la curva de Gompertz puede representar un modelo general para todos los helechos leptosporangiados
Statistical Inference for Partially Linear Regression Models with Measurement Errors
Institute of Scientific and Technical Information of China (English)
Jinhong YOU; Qinfeng XU; Bin ZHOU
2008-01-01
In this paper, the authors investigate three aspects of statistical inference for the partially linear regression models where some covariates are measured with errors. Firstly,a bandwidth selection procedure is proposed, which is a combination of the difference-based technique and GCV method. Secondly, a goodness-of-fit test procedure is proposed,which is an extension of the generalized likelihood technique. Thirdly, a variable selection procedure for the parametric part is provided based on the nonconcave penalization and corrected profile least squares. Same as "Variable selection via nonconcave penalized like-lihood and its oracle properties" (J. Amer. Statist. Assoc., 96, 2001, 1348-1360), it is shown that the resulting estimator has an oracle property with a proper choice of regu-larization parameters and penalty function. Simulation studies are conducted to illustrate the finite sample performances of the proposed procedures.
The R Package threg to Implement Threshold Regression Models
Directory of Open Access Journals (Sweden)
Tao Xiao
2015-08-01
This new package includes four functions: threg, and the methods hr, predict and plot for threg objects returned by threg. The threg function is the model-fitting function which is used to calculate regression coefficient estimates, asymptotic standard errors and p values. The hr method for threg objects is the hazard-ratio calculation function which provides the estimates of hazard ratios at selected time points for specified scenarios (based on given categories or value settings of covariates. The predict method for threg objects is used for prediction. And the plot method for threg objects provides plots for curves of estimated hazard functions, survival functions and probability density functions of the first-hitting-time; function curves corresponding to different scenarios can be overlaid in the same plot for comparison to give additional research insights.
Epistasis analysis for quantitative traits by functional regression model.
Zhang, Futao; Boerwinkle, Eric; Xiong, Momiao
2014-06-01
The critical barrier in interaction analysis for rare variants is that most traditional statistical methods for testing interactions were originally designed for testing the interaction between common variants and are difficult to apply to rare variants because of their prohibitive computational time and poor ability. The great challenges for successful detection of interactions with next-generation sequencing (NGS) data are (1) lack of methods for interaction analysis with rare variants, (2) severe multiple testing, and (3) time-consuming computations. To meet these challenges, we shift the paradigm of interaction analysis between two loci to interaction analysis between two sets of loci or genomic regions and collectively test interactions between all possible pairs of SNPs within two genomic regions. In other words, we take a genome region as a basic unit of interaction analysis and use high-dimensional data reduction and functional data analysis techniques to develop a novel functional regression model to collectively test interactions between all possible pairs of single nucleotide polymorphisms (SNPs) within two genome regions. By intensive simulations, we demonstrate that the functional regression models for interaction analysis of the quantitative trait have the correct type 1 error rates and a much better ability to detect interactions than the current pairwise interaction analysis. The proposed method was applied to exome sequence data from the NHLBI's Exome Sequencing Project (ESP) and CHARGE-S study. We discovered 27 pairs of genes showing significant interactions after applying the Bonferroni correction (P-values < 4.58 × 10(-10)) in the ESP, and 11 were replicated in the CHARGE-S study.
Directory of Open Access Journals (Sweden)
Kuo-Hsin Tseng
2015-04-01
Full Text Available Accurate estimation of lithium-ion battery life is essential to assure the reliable operation of the energy supply system. This study develops regression models for battery prognostics using statistical methods. The resultant regression models can not only monitor a battery’s degradation trend but also accurately predict its remaining useful life (RUL at an early stage. Three sets of test data are employed in the training stage for regression models. Another set of data is then applied to the regression models for validation. The fully discharged voltage (Vdis and internal resistance (R are adopted as aging parameters in two different mathematical models, with polynomial and exponential functions. A particle swarm optimization (PSO process is applied to search for optimal coefficients of the regression models. Simulations indicate that the regression models using Vdis and R as aging parameters can build a real state of health profile more accurately than those using cycle number, N. The Monte Carlo method is further employed to make the models adaptive. The subsequent results, however, show that this results in an insignificant improvement of the battery life prediction. A reasonable speculation is that the PSO process already yields the major model coefficients.
International Nuclear Information System (INIS)
In this paper, we present the use of different mathematical models to forecast electricity price under deregulated power. A successful prediction tool of electricity price can help both power producers and consumers plan their bidding strategies. Inspired by that the support vector regression (SVR) model, with the ε-insensitive loss function, admits of the residual within the boundary values of ε-tube, we propose a hybrid model that combines both SVR and Auto-regressive integrated moving average (ARIMA) models to take advantage of the unique strength of SVR and ARIMA models in nonlinear and linear modeling, which is called SVRARIMA. A nonlinear analysis of the time-series indicates the convenience of nonlinear modeling, the SVR is applied to capture the nonlinear patterns. ARIMA models have been successfully applied in solving the residuals regression estimation problems. The experimental results demonstrate that the model proposed outperforms the existing neural-network approaches, the traditional ARIMA models and other hybrid models based on the root mean square error and mean absolute percentage error.
Modeling Pan Evaporation for Kuwait by Multiple Linear Regression
Directory of Open Access Journals (Sweden)
Jaber Almedeij
2012-01-01
Full Text Available Evaporation is an important parameter for many projects related to hydrology and water resources systems. This paper constitutes the first study conducted in Kuwait to obtain empirical relations for the estimation of daily and monthly pan evaporation as functions of available meteorological data of temperature, relative humidity, and wind speed. The data used here for the modeling are daily measurements of substantial continuity coverage, within a period of 17 years between January 1993 and December 2009, which can be considered representative of the desert climate of the urban zone of the country. Multiple linear regression technique is used with a procedure of variable selection for fitting the best model forms. The correlations of evaporation with temperature and relative humidity are also transformed in order to linearize the existing curvilinear patterns of the data by using power and exponential functions, respectively. The evaporation models suggested with the best variable combinations were shown to produce results that are in a reasonable agreement with observation values.
Symbolic regression modeling of noise generation at porous airfoils
Sarradj, Ennes; Geyer, Thomas
2014-07-01
Based on data sets from previous experimental studies, the tool of symbolic regression is applied to find empirical models that describe the noise generation at porous airfoils. Both the self noise from the interaction of a turbulent boundary layer with the trailing edge of an porous airfoil and the noise generated at the leading edge due to turbulent inflow are considered. Following a dimensional analysis, models are built for trailing edge noise and leading edge noise in terms of four and six dimensionless quantities, respectively. Models of different accuracy and complexity are proposed and discussed. For the trailing edge noise case, a general dependency of the sound power on the fifth power of the flow velocity was found and the frequency spectrum is controlled by the flow resistivity of the porous material. Leading edge noise power is proportional to the square of the turbulence intensity and shows a dependency on the fifth to sixth power of the flow velocity, while the spectrum is governed by the flow resistivity and the integral length scale of the incoming turbulence.
Kernel Averaged Predictors for Spatio-Temporal Regression Models.
Heaton, Matthew J; Gelfand, Alan E
2012-12-01
In applications where covariates and responses are observed across space and time, a common goal is to quantify the effect of a change in the covariates on the response while adequately accounting for the spatio-temporal structure of the observations. The most common approach for building such a model is to confine the relationship between a covariate and response variable to a single spatio-temporal location. However, oftentimes the relationship between the response and predictors may extend across space and time. In other words, the response may be affected by levels of predictors in spatio-temporal proximity to the response location. Here, a flexible modeling framework is proposed to capture such spatial and temporal lagged effects between a predictor and a response. Specifically, kernel functions are used to weight a spatio-temporal covariate surface in a regression model for the response. The kernels are assumed to be parametric and non-stationary with the data informing the parameter values of the kernel. The methodology is illustrated on simulated data as well as a physical data set of ozone concentrations to be explained by temperature. PMID:24010051
Knol, Mirjam J.; van der Tweel, Ingeborg; Grobbee, Diederick E.; Numans, Mattijs E.; Geerlings, Mirjam I.
2007-01-01
Background To determine the presence of interaction in epidemiologic research, typically a product term is added to the regression model. In linear regression, the regression coefficient of the product term reflects interaction as departure from additivity. However, in logistic regression it refers
Robust repeated median regression in moving windows with data-adaptive width selection
Borowski, Matthias; Fried, Roland
2011-01-01
Online (also 'real-time' or 'sequential') signal extraction from noisy and outlier- interfered data streams is a basic but challenging goal. Fitting a robust Repeated Median (Siegel, 1982) regression line in a moving time window has turned out to be a promising approach (Davies et al., 2004; Gather et al., 2006; Schettlinger et al., 2006). The level of the regression line at the rightmost window position, which equates to the current time point in an online application, is then...
Quadtree-adaptive tsunami modelling
Popinet, Stéphane
2011-09-01
The well-balanced, positivity-preserving scheme of Audusse et al. (SIAM J Sci Comput 25(6):2050-2065, 2004), for the solution of the Saint-Venant equations with wetting and drying, is generalised to an adaptive quadtree spatial discretisation. The scheme is validated using an analytical solution for the oscillation of a fluid in a parabolic container, as well as the classic Monai tsunami laboratory benchmark. An efficient database system able to dynamically reconstruct a multiscale bathymetry based on extremely large datasets is also described. This combination of methods is successfully applied to the adaptive modelling of the 2004 Indian ocean tsunami. Adaptivity is shown to significantly decrease the exponent of the power law describing computational cost as a function of spatial resolution. The new exponent is directly related to the fractal dimension of the geometrical structures characterising tsunami propagation. The implementation of the method as well as the data and scripts necessary to reproduce the results presented are freely available as part of the open-source Gerris Flow Solver framework.
Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.
Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko
2016-03-01
In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. PMID:26774211
Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.
Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko
2016-03-01
In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique.
MODELING SNAKE MICROHABITAT FROM RADIOTELEMETRY STUDIES USING POLYTOMOUS LOGISTIC REGRESSION
Multivariate analysis of snake microhabitat has historically used techniques that were derived under assumptions of normality and common covariance structure (e.g., discriminant function analysis, MANOVA). In this study, polytomous logistic regression (PLR which does not require ...
Gurudeo Anand Tularam; Siti Amri
2012-01-01
House price prediction continues to be important for government agencies insurance companies and real estate industry. This study investigates the performance of house sales price models based on linear and non-linear approaches to study the effects of selected variables. Linear stepwise Multivariate Regression (MR) and nonlinear models of Neural Network (NN) and Adaptive Neuro-Fuzzy (ANFIS) are developed and compared. The GIS methods are used to integrate the data for the study area (Bathurs...
Faraway, Julian J
2005-01-01
Linear models are central to the practice of statistics and form the foundation of a vast range of statistical methodologies. Julian J. Faraway''s critically acclaimed Linear Models with R examined regression and analysis of variance, demonstrated the different methods available, and showed in which situations each one applies. Following in those footsteps, Extending the Linear Model with R surveys the techniques that grow from the regression model, presenting three extensions to that framework: generalized linear models (GLMs), mixed effect models, and nonparametric regression models. The author''s treatment is thoroughly modern and covers topics that include GLM diagnostics, generalized linear mixed models, trees, and even the use of neural networks in statistics. To demonstrate the interplay of theory and practice, throughout the book the author weaves the use of the R software environment to analyze the data of real examples, providing all of the R commands necessary to reproduce the analyses. All of the ...
Boosting the partial least square algorithm for regression modelling
Institute of Scientific and Technical Information of China (English)
Ling YU; Tiejun WU
2006-01-01
Boosting algorithms are a class of general methods used to improve the general performance of regression analysis. The main idea is to maintain a distribution over the train set. In order to use the given distribution directly,a modified PLS algorithm is proposed and used as the base learner to deal with the nonlinear multivariate regression problems. Experiments on gasoline octane number prediction demonstrate that boosting the modified PLS algorithm has better general performance over the PLS algorithm.
Linking Simple Economic Theory Models and the Cointegrated Vector AutoRegressive Model
DEFF Research Database (Denmark)
Møller, Niels Framroze
This paper attempts to clarify the connection between simple economic theory models and the approach of the Cointegrated Vector-Auto-Regressive model (CVAR). By considering (stylized) examples of simple static equilibrium models, it is illustrated in detail, how the theoretical model and its...
Proteomics Improves the Prediction of Burns Mortality: Results from Regression Spline Modeling
Finnerty, Celeste C.; Ju, Hyunsu; Spratt, Heidi; Victor, Sundar; Jeschke, Marc G.; Hegde, Sachin; Bhavnani, Suresh K.; Luxon, Bruce A.; Brasier, Allan R.; Herndon, David N.
2012-01-01
Prediction of mortality in severely burned patients remains unreliable. Although clinical covariates and plasma protein abundance have been used with varying degrees of success, the triad of burn size, inhalation injury, and age remains the most reliable predictor. We investigated the effect of combining proteomics variables with these three clinical covariates on prediction of mortality in burned children. Serum samples were collected from 330 burned children (burns covering >25% of the total body surface area) between admission and the time of the first operation for clinical chemistry analyses and proteomic assays of cytokines. Principal component analysis revealed that serum protein abundance and the clinical covariates each provided independent information regarding patient survival. To determine whether combining proteomics with clinical variables improves prediction of patient mortality, we used multivariate adaptive regression splines, since the relationships between analytes and mortality were not linear. Combining these factors increased overall outcome prediction accuracy from 52% to 81% and area under the receiver operating characteristic curve from 0.82 to 0.95. Thus, the predictive accuracy of burns mortality is substantially improved by combining protein abundance information with clinical covariates in a multivariate adaptive regression splines classifier, a model currently being validated in a prospective study. PMID:22686201
Story, Roger E.
1996-01-01
Discussion of the use of Latent Semantic Indexing to determine relevancy in information retrieval focuses on statistical regression and Bayesian methods. Topics include keyword searching; a multiple regression model; how the regression model can aid search methods; and limitations of this approach, including complexity, linearity, and…
Regression of retinopathy by squalamine in a mouse model.
Higgins, Rosemary D; Yan, Yun; Geng, Yixun; Zasloff, Michael; Williams, Jon I
2004-07-01
The goal of this study was to determine whether an antiangiogenic agent, squalamine, given late during the evolution of oxygen-induced retinopathy (OIR) in the mouse, could improve retinal neovascularization. OIR was induced in neonatal C57BL6 mice and the neonates were treated s.c. with squalamine doses begun at various times after OIR induction. A system of retinal whole mounts and assessment of neovascular nuclei extending beyond the inner limiting membrane from animals reared under room air or OIR conditions and killed periodically from d 12 to 21 were used to assess retinopathy in squalamine-treated and untreated animals. OIR evolved after 75% oxygen exposure in neonatal mice with florid retinal neovascularization developing by d 14. Squalamine (single dose, 25 mg/kg s.c.) given on d 15 or 16, but not d 17, substantially improved retinal neovascularization in the mouse model of OIR. There was improvement seen in the degree of blood vessel tuft formation, blood vessel tortuosity, and central vasoconstriction with squalamine treatment at d 15 or 16. Single-dose squalamine at d 12 was effective at reducing subsequent development of retinal neovascularization at doses as low as 1 mg/kg. Squalamine is a very active inhibitor of OIR in mouse neonates at doses as low as 1 mg/kg given once. Further, squalamine given late in the course of OIR improves retinopathy by inducing regression of retinal neovessels and abrogating invasion of new vessels beyond the inner-limiting membrane of the retina. PMID:15128931
M. Ahmadlou; M. R. Delavar; Tayyebi, A.; H. Shafizadeh-Moghadam
2015-01-01
Land use change (LUC) models used for modelling urban growth are different in structure and performance. Local models divide the data into separate subsets and fit distinct models on each of the subsets. Non-parametric models are data driven and usually do not have a fixed model structure or model structure is unknown before the modelling process. On the other hand, global models perform modelling using all the available data. In addition, parametric models have a fixed structure before the m...
A Bayesian Nonparametric Causal Model for Regression Discontinuity Designs
Karabatsos, George; Walker, Stephen G.
2013-01-01
The regression discontinuity (RD) design (Thistlewaite & Campbell, 1960; Cook, 2008) provides a framework to identify and estimate causal effects from a non-randomized design. Each subject of a RD design is assigned to the treatment (versus assignment to a non-treatment) whenever her/his observed value of the assignment variable equals or…
VARIABLE SELECTION BY PSEUDO WAVELETS IN HETEROSCEDASTIC REGRESSION MODELS INVOLVING TIME SERIES
Institute of Scientific and Technical Information of China (English)
无
2006-01-01
A simple but efficient method has been proposed to select variables in heteroscedastic regression models. It is shown that the pseudo empirical wavelet coefficients corresponding to the significant explanatory variables in the regression models are clearly larger than those nonsignificant ones, on the basis of which a procedure is developed to select variables in regression models. The coefficients of the models are also estimated. All estimators are proved to be consistent.
Tan, C. H.; Matjafri, M. Z.; Lim, H. S.
2015-10-01
This paper presents the prediction models which analyze and compute the CO2 emission in Malaysia. Each prediction model for CO2 emission will be analyzed based on three main groups which is transportation, electricity and heat production as well as residential buildings and commercial and public services. The prediction models were generated using data obtained from World Bank Open Data. Best subset method will be used to remove irrelevant data and followed by multi linear regression to produce the prediction models. From the results, high R-square (prediction) value was obtained and this implies that the models are reliable to predict the CO2 emission by using specific data. In addition, the CO2 emissions from these three groups are forecasted using trend analysis plots for observation purpose.
A generalized additive regression model for survival times
DEFF Research Database (Denmark)
Scheike, Thomas H.
2001-01-01
Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models......Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models...
The Linear Regression Model for setting up the Futures Price
Mario G.R. PAGLIACC; Janusz GRABARA; Madalina Gabriela ANGHEL; Cristina SACALA; Vasile Lucian ANTON
2015-01-01
To realize a linear regression, we have considered the computation method for futures prices that, according to economic culture, is based on the rate of the supporting asset and internal/external interest ratios, and also on the time period until maturity. The market price of a futures instrument is influenced by the demand and supply, that is the number of units traded within a certain period.
RCR: Robust Compound Regression for Robust Estimation of Errors-in-Variables Model
Han, Hao; Wei ZHU
2015-01-01
The errors-in-variables (EIV) regression model, being more realistic by accounting for measurement errors in both the dependent and the independent variables, is widely adopted in applied sciences. The traditional EIV model estimators, however, can be highly biased by outliers and other departures from the underlying assumptions. In this paper, we develop a novel nonparametric regression approach - the robust compound regression (RCR) analysis method for the robust estimation of EIV models. W...
Energy Technology Data Exchange (ETDEWEB)
Khorami, M. Tayebi [Department of Mining Engineering, Science and Research Branch, Islamic Azad University, Poonak, Hesarak Tehran (Iran, Islamic Republic of); Chelgani, S. Chehreh [Surface Science Western, Research Park, University of Western Ontario, London (Canada); Hower, James C. [Center for Applied Energy Research, University of Kentucky, Kexington (United States); Jorjani, E. [Department of Mining Engineering, Science and Research Branch, Islamic Azad University, Poonak, Hesarak Tehran (Iran, Islamic Republic of)
2011-01-01
The results of proximate, ultimate, and petrographic analysis for a wide range of Kentucky coal samples were used to predict Free Swelling Index (FSI) using multivariable regression and Adaptive Neuro Fuzzy Inference System (ANFIS). Three different input sets: (a) moisture, ash, and volatile matter; (b) carbon, hydrogen, nitrogen, oxygen, sulfur, and mineral matter; and (c) group-maceral analysis, mineral matter, moisture, sulfur, and R{sub max} were applied for both methods. Non-linear regression achieved the correlation coefficients (R{sup 2}) of 0.38, 0.49, and 0.70 for input sets (a), (b), and (c), respectively. By using the same input sets, ANFIS predicted FSI with higher R{sup 2} of 0.46, 0.82 and 0.95, respectively. Results show that input set (c) is the best predictor of FSI in both prediction methods, and ANFIS significantly can be used to predict FSI when regression results do not have appropriate accuracy. (author)
DEFF Research Database (Denmark)
Tan, Qihua; Bathum, L; Christiansen, L;
2003-01-01
In this paper, we apply logistic regression models to measure genetic association with human survival for highly polymorphic and pleiotropic genes. By modelling genotype frequency as a function of age, we introduce a logistic regression model with polytomous responses to handle the polymorphic...
Jokar Arsanjani, J.; Helbich, M.; Kainz, W.; Boloorani, A.
2013-01-01
This research analyses the suburban expansion in the metropolitan area of Tehran, Iran. A hybrid model consisting of logistic regression model, Markov chain (MC), and cellular automata (CA) was designed to improve the performance of the standard logistic regression model. Environmental and socio-eco
Cepeda-Cuervo, Edilberto; Núñez-Antón, Vicente
2013-01-01
In this article, a proposed Bayesian extension of the generalized beta spatial regression models is applied to the analysis of the quality of education in Colombia. We briefly revise the beta distribution and describe the joint modeling approach for the mean and dispersion parameters in the spatial regression models' setting. Finally, we…
Hybrid hotspot detection using regression model and lithography simulation
Kimura, Taiki; Matsunawa, Tetsuaki; Nojima, Shigeki; Pan, David Z.
2016-03-01
As minimum feature sizes shrink, unexpected hotspots appear on wafers. Therefore, it is important to detect and fix these hotspots at design stage to reduce development time and manufacturing cost. Currently, as the most accurate approach, lithography simulation is widely used to detect such hotspots. However, it is known to be time-consuming. This paper proposes a novel aerial image synthesizing method using regression and minimum lithography simulation for only hotspot detection. Experimental results show hotspot detection on the proposed method is equivalent compared with the results on the conventional hotspot detection method which uses only lithography simulation with much less computational cost.
A nonparametric dynamic additive regression model for longitudinal data
DEFF Research Database (Denmark)
Martinussen, Torben; Scheike, Thomas H.
2000-01-01
dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models......dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models...
Qualitative Analysis of Integration Adapter Modeling
Ritter, Daniel; Holzleitner, Manuel
2015-01-01
Integration Adapters are a fundamental part of an integration system, since they provide (business) applications access to its messaging channel. However, their modeling and configuration remain under-represented. In previous work, the integration control and data flow syntax and semantics have been expressed in the Business Process Model and Notation (BPMN) as a semantic model for message-based integration, while adapter and the related quality of service modeling were left for further studi...
Completing and adapting models of biological processes
Margaria, Tiziana; Hinchey, Michael G.; Raffelt, Harald; Rash, James L.; Rouff, Christopher A.; Steffen, Bernhard
2006-01-01
We present a learning-based method for model completion and adaptation, which is based on the combination of two approaches: 1) R2D2C, a technique for mechanically transforming system requirements via provably equivalent models to running code, and 2) automata learning-based model extrapolation. The intended impact of this new combination is to make model completion and adaptation accessible to experts of the field, like biologists or engineers. The principle is briefly illustrated by gene...
Hao, Lingxin
2007-01-01
Quantile Regression, the first book of Hao and Naiman's two-book series, establishes the seldom recognized link between inequality studies and quantile regression models. Though separate methodological literature exists for each subject, the authors seek to explore the natural connections between this increasingly sought-after tool and research topics in the social sciences. Quantile regression as a method does not rely on assumptions as restrictive as those for the classical linear regression; though more traditional models such as least squares linear regression are more widely utilized, Hao
Beta Regression Finite Mixture Models of Polarization and Priming
Smithson, Michael; Merkle, Edgar C.; Verkuilen, Jay
2011-01-01
This paper describes the application of finite-mixture general linear models based on the beta distribution to modeling response styles, polarization, anchoring, and priming effects in probability judgments. These models, in turn, enhance our capacity for explicitly testing models and theories regarding the aforementioned phenomena. The mixture…
Technology diffusion in hospitals: A log odds random effects regression model
J.L.T. Blank (Jos); V.G. Valdmanis (Vivian G.)
2015-01-01
textabstractThis study identifies the factors that affect the diffusion of hospital innovations. We apply a log odds random effects regression model on hospital micro data. We introduce the concept of clustering innovations and the application of a log odds random effects regression model to describ
Wheeler, David C.; Calder, Catherine A.
2007-06-01
The realization in the statistical and geographical sciences that a relationship between an explanatory variable and a response variable in a linear regression model is not always constant across a study area has led to the development of regression models that allow for spatially varying coefficients. Two competing models of this type are geographically weighted regression (GWR) and Bayesian regression models with spatially varying coefficient processes (SVCP). In the application of these spatially varying coefficient models, marginal inference on the regression coefficient spatial processes is typically of primary interest. In light of this fact, there is a need to assess the validity of such marginal inferences, since these inferences may be misleading in the presence of explanatory variable collinearity. In this paper, we present the results of a simulation study designed to evaluate the sensitivity of the spatially varying coefficients in the competing models to various levels of collinearity. The simulation study results show that the Bayesian regression model produces more accurate inferences on the regression coefficients than does GWR. In addition, the Bayesian regression model is overall fairly robust in terms of marginal coefficient inference to moderate levels of collinearity, and degrades less substantially than GWR with strong collinearity.
Moment-bases estimation of smooth transition regression models with endogenous variables
W.D. Areosa (Waldyr Dutra); M.J. McAleer (Michael); M.C. Medeiros (Marcelo)
2008-01-01
textabstractNonlinear regression models have been widely used in practice for a variety of time series and cross-section datasets. For purposes of analyzing univariate and multivariate time series data, in particular, Smooth Transition Regression (STR) models have been shown to be very useful for re
Using regression models in design-based estimation of spatial means of soil properties
Brus, D.J.
2000-01-01
The precision of design-based sampling strategies can be increased by using regression models at the estimation stage. A general regression estimator is given that can be used for a wide variety of models and any well-defined sampling design. It equals the estimator plus an adjustment term that acco
Institute of Scientific and Technical Information of China (English)
QIN GuoYou; ZHU ZhongYi
2009-01-01
In this paper, we study the local asymptotic behavior of the regression spline estimator in the framework of marginal semiparametric model. Similarly to Zhu, Fung and He (2008), we give explicit expression for the asymptotic bias of regression spline estimator for nonparametric function f. Our results also show that the asymptotic bias of the regression spline estimator does not depend on the working covariance matrix, which distinguishes the regression splines from the smoothing splines and the seemingly unrelated kernel. To understand the local bias result of the regression spline estimator, we show that the regression spline estimator can be obtained iteratively by applying the standard weighted least squares regression spline estimator to pseudo-observations. At each iteration, the bias of the estimator is unchanged and only the variance is updated.
Institute of Scientific and Technical Information of China (English)
2009-01-01
In this paper, we study the local asymptotic behavior of the regression spline estimator in the framework of marginal semiparametric model. Similarly to Zhu, Fung and He (2008), we give explicit expression for the asymptotic bias of regression spline estimator for nonparametric function f. Our results also show that the asymptotic bias of the regression spline estimator does not depend on the working covariance matrix, which distinguishes the regression splines from the smoothing splines and the seemingly unrelated kernel. To understand the local bias result of the regression spline estimator, we show that the regression spline estimator can be obtained iteratively by applying the standard weighted least squares regression spline estimator to pseudo-observations. At each iteration, the bias of the estimator is unchanged and only the variance is updated.
Unobtrusive user modeling for adaptive hypermedia
H.J. Holz; K. Hofmann; C. Reed
2008-01-01
We propose a technique for user modeling in Adaptive Hypermedia (AH) that is unobtrusive at both the level of observable behavior and that of cognition. Unobtrusive user modeling is complementary to transparent user modeling. Unobtrusive user modeling induces user models appropriate for Educational
Suhartono, Lee, Muhammad Hisyam; Prastyo, Dedy Dwi
2015-12-01
The aim of this research is to develop a calendar variation model for forecasting retail sales data with the Eid ul-Fitr effect. The proposed model is based on two methods, namely two levels ARIMAX and regression methods. Two levels ARIMAX and regression models are built by using ARIMAX for the first level and regression for the second level. Monthly men's jeans and women's trousers sales in a retail company for the period January 2002 to September 2009 are used as case study. In general, two levels of calendar variation model yields two models, namely the first model to reconstruct the sales pattern that already occurred, and the second model to forecast the effect of increasing sales due to Eid ul-Fitr that affected sales at the same and the previous months. The results show that the proposed two level calendar variation model based on ARIMAX and regression methods yields better forecast compared to the seasonal ARIMA model and Neural Networks.
A Formal Model for Dynamically Adaptable Services
Fox, Jorge
2010-01-01
The growing complexity of software systems as well as changing conditions in their operating environment demand systems that are more flexible, adaptive and dependable. The service-oriented computing paradigm is in widespread use to support such adaptive systems, and, in many domains, adaptations may occur dynamically and in real time. In addition, services from heterogeneous, possibly unknown sources may be used. This motivates a need to ensure the correct behaviour of the adapted systems, and its continuing compliance to time bounds and other QoS properties. The complexity of dynamic adaptation (DA) is significant, but currently not well understood or formally specified. This paper elaborates a well-founded model and theory of DA, introducing formalisms written using COWS. The model is evaluated for reliability and responsiveness properties with the model checker CMC.
A generalized exponential time series regression model for electricity prices
DEFF Research Database (Denmark)
Haldrup, Niels; Knapik, Oskar; Proietti, Tomasso
We consider the issue of modeling and forecasting daily electricity spot prices on the Nord Pool Elspot power market. We propose a method that can handle seasonal and non-seasonal persistence by modelling the price series as a generalized exponential process. As the presence of spikes can distort...... on the estimated model, the best linear predictor is constructed. Our modeling approach provides good fit within sample and outperforms competing benchmark predictors in terms of forecasting accuracy. We also find that building separate models for each hour of the day and averaging the forecasts is a better...
Asymptotic Normality of LS Estimate in Simple Linear EV Regression Model
Institute of Scientific and Technical Information of China (English)
Jixue LIU
2006-01-01
Though EV model is theoretically more appropriate for applications in which measurement errors exist, people are still more inclined to use the ordinary regression models and the traditional LS method owing to the difficulties of statistical inference and computation. So it is meaningful to study the performance of LS estimate in EV model.In this article we obtain general conditions guaranteeing the asymptotic normality of the estimates of regression coefficients in the linear EV model. It is noticeable that the result is in some way different from the corresponding result in the ordinary regression model.
Teacher training through the Regression Model in foreign language education
Directory of Open Access Journals (Sweden)
Jesús García Laborda
2011-01-01
Full Text Available In the last few years, Spain has seen dramatic changes in its educational system. Many of them have been rejected by most teachers after their implementation (LOGSE while others have found potential drawbacks even before starting operating (LOCE, LOE. To face these changes, schools need well qualified instructors. Given this need, and also considering that, although all the schools want the best teachers but, as teachers’ salaries are regulated by the state, few schools can actually offer incentives to their teachers and consequently schools never have the instructors they wish. Apart from this, state schools have a fixed salary for their teachers and private institutions offer no additional bonuses for things like additional training or diplomas (for example, masters or post-degree courses and, therefore, teachers are rarely interested in pursuing any further studies in methodology or any other related fields such as education or applied linguistics. Although many teachers acknowledge their love to teaching, the current situation in schools (school violence, bad salaries, depression, social desprestige, legal changes and so has made the teaching job one of the most complicated and undevoted in Spain. It is not unusual to have a couple of instructors ill due to depression and other psychological sicknesses. This paper deals with the development and implementation of a training program based on regressive visualizations of one’s experience both as a teacher as well as a learner.
Logistic Regression Models to Forecast Travelling Behaviour in Tripoli City
Directory of Open Access Journals (Sweden)
Amiruddin Ismail
2011-01-01
Full Text Available Transport modes are very important to Libyan’s Tripoli residents for their daily trips. However, the total number of own car and private transport namely taxi and micro buses on the road increases and causes many problems such as traffic congestion, accidents, air and noise pollution. These problems then causes other related phenomena to the travel activities such as delay in trips, stress and frustration to motorists which may affect their productivity and efficiency to both workers and students. Delay may also increase travel cost as well inefficiency in trips making if compare to other public transport users in some Arabs cities. Switching to public transport (PT modes alternatives such as buses, light rail transit and underground train could improve travel time and travel costs. A transport study has been carried out at Tripoli City Authority areas among own car users who live in areas with inadequate of private transport and poor public transportation services. Analyses about relation between factors such as travel time, travel cost, trip purpose and parking cost have been made to answer research questions. Logistic regression technique has been used to analyse these factors that influence users to switch their trips mode to public transport alternatives.
Teacher training through the Regression Model in foreign language education
Directory of Open Access Journals (Sweden)
Jesús García Laborda
2011-01-01
Full Text Available In the last few years, Spain has seen dramatic changes in its educational system. Many of these changes have been rejected by most teachers after their implementation (LOGSE while others have found potential drawbacks even before starting operating (LOCE, LOE. To face these changes, schools need well qualified instructors. Given this need, and also considering that, although all the schools want the best teachers but, as teachers’ salaries are regulated by the state, few schools can actually offer incentives to their teachers and consequently schools never have the instructors they wish. Apart from this, state schools have a fixed salary for their teachers and private institutions offer no additional bonuses for things like additional training or diplomas (for example, masters or post-degree courses and, therefore, teachers are rarely interested in pursuing any further studies in methodology or any other related fields such as education or applied linguistics. Although many teachers acknowledge their love to teaching, the current situation in schools (school violence, bad salaries, depression, social desprestige, legal changes and so has made the teaching job one of the most complicated and undevoted in Spain. It is not unusual to have a couple of instructors ill due to depression and other psychological sicknesses. This paper deals with the development and implementation of a training program based on regressive visualizations of one’s experience both as a teacher as well as a learner.
Multiple models adaptive feedforward decoupling controller
Institute of Scientific and Technical Information of China (English)
Wang Xin; Li Shaoyuan; Wang Zhongjie
2005-01-01
When the parameters of the system change abruptly, a new multivariable adaptive feedforward decoupling controller using multiple models is presented to improve the transient response. The system models are composed of multiple fixed models, one free-running adaptive model and one re-initialized adaptive model. The fixed models are used to provide initial control to the process. The re-initialized adaptive model can be reinitialized as the selected model to improve the adaptation speed. The free-running adaptive controller is added to guarantee the overall system stability. At each instant, the best system model is selected according to the switching index and the corresponding controller is designed. During the controller design, the interaction is viewed as the measurable disturbance and eliminated by the choice of the weighting polynomial matrix. It not only eliminates the steady-state error but also decouples the system dynamically. The global convergence is obtained and several simulation examples are presented to illustrate the effectiveness of the proposed controller.
Reduction of the curvature of a class of nonlinear regression models
Institute of Scientific and Technical Information of China (English)
吴翊; 易东云
2000-01-01
It is proved that the curvature of nonlinear model can be reduced to zero by increasing measured data for a class of nonlinear regression models. The result is important to actual problem and has obtained satisfying effect on data fusing.
Process Design and Optimization (MLS-S03): a Journey in Modeling, Optimization and Regression
Billeter, Julien
2015-01-01
This lecture describes the following topics: • Preamble on Linear Algebra • Dynamic and Static Models • Solving Dynamic and Static Models • Solving Optimization Problems • Solving Regression Problems
On relationship between coefficients of the different dimensions linear regression models
Panov, V. G.
2011-01-01
Considered two linear regression models of a given response variable with some predictor set and its subset. It is shown that there is a linear relationship between coefficients of these models. Some corollaries of the proved theorem is considered.
Modeling by regression for laser cutting of quartz crystal
Institute of Scientific and Technical Information of China (English)
无
2000-01-01
Presents the theoretical models built by analysis of the mechanism of laser cutting of quartz crystal and re gression of test results for the laser cutting of quartz crystal, and comparative analysis of calculation errors for these models, and concludes with test results that these models comprehensively reflect the physical features of laser cutting of quartz crystal and satisfy the industrial production requirements, and they can be used to select right parameters for improvement of productivity and quality and saving of energy.
Additive Intensity Regression Models in Corporate Default Analysis
DEFF Research Database (Denmark)
Lando, David; Medhat, Mamdouh; Nielsen, Mads Stenbo;
2013-01-01
We consider additive intensity (Aalen) models as an alternative to the multiplicative intensity (Cox) models for analyzing the default risk of a sample of rated, nonfinancial U.S. firms. The setting allows for estimating and testing the significance of time-varying effects. We use a variety...... of model checking techniques to identify misspecifications. In our final model, we find evidence of time-variation in the effects of distance-to-default and short-to-long term debt. Also we identify interactions between distance-to-default and other covariates, and the quick ratio covariate is significant...
CONSISTENCY OF LS ESTIMATOR IN SIMPLE LINEAR EV REGRESSION MODELS
Institute of Scientific and Technical Information of China (English)
Liu Jixue; Chen Xiru
2005-01-01
Consistency of LS estimate of simple linear EV model is studied. It is shown that under some common assumptions of the model, both weak and strong consistency of the estimate are equivalent but it is not so for quadratic-mean consistency.
A Negative Binomial Regression Model for Accuracy Tests
Hung, Lai-Fa
2012-01-01
Rasch used a Poisson model to analyze errors and speed in reading tests. An important property of the Poisson distribution is that the mean and variance are equal. However, in social science research, it is very common for the variance to be greater than the mean (i.e., the data are overdispersed). This study embeds the Rasch model within an…
Misspecified poisson regression models for large-scale registry data
DEFF Research Database (Denmark)
Grøn, Randi; Gerds, Thomas A; Andersen, Per K
2016-01-01
working models that are then likely misspecified. To support and improve conclusions drawn from such models, we discuss methods for sensitivity analysis, for estimation of average exposure effects using aggregated data, and a semi-parametric bootstrap method to obtain robust standard errors. The methods...
A Noncentral "t" Regression Model for Meta-Analysis
Camilli, Gregory; de la Torre, Jimmy; Chiu, Chia-Yi
2010-01-01
In this article, three multilevel models for meta-analysis are examined. Hedges and Olkin suggested that effect sizes follow a noncentral "t" distribution and proposed several approximate methods. Raudenbush and Bryk further refined this model; however, this procedure is based on a normal approximation. In the current research literature, this…
Preobrazhenskii, M. P.; Rudakov, O. B.
2016-01-01
A regression model for calculating the boiling point isobars of tetrachloromethane-organic solvent binary homogeneous systems is proposed. The parameters of the model proposed were calculated for a series of solutions. The correlation between the nonadditivity parameter of the regression model and the hydrophobicity criterion of the organic solvent is established. The parameter value of the proposed model is shown to allow prediction of the potential formation of azeotropic mixtures of solvents with tetrachloromethane.
The Relationship between Economic Growth and Money Laundering – a Linear Regression Model
Daniel Rece; Ion Stancu
2009-01-01
This study provides an overview of the relationship between economic growth and money laundering modeled by a least squares function. The report analyzes statistically data collected from USA, Russia, Romania and other eleven European countries, rendering a linear regression model. The study illustrates that 23.7% of the total variance in the regressand (level of money laundering) is “explained” by the linear regression model. In our opinion, this model will provide critical...
Directory of Open Access Journals (Sweden)
N. Diodato
2010-12-01
Full Text Available To reconstruct sub-regional European climate over the past centuries, several efforts have been made using historical datasets. However, only scattered information at low spatial and temporal resolution have been produced to date for the Mediterranean area. This paper has exploited, for Southern and Central Italy (Mediterranean Sub-Regional Area, an unprecedented historical dataset as an attempt to model seasonal (winter and summer air temperatures in pre-instrumental time (back to 1500. Combining information derived from proxy documentary data and large-scale simulation, a statistical methodology in the form of multiscale-temperature regression (MTR-model was developed to adapt larger-scale estimations to the sub-regional temperature pattern. The modelled response lacks essentially of autocorrelations among the residuals (marginal or any significance in the Durbin-Watson statistic, and agrees well with the independent data from the validation sample (Nash-Sutcliffe efficiency coefficient >0.60. The advantage of the approach is not merely increased accuracy in estimation. Rather, it relies on the ability to extract (and exploit the right information to replicate coherent temperature series in historical times.
STATISTICAL INFERENCES FOR VARYING-COEFFICINT MODELS BASED ON LOCALLY WEIGHTED REGRESSION TECHNIQUE
Institute of Scientific and Technical Information of China (English)
梅长林; 张文修; 梁怡
2001-01-01
Some fundamental issues on statistical inferences relating to varying-coefficient regression models are addressed and studied. An exact testing procedure is proposed for checking the goodness of fit of a varying-coefficient model fired by the locally weighted regression technique versus an ordinary linear regression model. Also, an appropriate statistic for testing variation of model parameters over the locations where the observations are collected is constructed and a formal testing approach which is essential to exploring spatial non-stationarity in geography science is suggested.
Prediction of the result in race walking using regularized regression models
Directory of Open Access Journals (Sweden)
Krzysztof Przednowek
2013-04-01
Full Text Available The following paper presents the use of regularized linear models as tools to optimize training process. The models were calculated by using data collected from race-walkers' training events. The models used predict the outcomes over a 3 km race and following a prescribed training plan. The material included a total of 122 training patterns made by 21 players. The methods of analysis include: classical model of OLS regression, ridge regression, LASSO regression and elastic net regression. In order to compare and choose the best method a cross-validation of the extit{leave-one-out} was used. All models were calculated using R language with additional packages. The best model was determined by the LASSO method which generates an error of about 26 seconds. The method has simplified the structure of the model by eliminating 5 out of 18 predictors.
Evaporation modeling with multiple linear regression techniques– a review
Parameshwar Sidramappa Shirgure
2013-01-01
Evaporation is influenced by number of agro-meteorological parameters and one of the integral components of the hydrological cycle and. Usually, estimates of evaporation are needed in a wide array of problems in agriculture, hydrology, agronomy, forestry and land resources planning, such as water balance computation, irrigation management, crop yield forecasting model, river flow forecasting, ecosystem modeling. Irrigation can substantially increase crop yields, but again the scheduling of th...
Directory of Open Access Journals (Sweden)
J. Martínez-Fernández
2013-02-01
Full Text Available Humans are responsible for most forest fires in Europe, but anthropogenic factors behind these events are still poorly understood. We tried to identify the driving factors of human-caused fire occurrence in Spain by applying two different statistical approaches. Firstly, assuming stationary processes for the whole country, we created models based on multiple linear regression and binary logistic regression to find factors associated with fire density and fire presence, respectively. Secondly, we used geographically weighted regression (GWR to better understand and explore the local and regional variations of those factors behind human-caused fire occurrence.
The number of human-caused fires occurring within a 25-yr period (1983–2007 was computed for each of the 7638 Spanish mainland municipalities, creating a binary variable (fire/no fire to develop logistic models, and a continuous variable (fire density to build standard linear regression models. A total of 383 657 fires were registered in the study dataset. The binary logistic model, which estimates the probability of having/not having a fire, successfully classified 76.4% of the total observations, while the ordinary least squares (OLS regression model explained 53% of the variation of the fire density patterns (adjusted R^{2} = 0.53. Both approaches confirmed, in addition to forest and climatic variables, the importance of variables related with agrarian activities, land abandonment, rural population exodus and developmental processes as underlying factors of fire occurrence.
For the GWR approach, the explanatory power of the GW linear model for fire density using an adaptive bandwidth increased from 53% to 67%, while for the GW logistic model the correctly classified observations improved only slightly, from 76.4% to 78.4%, but significantly according to the corrected Akaike Information Criterion (AIC_{c}, from 3451.19 to 3321.19. The results from
MODELING THE ADAPTION RULE IN CONTEXTAWARE SYSTEMS
Directory of Open Access Journals (Sweden)
Mao Zheng
2016-08-01
Full Text Available Context awareness is increasingly gaining applicability in interactive ubiquitous mobile computing systems. Each context-aware application has its own set of behaviors to react to context modifications. This paper is concerned with the context modeling and the development methodology for context-aware systems. We proposed a rule-based approach and use the adaption tree to model the adaption rule of context-aware systems. We illustrate this idea in an arithmetic game application.
Evaporation modeling with multiple linear regression techniques– a review
Directory of Open Access Journals (Sweden)
Parameshwar Sidramappa Shirgure
2013-01-01
Full Text Available Evaporation is influenced by number of agro-meteorological parameters and one of the integral components of the hydrological cycle and. Usually, estimates of evaporation are needed in a wide array of problems in agriculture, hydrology, agronomy, forestry and land resources planning, such as water balance computation, irrigation management, crop yield forecasting model, river flow forecasting, ecosystem modeling. Irrigation can substantially increase crop yields, but again the scheduling of the water application is usually based on evaporation estimates. Numerous investigators developed models for estimation of evaporation. The interrelated meteorological factors having a major influence on evaporation have been incorporated into various formulae for estimating evaporation. Unfortunately, reliable estimates of evaporation are extremely difficult to obtain because of complex interactions between the components of the land-plant-atmosphere system. In hot climate, the loss of water by evaporation from rivers, canals and open-water bodies is a vital factor as evaporation takes a significant portion of all water supplies. Even in humid areas, evaporation loss is significant, although the cumulative precipitation tends to mask it due to which it is ordinarily not recognized except during rainless period. Therefore, the need for reliable models for quantifying evaporation losses from increasingly scarce water resources is greater than ever before. Accurate estimation of evaporation is fundamental for effective management of water resources. The evaporation models using MLR techniques is discussed her in details.
Application of Search Algorithms for Model Based Regression Testing
Directory of Open Access Journals (Sweden)
Sidra Noureen
2014-04-01
Full Text Available UML models have gained their significance as reported in the literature. The use of a model to describe the behavior of a system is a proven and major advantage to test. With the help of Model Based Testing (MBT, it is possible to automatically generate test cases. When MBT is applied on large industrial systems, there is problem to sampling the test cases from the suit of entire test because it is difficult to execute the huge number of test cases being generated. The motivation of this study is to design a multi objective genetic algorithm based test case selection technique which can select the most appropriate subset of test cases. NSGA (Non-dominated Sorting Genetic Algorithm is used as an optimization algorithm and its fitness function is improved for selecting test cases from the dataset. It is concluded that there is a room to improve the performance of NSGA algorithm by means of tailoring its respective fitness function.
Thomas, Michael S. C.; Knowland, Victoria C. P.; Karmiloff-Smith, Annette
2011-01-01
Loss of previously established behaviors in early childhood constitutes a markedly atypical developmental trajectory. It is found almost uniquely in autism and its cause is currently unknown (Baird et al., 2008). We present an artificial neural network model of developmental regression, exploring the hypothesis that regression is caused by…
BAHADUR ASYMPTOTIC EFFICIENCY IN A SEMIPARAMETRIC REGRESSION MODEL
Institute of Scientific and Technical Information of China (English)
LIANGHUA; CHENGPING
1994-01-01
The authors give MLE θ1ML of θ1 in the model Y=θ1+g(T)-σ,then consider Bahadur asymptotic efficiency of θ1ML,where T and ε are independent,g is unknown,ε～φ(-) is known with mean 0 and variance σ2.
CONFIDENCE REGIONS IN TERMS OF STATISTICAL CURVATURE FOR AR(q) NONLINEAR REGRESSION MODELS
Institute of Scientific and Technical Information of China (English)
刘应安; 韦博成
2004-01-01
This paper constructs a set of confidence regions of parameters in terms of statistical curvatures for AR(q) nonlinear regression models. The geometric frameworks are proposed for the model. Then several confidence regions for parameters and parameter subsets in terms of statistical curvatures are given based on the likelihood ratio statistics and score statistics. Several previous results, such as [1] and [2] are extended to AR(q)nonlinear regression models.
International Nuclear Information System (INIS)
Highlights: • Developed hourly-indexed ARX models for robust cooling-load forecasting. • Proposed a two-stage weighted least-squares regression approach. • Considered the effect of outliers as well as trend of cooling load and weather patterns. • Included higher order terms and day type patterns in the forecasting models. • Demonstrated better accuracy compared with some ARX and ANN models. - Abstract: This paper presents a robust hourly cooling-load forecasting method based on time-indexed autoregressive with exogenous inputs (ARX) models, in which the coefficients are estimated through a two-stage weighted least squares regression. The prediction method includes a combination of two separate time-indexed ARX models to improve prediction accuracy of the cooling load over different forecasting periods. The two-stage weighted least-squares regression approach in this study is robust to outliers and suitable for fast and adaptive coefficient estimation. The proposed method is tested on a large-scale central cooling system in an academic institution. The numerical case studies show the proposed prediction method performs better than some ANN and ARX forecasting models for the given test data set
Koon, Sharon; Petscher, Yaacov
2015-01-01
The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules…
Bayesian regression model for seasonal forecast of precipitation over Korea
Jo, Seongil; Lim, Yaeji; Lee, Jaeyong; Kang, Hyun-Suk; Oh, Hee-Seok
2012-08-01
In this paper, we apply three different Bayesian methods to the seasonal forecasting of the precipitation in a region around Korea (32.5°N-42.5°N, 122.5°E-132.5°E). We focus on the precipitation of summer season (June-July-August; JJA) for the period of 1979-2007 using the precipitation produced by the Global Data Assimilation and Prediction System (GDAPS) as predictors. Through cross-validation, we demonstrate improvement for seasonal forecast of precipitation in terms of root mean squared error (RMSE) and linear error in probability space score (LEPS). The proposed methods yield RMSE of 1.09 and LEPS of 0.31 between the predicted and observed precipitations, while the prediction using GDAPS output only produces RMSE of 1.20 and LEPS of 0.33 for CPC Merged Analyzed Precipitation (CMAP) data. For station-measured precipitation data, the RMSE and LEPS of the proposed Bayesian methods are 0.53 and 0.29, while GDAPS output is 0.66 and 0.33, respectively. The methods seem to capture the spatial pattern of the observed precipitation. The Bayesian paradigm incorporates the model uncertainty as an integral part of modeling in a natural way. We provide a probabilistic forecast integrating model uncertainty.
Maximum likelihood polynomial regression for robust speech recognition
Institute of Scientific and Technical Information of China (English)
LU Yong; WU Zhenyang
2011-01-01
The linear hypothesis is the main disadvantage of maximum likelihood linear re- gression （MLLR）. This paper applies the polynomial regression method to model adaptation and establishes a nonlinear model adaptation algorithm using maximum likelihood polyno
Hidden Connections between Regression Models of Strain-Gage Balance Calibration Data
Ulbrich, Norbert
2013-01-01
Hidden connections between regression models of wind tunnel strain-gage balance calibration data are investigated. These connections become visible whenever balance calibration data is supplied in its design format and both the Iterative and Non-Iterative Method are used to process the data. First, it is shown how the regression coefficients of the fitted balance loads of a force balance can be approximated by using the corresponding regression coefficients of the fitted strain-gage outputs. Then, data from the manual calibration of the Ames MK40 six-component force balance is chosen to illustrate how estimates of the regression coefficients of the fitted balance loads can be obtained from the regression coefficients of the fitted strain-gage outputs. The study illustrates that load predictions obtained by applying the Iterative or the Non-Iterative Method originate from two related regression solutions of the balance calibration data as long as balance loads are given in the design format of the balance, gage outputs behave highly linear, strict statistical quality metrics are used to assess regression models of the data, and regression model term combinations of the fitted loads and gage outputs can be obtained by a simple variable exchange.
Directory of Open Access Journals (Sweden)
Deoclécio Domingos Garbuglio
2007-02-01
Full Text Available O objetivo deste trabalho foi verificar possíveis divergências entre os resultados obtidos nas avaliações da adaptabilidade de 27 genótipos de milho (Zea mays L., e na estratificação de 22 ambientes no Estado do Paraná, por meio de técnicas baseadas na análise de fatores e regressão bissegmentada. As estratificações ambientais foram feitas por meio do método tradicional e por análise de fatores, aliada ao porcentual da porção simples da interação GxA (PS%. As análises de adaptabilidade foram realizadas por meio de regressão bissegmentada e análise de fatores. Pela análise de regressão bissegmentada, os genótipos estudados apresentaram alta performance produtiva; no entanto, não foi constatado o genótipo considerado como ideal. A adaptabilidade dos genótipos, analisada por meio de plotagens gráficas, apresentou respostas diferenciadas quando comparada à regressão bissegmentada. A análise de fatores mostrou-se eficiente nos processos de estratificação ambiental e adaptabilidade dos genótipos de milho.The objective of this work was to verify possible divergences among results obtained on adaptability evaluations of 27 maize genotypes (Zea mays L., and on stratification of 22 environments on Paraná State, Brazil, through techniques of factor analysis and bissegmented regression. The environmental stratifications were made through the traditional methodology and by factor analysis, allied to the percentage of the simple portion of GxE interaction (PS%. Adaptability analyses were carried out through bissegmented regression and factor analysis. By the analysis of bissegmented regression, studied genotypes had presented high productive performance; however, it was not evidenced the genotype considered as ideal. The adaptability of the genotypes, analyzed through graphs, presented different answers when compared to bissegmented regression. Factor analysis was efficient in the processes of environment stratification and
Aboveground biomass and carbon stocks modelling using non-linear regression model
Ain Mohd Zaki, Nurul; Abd Latif, Zulkiflee; Nazip Suratman, Mohd; Zainee Zainal, Mohd
2016-06-01
Aboveground biomass (AGB) is an important source of uncertainty in the carbon estimation for the tropical forest due to the variation biodiversity of species and the complex structure of tropical rain forest. Nevertheless, the tropical rainforest holds the most extensive forest in the world with the vast diversity of tree with layered canopies. With the usage of optical sensor integrate with empirical models is a common way to assess the AGB. Using the regression, the linkage between remote sensing and a biophysical parameter of the forest may be made. Therefore, this paper exemplifies the accuracy of non-linear regression equation of quadratic function to estimate the AGB and carbon stocks for the tropical lowland Dipterocarp forest of Ayer Hitam forest reserve, Selangor. The main aim of this investigation is to obtain the relationship between biophysical parameter field plots with the remotely-sensed data using nonlinear regression model. The result showed that there is a good relationship between crown projection area (CPA) and carbon stocks (CS) with Pearson Correlation (p forest.
Correlated Component Regression: Application On Model To Determination Of Dna Damage
Directory of Open Access Journals (Sweden)
Sadi ELASAN
2016-04-01
Full Text Available Objective: The number of explanatory variables, where the sample size is approaching or passing the sample size, the high-dimensional data sets in other words, the estimated regression model is one of the important questions of how to increase the reliability. One of the new methods that can be used, Correlated Component Regression Centerproduct. In this study, providing information about the associated Component Regression is intended to be introduced along with an application. Material and Methods: Regression analysis in the scientific work to be done, the number of explanatory variables; sample width when approached or when the sample width late (in high-dimensional data sets, an estimate will be made with standard regression analysis method, the estimated coefficients, multiple connections (to be singular of covariance matrix varies due. As an alternative to this, "Associated Component Regression" (CCR can help solving the problem. If there are continuous response variables, Iber-linear regression and binary if there is, CCR is used in logistic regression, CCR is available on-Cox regression in survival data. The method uses K number associated components. These related components, as determined by the researchers can be determined by the program. Results: In this study sample size is small, the correlation coefficients are moderate and numbers of variables are high. Therefore, we performed m-fold cross-validation test due to extreme overfit of the saturated regression model. Conclusion: Encountered in regression analysis; multiple connections, CCR can be used for solving problems such as lack of or excessive integration and said can capture higher power.
U.S. Geological Survey, Department of the Interior — This dataset was created using the PRISM (Parameter-elevation Regressions on Independent Slopes Model) climate mapping system, developed by Dr. Christopher Daly,...
Institute of Scientific and Technical Information of China (English)
Tao Hu; Heng-jian Cui; Xing-wei Tong
2009-01-01
This article considers a semiparametric varying-coefficient partially linear regression model with current status data. The semiparametric varying-coefficient partially linear regression model which is a gen-eralization of the partially linear regression model and varying-coefficient regression model that allows one to explore the possibly nonlinear effect of a certain covariate on the response variable. A Sieve maximum likelihood estimation method is proposed and the asymptotic properties of the proposed estimators are discussed. Under some mild conditions, the estimators are shown to be strongly consistent. The convergence rate of the estima-tor for the unknown smooth function is obtained and the estimator for the unknown parameter is shown to be asymptotically efficient and normally distributed. Simulation studies are conducted to examine the small-sample properties of the proposed estimates and a real dataset is used to illustrate our approach.
Observer-based and Regression Model-based Detection of Emerging Faults in Coal Mills
DEFF Research Database (Denmark)
Odgaard, Peter Fogh; Lin, Bao; Jørgensen, Sten Bay
2006-01-01
. In this paper three different fault detection approaches are compared using a example of a coal mill, where a fault emerges. The compared methods are based on: an optimal unknown input observer, static and dynamic regression model-based detections. The conclusion on the comparison is that observer-based scheme...... detects the fault 13 samples earlier than the dynamic regression model-based method, and that the static regression based method is not usable due to generation of far too many false detections....
ON CONFIDENCE REGIONS OF SEMIPARAMETRIC NONLINEAR REGRESSION MODELS(A GEOMETRIC APPROACH)
Institute of Scientific and Technical Information of China (English)
无
2000-01-01
A geometric framework is proposed for semiparametric nonlinear regression models based on the concept of least favorable curve,introduced by Severini and Wong (1992).The authors use this framework to drive three kinds of improved approximate confidence regions for the parameter and parameter subset in terms of curvatures.The results obtained by Hamilton et al.(1982),Hamilton (1986) and Wei (1994) are extended to semiparametric nonlinear regression models.
On asymptotics of t-type regression estimation in multiple linear model
Institute of Scientific and Technical Information of China (English)
无
2004-01-01
We consider a robust estimator (t-type regression estimator) of multiple linear regression model by maximizing marginal likelihood of a scaled t-type error t-distribution.The marginal likelihood can also be applied to the de-correlated response when the withinsubject correlation can be consistently estimated from an initial estimate of the model based on the independent working assumption. This paper shows that such a t-type estimator is consistent.
A note on the estimation of asset pricing models using simple regression betas
Kan, Raymond; Robotti, Cesare
2009-01-01
Since Black, Jensen, and Scholes (1972) and Fama and MacBeth (1973), the two-pass cross-sectional regression (CSR) methodology has become the most popular tool for estimating and testing beta asset pricing models. In this paper, we focus on the case in which simple regression betas are used as regressors in the second-pass CSR. Under general distributional assumptions, we derive asymptotic standard errors of the risk premia estimates that are robust to model misspecification. When testing whe...
The empirical likelihood goodness-of-fit test for regression model
Institute of Scientific and Technical Information of China (English)
Li-xing ZHU; Yong-song QIN; Wang-li XU
2007-01-01
Goodness-of-fit test for regression modes has received much attention in literature. In this paper, empirical likelihood (EL) goodness-of-fit tests for regression models including classical parametric and autoregressive (AR) time series models are proposed. Unlike the existing locally smoothing and globally smoothing methodologies, the new method has the advantage that the tests are self-scale invariant and that the asymptotic null distribution is chi-squared. Simulations are carried out to illustrate the methodology.
A Financial Distress Pre-Warning Study by Fuzzy Regression Model of TSE-Listed Companies
Wen-Ying Cheng; Ender Su; Sheng-Jung Li
2006-01-01
The purpose of this paper is to construct a financial distress pre-warning model for investors and risk supervisors. Through the Securities and Futures Institute Network, we collect the financial data of the electronic companies listing on the Taiwan Security Exchange (TSE) from 1998 to 2005. By binary logistic regression test, we found that financial statement ratios show significant difference in different financial stages. On the other hand, using fuzzy regression model, we construct a rat...
Comparison of land-use regression models between Great Britain and the Netherlands.
Vienneau, D.; de Hoogh, K.; Beelen, R.M.J.; Fischer, P.; Hoek, G.; Briggs, D.
2010-01-01
Land-use regression models have increasingly been applied for air pollution mapping at typically the city level. Though models generally predict spatial variability well, the structure of models differs widely between studies. The observed differences in the models may be due to artefacts of data an
Directory of Open Access Journals (Sweden)
Shahab Karimi
2014-01-01
Full Text Available In this study, the effects of ratios of dolomite, base/acid, silica, SiO2/Al2O3, and Fe2O3/CaO, base and acid oxides, and 11 oxides (SiO2, Al2O3, CaO, MgO, MnO, Na2O, K2O, Fe2O3, TiO2, P2O5, and SO3 on ash fusion temperatures for 1040 US coal samples from 12 states were evaluated using regression and adaptive neurofuzzy inference system (ANFIS methods. Different combinations of independent variables were examined to predict ash fusion temperatures in the multivariable procedure. The combination of the “11 oxides + (Base/Acid + Silica ratio” was the best predictor. Correlation coefficients (R2 of 0.891, 0.917, and 0.94 were achieved using nonlinear equations for the prediction of initial deformation temperature (IDT, softening temperature (ST, and fluid temperature (FT, respectively. The mentioned “best predictor” was used as input to the ANFIS system as well, and the correlation coefficients (R2 of the prediction were enhanced to 0.97, 0.98, and 0.99 for IDT, ST, and FT, respectively. The prediction precision that was achieved in this work exceeded that reported in previously published works.
Adaptive Partially Hidden Markov Models
DEFF Research Database (Denmark)
Forchhammer, Søren Otto; Rasmussen, Tage
1996-01-01
Partially Hidden Markov Models (PHMM) have recently been introduced. The transition and emission probabilities are conditioned on the past. In this report, the PHMM is extended with a multiple token version. The different versions of the PHMM are applied to bi-level image coding.......Partially Hidden Markov Models (PHMM) have recently been introduced. The transition and emission probabilities are conditioned on the past. In this report, the PHMM is extended with a multiple token version. The different versions of the PHMM are applied to bi-level image coding....
An adaptive stochastic model for financial markets
International Nuclear Information System (INIS)
An adaptive stochastic model is introduced to simulate the behavior of real asset markets. The model adapts itself by changing its parameters automatically on the basis of the recent historical data. The basic idea underlying the model is that a random variable uniformly distributed within an interval with variable extremes can replicate the histograms of asset returns. These extremes are calculated according to the arrival of new market information. This adaptive model is applied to the daily returns of three well-known indices: Ibex35, Dow Jones and Nikkei, for three complete years. The model reproduces the histograms of the studied indices as well as their autocorrelation structures. It produces the same fat tails and the same power laws, with exactly the same exponents, as in the real indices. In addition, the model shows a great adaptation capability, anticipating the volatility evolution and showing the same volatility clusters observed in the assets. This approach provides a novel way to model asset markets with internal dynamics which changes quickly with time, making it impossible to define a fixed model to fit the empirical observations.
Graphical Models and Computerized Adaptive Testing.
Mislevy, Robert J.; Almond, Russell G.
This paper synthesizes ideas from the fields of graphical modeling and education testing, particularly item response theory (IRT) applied to computerized adaptive testing (CAT). Graphical modeling can offer IRT a language for describing multifaceted skills and knowledge, and disentangling evidence from complex performances. IRT-CAT can offer…
Modeling personalized head-related impulse response using support vector regression
Institute of Scientific and Technical Information of China (English)
HUANG Qing-hua; FANG Yong
2009-01-01
A new customization approach based on support vector regression (SVR) is proposed to obtain individual headrelated impulse response (HRIR) without complex measurement and special equipment. Principal component analysis (PCA) is first applied to obtain a few principal components and corresponding weight vectors correlated with individual anthropometric parameters. Then the weight vectors act as output of the nonlinear regression model. Some measured anthropometric parameters are selected as input of the model according to the correlation coefficients between the parameters and the weight vectors. After the regression model is learned from the training data, the individual HRIR can be predicted based on the measured anthropometric parameters. Compared with a back-propagation neural network (BPNN) for nonlinear regression,better generalization and prediction performance for small training samples can be obtained using the proposed PCA-SVR algorithm.
Hybrid Surface Mesh Adaptation for Climate Modeling
Institute of Scientific and Technical Information of China (English)
Ahmed Khamayseh; Valmor de Almeida; Glen Hansen
2008-01-01
Solution-driven mesh adaptation is becoming quite popular for spatial error control in the numerical simulation of complex computational physics applications, such as climate modeling. Typically, spatial adaptation is achieved by element subdivision (h adaptation) with a primary goal of resolving the local length scales of interest. A second, lesspopular method of spatial adaptivity is called "mesh motion" (r adaptation); the smooth repositioning of mesh node points aimed at resizing existing elements to capture the local length scales. This paper proposes an adaptation method based on a combination of both element subdivision and node point repositioning (rh adaptation). By combining these two methods using the notion of a mobility function, the proposed approach seeks to increase the flexibility and extensibility of mesh motion algorithms while providing a somewhat smoother transition between refined regions than is pro-duced by element subdivision alone. Further, in an attempt to support the requirements of a very general class of climate simulation applications, the proposed method is de-signed to accommodate unstructured, polygonal mesh topologies in addition to the most popular mesh types.
Using the Logistic Regression model in supporting decisions of establishing marketing strategies
Directory of Open Access Journals (Sweden)
Cristinel CONSTANTIN
2015-12-01
Full Text Available This paper is about an instrumental research regarding the using of Logistic Regression model for data analysis in marketing research. The decision makers inside different organisation need relevant information to support their decisions regarding the marketing strategies. The data provided by marketing research could be computed in various ways but the multivariate data analysis models can enhance the utility of the information. Among these models we can find the Logistic Regression model, which is used for dichotomous variables. Our research is based on explanation the utility of this model and interpretation of the resulted information in order to help practitioners and researchers to use it in their future investigations
Demenais, F M; Laing, A E; Bonney, G E
1992-01-01
Segregation analysis of discrete traits can be conducted by the classical mixed model and the recently introduced regressive models. The mixed model assumes an underlying liability to the disease, to which a major gene, a multifactorial component, and random environment contribute independently. Affected persons have a liability exceeding a threshold. The regressive logistic models assume that the logarithm of the odds of being affected is a linear function of major genotype effects, the phenotypes of older relatives, and other covariates. A formulation of the regressive models, based on an underlying liability model, has been recently proposed. The regression coefficients on antecedents are expressed in terms of the relevant familial correlations and a one-to-one correspondence with the parameters of the mixed model can thus be established. Computer simulations are conducted to evaluate the fit of the two formulations of the regressive models to the mixed model on nuclear families. The two forms of the class D regressive model provide a good fit to a generated mixed model, in terms of both hypothesis testing and parameter estimation. The simpler class A regressive model, which assumes that the outcomes of children depend solely on the outcomes of parents, is not robust against a sib-sib correlation exceeding that specified by the model, emphasizing testing class A against class D. The studies reported here show that if the true state of nature is that described by the mixed model, then a regressive model will do just as well. Moreover, the regressive models, allowing for more patterns of family dependence, provide a flexible framework to understand gene-environment interactions in complex diseases. PMID:1487139
Regression models tolerant to massively missing data: a case study in solar radiation nowcasting
Directory of Open Access Journals (Sweden)
I. Žliobaitė
2014-07-01
Full Text Available Statistical models for environmental monitoring strongly rely on automatic data acquisition systems, using various physical sensors. Often, sensor readings are missing for extended periods of time while model outputs need to be continuously available in real time. With a case study in solar radiation nowcasting, we investigate how to deal with massively missing data (around 50% of the time some data are unavailable in such situations. Our goal is to analyze the characteristics of missing data and recommend a strategy for deploying regression models, which would be robust to missing data in situations, where data are massively missing. We are after one model that performs well at all times, with and without data gaps. Due to the need to provide instantaneous outputs with minimum energy consumption for computing in the data streaming setting, we dismiss computationally demanding data imputation methods, and resort to a simple mean replacement. We use an established strategy for comparing different regression models, with the possibility of determining how many missing sensor readings can be tolerated before model outputs become obsolete. We experimentally analyze accuracies and robustness to missing data of seven linear regression models and recommend using regularized PCA regression. We recommend using our established guideline in training regression models, which themselves are robust to missing data.
Evaluation of Regression Models of Balance Calibration Data Using an Empirical Criterion
Ulbrich, Norbert; Volden, Thomas R.
2012-01-01
An empirical criterion for assessing the significance of individual terms of regression models of wind tunnel strain gage balance outputs is evaluated. The criterion is based on the percent contribution of a regression model term. It considers a term to be significant if its percent contribution exceeds the empirical threshold of 0.05%. The criterion has the advantage that it can easily be computed using the regression coefficients of the gage outputs and the load capacities of the balance. First, a definition of the empirical criterion is provided. Then, it is compared with an alternate statistical criterion that is widely used in regression analysis. Finally, calibration data sets from a variety of balances are used to illustrate the connection between the empirical and the statistical criterion. A review of these results indicated that the empirical criterion seems to be suitable for a crude assessment of the significance of a regression model term as the boundary between a significant and an insignificant term cannot be defined very well. Therefore, regression model term reduction should only be performed by using the more universally applicable statistical criterion.
Intelligent CAD Methodology Research of Adaptive Modeling
Institute of Scientific and Technical Information of China (English)
ZHANG Weibo; LI Jun; YAN Jianrong
2006-01-01
The key to carry out ICAD technology is to establish the knowledge-based and wide rang of domains-covered product model. This paper put out a knowledge-based methodology of adaptive modeling. It is under the Ontology mind, using the Object-Oriented technology and being a knowledge-based model framework. It involves the diverse domains in product design and realizes the multi-domain modeling, embedding the relative information including standards, regulars and expert experience. To test the feasibility of the methodology, the research bonds of the automotive diaphragm spring clutch design and an adaptive clutch design model is established, using the knowledge-based modeling language-AML.
A Stochastic Restricted Principal Components Regression Estimator in the Linear Model
Directory of Open Access Journals (Sweden)
Daojiang He
2014-01-01
Full Text Available We propose a new estimator to combat the multicollinearity in the linear model when there are stochastic linear restrictions on the regression coefficients. The new estimator is constructed by combining the ordinary mixed estimator (OME and the principal components regression (PCR estimator, which is called the stochastic restricted principal components (SRPC regression estimator. Necessary and sufficient conditions for the superiority of the SRPC estimator over the OME and the PCR estimator are derived in the sense of the mean squared error matrix criterion. Finally, we give a numerical example and a Monte Carlo study to illustrate the performance of the proposed estimator.
Regression analysis understanding and building business and economic models using Excel
Wilson, J Holton
2012-01-01
The technique of regression analysis is used so often in business and economics today that an understanding of its use is necessary for almost everyone engaged in the field. This book will teach you the essential elements of building and understanding regression models in a business/economic context in an intuitive manner. The authors take a non-theoretical treatment that is accessible even if you have a limited statistical background. It is specifically designed to teach the correct use of regression, while advising you of its limitations and teaching about common pitfalls. This book describe
DEFF Research Database (Denmark)
Carstensen, Bendix
1996-01-01
This paper shows how to fit excess and relative risk regression models to interval censored survival data, and how to implement the models in standard statistical software. The methods developed are used for the analysis of HIV infection rates in a cohort of Danish homosexual men.......This paper shows how to fit excess and relative risk regression models to interval censored survival data, and how to implement the models in standard statistical software. The methods developed are used for the analysis of HIV infection rates in a cohort of Danish homosexual men....
Directory of Open Access Journals (Sweden)
S. Goyal
2012-03-01
Full Text Available This paper highlights the significance of computational intelligence models for predicting shelf life of processed cheese stored at 7-8 g.C. Linear Layer and Generalized Regression models were developed with input parameters: Soluble nitrogen, pH, Standard plate count, Yeast & mould count, Spores, and sensory score as output parameter. Mean Square Error, Root Mean Square Error, Coefficient of Determination and Nash - Sutcliffo Coefficient were used in order to compare the prediction ability of the models. The study revealed that Generalized Regression computational intelligence models are quite effective in predicting the shelf life of processed cheese stored at 7-8 g.C.
The Relationship between Economic Growth and Money Laundering – a Linear Regression Model
Directory of Open Access Journals (Sweden)
Daniel Rece
2009-09-01
Full Text Available This study provides an overview of the relationship between economic growth and money laundering modeled by a least squares function. The report analyzes statistically data collected from USA, Russia, Romania and other eleven European countries, rendering a linear regression model. The study illustrates that 23.7% of the total variance in the regressand (level of money laundering is “explained” by the linear regression model. In our opinion, this model will provide critical auxiliary judgment and decision support for anti-money laundering service systems.
Reference model decomposition in direct adaptive control
Butler, H.; Honderd, G.; Amerongen, van, W.E.
1991-01-01
This paper introduces the method of reference model decomposition as a way to improve the robustness of model reference adaptive control systems (MRACs) with respect to unmodelled dynamics with a known structure. Such unmodelled dynamics occur when some of the nominal plant dynamics are purposely neglected in the controller design with the aim of keeping the controller order low. One of the effects of such undermodelling of the controller is a violation of the perfect model-matching condition...
Structured Additive Regression Models: An R Interface to BayesX
Directory of Open Access Journals (Sweden)
Nikolaus Umlauf
2015-02-01
Full Text Available Structured additive regression (STAR models provide a flexible framework for model- ing possible nonlinear effects of covariates: They contain the well established frameworks of generalized linear models and generalized additive models as special cases but also allow a wider class of effects, e.g., for geographical or spatio-temporal data, allowing for specification of complex and realistic models. BayesX is standalone software package providing software for fitting general class of STAR models. Based on a comprehensive open-source regression toolbox written in C++, BayesX uses Bayesian inference for estimating STAR models based on Markov chain Monte Carlo simulation techniques, a mixed model representation of STAR models, or stepwise regression techniques combining penalized least squares estimation with model selection. BayesX not only covers models for responses from univariate exponential families, but also models from less-standard regression situations such as models for multi-categorical responses with either ordered or unordered categories, continuous time survival data, or continuous time multi-state models. This paper presents a new fully interactive R interface to BayesX: the R package R2BayesX. With the new package, STAR models can be conveniently specified using Rs formula language (with some extended terms, fitted using the BayesX binary, represented in R with objects of suitable classes, and finally printed/summarized/plotted. This makes BayesX much more accessible to users familiar with R and adds extensive graphics capabilities for visualizing fitted STAR models. Furthermore, R2BayesX complements the already impressive capabilities for semiparametric regression in R by a comprehensive toolbox comprising in particular more complex response types and alternative inferential procedures such as simulation-based Bayesian inference.
Regression model for daily passenger volume of high-speed railway line under capacity constraint
Institute of Scientific and Technical Information of China (English)
骆泳吉; 刘军; 孙迅; 赖晴鹰
2015-01-01
A non-linear regression model is proposed to forecast the aggregated passenger volume of Beijing−Shanghai high-speed railway (HSR) line in China. Train services and temporal features of passenger volume are studied to have a prior knowledge about this high-speed railway line. Then, based on a theoretical curve that depicts the relationship among passenger demand, transportation capacity and passenger volume, a non-linear regression model is established with consideration of the effect of capacity constraint. Through experiments, it is found that the proposed model can perform better in both forecasting accuracy and stability compared with linear regression models and back-propagation neural networks. In addition to the forecasting ability, with a definite formation, the proposed model can be further used to forecast the effects of train planning policies.
Regression models based on new local strategies for near infrared spectroscopic data.
Allegrini, F; Fernández Pierna, J A; Fragoso, W D; Olivieri, A C; Baeten, V; Dardenne, P
2016-08-24
In this work, a comparative study of two novel algorithms to perform sample selection in local regression based on Partial Least Squares Regression (PLS) is presented. These methodologies were applied for Near Infrared Spectroscopy (NIRS) quantification of five major constituents in corn seeds and are compared and contrasted with global PLS calibrations. Validation results show a significant improvement in the prediction quality when local models implemented by the proposed algorithms are applied to large data bases. PMID:27496996
Sample- and segment-size specific Model Selection in Mixture Regression Analysis
Sarstedt, Marko
2006-01-01
As mixture regression models increasingly receive attention from both theory and practice, the question of selecting the correct number of segments gains urgency. A misspecification can lead to an under- or oversegmentation, thus resulting in flawed management decisions on customer targeting or product positioning. This paper presents the results of an extensive simulation study that examines the performance of commonly used information criteria in a mixture regression context with normal ...
Testing normality in bivariate probit models : a simple artificial regression based LM test
Murphy, Anthony
1994-01-01
A simple and convenient LM test of normality in the bivariate probit model is derived. The alternative hypothesis is based on a form of truncated Gram Charlier Type series. The LM test may be calculated as an artificial regression. However, the proposed artificial regression does not use the outer product gradient form. Thus it is likely to perform reasonably well in small samples. non-peer-reviewed
Testing and Modeling Fuel Regression Rate in a Miniature Hybrid Burner
Luciano Fanton; Christian Paravan; Luigi T. De Luca
2012-01-01
Ballistic characterization of an extended group of innovative HTPB-based solid fuel formulations for hybrid rocket propulsion was performed in a lab-scale burner. An optical time-resolved technique was used to assess the quasisteady regression history of single perforation, cylindrical samples. The effects of metalized additives and radiant heat transfer on the regression rate of such formulations were assessed. Under the investigated operating conditions and based on phenomenological models ...
A brief introduction to regression designs and mixed-effects modelling by a recent convert
Balling, Laura Winther
2008-01-01
This article discusses the advantages of multiple regression designs over the factorial designs traditionally used in many psycholinguistic experiments. It is shown that regression designs are typically more informative, statistically more powerful and better suited to the analysis of naturalistic tasks. The advantages of including both fixed and random effects are demonstrated with reference to linear mixed-effects models, and problems of collinearity, variable distribution and variable sele...
Hybrid adaptive control of a dragonfly model
Couceiro, Micael S.; Ferreira, Nuno M. F.; Machado, J. A. Tenreiro
2012-02-01
Dragonflies show unique and superior flight performances than most of other insect species and birds. They are equipped with two pairs of independently controlled wings granting an unmatchable flying performance and robustness. In this paper, it is presented an adaptive scheme controlling a nonlinear model inspired in a dragonfly-like robot. It is proposed a hybrid adaptive ( HA) law for adjusting the parameters analyzing the tracking error. At the current stage of the project it is considered essential the development of computational simulation models based in the dynamics to test whether strategies or algorithms of control, parts of the system (such as different wing configurations, tail) as well as the complete system. The performance analysis proves the superiority of the HA law over the direct adaptive ( DA) method in terms of faster and improved tracking and parameter convergence.
Kahane, Leo H
2007-01-01
Using a friendly, nontechnical approach, the Second Edition of Regression Basics introduces readers to the fundamentals of regression. Accessible to anyone with an introductory statistics background, this book builds from a simple two-variable model to a model of greater complexity. Author Leo H. Kahane weaves four engaging examples throughout the text to illustrate not only the techniques of regression but also how this empirical tool can be applied in creative ways to consider a broad array of topics. New to the Second Edition Offers greater coverage of simple panel-data estimation:
Kamaruddin, Ainur Amira; Ali, Zalila; Noor, Norlida Mohd.; Baharum, Adam; Ahmad, Wan Muhamad Amir W.
2014-07-01
Logistic regression analysis examines the influence of various factors on a dichotomous outcome by estimating the probability of the event's occurrence. Logistic regression, also called a logit model, is a statistical procedure used to model dichotomous outcomes. In the logit model the log odds of the dichotomous outcome is modeled as a linear combination of the predictor variables. The log odds ratio in logistic regression provides a description of the probabilistic relationship of the variables and the outcome. In conducting logistic regression, selection procedures are used in selecting important predictor variables, diagnostics are used to check that assumptions are valid which include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers and a test statistic is calculated to determine the aptness of the model. This study used the binary logistic regression model to investigate overweight and obesity among rural secondary school students on the basis of their demographics profile, medical history, diet and lifestyle. The results indicate that overweight and obesity of students are influenced by obesity in family and the interaction between a student's ethnicity and routine meals intake. The odds of a student being overweight and obese are higher for a student having a family history of obesity and for a non-Malay student who frequently takes routine meals as compared to a Malay student.
Meuwissen Theo HE; Ødegård Jørgen; Lillehammer Marie
2009-01-01
Abstract The combination of a sire model and a random regression term describing genotype by environment interactions may lead to biased estimates of genetic variance components because of heterogeneous residual variance. In order to test different models, simulated data with genotype by environment interactions, and dairy cattle data assumed to contain such interactions, were analyzed. Two animal models were compared to four sire models. Models differed in their ability to handle heterogeneo...
Use of posterior predictive assessments to evaluate model fit in multilevel logistic regression
Green, Martin J.; Medley, Graham F; Browne, William J.
2009-01-01
Assessing the fit of a model is an important final step in any statistical analysis, but this is not straightforward when complex discrete response models are used. Cross validation and posterior predictions have been suggested as methods to aid model criticism. In this paper a comparison is made between four methods of model predictive assessment in the context of a three level logistic regression model for clinical mastitis in dairy cattle; cross validation, a prediction using the full post...
Garcia Nieto, P J; Sánchez Lasheras, F; de Cos Juez, F J; Alonso Fernández, J R
2011-11-15
There is an increasing need to describe cyanobacteria blooms since some cyanobacteria produce toxins, termed cyanotoxins. These latter can be toxic and dangerous to humans as well as other animals and life in general. It must be remarked that the cyanobacteria are reproduced explosively under certain conditions. This results in algae blooms, which can become harmful to other species if the cyanobacteria involved produce cyanotoxins. In this research work, the evolution of cyanotoxins in Trasona reservoir (Principality of Asturias, Northern Spain) was studied with success using the data mining methodology based on multivariate adaptive regression splines (MARS) technique. The results of the present study are two-fold. On one hand, the importance of the different kind of cyanobacteria over the presence of cyanotoxins in the reservoir is presented through the MARS model and on the other hand a predictive model able to forecast the possible presence of cyanotoxins in a short term was obtained. The agreement of the MARS model with experimental data confirmed the good performance of the same one. Finally, conclusions of this innovative research are exposed. PMID:21920665
Pradhan, B.; Buchroithner, M. F.; Mansor, S.
2009-04-01
This paper presents the assessment results of spatially based probabilistic three models using Geoinformation Techniques (GIT) for landslide susceptibility analysis at Penang Island in Malaysia. Landslide locations within the study areas were identified by interpreting aerial photographs, satellite images and supported with field surveys. Maps of the topography, soil type, lineaments and land cover were constructed from the spatial data sets. There are nine landslide related factors were extracted from the spatial database and the neural network, frequency ratio and logistic regression coefficients of each factor was computed. Landslide susceptibility maps were drawn for study area using neural network, frequency ratios and logistic regression models. For verification, the results of the analyses were compared with actual landslide locations in study area. The verification results show that frequency ratio model provides higher prediction accuracy than the ANN and regression models.
Semantic models for adaptive interactive systems
Hussein, Tim; Lukosch, Stephan; Ziegler, Jürgen; Calvary, Gaëlle
2013-01-01
Providing insights into methodologies for designing adaptive systems based on semantic data, and introducing semantic models that can be used for building interactive systems, this book showcases many of the applications made possible by the use of semantic models.Ontologies may enhance the functional coverage of an interactive system as well as its visualization and interaction capabilities in various ways. Semantic models can also contribute to bridging gaps; for example, between user models, context-aware interfaces, and model-driven UI generation. There is considerable potential for using
Modelling and (adaptive) control of greenhouse climates
Udink ten Cate, A.J.
1983-01-01
The material presented in this thesis can be grouped around four themes, system concepts, modeling, control and adaptive control. In this summary these themes will be treated separately.System conceptsIn Chapters 1 and 2 an overview of the problem formulation is presented. It is suggested that there
Efficient Quantile Estimation for Functional-Coefficient Partially Linear Regression Models
Institute of Scientific and Technical Information of China (English)
Zhangong ZHOU; Rong JIANG; Weimin QIAN
2011-01-01
The quantile estimation methods are proposed for functional-coefficient partially linear regression (FCPLR) model by combining nonparametric and functional-coefficient regression (FCR) model.The local linear scheme and the integrated method are used to obtain local quantile estimators of all unknown functions in the FCPLR model.These resulting estimators are asymptotically normal,but each of them has big variance.To reduce variances of these quantile estimators,the one-step backfitting technique is used to obtain the efficient quantile estimators of all unknown functions,and their asymptotic normalities are derived.Two simulated examples are carried out to illustrate the proposed estimation methodology.
Energy Technology Data Exchange (ETDEWEB)
Akkaya, Ali Volkan [Department of Mechanical Engineering, Yildiz Technical University, 34349 Besiktas, Istanbul (Turkey)
2009-02-15
In this paper, multiple nonlinear regression models for estimation of higher heating value of coals are developed using proximate analysis data obtained generally from the low rank coal samples as-received basis. In this modeling study, three main model structures depended on the number of proximate analysis parameters, which are named the independent variables, such as moisture, ash, volatile matter and fixed carbon, are firstly categorized. Secondly, sub-model structures with different arrangements of the independent variables are considered. Each sub-model structure is analyzed with a number of model equations in order to find the best fitting model using multiple nonlinear regression method. Based on the results of nonlinear regression analysis, the best model for each sub-structure is determined. Among them, the models giving highest correlation for three main structures are selected. Although the selected all three models predicts HHV rather accurately, the model involving four independent variables provides the most accurate estimation of HHV. Additionally, when the chosen model with four independent variables and a literature model are tested with extra proximate analysis data, it is seen that that the developed model in this study can give more accurate prediction of HHV of coals. It can be concluded that the developed model is effective tool for HHV estimation of low rank coals. (author)
Adaptive Modeling for Security Infrastructure Fault Response
Institute of Scientific and Technical Information of China (English)
CUI Zhong-jie; YAO Shu-ping; HU Chang-zhen
2008-01-01
Based on the analysis of inherent limitations in existing security response decision-making systems, a dynamic adaptive model of fault response is presented. Several security fault levels were founded, which comprise the basic level, equipment level and mechanism level. Fault damage cost is calculated using the analytic hierarchy process. Meanwhile, the model evaluates the impact of different responses upon fault repair and normal operation. Response operation cost and response negative cost are introduced through quantitative calculation. This model adopts a comprehensive response decision of security fault in three principles-the maximum and minimum principle, timeliness principle, acquiescence principle, which assure optimal response countermeasure is selected for different situations. Experimental results show that the proposed model has good self-adaptation ability, timeliness and cost-sensitiveness.
Testing and Modeling Fuel Regression Rate in a Miniature Hybrid Burner
Directory of Open Access Journals (Sweden)
Luciano Fanton
2012-01-01
Full Text Available Ballistic characterization of an extended group of innovative HTPB-based solid fuel formulations for hybrid rocket propulsion was performed in a lab-scale burner. An optical time-resolved technique was used to assess the quasisteady regression history of single perforation, cylindrical samples. The effects of metalized additives and radiant heat transfer on the regression rate of such formulations were assessed. Under the investigated operating conditions and based on phenomenological models from the literature, analyses of the collected experimental data show an appreciable influence of the radiant heat flux from burnt gases and soot for both unloaded and loaded fuel formulations. Pure HTPB regression rate data are satisfactorily reproduced, while the impressive initial regression rates of metalized formulations require further assessment.
A Robbins-Monro procedure for estimation in semiparametric regression models
Bercu, Bernard
2011-01-01
This paper is devoted to the parametric estimation of a shift together with the nonparametric estimation of a regression function in a semiparametric regression model. We implement a Robbins-Monro procedure very efficient and easy to handle. On the one hand, we propose a stochastic algorithm similar to that of Robbins-Monro in order to estimate the shift parameter. A preliminary evaluation of the regression function is not necessary for estimating the shift parameter. On the other hand, we make use of a recursive Nadaraya-Watson estimator for the estimation of the regression function. This kernel estimator takes in account the previous estimation of the shift parameter. We establish the almost sure convergence for both Robbins-Monro and Nadaraya-Watson estimators. The asymptotic normality of our estimates is also provided.
Kleijnen, J.P.C.
1995-01-01
This tutorial discusses what-if analysis and optimization of System Dynamics models. These problems are solved, using the statistical techniques of regression analysis and design of experiments (DOE). These issues are illustrated by applying the statistical techniques to a System Dynamics model for
Application of Fuzzy Regression Model to the Prediction of Field Mouse Occurrence Rate
Institute of Scientific and Technical Information of China (English)
XU Fei
2009-01-01
Expressions were given to describe the closeness between the estimated value and observed value for two asymmetric exponential fuzzy numbers. Based on that, the model was given to solve the question of fuzzy multivariable regression with fuzzy input, fuzzy output and crisp coefficients. Finally, with this model, the prediction of field mouse occurrence rate had been done and the satisfied result was obtained.
de Vries, SO; Fidler, [No Value; Kuipers, WD; Hunink, MGM
1998-01-01
The purpose of this study was to develop a model that predicts the outcome of supervised exercise for intermittent claudication. The authors present an example of the use of autoregressive logistic regression for modeling observed longitudinal data. Data were collected from 329 participants in a six
de Vries, S.O.; Fidler, V.; Kuipers, W.D.; Hunink, M.G.
1998-01-01
The purpose of this study was to develop a model that predicts the outcome of supervised exercise for intermittent claudication. The authors present an example of the use of autoregressive logistic regression for modeling observed longitudinal data. Data were collected from 329 participants in a six
On pseudo-values for regression analysis in competing risks models
DEFF Research Database (Denmark)
Gerds, Thomas Alexander; Graw, F; Schumacher, M
2009-01-01
For regression on state and transition probabilities in multi-state models Andersen et al. (Biometrika 90:15-27, 2003) propose a technique based on jackknife pseudo-values. In this article we analyze the pseudo-values suggested for competing risks models and prove some conjectures regarding...
Random regression models in the evaluation of the growth curve of Simbrasil beef cattle
Mota, M.; Marques, F.A.; Lopes, P.S.; Hidalgo, A.M.
2013-01-01
Random regression models were used to estimate the types and orders of random effects of (co)variance functions in the description of the growth trajectory of the Simbrasil cattle breed. Records for 7049 animals totaling 18,677 individual weighings were submitted to 15 models from the third to the f
Adaptive Cruise Control and Driver Modeling
Bengtsson, Johan
2001-01-01
Many vehicle manufacturers have lately introduced advance driver support in some of their automobiles. One of those new features is Adaptive Cruise Control DACCE, which extends the conventional cruise control system to control of relative speed and distance to other vehicles. In order to design an ACC controller it is suitable to have a model of driver behavior. The approach in the thesis is to use system identification methodology to obtain dynamic models of driver behavior useful for ACC ap...
Directory of Open Access Journals (Sweden)
Makoto Suzuki
Full Text Available Cognitive disorders in the acute stage of stroke are common and are important independent predictors of adverse outcome in the long term. Despite the impact of cognitive disorders on both patients and their families, it is still difficult to predict the extent or duration of cognitive impairments. The objective of the present study was, therefore, to provide data on predicting the recovery of cognitive function soon after stroke by differential modeling with logarithmic and linear regression. This study included two rounds of data collection comprising 57 stroke patients enrolled in the first round for the purpose of identifying the time course of cognitive recovery in the early-phase group data, and 43 stroke patients in the second round for the purpose of ensuring that the correlation of the early-phase group data applied to the prediction of each individual's degree of cognitive recovery. In the first round, Mini-Mental State Examination (MMSE scores were assessed 3 times during hospitalization, and the scores were regressed on the logarithm and linear of time. In the second round, calculations of MMSE scores were made for the first two scoring times after admission to tailor the structures of logarithmic and linear regression formulae to fit an individual's degree of functional recovery. The time course of early-phase recovery for cognitive functions resembled both logarithmic and linear functions. However, MMSE scores sampled at two baseline points based on logarithmic regression modeling could estimate prediction of cognitive recovery more accurately than could linear regression modeling (logarithmic modeling, R(2 = 0.676, P<0.0001; linear regression modeling, R(2 = 0.598, P<0.0001. Logarithmic modeling based on MMSE scores could accurately predict the recovery of cognitive function soon after the occurrence of stroke. This logarithmic modeling with mathematical procedures is simple enough to be adopted in daily clinical practice.
An Auditory Model of Improved Adaptive ZCPA
Directory of Open Access Journals (Sweden)
Jinping Zhang
2013-07-01
Full Text Available An improved ZCAP auditory model with adaptability is proposed in this paper, and the adaptive method designed for ZCPA model is suitable for other auditory model with inner-hair-cell sub-model. The first step in the implement process of the proposed ZCPA model is to carry out the calculation of inner product between signal and complex Gammatone filters to obtain important frequency components of signal. And then, according to the result of the first step, the parameters of the basilar membrane sub-model and frequency box are automatically adjusted, such as the number of the basilar membrane filters, center frequency and bandwith of each basilar membrane filter, position of each frequency box, and so on. Lastly an auditory model is built, and the final output is auditory spectrum.The results of numerical simulation and experiments have showed that the proposed model could realize accurate frequency selection, and the auditory spectrum is more distinctly than that of conventional ZCPA model. Moreover, the proposed model can completely avoided the influence of the number of filter on the shape of auditory spectrum existing in conventional ZCPA model so that the shape of auditory spectrum is steady, and the data quantity is small.
Longitudinal beta regression models for analyzing health-related quality of life scores over time
Directory of Open Access Journals (Sweden)
Hunger Matthias
2012-09-01
Full Text Available Abstract Background Health-related quality of life (HRQL has become an increasingly important outcome parameter in clinical trials and epidemiological research. HRQL scores are typically bounded at both ends of the scale and often highly skewed. Several regression techniques have been proposed to model such data in cross-sectional studies, however, methods applicable in longitudinal research are less well researched. This study examined the use of beta regression models for analyzing longitudinal HRQL data using two empirical examples with distributional features typically encountered in practice. Methods We used SF-6D utility data from a German older age cohort study and stroke-specific HRQL data from a randomized controlled trial. We described the conceptual differences between mixed and marginal beta regression models and compared both models to the commonly used linear mixed model in terms of overall fit and predictive accuracy. Results At any measurement time, the beta distribution fitted the SF-6D utility data and stroke-specific HRQL data better than the normal distribution. The mixed beta model showed better likelihood-based fit statistics than the linear mixed model and respected the boundedness of the outcome variable. However, it tended to underestimate the true mean at the upper part of the distribution. Adjusted group means from marginal beta model and linear mixed model were nearly identical but differences could be observed with respect to standard errors. Conclusions Understanding the conceptual differences between mixed and marginal beta regression models is important for their proper use in the analysis of longitudinal HRQL data. Beta regression fits the typical distribution of HRQL data better than linear mixed models, however, if focus is on estimating group mean scores rather than making individual predictions, the two methods might not differ substantially.
Time-varying parameter auto-regressive models for autocovariance nonstationary time series
Institute of Scientific and Technical Information of China (English)
FEI WanChun; BAI Lun
2009-01-01
In this paper,autocovariance nonstationary time series is clearly defined on a family of time series.We propose three types of TVPAR (time-varying parameter auto-regressive) models:the full order TVPAR model,the time-unvarying order TVPAR model and the time-varying order TVPAR model for autocovariance nonstationary time series.Related minimum AIC (Akaike information criterion) estimations are carried out.
Time-varying parameter auto-regressive models for autocovariance nonstationary time series
Institute of Scientific and Technical Information of China (English)
无
2009-01-01
In this paper, autocovariance nonstationary time series is clearly defined on a family of time series. We propose three types of TVPAR (time-varying parameter auto-regressive) models: the full order TVPAR model, the time-unvarying order TVPAR model and the time-varying order TV-PAR model for autocovariance nonstationary time series. Related minimum AIC (Akaike information criterion) estimations are carried out.
Effects of model sensitivity and nonlinearity on nonlinear regression of ground water flow
Yager, R.M.
2004-01-01
Nonlinear regression is increasingly applied to the calibration of hydrologic models through the use of perturbation methods to compute the Jacobian or sensitivity matrix required by the Gauss-Newton optimization method. Sensitivities obtained by perturbation methods can be less accurate than those obtained by direct differentiation, however, and concern has arisen that the optimal parameter values and the associated parameter covariance matrix computed by perturbation could also be less accurate. Sensitivities computed by both perturbation and direct differentiation were applied in nonlinear regression calibration of seven ground water flow models. The two methods gave virtually identical optimum parameter values and covariances for the three models that were relatively linear and two of the models that were relatively nonlinear, but gave widely differing results for two other nonlinear models. The perturbation method performed better than direct differentiation in some regressions with the nonlinear models, apparently because approximate sensitivities computed for an interval yielded better search directions than did more accurately computed sensitivities for a point. The method selected to avoid overshooting minima on the error surface when updating parameter values with the Gauss-Newton procedure appears for nonlinear models to be more important than the method of sensitivity calculation in controlling regression convergence.
Grégoire, G.
2014-12-01
The logistic regression originally is intended to explain the relationship between the probability of an event and a set of covariables. The model's coefficients can be interpreted via the odds and odds ratio, which are presented in introduction of the chapter. The observations are possibly got individually, then we speak of binary logistic regression. When they are grouped, the logistic regression is said binomial. In our presentation we mainly focus on the binary case. For statistical inference the main tool is the maximum likelihood methodology: we present the Wald, Rao and likelihoods ratio results and their use to compare nested models. The problems we intend to deal with are essentially the same as in multiple linear regression: testing global effect, individual effect, selection of variables to build a model, measure of the fitness of the model, prediction of new values… . The methods are demonstrated on data sets using R. Finally we briefly consider the binomial case and the situation where we are interested in several events, that is the polytomous (multinomial) logistic regression and the particular case of ordinal logistic regression.
2013-01-01
Background Integrase inhibitors (INI) form a new drug class in the treatment of HIV-1 patients. We developed a linear regression modeling approach to make a quantitative raltegravir (RAL) resistance phenotype prediction, as Fold Change in IC50 against a wild type virus, from mutations in the integrase genotype. Methods We developed a clonal genotype-phenotype database with 991 clones from 153 clinical isolates of INI naïve and RAL treated patients, and 28 site-directed mutants. We did the development of the RAL linear regression model in two stages, employing a genetic algorithm (GA) to select integrase mutations by consensus. First, we ran multiple GAs to generate first order linear regression models (GA models) that were stochastically optimized to reach a goal R2 accuracy, and consisted of a fixed-length subset of integrase mutations to estimate INI resistance. Secondly, we derived a consensus linear regression model in a forward stepwise regression procedure, considering integrase mutations or mutation pairs by descending prevalence in the GA models. Results The most frequently occurring mutations in the GA models were 92Q, 97A, 143R and 155H (all 100%), 143G (90%), 148H/R (89%), 148K (88%), 151I (81%), 121Y (75%), 143C (72%), and 74M (69%). The RAL second order model contained 30 single mutations and five mutation pairs (p INI naïve patients. Conclusions We describe a systematic approach to derive a model for predicting INI resistance from a limited amount of clonal samples. Our RAL second order model is made available as an Additional file for calculating a resistance phenotype as the sum of integrase mutations and mutation pairs. PMID:23282253
Linear regression models of floor surface parameters on friction between Neolite and quarry tiles.
Chang, Wen-Ruey; Matz, Simon; Grönqvist, Raoul; Hirvonen, Mikko
2010-01-01
For slips and falls, friction is widely used as an indicator of surface slipperiness. Surface parameters, including surface roughness and waviness, were shown to influence friction by correlating individual surface parameters with the measured friction. A collective input from multiple surface parameters as a predictor of friction, however, could provide a broader perspective on the contributions from all the surface parameters evaluated. The objective of this study was to develop regression models between the surface parameters and measured friction. The dynamic friction was measured using three different mixtures of glycerol and water as contaminants. Various surface roughness and waviness parameters were measured using three different cut-off lengths. The regression models indicate that the selected surface parameters can predict the measured friction coefficient reliably in most of the glycerol concentrations and cut-off lengths evaluated. The results of the regression models were, in general, consistent with those obtained from the correlation between individual surface parameters and the measured friction in eight out of nine conditions evaluated in this experiment. A hierarchical regression model was further developed to evaluate the cumulative contributions of the surface parameters in the final iteration by adding these parameters to the regression model one at a time from the easiest to measure to the most difficult to measure and evaluating their impacts on the adjusted R(2) values. For practical purposes, the surface parameter R(a) alone would account for the majority of the measured friction even if it did not reach a statistically significant level in some of the regression models.
Adapting virtual camera behaviour through player modelling
DEFF Research Database (Denmark)
Burelli, Paolo; Yannakakis, Georgios N.
2015-01-01
the viewpoint movements to the player type and her game-play style. Ultimately, the methodology is applied to a 3D platform game and is evaluated through a controlled experiment; the results suggest that the resulting adaptive cinematographic experience is favoured by some player types and it can generate......Research in virtual camera control has focused primarily on finding methods to allow designers to place cameras effectively and efficiently in dynamic and unpredictable environments, and to generate complex and dynamic plans for cinematography in virtual environments. In this article, we propose...... a novel approach to virtual camera control, which builds upon camera control and player modelling to provide the user with an adaptive point-of-view. To achieve this goal, we propose a methodology to model the player’s preferences on virtual camera movements and we employ the resulting models to tailor...
Adaptive Behaviour Assessment System: Indigenous Australian Adaptation Model (ABAS: IAAM)
du Plessis, Santie
2015-01-01
The study objectives were to develop, trial and evaluate a cross-cultural adaptation of the Adaptive Behavior Assessment System-Second Edition Teacher Form (ABAS-II TF) ages 5-21 for use with Indigenous Australian students ages 5-14. This study introduced a multiphase mixed-method design with semi-structured and informal interviews, school…
Blind identification of threshold auto-regressive model for machine fault diagnosis
Institute of Scientific and Technical Information of China (English)
LI Zhinong; HE Yongyong; CHU Fulei; WU Zhaotong
2007-01-01
A blind identification method was developed for the threshold auto-regressive (TAR) model. The method had good identification accuracy and rapid convergence, especially for higher order systems. The proposed method was then combined with the hidden Markov model (HMM) to determine the auto-regressive (AR) coefficients for each interval used for feature extraction, with the HMM as a classifier. The fault diagnoses during the speed-up and speed- down processes for rotating machinery have been success- fully completed. The result of the experiment shows that the proposed method is practical and effective.
Some New Methods for the Comparison of Two Linear Regression Models
Liu, Wei; Jamshidian, Mortaza; Zhang, Ying; Bretz, Frank; Han, Xiaoliang
2006-01-01
The frequently used approach to the comparison of two linear regression models is to use the partial F test. It is pointed out in this paper that the partial F test has in fact a naturally associated two-sided simultaneous confidence band, which is much more informative than the test itself. But this confidence band is over the entire range of all the covariates. As regression models are true or of interest often only over a restricted region of the covariates, the part of this confidence ban...
Methods and applications of linear models regression and the analysis of variance
Hocking, Ronald R
2013-01-01
Praise for the Second Edition"An essential desktop reference book . . . it should definitely be on your bookshelf." -Technometrics A thoroughly updated book, Methods and Applications of Linear Models: Regression and the Analysis of Variance, Third Edition features innovative approaches to understanding and working with models and theory of linear regression. The Third Edition provides readers with the necessary theoretical concepts, which are presented using intuitive ideas rather than complicated proofs, to describe the inference that is appropriate for the methods being discussed. The book
Energy Technology Data Exchange (ETDEWEB)
Schmid, Maximilian P.; Fidarova, Elena [Dept. of Radiotherapy, Comprehensive Cancer Center, Medical Univ. of Vienna, Vienna (Austria)], e-mail: maximilian.schmid@akhwien.at; Poetter, Richard [Dept. of Radiotherapy, Comprehensive Cancer Center, Medical Univ. of Vienna, Vienna (Austria); Christian Doppler Lab. for Medical Radiation Research for Radiation Oncology, Medical Univ. of Vienna (Austria)] [and others
2013-10-15
Purpose: To investigate the impact of magnetic resonance imaging (MRI)-morphologic differences in parametrial infiltration on tumour response during primary radio chemotherapy in cervical cancer. Material and methods: Eighty-five consecutive cervical cancer patients with FIGO stages IIB (n = 59) and IIIB (n = 26), treated by external beam radiotherapy ({+-}chemotherapy) and image-guided adaptive brachytherapy, underwent T2-weighted MRI at the time of diagnosis and at the time of brachytherapy. MRI patterns of parametrial tumour infiltration at the time of diagnosis were assessed with regard to predominant morphology and maximum extent of parametrial tumour infiltration and were stratified into five tumour groups (TG): 1) expansive with spiculae; 2) expansive with spiculae and infiltrating parts; 3) infiltrative into the inner third of the parametrial space (PM); 4) infiltrative into the middle third of the PM; and 5) infiltrative into the outer third of the PM. MRI at the time of brachytherapy was used for identifying presence (residual vs. no residual disease) and signal intensity (high vs. intermediate) of residual disease within the PM. Left and right PM of each patient were evaluated separately at both time points. The impact of the TG on tumour remission status within the PM was analysed using {chi}2-test and logistic regression analysis. Results: In total, 170 PM were analysed. The TG 1, 2, 3, 4, 5 were present in 12%, 11%, 35%, 25% and 12% of the cases, respectively. Five percent of the PM were tumour-free. Residual tumour in the PM was identified in 19%, 68%, 88%, 90% and 85% of the PM for the TG 1, 2, 3, 4, and 5, respectively. The TG 3 - 5 had significantly higher rates of residual tumour in the PM in comparison to TG 1 + 2 (88% vs. 43%, p < 0.01). Conclusion: MRI-morphologic features of PM infiltration appear to allow for prediction of tumour response during external beam radiotherapy and chemotherapy. A predominantly infiltrative tumour spread at the
Bayesian Bandwidth Selection for a Nonparametric Regression Model with Mixed Types of Regressors
Directory of Open Access Journals (Sweden)
Xibin Zhang
2016-04-01
Full Text Available This paper develops a sampling algorithm for bandwidth estimation in a nonparametric regression model with continuous and discrete regressors under an unknown error density. The error density is approximated by the kernel density estimator of the unobserved errors, while the regression function is estimated using the Nadaraya-Watson estimator admitting continuous and discrete regressors. We derive an approximate likelihood and posterior for bandwidth parameters, followed by a sampling algorithm. Simulation results show that the proposed approach typically leads to better accuracy of the resulting estimates than cross-validation, particularly for smaller sample sizes. This bandwidth estimation approach is applied to nonparametric regression model of the Australian All Ordinaries returns and the kernel density estimation of gross domestic product (GDP growth rates among the organisation for economic co-operation and development (OECD and non-OECD countries.
Su, Liyun; Zhao, Yanyong; Yan, Tianshun; Li, Fenglan
2012-01-01
Multivariate local polynomial fitting is applied to the multivariate linear heteroscedastic regression model. Firstly, the local polynomial fitting is applied to estimate heteroscedastic function, then the coefficients of regression model are obtained by using generalized least squares method. One noteworthy feature of our approach is that we avoid the testing for heteroscedasticity by improving the traditional two-stage method. Due to non-parametric technique of local polynomial estimation, it is unnecessary to know the form of heteroscedastic function. Therefore, we can improve the estimation precision, when the heteroscedastic function is unknown. Furthermore, we verify that the regression coefficients is asymptotic normal based on numerical simulations and normal Q-Q plots of residuals. Finally, the simulation results and the local polynomial estimation of real data indicate that our approach is surely effective in finite-sample situations.
Stahel-Donoho kernel estimation for fixed design nonparametric regression models
Institute of Scientific and Technical Information of China (English)
LIN; Lu
2006-01-01
This paper reports a robust kernel estimation for fixed design nonparametric regression models.A Stahel-Donoho kernel estimation is introduced,in which the weight functions depend on both the depths of data and the distances between the design points and the estimation points.Based on a local approximation,a computational technique is given to approximate to the incomputable depths of the errors.As a result the new estimator is computationally efficient.The proposed estimator attains a high breakdown point and has perfect asymptotic behaviors such as the asymptotic normality and convergence in the mean squared error.Unlike the depth-weighted estimator for parametric regression models,this depth-weighted nonparametric estimator has a simple variance structure and then we can compare its efficiency with the original one.Some simulations show that the new method can smooth the regression estimation and achieve some desirable balances between robustness and efficiency.
Accounting for spatial effects in land use regression for urban air pollution modeling.
Bertazzon, Stefania; Johnson, Markey; Eccles, Kristin; Kaplan, Gilaad G
2015-01-01
In order to accurately assess air pollution risks, health studies require spatially resolved pollution concentrations. Land-use regression (LUR) models estimate ambient concentrations at a fine spatial scale. However, spatial effects such as spatial non-stationarity and spatial autocorrelation can reduce the accuracy of LUR estimates by increasing regression errors and uncertainty; and statistical methods for resolving these effects--e.g., spatially autoregressive (SAR) and geographically weighted regression (GWR) models--may be difficult to apply simultaneously. We used an alternate approach to address spatial non-stationarity and spatial autocorrelation in LUR models for nitrogen dioxide. Traditional models were re-specified to include a variable capturing wind speed and direction, and re-fit as GWR models. Mean R(2) values for the resulting GWR-wind models (summer: 0.86, winter: 0.73) showed a 10-20% improvement over traditional LUR models. GWR-wind models effectively addressed both spatial effects and produced meaningful predictive models. These results suggest a useful method for improving spatially explicit models.
Accounting for spatial effects in land use regression for urban air pollution modeling.
Bertazzon, Stefania; Johnson, Markey; Eccles, Kristin; Kaplan, Gilaad G
2015-01-01
In order to accurately assess air pollution risks, health studies require spatially resolved pollution concentrations. Land-use regression (LUR) models estimate ambient concentrations at a fine spatial scale. However, spatial effects such as spatial non-stationarity and spatial autocorrelation can reduce the accuracy of LUR estimates by increasing regression errors and uncertainty; and statistical methods for resolving these effects--e.g., spatially autoregressive (SAR) and geographically weighted regression (GWR) models--may be difficult to apply simultaneously. We used an alternate approach to address spatial non-stationarity and spatial autocorrelation in LUR models for nitrogen dioxide. Traditional models were re-specified to include a variable capturing wind speed and direction, and re-fit as GWR models. Mean R(2) values for the resulting GWR-wind models (summer: 0.86, winter: 0.73) showed a 10-20% improvement over traditional LUR models. GWR-wind models effectively addressed both spatial effects and produced meaningful predictive models. These results suggest a useful method for improving spatially explicit models. PMID:26530819
International Nuclear Information System (INIS)
The absorbed dose for equivalent soft tissue is determined,it is imparted by ophthalmologic applicators, (90 Sr/90 Y, 1850 MBq) using an extrapolation chamber of variable electrodes; when estimating the slope of the extrapolation curve using a simple lineal regression model is observed that the dose values are underestimated from 17.7 percent up to a 20.4 percent in relation to the estimate of this dose by means of a regression model polynomial two grade, at the same time are observed an improvement in the standard error for the quadratic model until in 50%. Finally the global uncertainty of the dose is presented, taking into account the reproducibility of the experimental arrangement. As conclusion it can infers that in experimental arrangements where the source is to contact with the extrapolation chamber, it was recommended to substitute the lineal regression model by the quadratic regression model, in the determination of the slope of the extrapolation curve, for more exact and accurate measurements of the absorbed dose. (Author)
Directory of Open Access Journals (Sweden)
Wen-Cheng Wang
2014-01-01
Full Text Available It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models.
Logistic回归模型及其应用%Logistic regression model and its application
Institute of Scientific and Technical Information of China (English)
常振海; 刘薇
2012-01-01
为了利用Logistic模型提高多分类定性因变量的预测准确率,在二分类Logistic回归模型的基础上,对实际统计数据建立三类别的Logistic模型.采用似然比检验法对自变量的显著性进行检验,剔除了不显著的变量;对每个类别的因变量都确定了1个线性回归函数,并进行了模型检验.分析结果表明,在处理因变量为定性变量的回归分析中,Logistic模型具有很好的预测准确度和实用推广性.%To improve the forecasting accuracy of the multinomial qualitative dependent variable by using logistic model,ternary logistic model is established for actual statistical data based on binary logistic regression model.The significance of independent variables is tested by using the likelihood ratio test method to remove the non-significant variable.A linear regression function is determined for each category dependent variable,and the models are tested.The analysis results show that logistic regression model has good predictive accuracy and practical promotional value in handling regression analysis of qualitative dependent variable.
Pricing model performance and the two-pass cross-sectional regression methodology
Kan, Raymond; Robotti, Cesare; Shanken, Jay
2009-01-01
Since Black, Jensen, and Scholes (1972) and Fama and MacBeth (1973), the two-pass cross-sectional regression (CSR) methodology has become the most popular approach for estimating and testing asset pricing models. Statistical inference with this method is typically conducted under the assumption that the models are correctly specified, that is, expected returns are exactly linear in asset betas. This assumption can be a problem in practice since all models are, at best, approximations of reali...
Additive Hazard Regression Models: An Application to the Natural History of Human Papillomavirus
Xianhong Xie; STRICKLER, Howard D.; Xiaonan Xue
2013-01-01
There are several statistical methods for time-to-event analysis, among which is the Cox proportional hazards model that is most commonly used. However, when the absolute change in risk, instead of the risk ratio, is of primary interest or when the proportional hazard assumption for the Cox proportional hazards model is violated, an additive hazard regression model may be more appropriate. In this paper, we give an overview of this approach and then apply a semiparametric as well as a nonpara...
Selecting both latent and explanatory variables in the PLS1 regression model
Lazraq, Aziz; Cléroux, Robert; Gauchi, Jean-Pierre
2003-01-01
In this paper, two inferential procedures for selecting the significant predictors in the PLS1 regression model are introduced. The significant PLS components are first obtained and the two predictor selection methods, called PLS–Forward and PLS–Bootstrap, are applied to the PLS model obtained. They are also compared empirically to two other methods that exist in the literature with respect to the quality of fit of the model and to their predictive ability. Although none of the four methods i...
APPLICATION OF PARTIAL LEAST SQUARES REGRESSION FOR AUDIO-VISUAL SPEECH PROCESSING AND MODELING
Directory of Open Access Journals (Sweden)
A. L. Oleinik
2015-09-01
Full Text Available Subject of Research. The paper deals with the problem of lip region image reconstruction from speech signal by means of Partial Least Squares regression. Such problems arise in connection with development of audio-visual speech processing methods. Audio-visual speech consists of acoustic and visual components (called modalities. Applications of audio-visual speech processing methods include joint modeling of voice and lips’ movement dynamics, synchronization of audio and video streams, emotion recognition, liveness detection. Method. Partial Least Squares regression was applied to solve the posed problem. This method extracts components of initial data with high covariance. These components are used to build regression model. Advantage of this approach lies in the possibility of achieving two goals: identification of latent interrelations between initial data components (e.g. speech signal and lip region image and approximation of initial data component as a function of another one. Main Results. Experimental research on reconstruction of lip region images from speech signal was carried out on VidTIMIT audio-visual speech database. Results of the experiment showed that Partial Least Squares regression is capable of solving reconstruction problem. Practical Significance. Obtained findings give the possibility to assert that Partial Least Squares regression is successfully applicable for solution of vast variety of audio-visual speech processing problems: from synchronization of audio and video streams to liveness detection.
Significance tests to determine the direction of effects in linear regression models.
Wiedermann, Wolfgang; Hagmann, Michael; von Eye, Alexander
2015-02-01
Previous studies have discussed asymmetric interpretations of the Pearson correlation coefficient and have shown that higher moments can be used to decide on the direction of dependence in the bivariate linear regression setting. The current study extends this approach by illustrating that the third moment of regression residuals may also be used to derive conclusions concerning the direction of effects. Assuming non-normally distributed variables, it is shown that the distribution of residuals of the correctly specified regression model (e.g., Y is regressed on X) is more symmetric than the distribution of residuals of the competing model (i.e., X is regressed on Y). Based on this result, 4 one-sample tests are discussed which can be used to decide which variable is more likely to be the response and which one is more likely to be the explanatory variable. A fifth significance test is proposed based on the differences of skewness estimates, which leads to a more direct test of a hypothesis that is compatible with direction of dependence. A Monte Carlo simulation study was performed to examine the behaviour of the procedures under various degrees of associations, sample sizes, and distributional properties of the underlying population. An empirical example is given which illustrates the application of the tests in practice. PMID:24620829
Stigter, T. Y.; Ribeiro, L.; Dill, A. M. M. Carvalho
2008-07-01
SummaryFactorial regression models, based on correspondence analysis, are built to explain the high nitrate concentrations in groundwater beneath an agricultural area in the south of Portugal, exceeding 300 mg/l, as a function of chemical variables, electrical conductivity (EC), land use and hydrogeological setting. Two important advantages of the proposed methodology are that qualitative parameters can be involved in the regression analysis and that multicollinearity is avoided. Regression is performed on eigenvectors extracted from the data similarity matrix, the first of which clearly reveals the impact of agricultural practices and hydrogeological setting on the groundwater chemistry of the study area. Significant correlation exists between response variable NO3- and explanatory variables Ca 2+, Cl -, SO42-, depth to water, aquifer media and land use. Substituting Cl - by the EC results in the most accurate regression model for nitrate, when disregarding the four largest outliers (model A). When built solely on land use and hydrogeological setting, the regression model (model B) is less accurate but more interesting from a practical viewpoint, as it is based on easily obtainable data and can be used to predict nitrate concentrations in groundwater in other areas with similar conditions. This is particularly useful for conservative contaminants, where risk and vulnerability assessment methods, based on assumed rather than established correlations, generally produce erroneous results. Another purpose of the models can be to predict the future evolution of nitrate concentrations under influence of changes in land use or fertilization practices, which occur in compliance with policies such as the Nitrates Directive. Model B predicts a 40% decrease in nitrate concentrations in groundwater of the study area, when horticulture is replaced by other land use with much lower fertilization and irrigation rates.
Adaptive cyber-attack modeling system
Gonsalves, Paul G.; Dougherty, Edward T.
2006-05-01
The pervasiveness of software and networked information systems is evident across a broad spectrum of business and government sectors. Such reliance provides an ample opportunity not only for the nefarious exploits of lone wolf computer hackers, but for more systematic software attacks from organized entities. Much effort and focus has been placed on preventing and ameliorating network and OS attacks, a concomitant emphasis is required to address protection of mission critical software. Typical software protection technique and methodology evaluation and verification and validation (V&V) involves the use of a team of subject matter experts (SMEs) to mimic potential attackers or hackers. This manpower intensive, time-consuming, and potentially cost-prohibitive approach is not amenable to performing the necessary multiple non-subjective analyses required to support quantifying software protection levels. To facilitate the evaluation and V&V of software protection solutions, we have designed and developed a prototype adaptive cyber attack modeling system. Our approach integrates an off-line mechanism for rapid construction of Bayesian belief network (BN) attack models with an on-line model instantiation, adaptation and knowledge acquisition scheme. Off-line model construction is supported via a knowledge elicitation approach for identifying key domain requirements and a process for translating these requirements into a library of BN-based cyber-attack models. On-line attack modeling and knowledge acquisition is supported via BN evidence propagation and model parameter learning.
Adaptive approximate Bayesian computation for complex models
Lenormand, Maxime; Deffuant, Guillaume
2011-01-01
Approximate Bayesian computation (ABC) is a family of computational techniques in Bayesian statistics. These techniques allow to fit a model to data without relying on the computation of the model likelihood. They instead require to simulate a large number of times the model to be fitted. A number of refinements to the original rejection-based ABC scheme have been proposed, including the sequential improvement of posterior distributions. This technique allows to decrease the number of model simulations required, but it still presents several shortcomings which are particularly problematic for costly to simulate complex models. We here provide a new algorithm to perform adaptive approximate Bayesian computation, which is shown to perform better on both a toy example and a complex social model.
Institute of Scientific and Technical Information of China (English)
TIAN Lin-ya; HUA Xi-sheng
2007-01-01
To ensure the safety of buildings surrounding foundation pits, a study was made on a settlement monitoring and trend prediction method. A statistical testing method for analyzing the stability of a settlement monitoring datum has been discussed. According to a comprehensive survey, data of 16 stages at operating control point, were verified by a standard t test to determine the stability of the operating control point. A stationary auto-regression model, AR(p), used for the observation point settlement prediction has been investigated. Given the 16 stages of the settlement data at an observation point, the applicability of this model was analyzed. Settlement of last four stages was predicted using the stationary auto-regression model AR (1); the maximum difference between predicted and measured values was 0.6 mm,indicating good prediction results of the model. Hence, this model can be applied to settlement predictions for buildings surrounding foundation pits.
Directory of Open Access Journals (Sweden)
Lukianenko Iryna H.
2014-01-01
Full Text Available The article considers possibilities and specific features of modelling economic phenomena with the help of the category of models that unite elements of econometric regressions and artificial neural networks. This category of models contains auto-regression neural networks (AR-NN, regressions of smooth transition (STR/STAR, multi-mode regressions of smooth transition (MRSTR/MRSTAR and smooth transition regressions with neural coefficients (NCSTR/NCSTAR. Availability of the neural network component allows models of this category achievement of a high empirical authenticity, including reproduction of complex non-linear interrelations. On the other hand, the regression mechanism expands possibilities of interpretation of the obtained results. An example of multi-mode monetary rule is used to show one of the cases of specification and interpretation of this model. In particular, the article models and interprets principles of management of the UAH exchange rate that come into force when economy passes from a relatively stable into a crisis state.
DEFF Research Database (Denmark)
Petersen, Jørgen Holm
2016-01-01
This paper describes a new approach to the estimation in a logistic regression model with two crossed random effects where special interest is in estimating the variance of one of the effects while not making distributional assumptions about the other effect. A composite likelihood is studied. For...
Simple multiple regression model for long range forecasting of Indian summer monsoon rainfall
Digital Repository Service at National Institute of Oceanography (India)
Sadhuram, Y.; Murthy, T.V.R.
) and ISMR is found to be 0.62. The multiple correlation using the above two parameters is 0.85 which explains 72% variance in ISMR. Using the above two parameters a linear multiple regression model to predict ISMR is developed. The results are comparable...
DEFF Research Database (Denmark)
Larsen, Ulrik; Pierobon, Leonardo; Wronski, Jorrit;
2014-01-01
to power. In this study we propose four linear regression models to predict the maximum obtainable thermal efficiency for simple and recuperated ORCs. A previously derived methodology is able to determine the maximum thermal efficiency among many combinations of fluids and processes, given the boundary...
Sieve M-estimation for semiparametric varying-coefficient partially linear regression model
Institute of Scientific and Technical Information of China (English)
无
2010-01-01
This article considers a semiparametric varying-coefficient partially linear regression model.The semiparametric varying-coefficient partially linear regression model which is a generalization of the partially linear regression model and varying-coefficient regression model that allows one to explore the possibly nonlinear effect of a certain covariate on the response variable.A sieve M-estimation method is proposed and the asymptotic properties of the proposed estimators are discussed.Our main object is to estimate the nonparametric component and the unknown parameters simultaneously.It is easier to compute and the required computation burden is much less than the existing two-stage estimation method.Furthermore,the sieve M-estimation is robust in the presence of outliers if we choose appropriate ρ(·).Under some mild conditions,the estimators are shown to be strongly consistent;the convergence rate of the estimator for the unknown nonparametric component is obtained and the estimator for the unknown parameter is shown to be asymptotically normally distributed.Numerical experiments are carried out to investigate the performance of the proposed method.
Modeling protein tandem mass spectrometry data with an extended linear regression strategy.
Liu, Han; Bonner, Anthony J; Emili, Andrew
2004-01-01
Tandem mass spectrometry (MS/MS) has emerged as a cornerstone of proteomics owing in part to robust spectral interpretation algorithm. The intensity patterns presented in mass spectra are useful information for identification of peptides and proteins. However, widely used algorithms can not predicate the peak intensity patterns exactly. We have developed a systematic analytical approach based on a family of extended regression models, which permits routine, large scale protein expression profile modeling. By proving an important technical result that the regression coefficient vector is just the eigenvector corresponding to the least eigenvalue of a space transformed version of the original data, this extended regression problem can be reduced to a SVD decomposition problem, thus gain the robustness and efficiency. To evaluate the performance of our model, from 60,960 spectra, we chose 2,859 with high confidence, non redundant matches as training data, based on this specific problem, we derived some measurements of goodness of fit to show that our modeling method is reasonable. The issues of overfitting and underfitting are also discussed. This extended regression strategy therefore offers an effective and efficient framework for in-depth investigation of complex mammalian proteomes. PMID:17270923
Comparison of regression methods for modeling intensive care length of stay.
Directory of Open Access Journals (Sweden)
Ilona W M Verburg
Full Text Available Intensive care units (ICUs are increasingly interested in assessing and improving their performance. ICU Length of Stay (LoS could be seen as an indicator for efficiency of care. However, little consensus exists on which prognostic method should be used to adjust ICU LoS for case-mix factors. This study compared the performance of different regression models when predicting ICU LoS. We included data from 32,667 unplanned ICU admissions to ICUs participating in the Dutch National Intensive Care Evaluation (NICE in the year 2011. We predicted ICU LoS using eight regression models: ordinary least squares regression on untransformed ICU LoS,LoS truncated at 30 days and log-transformed LoS; a generalized linear model with a Gaussian distribution and a logarithmic link function; Poisson regression; negative binomial regression; Gamma regression with a logarithmic link function; and the original and recalibrated APACHE IV model, for all patients together and for survivors and non-survivors separately. We assessed the predictive performance of the models using bootstrapping and the squared Pearson correlation coefficient (R2, root mean squared prediction error (RMSPE, mean absolute prediction error (MAPE and bias. The distribution of ICU LoS was skewed to the right with a median of 1.7 days (interquartile range 0.8 to 4.0 and a mean of 4.2 days (standard deviation 7.9. The predictive performance of the models was between 0.09 and 0.20 for R2, between 7.28 and 8.74 days for RMSPE, between 3.00 and 4.42 days for MAPE and between -2.99 and 1.64 days for bias. The predictive performance was slightly better for survivors than for non-survivors. We were disappointed in the predictive performance of the regression models and conclude that it is difficult to predict LoS of unplanned ICU admissions using patient characteristics at admission time only.
Regressions by leaps and bounds and biased estimation techniques in yield modeling
Marquina, N. E. (Principal Investigator)
1979-01-01
The author has identified the following significant results. It was observed that OLS was not adequate as an estimation procedure when the independent or regressor variables were involved in multicollinearities. This was shown to cause the presence of small eigenvalues of the extended correlation matrix A'A. It was demonstrated that the biased estimation techniques and the all-possible subset regression could help in finding a suitable model for predicting yield. Latent root regression was an excellent tool that found how many predictive and nonpredictive multicollinearities there were.
DEFF Research Database (Denmark)
Strathe, Anders B; Mark, Thomas; Nielsen, Bjarne;
Random regression models were used to estimate covariance functions between cumulated feed intake (CFI) and body weight (BW) in 8424 Danish Duroc pigs. Random regressions on second order Legendre polynomials of age were used to describe genetic and permanent environmental curves in BW and CFI. Ba......% of the genetic variance in feed intake, revealing that a minor component of feed intake was genetically independent of maintenance and growth. In conclusion, the approach derived herein led to a consistent definition of RFI, where genomic breeding values were easily obtained...
Genetic Parameters for Number of Piglets Born Alive Using a Random Regression Model
Directory of Open Access Journals (Sweden)
Zoran Luković
2003-06-01
Full Text Available A random regression model (RRM was applied to estimate dispersion parameters for number of piglets born alive (NBA from first to tenth parity. Random regressions on Legendre polynomials of standardized parity were included for common litter environmental, permanent environmental and additive genetic effects. Estimated phenotypic variance and variance components (ratios for NBA changed over parities and differed between farms. Eigenvalues for additive genetic effect were calculated in order to detect the proportion of additive genetic variability explained with individual production curves of animals. Existence of the 10-20 % genetic variability in the shape of the curves confirms a possibility for selection on persistency in litter size.
A brief introduction to regression designs and mixed-effects modelling by a recent convert
DEFF Research Database (Denmark)
Balling, Laura Winther
2008-01-01
This article discusses the advantages of multiple regression designs over the factorial designs traditionally used in many psycholinguistic experiments. It is shown that regression designs are typically more informative, statistically more powerful and better suited to the analysis of naturalistic...... tasks. The advantages of including both fixed and random effects are demonstrated with reference to linear mixed-effects models, and problems of collinearity, variable distribution and variable selection are discussed. The advantages of these techniques are exemplified in an analysis of a word...
A Study of Wind Statistics Through Auto-Regressive and Moving-Average (ARMA) Modeling
Institute of Scientific and Technical Information of China (English)
尹彰; 周宗仁
2001-01-01
Statistical properties of winds near the Taichung Harbour are investigated. The 26 years′incomplete data of wind speeds, measured on an hourly basis, are used as reference. The possibility of imputation using simulated results of the Auto-Regressive (AR), Moving-Average (MA), and/or Auto-Regressive and Moving-Average (ARMA) models is studied. Predictions of the 25-year extreme wind speeds based upon the augmented data are compared with the original series. Based upon the results, predictions of the 50- and 100-year extreme wind speeds are then made.
Fatigue design of a cellular phone folder using regression model-based multi-objective optimization
Kim, Young Gyun; Lee, Jongsoo
2016-08-01
In a folding cellular phone, the folding device is repeatedly opened and closed by the user, which eventually results in fatigue damage, particularly to the front of the folder. Hence, it is important to improve the safety and endurance of the folder while also reducing its weight. This article presents an optimal design for the folder front that maximizes its fatigue endurance while minimizing its thickness. Design data for analysis and optimization were obtained experimentally using a test jig. Multi-objective optimization was carried out using a nonlinear regression model. Three regression methods were employed: back-propagation neural networks, logistic regression and support vector machines. The AdaBoost ensemble technique was also used to improve the approximation. Two-objective Pareto-optimal solutions were identified using the non-dominated sorting genetic algorithm (NSGA-II). Finally, a numerically optimized solution was validated against experimental product data, in terms of both fatigue endurance and thickness index.
A note on constrained M-estimation and its recursive analog in multivariate linear regression models
Institute of Scientific and Technical Information of China (English)
RAO; Calyampudi; R
2009-01-01
In this paper,the constrained M-estimation of the regression coeffcients and scatter parameters in a general multivariate linear regression model is considered.Since the constrained M-estimation is not easy to compute,an up-dating recursion procedure is proposed to simplify the com-putation of the estimators when a new observation is obtained.We show that,under mild conditions,the recursion estimates are strongly consistent.In addition,the asymptotic normality of the recursive constrained M-estimators of regression coeffcients is established.A Monte Carlo simulation study of the recursion estimates is also provided.Besides,robustness and asymptotic behavior of constrained M-estimators are briefly discussed.
A Multilevel Regression Model for Geographical Studies in Sets of Non-Adjacent Cities.
Directory of Open Access Journals (Sweden)
Marc Marí-Dell'Olmo
Full Text Available In recent years, small-area-based ecological regression analyses have been published that study the association between a health outcome and a covariate in several cities. These analyses have usually been performed independently for each city and have therefore yielded unrelated estimates for the cities considered, even though the same process has been studied in all of them. In this study, we propose a joint ecological regression model for multiple cities that accounts for spatial structure both within and between cities and explore the advantages of this model. The proposed model merges both disease mapping and geostatistical ideas. Our proposal is compared with two alternatives, one that models the association for each city as fixed effects and another that treats them as independent and identically distributed random effects. The proposed model allows us to estimate the association (and assess its significance at locations with no available data. Our proposal is illustrated by an example of the association between unemployment (as a deprivation surrogate and lung cancer mortality among men in 31 Spanish cities. In this example, the associations found were far more accurate for the proposed model than those from the fixed effects model. Our main conclusion is that ecological regression analyses can be markedly improved by performing joint analyses at several locations that share information among them. This finding should be taken into consideration in the design of future epidemiological studies.
Floating Car Data Based Nonparametric Regression Model for Short-Term Travel Speed Prediction
Institute of Scientific and Technical Information of China (English)
WENG Jian-cheng; HU Zhong-wei; YU Quan; REN Fu-tian
2007-01-01
A K-nearest neighbor (K-NN) based nonparametric regression model was proposed to predict travel speed for Beijing expressway. By using the historical traffic data collected from the detectors in Beijing expressways, a specically designed database was developed via the processes including data filtering, wavelet analysis and clustering. The relativity based weighted Euclidean distance was used as the distance metric to identify the K groups of nearest data series. Then, a K-NN nonparametric regression model was built to predict the average travel speeds up to 6 min into the future. Several randomly selected travel speed data series,collected from the floating car data (FCD) system, were used to validate the model. The results indicate that using the FCD, the model can predict average travel speeds with an accuracy of above 90%, and hence is feasible and effective.
The applicability of linear regression models in working environments' thermal evaluation.
Directory of Open Access Journals (Sweden)
Pablo Adamoglu de Oliveira
2006-04-01
Full Text Available The simultaneous analysis of thermal variables with normal distribution with the aim of checking if there is any significative correlation among them or if there is the possibility of making predictions of the values of some of them based on others’ values is considered a problem of great importance in statistics studies. The aim of this paper is to study the applicability of linear regression models in working environments’ thermal comfort studies, thus contributing for the comprehension of the possible environmental cooling, heating or winding needs. It starts with a bibliographical research, followed by a field research, data collection and and software statistical-mathematical data treatment. It was then performed data analysis and the construction of the regression linear models using the t and F tests for determining the consistency of the models and their parameters, as well as the building of conclusions based on the information obtained and on the significance of the mathematical models built.
Directory of Open Access Journals (Sweden)
Sumit Goyal
2011-07-01
Full Text Available Coffee as beverage is prepared from the roasted seeds (beans of the coffee plant. Coffee is the second most important product in the international market in terms of volume trade and the most important in terms of value. Artificial neural engineering and regression models were developed to predict shelf life of instant coffee drink. Colour and appearance, flavour, viscosity and sediment were used as input parameters. Overall acceptability was used as output parameter. The dataset consisted of experimentally developed 50 observations. The dataset was divided into two disjoint subsets, namely, training set containing 40 observations (80% of total observations and test set comprising of 10 observations (20% of total observations. The network was trained with 500 epochs. Neural network toolbox under Matlab 7.0 software was used for training the models. From the investigation it was revealed that multiple linear regression model was superior over radial basis model for forecasting shelf life of instant coffee drink.
Directory of Open Access Journals (Sweden)
Kupek Emil
2006-03-01
Full Text Available Abstract Background Structural equation modelling (SEM has been increasingly used in medical statistics for solving a system of related regression equations. However, a great obstacle for its wider use has been its difficulty in handling categorical variables within the framework of generalised linear models. Methods A large data set with a known structure among two related outcomes and three independent variables was generated to investigate the use of Yule's transformation of odds ratio (OR into Q-metric by (OR-1/(OR+1 to approximate Pearson's correlation coefficients between binary variables whose covariance structure can be further analysed by SEM. Percent of correctly classified events and non-events was compared with the classification obtained by logistic regression. The performance of SEM based on Q-metric was also checked on a small (N = 100 random sample of the data generated and on a real data set. Results SEM successfully recovered the generated model structure. SEM of real data suggested a significant influence of a latent confounding variable which would have not been detectable by standard logistic regression. SEM classification performance was broadly similar to that of the logistic regression. Conclusion The analysis of binary data can be greatly enhanced by Yule's transformation of odds ratios into estimated correlation matrix that can be further analysed by SEM. The interpretation of results is aided by expressing them as odds ratios which are the most frequently used measure of effect in medical statistics.
Comparison of a Bayesian Network with a Logistic Regression Model to Forecast IgA Nephropathy
Directory of Open Access Journals (Sweden)
Michel Ducher
2013-01-01
Full Text Available Models are increasingly used in clinical practice to improve the accuracy of diagnosis. The aim of our work was to compare a Bayesian network to logistic regression to forecast IgA nephropathy (IgAN from simple clinical and biological criteria. Retrospectively, we pooled the results of all biopsies (n=155 performed by nephrologists in a specialist clinical facility between 2002 and 2009. Two groups were constituted at random. The first subgroup was used to determine the parameters of the models adjusted to data by logistic regression or Bayesian network, and the second was used to compare the performances of the models using receiver operating characteristics (ROC curves. IgAN was found (on pathology in 44 patients. Areas under the ROC curves provided by both methods were highly significant but not different from each other. Based on the highest Youden indices, sensitivity reached (100% versus 67% and specificity (73% versus 95% using the Bayesian network and logistic regression, respectively. A Bayesian network is at least as efficient as logistic regression to estimate the probability of a patient suffering IgAN, using simple clinical and biological data obtained during consultation.
Comparison of a Bayesian network with a logistic regression model to forecast IgA nephropathy.
Ducher, Michel; Kalbacher, Emilie; Combarnous, François; Finaz de Vilaine, Jérome; McGregor, Brigitte; Fouque, Denis; Fauvel, Jean Pierre
2013-01-01
Models are increasingly used in clinical practice to improve the accuracy of diagnosis. The aim of our work was to compare a Bayesian network to logistic regression to forecast IgA nephropathy (IgAN) from simple clinical and biological criteria. Retrospectively, we pooled the results of all biopsies (n = 155) performed by nephrologists in a specialist clinical facility between 2002 and 2009. Two groups were constituted at random. The first subgroup was used to determine the parameters of the models adjusted to data by logistic regression or Bayesian network, and the second was used to compare the performances of the models using receiver operating characteristics (ROC) curves. IgAN was found (on pathology) in 44 patients. Areas under the ROC curves provided by both methods were highly significant but not different from each other. Based on the highest Youden indices, sensitivity reached (100% versus 67%) and specificity (73% versus 95%) using the Bayesian network and logistic regression, respectively. A Bayesian network is at least as efficient as logistic regression to estimate the probability of a patient suffering IgAN, using simple clinical and biological data obtained during consultation.
Age estimation based on pelvic ossification using regression models from conventional radiography.
Zhang, Kui; Dong, Xiao-Ai; Fan, Fei; Deng, Zhen-Hua
2016-07-01
To establish regression models for age estimation from the combination of the ossification of iliac crest and ischial tuberosity. One thousand three hundred and seventy-nine conventional pelvic radiographs at the West China Hospital of Sichuan University between January 2010 and June 2012 were evaluated retrospectively. The receiver operating characteristic analysis was performed to measure the value of estimation of 18 years of age with the classification scheme for the iliac crest and ischial tuberosity. Regression analysis was performed, and formulas for calculating approximate chronological age according to the combination developmental status of the ossification for the iliac crest and ischial tuberosity were developed. The areas under the receiver operating characteristic (ROC) curves were above 0.9 (p systems, and the cubic regression model was found to have the highest R-square value (R (2) = 0.744 for female and R (2) = 0.753 for male). The present classification scheme for apophyseal iliac crest ossification and the ischial tuberosity may be used for age estimation. And the present established cubic regression model according to the combination developmental status of the ossification for the iliac crest and ischial tuberosity can be used for age estimation. PMID:27169673
BOOTSTRAP WAVELET IN THE NONPARAMETRIC REGRESSION MODEL WITH WEAKLY DEPENDENT PROCESSES
Institute of Scientific and Technical Information of China (English)
林路; 张润楚
2004-01-01
This paper introduces a method of bootstrap wavelet estimation in a nonparametric regression model with weakly dependent processes for both fixed and random designs. The asymptotic bounds for the bias and variance of the bootstrap wavelet estimators are given in the fixed design model. The conditional normality for a modified version of the bootstrap wavelet estimators is obtained in the fixed model. The consistency for the bootstrap wavelet estimator is also proved in the random design model. These results show that the bootstrap wavelet method is valid for the model with weakly dependent processes.
Multiple Regression (MR) and Artificial Neural Network (ANN) models for prediction of soil suction
Erzin, Yusuf; Yilmaz, Isik
2010-05-01
This article presents a comparison of multiple regression (MR) and artificial neural network (ANN) model for prediction of soil suction of clayey soils. The results of the soil suction tests utilizing thermocouple psychrometers on statically compacted specimens of Bentonite-Kaolinite clay mixtures with varying soil properties were used to develope the models. The results obtained from both models were then compared with the experimental results. The performance indices such as coefficient of determination (R2), root mean square error (RMSE), mean absolute error (MAE), and variance account for (VAF) were used to control the performance of the prediction capacity of the models developed in this study. ANN model has shown higher prediction performance than regression model according to the performance indices. It is shown that ANN models provide significant improvements in prediction accuracy over statistical models. The potential benefits of soft computing models extend beyond the high computation rates. Higher performances of the soft computing models were sourced from greater degree of robustness and fault tolerance than traditional statistical models because there are many more processing neurons, each with primarily local connections. It appears that there is a possibility of estimating soil suction by using the proposed empirical relationships and soft computing models. The population of the analyzed data is relatively limited in this study. Therefore, the practical outcome of the proposed equations and models could be used, with acceptable accuracy.
Goyal, S; Goyal, G. K.
2012-01-01
This paper highlights the significance of computational intelligence models for predicting shelf life of processed cheese stored at 7-8 g.C. Linear Layer and Generalized Regression models were developed with input parameters: Soluble nitrogen, pH, Standard plate count, Yeast & mould count, Spores, and sensory score as output parameter. Mean Square Error, Root Mean Square Error, Coefficient of Determination and Nash - Sutcliffo Coefficient were used in order to compare the prediction ability o...
Regression spline bivariate probit models: a practical approach to testing for exogeneity
Marra, G.; Radice, Rosalba; Filippou, P
2015-01-01
Bivariate probit models can deal with a problem usually known as endogeneity. This issue is likely to arise in observational studies when confounders are unobserved. We are concerned with testing the hypothesis of exogeneity (or absence of endogeneity) when using regression spline recursive and sample selection bivariate probit models. Likelihood ratio and gradient tests are discussed in this context and their empirical properties investigated and compared with those of the Lagrange multiplie...
Rachna Aggarwal; Maneek Kumar; Sharma, R K; M. K. Sharma
2014-01-01
This paper presents Reliability Based Design Optimization (RBDO) model to deal with uncertainties involved in concrete mix design process. The optimization problem is formulated in such a way that probabilistic concrete mix input parameters showing random characteristics are determined by minimizing the cost of concrete subjected to concrete compressive strength constraint for a given target reliability. Linear and quadratic models based on Ordinary Least Square Regression (OLSR), Traditiona...
Kupek Emil
2006-01-01
Abstract Background Structural equation modelling (SEM) has been increasingly used in medical statistics for solving a system of related regression equations. However, a great obstacle for its wider use has been its difficulty in handling categorical variables within the framework of generalised linear models. Methods A large data set with a known structure among two related outcomes and three independent variables was generated to investigate the use of Yule's transformation of odds ratio (O...
A quadtree-adaptive spectral wave model
Popinet, Stéphane; Gorman, Richard M.; Rickard, Graham J.; Tolman, Hendrik L.
A spectral wave model coupling a quadtree-adaptive discretisation of the two spatial dimensions with a standard discretisation of the two spectral dimensions is described. The implementation is greatly simplified by reusing components of the Gerris solver (for spatial advection on quadtrees) and WAVEWATCH III (for spectral advection and source terms). Strict equivalence between the anisotropic diffusion and spatial filtering methods for alleviation of the Garden Sprinkler Effect (GSE) is demonstrated. This equivalence facilitates the generalisation of GSE alleviation techniques to quadtree grids. For the case of a cyclone-generated wave field, the cost of the adaptive method increases linearly with spatial resolution compared to quadratically for constant-resolution methods. This leads to decrease in runtimes of one to two orders of magnitude for practical spatial resolutions. Similar efficiency gains are shown to be possible for global spectral wave forecasting.
APPLICATION OF REGRESSION MODELLING TECHNIQUES IN DESALINATION OF SEA WATER BY MEMBRANE DISTILLATION
Directory of Open Access Journals (Sweden)
SELVI S. R
2015-08-01
Full Text Available The objective of this work is to gain an idea about the statistical significance of experimental parameters on the performance of membrane distillation. In this work the raw sea water sample without pretreatment was collected from Puducherry and desalinated using direct contact membrane distillation method. Experimental data analysis was carried out using statistical methods. The experimental data involves the effects of feed temperature, feed flow rate and feed concentration on the permeate flux. In statistical methods, regression model was developed to correlate the significance of input parameters like feed temperature, feed concentration and feed flow rate with the output parameter like permeate flux in the process of membrane distillation. Since the performance of the membrane distillation in the desalination of water is characterised by permeate flux, regression model using simple linear method was carried out. Goodness of model fitting should always has to be validated. Regression model was validated using ANOVA. Estimates of ANOVA for the parameter study was given and the coefficient obtained by regression analysis was specified in the regression equation and concluded that the highest coefficient of input parameter is significant, highly influences the response. Feed flow rate and feed temperature has higher influence on permeate flux than that of feed concentration. The coefficient of feed concentration was found to be negative which indicates less significant factor on permeate flux. The chemical composition of sea water was given by water quality analysis . TDS of membrane distilled water was found to be 18ppm than the initial feed TDS of sea water 27,720 ppm. From the experimental work it was found, salt rejection as 99% and water analysis report confirms the quality of distillate obtained by this desalination process as potable water.
Support vector regression model based predictive control of water level of U-tube steam generators
International Nuclear Information System (INIS)
Highlights: • Water level of U-tube steam generators was controlled in a model predictive fashion. • Models for steam generator water level were built using support vector regression. • Cost function minimization for future optimal controls was performed by using the steepest descent method. • The results indicated the feasibility of the proposed method. - Abstract: A predictive control algorithm using support vector regression based models was proposed for controlling the water level of U-tube steam generators of pressurized water reactors. Steam generator data were obtained using a transfer function model of U-tube steam generators. Support vector regression based models were built using a time series type model structure for five different operating powers. Feedwater flow controls were calculated by minimizing a cost function that includes the level error, the feedwater change and the mismatch between feedwater and steam flow rates. Proposed algorithm was applied for a scenario consisting of a level setpoint change and a steam flow disturbance. The results showed that steam generator level can be controlled at all powers effectively by the proposed method
Mel'nikov, A. V.
1996-10-01
Contents Introduction Chapter I. Basic notions and results from contemporary martingale theory §1.1. General notions of the martingale theory §1.2. Convergence (a.s.) of semimartingales. The strong law of large numbers and the law of the iterated logarithm Chapter II. Stochastic differential equations driven by semimartingales §2.1. Basic notions and results of the theory of stochastic differential equations driven by semimartingales §2.2. The method of monotone approximations. Existence of strong solutions of stochastic equations with non-smooth coefficients §2.3. Linear stochastic equations. Properties of stochastic exponentials §2.4. Linear stochastic equations. Applications to models of the financial market Chapter III. Procedures of stochastic approximation as solutions of stochastic differential equations driven by semimartingales §3.1. Formulation of the problem. A general model and its relation to the classical one §3.2. A general description of the approach to the procedures of stochastic approximation. Convergence (a.s.) and asymptotic normality §3.3. The Gaussian model of stochastic approximation. Averaged procedures and their effectiveness Chapter IV. Statistical estimation in regression models with martingale noises §4.1. The formulation of the problem and classical regression models §4.2. Asymptotic properties of MLS-estimators. Strong consistency, asymptotic normality, the law of the iterated logarithm §4.3. Regression models with deterministic regressors §4.4. Sequential MLS-estimators with guaranteed accuracy and sequential statistical inferences Bibliography
Directory of Open Access Journals (Sweden)
Rachna Aggarwal
2014-12-01
Full Text Available This paper presents Reliability Based Design Optimization (RBDO model to deal with uncertainties involved in concrete mix design process. The optimization problem is formulated in such a way that probabilistic concrete mix input parameters showing random characteristics are determined by minimizing the cost of concrete subjected to concrete compressive strength constraint for a given target reliability. Linear and quadratic models based on Ordinary Least Square Regression (OLSR, Traditional Ridge Regression (TRR and Generalized Ridge Regression (GRR techniques have been explored to select the best model to explicitly represent compressive strength of concrete. The RBDO model is solved by Sequential Optimization and Reliability Assessment (SORA method using fully quadratic GRR model. Optimization results for a wide range of target compressive strength and reliability levels of 0.90, 0.95 and 0.99 have been reported. Also, safety factor based Deterministic Design Optimization (DDO designs for each case are obtained. It has been observed that deterministic optimal designs are cost effective but proposed RBDO model gives improved design performance.
Identifying of risks in pricing using a regression model of demand on price dependence
Directory of Open Access Journals (Sweden)
O.I. Yashkina
2016-09-01
Full Text Available The aim of the article. The main purpose of the article is to describe scientific and methodological approaches of determining the price elasticity of demand as a regression model based on the price and risk assessment of price variations on the received model. The results of the analysis. The study is based on the assumption that the index of price elasticity of demand on high-tech innovation is not constant as it is commonly understood in the classical sense. On the stage of commodity market release and subsequent sales growth, the index of price elasticity of demand may vary within certain limits. Index value and thereafter market response are closely related to the current price. Achieving the stated purpose of the article is possible when having factual information about prices and corresponding volumes of sales of new high-tech products for a short period of time, on the basis of which types of demand and prices interrelation are modeled. Risk assessment of pricing and profit optimization by the regression of demand depending on price consists of three stages: a obtaining of a regression model of the demand on the price; b obtaining of function of demand price elasticity and risk assessment of pricing depending on behavior of the function; c determination of the price of company to receive a maximum operating profit based on the specific model of price to demand function. To receive the regression model of dependence of demand on price it is recommended to use specific reference models. The article includes linear, hyperbolic and parabolic models. The regression dependence of price elasticity of demand on price for each of the reference models of demand is obtained on the basis of the function elasticity concept in mathematical analysis. The concept of «function of price elasticity of demand» expresses this dependence. For the received functions of price elasticity of demand, the article provides intervals with the highest and lowest
An adaptive contextual quantum language model
Li, Jingfei; Zhang, Peng; Song, Dawei; Hou, Yuexian
2016-08-01
User interactions in search system represent a rich source of implicit knowledge about the user's cognitive state and information need that continuously evolves over time. Despite massive efforts that have been made to exploiting and incorporating this implicit knowledge in information retrieval, it is still a challenge to effectively capture the term dependencies and the user's dynamic information need (reflected by query modifications) in the context of user interaction. To tackle these issues, motivated by the recent Quantum Language Model (QLM), we develop a QLM based retrieval model for session search, which naturally incorporates the complex term dependencies occurring in user's historical queries and clicked documents with density matrices. In order to capture the dynamic information within users' search session, we propose a density matrix transformation framework and further develop an adaptive QLM ranking model. Extensive comparative experiments show the effectiveness of our session quantum language models.
Model reference adaptive control and adaptive stability augmentation
DEFF Research Database (Denmark)
Henningsen, Arne; Ravn, Ole
1993-01-01
A comparison of the standard concepts in MRAC design suggests that a combination of the implicit and the explicit design techniques may lead to an improvement of the overall system performance in the presence of unmodelled dynamics. Using the ideas of adaptive stability augmentation a combined...
A regression-kriging model for estimation of rainfall in the Laohahe basin
Wang, Hong; Ren, Li L.; Liu, Gao H.
2009-10-01
This paper presents a multivariate geostatistical algorithm called regression-kriging (RK) for predicting the spatial distribution of rainfall by incorporating five topographic/geographic factors of latitude, longitude, altitude, slope and aspect. The technique is illustrated using rainfall data collected at 52 rain gauges from the Laohahe basis in northeast China during 1986-2005 . Rainfall data from 44 stations were selected for modeling and the remaining 8 stations were used for model validation. To eliminate multicollinearity, the five explanatory factors were first transformed using factor analysis with three Principal Components (PCs) extracted. The rainfall data were then fitted using step-wise regression and residuals interpolated using SK. The regression coefficients were estimated by generalized least squares (GLS), which takes the spatial heteroskedasticity between rainfall and PCs into account. Finally, the rainfall prediction based on RK was compared with that predicted from ordinary kriging (OK) and ordinary least squares (OLS) multiple regression (MR). For correlated topographic factors are taken into account, RK improves the efficiency of predictions. RK achieved a lower relative root mean square error (RMSE) (44.67%) than MR (49.23%) and OK (73.60%) and a lower bias than MR and OK (23.82 versus 30.89 and 32.15 mm) for annual rainfall. It is much more effective for the wet season than for the dry season. RK is suitable for estimation of rainfall in areas where there are no stations nearby and where topography has a major influence on rainfall.
Asavaskulkiet, Krissada
2014-01-01
This paper proposes a novel face super-resolution reconstruction (hallucination) technique for YCbCr color space. The underlying idea is to learn with an error regression model and multi-linear principal component analysis (MPCA). From hallucination framework, many color face images are explained in YCbCr space. To reduce the time complexity of color face hallucination, we can be naturally described the color face imaged as tensors or multi-linear arrays. In addition, the error regression analysis is used to find the error estimation which can be obtained from the existing LR in tensor space. In learning process is from the mistakes in reconstruct face images of the training dataset by MPCA, then finding the relationship between input and error by regression analysis. In hallucinating process uses normal method by backprojection of MPCA, after that the result is corrected with the error estimation. In this contribution we show that our hallucination technique can be suitable for color face images both in RGB and YCbCr space. By using the MPCA subspace with error regression model, we can generate photorealistic color face images. Our approach is demonstrated by extensive experiments with high-quality hallucinated color faces. Comparison with existing algorithms shows the effectiveness of the proposed method.
Das, Iswar; Stein, Alfred; Kerle, Norman; Dadhwal, Vinay K.
2012-12-01
Landslide susceptibility mapping (LSM) along road corridors in the Indian Himalayas is an essential exercise that helps planners and decision makers in determining the severity of probable slope failure areas. Logistic regression is commonly applied for this purpose, as it is a robust and straightforward technique that is relatively easy to handle. Ordinary logistic regression as a data-driven technique, however, does not allow inclusion of prior information. This study presents Bayesian logistic regression (BLR) for landslide susceptibility assessment along road corridors. The methodology is tested in a landslide-prone area in the Bhagirathi river valley in the Indian Himalayas. Parameter estimates from BLR are compared with those obtained from ordinary logistic regression. By means of iterative Markov Chain Monte Carlo simulation, BLR provides a rich set of results on parameter estimation. We assessed model performance by the receiver operator characteristics curve analysis, and validated the model using 50% of the landslide cells kept apart for testing and validation. The study concludes that BLR performs better in posterior parameter estimation in general and the uncertainty estimation in particular.
The study on Sanmenxia annual flow forecasting in the Yellow River with mix regression model
Institute of Scientific and Technical Information of China (English)
JIANG Xiaohui; LIU Changming; WANG Yu; WANG Hongrui
2004-01-01
This paper established mix regression model for simulating annual flow, in which annual runoff is auto-regression factor, precipitation, air temperature and water consumption are regression factors; we adopted 9 hypothesis climate change schemes to forecast the change of annual flow of Sanmenxia Station. The results show: (1) When temperature is steady, the average annual runoff will increase by 8.3% if precipitation increases by 10%; when precipitation decreases by 10%, the average annual runoff will decrease by 8.2%; when precipitation is steady, the average annual runoff will decrease by 2.4% if temperature increases 1 ℃; if temperature decreases 1 ℃, runoff will increase by 1.2%. The mix regression model can well simulate annual runoff. (2) As to 9 different temperature and precipitation scenarios, scenario 9 is the most adverse to the runoff of Sanmenxia Station of Yellow River; i.e. temperature increases 1℃and precipitation decreases by 10%. Under this condition, the simulated average annual runoff decreases by 10.8%. On the contrary, scenario 1 is the best to the enhancement of runoff; i.e. when temperature decreases 1 ℃ precipitation will increase by 10%, which will make the annual runoff of Sanmenxia increase by 10.6%.
Probing turbulence intermittency via Auto-Regressive Moving-Average models
Faranda, Davide; Dubrulle, Berengere; Daviaud, Francois
2014-01-01
We suggest a new approach to probing intermittency corrections to the Kolmogorov law in turbulent flows based on the Auto-Regressive Moving-Average modeling of turbulent time series. We introduce a new index $\\Upsilon$ that measures the distance from a Kolmogorov-Obukhov model in the Auto-Regressive Moving-Average models space. Applying our analysis to Particle Image Velocimetry and Laser Doppler Velocimetry measurements in a von K\\'arm\\'an swirling flow, we show that $\\Upsilon$ is proportional to the traditional intermittency correction computed from the structure function. Therefore it provides the same information, using much shorter time series. We conclude that $\\Upsilon$ is a suitable index to reconstruct the spatial intermittency of the dissipation in both numerical and experimental turbulent fields.
Partial Least Squares Regression Model to Predict Water Quality in Urban Water Distribution Systems
Institute of Scientific and Technical Information of China (English)
LUO Bijun; ZHAO Yuan; CHEN Kai; ZHAO Xinhua
2009-01-01
The water distribution system of one residential district in Tianjin is taken as an example to analyze the changes of water quality. Partial least squares (PLS) regression model, in which the turbidity and Fe are regarded as con-trol objectives, is used to establish the statistical model. The experimental results indicate that the PLS regression model has good predicted results of water quality compared with the monitored data. The percentages of absolute relative error (below 15%, 20%, 30%) are 44.4%, 66.7%, 100% (turbidity) and 33.3%, 44.4%, 77.8% (Fe) on the 4th sampling point; 77.8%, 88.9%, 88.9% (turbidity) and 44.4%, 55.6%, 66.7% (Fe) on the 5th sampling point.
Estimating strength of DDoS attack using various regression models
Gupta, B B; Misra, Manoj
2012-01-01
Anomaly-based DDoS detection systems construct profile of the traffic normally seen in the network, and identify anomalies whenever traffic deviate from normal profile beyond a threshold. This extend of deviation is normally not utilised. This paper reports the evaluation results of proposed approach that utilises this extend of deviation from detection threshold to estimate strength of DDoS attack using various regression models. A relationship is established between number of zombies and observed deviation in sample entropy. Various statistical performance measures, such as coefficient of determination (R2), coefficient of correlation (CC), sum of square error (SSE), mean square error (MSE), root mean square error (RMSE), normalised mean square error (NMSE), Nash-Sutcliffe efficiency index ({\\eta}) and mean absolute error (MAE) are used to measure the performance of various regression models. Internet type topologies used for simulation are generated using transit-stub model of GT-ITM topology generator. NS...
Random regression models for daily feed intake in Danish Duroc pigs
DEFF Research Database (Denmark)
Strathe, Anders Bjerring; Mark, Thomas; Jensen, Just;
The objective of this study was to develop random regression models and estimate covariance functions for daily feed intake (DFI) in Danish Duroc pigs. A total of 476201 DFI records were available on 6542 Duroc boars between 70 to 160 days of age. The data originated from the National test station...... and were collected using ACEMO electronic feeders in the period of 2008 to 2011. The pedigree was traced back to 1995 and included 17222 animals. The phenotypic feed intake curve was decomposed into a fixed curve, being specific to the barn-year-season effect and curves associated with the random pen......-year-season, permanent, and animal genetic effects. The functional form was based on Legendre polynomials. A total of 64 models for random regressions were initially ranked by BIC to identify the approximate order for the Legendre polynomials using AI-REML. The parsimonious model included Legendre polynomials of 2nd...
Oliveira, María; Einbeck, Jochen; Higueras, Manuel; Ainsbury, Elizabeth; Puig, Pedro; Rothkamm, Kai
2016-03-01
Within the field of cytogenetic biodosimetry, Poisson regression is the classical approach for modeling the number of chromosome aberrations as a function of radiation dose. However, it is common to find data that exhibit overdispersion. In practice, the assumption of equidispersion may be violated due to unobserved heterogeneity in the cell population, which will render the variance of observed aberration counts larger than their mean, and/or the frequency of zero counts greater than expected for the Poisson distribution. This phenomenon is observable for both full- and partial-body exposure, but more pronounced for the latter. In this work, different methodologies for analyzing cytogenetic chromosomal aberrations datasets are compared, with special focus on zero-inflated Poisson and zero-inflated negative binomial models. A score test for testing for zero inflation in Poisson regression models under the identity link is also developed. PMID:26461836
Mahmood, Ehab A.; Rana, Sohel; Hussin, Abdul Ghapor; Midi, Habshah
2016-06-01
The circular regression model may contain one or more data points which appear to be peculiar or inconsistent with the main part of the model. This may be occur due to recording errors, sudden short events, sampling under abnormal conditions etc. The existence of these data points "outliers" in the data set cause lot of problems in the research results and the conclusions. Therefore, we should identify them before applying statistical analysis. In this article, we aim to propose a statistic to identify outliers in the both of the response and explanatory variables of the simple circular regression model. Our proposed statistic is robust circular distance RCDxy and it is justified by the three robust measurements such as proportion of detection outliers, masking and swamping rates.
Model Driven Mutation Applied to Adaptative Systems Testing
Bartel, Alexandre; Munoz, Freddy; Klein, Jacques; Mouelhi, Tejeddine; Traon, Yves Le
2012-01-01
Dynamically Adaptive Systems modify their behav- ior and structure in response to changes in their surrounding environment and according to an adaptation logic. Critical sys- tems increasingly incorporate dynamic adaptation capabilities; examples include disaster relief and space exploration systems. In this paper, we focus on mutation testing of the adaptation logic. We propose a fault model for adaptation logics that classifies faults into environmental completeness and adaptation correct- ness. Since there are several adaptation logic languages relying on the same underlying concepts, the fault model is expressed independently from specific adaptation languages. Taking benefit from model-driven engineering technology, we express these common concepts in a metamodel and define the operational semantics of mutation operators at this level. Mutation is applied on model elements and model transformations are used to propagate these changes to a given adaptation policy in the chosen formalism. Preliminary resul...
Bentamapimod (JNK Inhibitor AS602801) Induces Regression of Endometriotic Lesions in Animal Models.
Palmer, Stephen S; Altan, Melis; Denis, Deborah; Tos, Enrico Gillio; Gotteland, Jean-Pierre; Osteen, Kevin G; Bruner-Tran, Kaylon L; Nataraja, Selvaraj G
2016-01-01
Endometriosis is an estrogen (ER)-dependent gynecological disease caused by the growth of endometrial tissue at extrauterine sites. Current endocrine therapies address the estrogenic aspect of disease and offer some relief from pain but are associated with significant side effects. Immune dysfunction is also widely believed to be an underlying contributor to the pathogenesis of this disease. This study evaluated an inhibitor of c-Jun N-terminal kinase, bentamapimod (AS602801), which interrupts immune pathways, in 2 rodent endometriosis models. Treatment of nude mice bearing xenografts biopsied from women with endometriosis (BWE) with 30 mg/kg AS602801 caused 29% regression of lesion. Medroxyprogesterone acetate (MPA) or progesterone (PR) alone did not cause regression of BWE lesions, but combining 10 mg/kg AS602801 with MPA caused 38% lesion regression. In human endometrial organ cultures (from healthy women), treatment with AS602801 or MPA reduced matrix metalloproteinase-3 (MMP-3) release into culture medium. In organ cultures established with BWE, PR or MPA failed to inhibit MMP-3 secretion, whereas AS602801 alone or MPA + AS602801 suppressed MMP-3 production. In an autologous rat endometriosis model, AS602801 caused 48% regression of lesions compared to GnRH antagonist Antide (84%). AS602801 reduced inflammatory cytokines in endometriotic lesions, while levels of cytokines in ipsilateral horns were unaffected. Furthermore, AS602801 enhanced natural killer cell activity, without apparent negative effects on uterus. These results indicate that bentamapimod induced regression of endometriotic lesions in endometriosis rodent animal models without suppressing ER action. c-Jun N-terminal kinase inhibition mediated a comprehensive reduction in cytokine secretion and moreover was able to overcome PR resistance. PMID:26335175
A Stepwise Time Series Regression Procedure for Water Demand Model Identification
Miaou, Shaw-Pin
1990-09-01
Annual time series water demand has traditionally been studied through multiple linear regression analysis. Four associated model specification problems have long been recognized: (1) the length of the available time series data is relatively short, (2) a large set of candidate explanatory or "input" variables needs to be considered, (3) input variables can be highly correlated with each other (multicollinearity problem), and (4) model error series are often highly autocorrelated or even nonstationary. A step wise time series regression identification procedure is proposed to alleviate these problems. The proposed procedure adopts the sequential input variable selection concept of stepwise regression and the "three-step" time series model building strategy of Box and Jenkins. Autocorrelated model error is assumed to follow an autoregressive integrated moving average (ARIMA) process. The stepwise selection procedure begins with a univariate time series demand model with no input variables. Subsequently, input variables are selected and inserted into the equation one at a time until the last entered variable is found to be statistically insignificant. The order of insertion is determined by a statistical measure called between-variable partial correlation. This correlation measure is free from the contamination of serial autocorrelation. Three data sets from previous studies are employed to illustrate the proposed procedure. The results are then compared with those from their original studies.
Directory of Open Access Journals (Sweden)
Yoonsu Shin
2016-01-01
Full Text Available In the 5G era, the operational cost of mobile wireless networks will significantly increase. Further, massive network capacity and zero latency will be needed because everything will be connected to mobile networks. Thus, self-organizing networks (SON are needed, which expedite automatic operation of mobile wireless networks, but have challenges to satisfy the 5G requirements. Therefore, researchers have proposed a framework to empower SON using big data. The recent framework of a big data-empowered SON analyzes the relationship between key performance indicators (KPIs and related network parameters (NPs using machine-learning tools, and it develops regression models using a Gaussian process with those parameters. The problem, however, is that the methods of finding the NPs related to the KPIs differ individually. Moreover, the Gaussian process regression model cannot determine the relationship between a KPI and its various related NPs. In this paper, to solve these problems, we proposed multivariate multiple regression models to determine the relationship between various KPIs and NPs. If we assume one KPI and multiple NPs as one set, the proposed models help us process multiple sets at one time. Also, we can find out whether some KPIs are conflicting or not. We implement the proposed models using MapReduce.
Silva, F G; Torres, R A; Brito, L F; Euclydes, R F; Melo, A L P; Souza, N O; Ribeiro, J I; Rodrigues, M T
2013-12-11
The objective of this study was to identify the best random regression model using Legendre orthogonal polynomials to evaluate Alpine goats genetically and to estimate the parameters for test day milk yield. On the test day, we analyzed 20,710 records of milk yield of 667 goats from the Goat Sector of the Universidade Federal de Viçosa. The evaluated models had combinations of distinct fitting orders for polynomials (2-5), random genetic (1-7), and permanent environmental (1-7) fixed curves and a number of classes for residual variance (2, 4, 5, and 6). WOMBAT software was used for all genetic analyses. A random regression model using the best Legendre orthogonal polynomial for genetic evaluation of milk yield on the test day of Alpine goats considered a fixed curve of order 4, curve of genetic additive effects of order 2, curve of permanent environmental effects of order 7, and a minimum of 5 classes of residual variance because it was the most economical model among those that were equivalent to the complete model by the likelihood ratio test. Phenotypic variance and heritability were higher at the end of the lactation period, indicating that the length of lactation has more genetic components in relation to the production peak and persistence. It is very important that the evaluation utilizes the best combination of fixed, genetic additive and permanent environmental regressions, and number of classes of heterogeneous residual variance for genetic evaluation using random regression models, thereby enhancing the precision and accuracy of the estimates of parameters and prediction of genetic values.
A class of additive-accelerated means regression models for recurrent event data
Institute of Scientific and Technical Information of China (English)
无
2010-01-01
In this article, we propose a class of additive-accelerated means regression models for analyzing recurrent event data. The class includes the proportional means model, the additive rates model, the accelerated failure time model, the accelerated rates model and the additive-accelerated rate model as special cases. The new model offers great flexibility in formulating the effects of covariates on the mean functions of counting processes while leaving the stochastic structure completely unspecified. For the inference on the model parameters, estimating equation approaches are derived and asymptotic properties of the proposed estimators are established. In addition, a technique is provided for model checking. The finite-sample behavior of the proposed methods is examined through Monte Carlo simulation studies, and an application to a bladder cancer study is illustrated.
Adaptive Genetic Algorithm Model for Intrusion Detection
Directory of Open Access Journals (Sweden)
K. S. Anil Kumar
2012-09-01
Full Text Available Intrusion detection systems are intelligent systems designed to identify and prevent the misuse of computer networks and systems. Various approaches to Intrusion Detection are currently being used, but they are relatively ineffective. Thus the emerging network security systems need be part of the life system and this ispossible only by embedding knowledge into the network. The Adaptive Genetic Algorithm Model - IDS comprising of K-Means clustering Algorithm, Genetic Algorithm and Neural Network techniques. Thetechnique is tested using multitude of background knowledge sets in DARPA network traffic datasets.
A Model for Dynamic Adaptive Coscheduling
Institute of Scientific and Technical Information of China (English)
LU Sanglu; ZHOU Xiaobo; XIE Li
1999-01-01
This paper proposes a dynamic adaptive coscheduling modelDASIC to take advantage of excess available resources in anetwork of workstations (NOW). Besides coscheduling related subtasksdynamically, DASIC can scale up or down the process space dependingupon the number of available processors on an NOW. Based on thedynamic idle processor group (IPG), DASIC employs three modules: thecoscheduling module, the scalable scheduling module and the loadbalancing module, and uses six algorithms to achieve scalability. Asimplified DASIC was also implemented, and experimental results arepresented in this paper, which show that it can maximize systemutilization, and achieve task parallelism as much as possible.
Adaptive model training system and method
Bickford, Randall L; Palnitkar, Rahul M; Lee, Vo
2014-04-15
An adaptive model training system and method for filtering asset operating data values acquired from a monitored asset for selectively choosing asset operating data values that meet at least one predefined criterion of good data quality while rejecting asset operating data values that fail to meet at least the one predefined criterion of good data quality; and recalibrating a previously trained or calibrated model having a learned scope of normal operation of the asset by utilizing the asset operating data values that meet at least the one predefined criterion of good data quality for adjusting the learned scope of normal operation of the asset for defining a recalibrated model having the adjusted learned scope of normal operation of the asset.
Adaptive model training system and method
Energy Technology Data Exchange (ETDEWEB)
Bickford, Randall L; Palnitkar, Rahul M
2014-11-18
An adaptive model training system and method for filtering asset operating data values acquired from a monitored asset for selectively choosing asset operating data values that meet at least one predefined criterion of good data quality while rejecting asset operating data values that fail to meet at least the one predefined criterion of good data quality; and recalibrating a previously trained or calibrated model having a learned scope of normal operation of the asset by utilizing the asset operating data values that meet at least the one predefined criterion of good data quality for adjusting the learned scope of normal operation of the asset for defining a recalibrated model having the adjusted learned scope of normal operation of the asset.
Cluster regression model and level fluctuation features of Van Lake, Turkey
Directory of Open Access Journals (Sweden)
Z. Şen
Full Text Available Lake water levels change under the influences of natural and/or anthropogenic environmental conditions. Among these influences are the climate change, greenhouse effects and ozone layer depletions which are reflected in the hydrological cycle features over the lake drainage basins. Lake levels are among the most significant hydrological variables that are influenced by different atmospheric and environmental conditions. Consequently, lake level time series in many parts of the world include nonstationarity components such as shifts in the mean value, apparent or hidden periodicities. On the other hand, many lake level modeling techniques have a stationarity assumption. The main purpose of this work is to develop a cluster regression model for dealing with nonstationarity especially in the form of shifting means. The basis of this model is the combination of transition probability and classical regression technique. Both parts of the model are applied to monthly level fluctuations of Lake Van in eastern Turkey. It is observed that the cluster regression procedure does preserve the statistical properties and the transitional probabilities that are indistinguishable from the original data.
Key words. Hydrology (hydrologic budget; stochastic processes · Meteorology and atmospheric dynamics (ocean-atmosphere interactions
Regression models for near-infrared measurement of subcutaneous adipose tissue thickness.
Wang, Yu; Hao, Dongmei; Shi, Jingbin; Yang, Zeqiang; Jin, Liu; Zhang, Song; Yang, Yimin; Bin, Guangyu; Zeng, Yanjun; Zheng, Dingchang
2016-07-01
Obesity is often associated with the risks of diabetes and cardiovascular disease, and there is a need to measure subcutaneous adipose tissue (SAT) thickness for acquiring the distribution of body fat. The present study aimed to develop and evaluate different model-based methods for SAT thickness measurement using an SATmeter developed in our laboratory. Near-infrared signals backscattered from the body surfaces from 40 subjects at 20 body sites each were recorded. Linear regression (LR) and support vector regression (SVR) models were established to predict SAT thickness on different body sites. The measurement accuracy was evaluated by ultrasound, and compared with results from a mechanical skinfold caliper (MSC) and a body composition balance monitor (BCBM). The results showed that both LR- and SVR-based measurement produced better accuracy than MSC and BCBM. It was also concluded that by using regression models specifically designed for certain parts of human body, higher measurement accuracy could be achieved than using a general model for the whole body. Our results demonstrated that the SATmeter is a feasible method, which can be applied at home and in the community due to its portability and convenience.
Indian Academy of Sciences (India)
R B Magar; V Jothiprakash
2011-12-01
In this study, multi-linear regression (MLR) approach is used to construct intermittent reservoir daily inflow forecasting system. To illustrate the applicability and effect of using lumped and distributed input data in MLR approach, Koyna river watershed in Maharashtra, India is chosen as a case study. The results are also compared with autoregressive integrated moving average (ARIMA) models. MLR attempts to model the relationship between two or more independent variables over a dependent variable by fitting a linear regression equation. The main aim of the present study is to see the consequences of development and applicability of simple models, when sufficient data length is available. Out of 47 years of daily historical rainfall and reservoir inflow data, 33 years of data is used for building the model and 14 years of data is used for validating the model. Based on the observed daily rainfall and reservoir inflow, various types of time-series, cause-effect and combined models are developed using lumped and distributed input data. Model performance was evaluated using various performance criteria and it was found that as in the present case, of well correlated input data, both lumped and distributed MLR models perform equally well. For the present case study considered, both MLR and ARIMA models performed equally sound due to availability of large dataset.
Directory of Open Access Journals (Sweden)
Lihua Yang
2015-04-01
Full Text Available In order to improve the accuracy of grain production forecasting, this study proposed a new combination forecasting model, the model combined stepwise regression method with RBF neural network by assigning proper weights using inverse variance method. By comparing different criteria, the result indicates that the combination forecasting model is superior to other models. The performance of the models is measured using three types of error measurement, which are Mean Absolute Percentage Error (MAPE, Theil Inequality Coefficient (Theil IC and Root Mean Squared Error (RMSE. The model with smallest value of MAPE, Theil IC and RMSE stands out to be the best model in predicting the grain production. Based on the MAPE, Theil IC and RMSE evaluation criteria, the combination model can reduce the forecasting error and has high prediction accuracy in grain production forecasting, making the decision more scientific and rational.
Directory of Open Access Journals (Sweden)
Renata Pires Gonçalves
2012-02-01
. The experiments of type dosage x response are very common in the determination of levels of nutrients in optimal food balance and include the use of regression models to achieve this objective. Nevertheless, the regression analysis routine, generally, uses a priori information about a possible relationship between the response variable. The isotonic regression is a method of estimation by least squares that generates estimates which preserves data ordering. In the theory of isotonic regression this information is essential and it is expected to increase fitting efficiency. The objective of this work was to use an isotonic regression methodology, as an alternative way of analyzing data of Zn deposition in tibia of male birds of Hubbard lineage. We considered the models of plateau response of polynomial quadratic and linear exponential forms. In addition to these models, we also proposed the fitting of a logarithmic model to the data and the efficiency of the methodology was evaluated by Monte Carlo simulations, considering different scenarios for the parametric values. The isotonization of the data yielded an improvement in all the fitting quality parameters evaluated. Among the models used, the logarithmic presented estimates of the parameters more consistent with the values reported in literature.
Selection of higher order regression models in the analysis of multi-factorial transcription data.
Directory of Open Access Journals (Sweden)
Olivia Prazeres da Costa
Full Text Available INTRODUCTION: Many studies examine gene expression data that has been obtained under the influence of multiple factors, such as genetic background, environmental conditions, or exposure to diseases. The interplay of multiple factors may lead to effect modification and confounding. Higher order linear regression models can account for these effects. We present a new methodology for linear model selection and apply it to microarray data of bone marrow-derived macrophages. This experiment investigates the influence of three variable factors: the genetic background of the mice from which the macrophages were obtained, Yersinia enterocolitica infection (two strains, and a mock control, and treatment/non-treatment with interferon-γ. RESULTS: We set up four different linear regression models in a hierarchical order. We introduce the eruption plot as a new practical tool for model selection complementary to global testing. It visually compares the size and significance of effect estimates between two nested models. Using this methodology we were able to select the most appropriate model by keeping only relevant factors showing additional explanatory power. Application to experimental data allowed us to qualify the interaction of factors as either neutral (no interaction, alleviating (co-occurring effects are weaker than expected from the single effects, or aggravating (stronger than expected. We find a biologically meaningful gene cluster of putative C2TA target genes that appear to be co-regulated with MHC class II genes. CONCLUSIONS: We introduced the eruption plot as a tool for visual model comparison to identify relevant higher order interactions in the analysis of expression data obtained under the influence of multiple factors. We conclude that model selection in higher order linear regression models should generally be performed for the analysis of multi-factorial microarray data.
Lee, Soo Min; Lee, Jae-Won
2014-11-01
In this study, the optimal conditions for biomass torrefaction were determined by comparing the gain of energy content to the weight loss of biomass from the final products. Torrefaction experiments were performed at temperatures ranging from 220 to 280°C using 20-80min reaction times. Polynomial regression models ranging from the 1st to the 3rd order were used to determine a relationship between the severity factor (SF) and calorific value or weight loss. The intersection of two regression models for calorific value and weight loss was determined and assumed to be the optimized SF. The optimized SFs on each biomass ranged from 6.056 to 6.372. Optimized torrefaction conditions were determined at various reaction times of 15, 30, and 60min. The average optimized temperature was 248.55°C in the studied biomass when torrefaction was performed for 60min.
Support vector regression model for predicting the sorption capacity of lead (II
Directory of Open Access Journals (Sweden)
Nusrat Parveen
2016-09-01
Full Text Available Biosorption is supposed to be an economical process for the treatment of wastewater containing heavy metals like lead (II. In this research paper, the support vector regression (SVR has been used to predict the sorption capacity of lead (II ions with the independent input parameters being: initial lead ion concentration, pH, temperature and contact time. Tree fern, an agricultural by-product, has been employed as a low cost biosorbent. Comparison between multiple linear regression (MLR and SVR-based models has been made using statistical parameters. It has been found that the SVR model is more accurate and generalized for prediction of the sorption capacity of lead (II ions.
Directory of Open Access Journals (Sweden)
Giuliano de Oliveira Freitas
2013-10-01
Full Text Available PURPOSE: To determine linear regression models between Alpins descriptive indices and Thibos astigmatic power vectors (APV, assessing the validity and strength of such correlations. METHODS: This case series prospectively assessed 62 eyes of 31 consecutive cataract patients with preoperative corneal astigmatism between 0.75 and 2.50 diopters in both eyes. Patients were randomly assorted among two phacoemulsification groups: one assigned to receive AcrySof®Toric intraocular lens (IOL in both eyes and another assigned to have AcrySof Natural IOL associated with limbal relaxing incisions, also in both eyes. All patients were reevaluated postoperatively at 6 months, when refractive astigmatism analysis was performed using both Alpins and Thibos methods. The ratio between Thibos postoperative APV and preoperative APV (APVratio and its linear regression to Alpins percentage of success of astigmatic surgery, percentage of astigmatism corrected and percentage of astigmatism reduction at the intended axis were assessed. RESULTS: Significant negative correlation between the ratio of post- and preoperative Thibos APVratio and Alpins percentage of success (%Success was found (Spearman's ρ=-0.93; linear regression is given by the following equation: %Success = (-APVratio + 1.00x100. CONCLUSION: The linear regression we found between APVratio and %Success permits a validated mathematical inference concerning the overall success of astigmatic surgery.
Profile-driven regression for modeling and runtime optimization of mobile networks
DEFF Research Database (Denmark)
McClary, Dan; Syrotiuk, Violet; Kulahci, Murat
2010-01-01
of throughput in a mobile ad hoc network, a self-organizing collection of mobile wireless nodes without any fixed infrastructure. The intermediate models generated in profile-driven regression are used to fit an overall model of throughput, and are also used to optimize controllable factors at runtime. Unlike...... others, the throughput model accounts for node speed. The resulting optimization is very effective; locally optimizing the network factors at runtime results in throughput as much as six times higher than that achieved with the factors at their default levels....
Random regression models for milk, fat and protein in Colombian Buffaloes
Naudin Hurtado-Lugo; Humberto Tonhati; Raul Aspilcuelta-Borquis; Cruz Enríquez-Valencia; Mario Cerón-Muñoz
2015-01-01
Objective. Covariance functions for additive genetic and permanent environmental effects and, subsequently, genetic parameters for test-day milk (MY), fat (FY) protein (PY) yields and mozzarella cheese (MP) in buffaloes from Colombia were estimate by using Random regression models (RRM) with Legendre polynomials (LP). Materials and Methods. Test-day records of MY, FY, PY and MP from 1884 first lactations of buffalo cows from 228 sires were analyzed. The animals belonged to 14 herds in Colombi...
A simple artificial regression based Lagrange multiplier test of normality in the probit model
Murphy, Anthony
1994-01-01
A convenient artifical regression based LM test of non-normality in the probit model is derived using a Gram Charlier type A alternative. The test is simply derived and may be extended to the bivariate probit case. The outer product gradient form of LM test is not used so the proposed test is likely to perform reasonably well in small samples. The test is compared with two other existing tests. non-peer-reviewed
USE OF THE SIMPLE LINEAR REGRESSION MODEL IN MACRO-ECONOMICAL ANALYSES
Constantin ANGHELACHE
2011-01-01
The article presents the fundamental aspects of the linear regression, as a toolbox which can be used in macroeconomic analyses. The article describes the estimation of the parameters, the statistical tests used, the homoscesasticity and heteroskedasticity. The use of econometrics instrument in macroeconomics is an important factor that guarantees the quality of the models, analyses, results and possible interpretation that can be drawn at this level.
Johns, Thomas D.
1982-01-01
Approved for public release; distribution unlimited In order to determine more scientifically the value of property assisted by the Coast Guard in search and rescue incidents, regression analysis was conducted on various characteristics of vessels in order to estimate their fair market values. Data for this research were collected from the U.S. Maritime Administration, the U.S. Coast Guard, and numerous oil and steel companies. Mathematical models were developed for merch...
An Analysis of Transit Bus Driver Distraction Using Multinomial Logistic Regression Models
D'Souza, Kelwyn
2012-01-01
This paper explores the problem of distracted driving at a regional bus transit agency to identify the sources of distraction and provide an understanding of factors responsible for driver distraction. A risk range system was developed to classify the distracting activities into four risk zones. The high risk zone distracting activities were analyzed using multinomial logistic regression models to determine the impact of various factors on the multiple categorical levels of driver distraction...
Asymptotic Properties in Semiparametric Partially Linear Regression Models for Functional Data
Institute of Scientific and Technical Information of China (English)
Tao ZHANG
2013-01-01
We consider the semiparametric partially linear regression models with mean function xTβ+g(z),where X and z are functional data.The new estimators of β and g(z) are presented and some asymptotic results are given.The strong convergence rates of the proposed estimators are obtained.In our estimation,the observation number of each subject will be completely flexible.Some simulation study is conducted to investigate the finite sample performance of the proposed estimators.
USE OF THE SIMPLE LINEAR REGRESSION MODEL IN MACRO-ECONOMICAL ANALYSES
Directory of Open Access Journals (Sweden)
Constantin ANGHELACHE
2011-10-01
Full Text Available The article presents the fundamental aspects of the linear regression, as a toolbox which can be used in macroeconomic analyses. The article describes the estimation of the parameters, the statistical tests used, the homoscesasticity and heteroskedasticity. The use of econometrics instrument in macroeconomics is an important factor that guarantees the quality of the models, analyses, results and possible interpretation that can be drawn at this level.
Sohair F. Higazi; Dina H. Abdel-Hady; Samir Ahmed Al-Oulfi
2013-01-01
Regression analysis depends on several assumptions that have to be satisfied. A major assumption that is never satisfied when variables are from contiguous observations is the independence of error terms. Spatial analysis treated the violation of that assumption by two derived models that put contiguity of observations into consideration. Data used are from Egypt's 2006 latest census, for 93 counties in middle delta seven adjacent Governorates. The dependent variable used is the percent of in...
Exergy diagnosis of coal fired CHP plant with application of neural and regression modelling
Directory of Open Access Journals (Sweden)
Stanek Wojciech
2012-01-01
Full Text Available Mathematical models of the processes, that proceed in energetic machines and devices, in many cases are very complicated. In such cases, the exact analytical models should be equipped with the auxiliary empirical models that describe those parameters which are difficult to model in a theoretical way. Regression or neural models identified basing on measurements are rather simple and are characterized by relatively short computation time. For this reason they can be effectively applied for simulation and optimization of steering and regulation processes, as well as, for control and thermal diagnosis of operation (eq. power plants or CHP plants. In the paper regression and neural models of thermal processes developed for systems of operation control of thermal plants are presented. Theoretical-empirical model of processes proceeding in coal fired CHP plant have been applied. Simulative calculations basing on these models have been carried out. Results of simulative calculations have been used for the exergetic evaluation of considered power plant. The diagnosis procedure let to investigate the formation of exergy costs in interconnected components of the system of CHP, as well as, investigate the influence of defects in operation of components on exergy losses and on the exergetic cost in other components. [Acknowledgment. The paper has been prepared within the RECENT project (REsearch Center for Energy and New Technologies supported by 7th Framework Programme, Theme 4, Capacities.
Directory of Open Access Journals (Sweden)
Sohair F Higazi
2013-02-01
Full Text Available Regression analysis depends on several assumptions that have to be satisfied. A major assumption that is never satisfied when variables are from contiguous observations is the independence of error terms. Spatial analysis treated the violation of that assumption by two derived models that put contiguity of observations into consideration. Data used are from Egypt's 2006 latest census, for 93 counties in middle delta seven adjacent Governorates. The dependent variable used is the percent of individuals classified as poor (those who make less than 1$ daily. Predictors are some demographic indicators. Explanatory Spatial Data Analysis (ESDA is performed to examine the existence of spatial clustering and spatial autocorrelation between neighboring counties. The ESDA revealed spatial clusters and spatial correlation between locations. Three statistical models are applied to the data, the Ordinary Least Square regression model (OLS, the Spatial Error Model (SEM and the Spatial Lag Model (SLM.The Likelihood Ratio test and some information criterions are used to compare SLM and SEM to OLS. The SEM model proved to be better than the SLM model. Recommendations are drawn regarding the two spatial models used.
Development and comparison of regression models for the uptake of metals into various field crops.
Novotná, Markéta; Mikeš, Ondřej; Komprdová, Klára
2015-12-01
Field crops represent one of the highest contributions to dietary metal exposure. The aim of this study was to develop specific regression models for the uptake of metals into various field crops and to compare the usability of other available models. We analysed samples of potato, hop, maize, barley, wheat, rape seed, and grass from 66 agricultural sites. The influence of measured soil concentrations and soil factors (pH, organic carbon, content of silt and clay) on the plant concentrations of Cd, Cr, Cu, Mo, Ni, Pb and Zn was evaluated. Bioconcentration factors (BCF) and plant-specific metal models (PSMM) developed from multivariate regressions were calculated. The explained variability of the models was from 19 to 64% and correlations between measured and predicted concentrations were between 0.43 and 0.90. The developed hop and rapeseed models are new in this field. Available models from literature showed inaccurate results, except for Cd; the modelling efficiency was mostly around zero. The use of interaction terms between parameters can significantly improve plant-specific models. PMID:26448504
Statistical downscaling modeling with quantile regression using lasso to estimate extreme rainfall
Santri, Dewi; Wigena, Aji Hamim; Djuraidah, Anik
2016-02-01
Rainfall is one of the climatic elements with high diversity and has many negative impacts especially extreme rainfall. Therefore, there are several methods that required to minimize the damage that may occur. So far, Global circulation models (GCM) are the best method to forecast global climate changes include extreme rainfall. Statistical downscaling (SD) is a technique to develop the relationship between GCM output as a global-scale independent variables and rainfall as a local- scale response variable. Using GCM method will have many difficulties when assessed against observations because GCM has high dimension and multicollinearity between the variables. The common method that used to handle this problem is principal components analysis (PCA) and partial least squares regression. The new method that can be used is lasso. Lasso has advantages in simultaneuosly controlling the variance of the fitted coefficients and performing automatic variable selection. Quantile regression is a method that can be used to detect extreme rainfall in dry and wet extreme. Objective of this study is modeling SD using quantile regression with lasso to predict extreme rainfall in Indramayu. The results showed that the estimation of extreme rainfall (extreme wet in January, February and December) in Indramayu could be predicted properly by the model at quantile 90th.
Ben Alaya, M. A.; Chebana, F.; Ouarda, T. B. M. J.
2016-09-01
Statistical downscaling techniques are required to refine atmosphere-ocean global climate data and provide reliable meteorological information such as a realistic temporal variability and relationships between sites and variables in a changing climate. To this end, the present paper introduces a modular structure combining two statistical tools of increasing interest during the last years: (1) Gaussian copula and (2) quantile regression. The quantile regression tool is employed to specify the entire conditional distribution of downscaled variables and to address the limitations of traditional regression-based approaches whereas the Gaussian copula is performed to describe and preserve the dependence between both variables and sites. A case study based on precipitation and maximum and minimum temperatures from the province of Quebec, Canada, is used to evaluate the performance of the proposed model. Obtained results suggest that this approach is capable of generating series with realistic correlation structures and temporal variability. Furthermore, the proposed model performed better than a classical multisite multivariate statistical downscaling model for most evaluation criteria.
Growth regression models at two generations of selected populations Alabio ducks
Directory of Open Access Journals (Sweden)
L Hardi Prasetyo
2007-12-01
Full Text Available A selection process to increase egg production of Alabio ducks was conducted in Balai Penelitian Ternak, Ciawi-Bogor. The selection aimed at increasing production, however observation on growth of the selected ducks was necessary since early growth stage (0-8 wks determines the performance during laying period. This paper presents the growth models and the coefficient of determination of two generations of selected Alabio ducks. Body weight were observed weekly on 363 ducks from F1 and 356 ducks from F2, between 0-8 weeks and then fortinghly until 16 weeks. Growth curves were analysed using regression models between age and bodyweight of each population. The selection of model with the best fit was based on the large value of determination coefficient (R2, small value of MSE, and sinificant level of regression coefficient. Result showed that cubic polynomial regression was the best fit for the two populations, Y = 56.31-1.44X+0.64X2-0.005X3 for F1 and Y = 43.05 + 0.96X + 0.69X2 - 0.0056X3 for F2. The values of R2 were 0.9466 for F1 and 0.9243 for F2, and the values of MSE were 11.586 for F1 and 19.978 for F2. The growth of F1 is better during starter period, but F2 is better during grower period.
Directory of Open Access Journals (Sweden)
Samsuri Abdullah
2016-07-01
Full Text Available Air pollution in Peninsular Malaysia is dominated by particulate matter which is demonstrated by having the highest Air Pollution Index (API value compared to the other pollutants at most part of the country. Particulate Matter (PM10 forecasting models development is crucial because it allows the authority and citizens of a community to take necessary actions to limit their exposure to harmful levels of particulates pollution and implement protection measures to significantly improve air quality on designated locations. This study aims in improving the ability of MLR using PCs inputs for PM10 concentrations forecasting. Daily observations for PM10 in Kuala Terengganu, Malaysia from January 2003 till December 2011 were utilized to forecast PM10 concentration levels. MLR and PCR (using PCs input models were developed and the performance was evaluated using RMSE, NAE and IA. Results revealed that PCR performed better than MLR due to the implementation of PCA which reduce intricacy and eliminate data multi-collinearity.
Directory of Open Access Journals (Sweden)
Regis Wendpouire Oubida
2015-03-01
Full Text Available Local adaptation to climate in temperate forest trees involves the integration of multiple physiological, morphological, and phenological traits. Latitudinal clines are frequently observed for these traits, but environmental constraints also track longitude and altitude. We combined extensive phenotyping of 12 candidate adaptive traits, multivariate regression trees, quantitative genetics, and a genome-wide panel of SNP markers to better understand the interplay among geography, climate, and adaptation to abiotic factors in Populus trichocarpa. Heritabilities were low to moderate (0.13 to 0.32 and population differentiation for many traits exceeded the 99th percentile of the genome-wide distribution of FST, suggesting local adaptation. When climate variables were taken as predictors and the 12 traits as response variables in a multivariate regression tree analysis, evapotranspiration (Eref explained the most variation, with subsequent splits related to mean temperature of the warmest month, frost-free period (FFP, and mean annual precipitation (MAP. These grouping matched relatively well the splits using geographic variables as predictors: the northernmost groups (short FFP and low Eref had the lowest growth, and lowest cold injury index; the southern British Columbia group (low Eref and intermediate temperatures had average growth and cold injury index; the group from the coast of California and Oregon (high Eref and FFP had the highest growth performance and the highest cold injury index; and the southernmost, high-altitude group (with high Eref and low FFP performed poorly, had high cold injury index, and lower water use efficiency. Taken together, these results suggest variation in both temperature and water availability across the range shape multivariate adaptive traits in poplar.
Oubida, Regis W; Gantulga, Dashzeveg; Zhang, Man; Zhou, Lecong; Bawa, Rajesh; Holliday, Jason A
2015-01-01
Local adaptation to climate in temperate forest trees involves the integration of multiple physiological, morphological, and phenological traits. Latitudinal clines are frequently observed for these traits, but environmental constraints also track longitude and altitude. We combined extensive phenotyping of 12 candidate adaptive traits, multivariate regression trees, quantitative genetics, and a genome-wide panel of SNP markers to better understand the interplay among geography, climate, and adaptation to abiotic factors in Populus trichocarpa. Heritabilities were low to moderate (0.13-0.32) and population differentiation for many traits exceeded the 99th percentile of the genome-wide distribution of FST, suggesting local adaptation. When climate variables were taken as predictors and the 12 traits as response variables in a multivariate regression tree analysis, evapotranspiration (Eref) explained the most variation, with subsequent splits related to mean temperature of the warmest month, frost-free period (FFP), and mean annual precipitation (MAP). These grouping matched relatively well the splits using geographic variables as predictors: the northernmost groups (short FFP and low Eref) had the lowest growth, and lowest cold injury index; the southern British Columbia group (low Eref and intermediate temperatures) had average growth and cold injury index; the group from the coast of California and Oregon (high Eref and FFP) had the highest growth performance and the highest cold injury index; and the southernmost, high-altitude group (with high Eref and low FFP) performed poorly, had high cold injury index, and lower water use efficiency. Taken together, these results suggest variation in both temperature and water availability across the range shape multivariate adaptive traits in poplar. PMID:25870603
Adaptation dynamics of the quasispecies model
Indian Academy of Sciences (India)
Kavita Jain
2008-08-01
We study the adaptation dynamics of an initially maladapted population evolving via the elementary processes of mutation and selection. The evolution occurs on rugged fitness landscapes which are defined on the multi-dimensional genotypic space and have many local peaks separated by low fitness valleys. We mainly focus on the Eigen’s model that describes the deterministic dynamics of an infinite number of self-replicating molecules. In the stationary state, for small mutation rates such a population forms a quasispecies which consists of the fittest genotype and its closely related mutants. The quasispecies dynamics on rugged fitness landscape follow a punctuated (or step-like) pattern in which a population jumps from a low fitness peak to a higher one, stays there for a considerable time before shifting the peak again and eventually reaches the global maximum of the fitness landscape. We calculate exactly several properties of this dynamical process within a simplified version of the quasispecies model.
Wishart, Justin Rory
2011-01-01
In this paper, a lower bound is determined in the minimax sense for change point estimators of the first derivative of a regression function in the fractional white noise model. Similar minimax results presented previously in the area focus on change points in the derivatives of a regression function in the white noise model or consider estimation of the regression function in the presence of correlated errors.
European upper mantle tomography: adaptively parameterized models
Schäfer, J.; Boschi, L.
2009-04-01
We have devised a new algorithm for upper-mantle surface-wave tomography based on adaptive parameterization: i.e. the size of each parameterization pixel depends on the local density of seismic data coverage. The advantage in using this kind of parameterization is that a high resolution can be achieved in regions with dense data coverage while a lower (and cheaper) resolution is kept in regions with low coverage. This way, parameterization is everywhere optimal, both in terms of its computational cost, and of model resolution. This is especially important for data sets with inhomogenous data coverage, as it is usually the case for global seismic databases. The data set we use has an especially good coverage around Switzerland and over central Europe. We focus on periods from 35s to 150s. The final goal of the project is to determine a new model of seismic velocities for the upper mantle underlying Europe and the Mediterranean Basin, of resolution higher than what is currently found in the literature. Our inversions involve regularization via norm and roughness minimization, and this in turn requires that discrete norm and roughness operators associated with our adaptive grid be precisely defined. The discretization of the roughness damping operator in the case of adaptive parameterizations is not as trivial as it is for the uniform ones; important complications arise from the significant lateral variations in the size of pixels. We chose to first define the roughness operator in a spherical harmonic framework, and subsequently translate it to discrete pixels via a linear transformation. Since the smallest pixels we allow in our parameterization have a size of 0.625 °, the spherical-harmonic roughness operator has to be defined up to harmonic degree 899, corresponding to 810.000 harmonic coefficients. This results in considerable computational costs: we conduct the harmonic-pixel transformations on a small Beowulf cluster. We validate our implementation of adaptive
Bayesian structured additive regression modeling of epidemic data: application to cholera
Directory of Open Access Journals (Sweden)
Osei Frank B
2012-08-01
Full Text Available Abstract Background A significant interest in spatial epidemiology lies in identifying associated risk factors which enhances the risk of infection. Most studies, however, make no, or limited use of the spatial structure of the data, as well as possible nonlinear effects of the risk factors. Methods We develop a Bayesian Structured Additive Regression model for cholera epidemic data. Model estimation and inference is based on fully Bayesian approach via Markov Chain Monte Carlo (MCMC simulations. The model is applied to cholera epidemic data in the Kumasi Metropolis, Ghana. Proximity to refuse dumps, density of refuse dumps, and proximity to potential cholera reservoirs were modeled as continuous functions; presence of slum settlers and population density were modeled as fixed effects, whereas spatial references to the communities were modeled as structured and unstructured spatial effects. Results We observe that the risk of cholera is associated with slum settlements and high population density. The risk of cholera is equal and lower for communities with fewer refuse dumps, but variable and higher for communities with more refuse dumps. The risk is also lower for communities distant from refuse dumps and potential cholera reservoirs. The results also indicate distinct spatial variation in the risk of cholera infection. Conclusion The study highlights the usefulness of Bayesian semi-parametric regression model analyzing public health data. These findings could serve as novel information to help health planners and policy makers in making effective decisions to control or prevent cholera epidemics.
Balshi, M. S.; McGuire, A.D.; Duffy, P.; Flannigan, M.; Walsh, J.; Melillo, J.
2009-01-01
Fire is a common disturbance in the North American boreal forest that influences ecosystem structure and function. The temporal and spatial dynamics of fire are likely to be altered as climate continues to change. In this study, we ask the question: how will area burned in boreal North America by wildfire respond to future changes in climate? To evaluate this question, we developed temporally and spatially explicit relationships between air temperature and fuel moisture codes derived from the Canadian Fire Weather Index System to estimate annual area burned at 2.5?? (latitude ?? longitude) resolution using a Multivariate Adaptive Regression Spline (MARS) approach across Alaska and Canada. Burned area was substantially more predictable in the western portion of boreal North America than in eastern Canada. Burned area was also not very predictable in areas of substantial topographic relief and in areas along the transition between boreal forest and tundra. At the scale of Alaska and western Canada, the empirical fire models explain on the order of 82% of the variation in annual area burned for the period 1960-2002. July temperature was the most frequently occurring predictor across all models, but the fuel moisture codes for the months June through August (as a group) entered the models as the most important predictors of annual area burned. To predict changes in the temporal and spatial dynamics of fire under future climate, the empirical fire models used output from the Canadian Climate Center CGCM2 global climate model to predict annual area burned through the year 2100 across Alaska and western Canada. Relative to 1991-2000, the results suggest that average area burned per decade will double by 2041-2050 and will increase on the order of 3.5-5.5 times by the last decade of the 21st century. To improve the ability to better predict wildfire across Alaska and Canada, future research should focus on incorporating additional effects of long-term and successional
Auto-Regressive Models of Non-Stationary Time Series with Finite Length
Institute of Scientific and Technical Information of China (English)
FEI Wanchun; BAI Lun
2005-01-01
To analyze and simulate non-stationary time series with finite length, the statistical characteristics and auto-regressive (AR) models of non-stationary time series with finite length are discussed and studied. A new AR model called the time varying parameter AR model is proposed for solution of non-stationary time series with finite length. The auto-covariances of time series simulated by means of several AR models are analyzed. The result shows that the new AR model can be used to simulate and generate a new time series with the auto-covariance same as the original time series. The size curves of cocoon filaments regarded as non-stationary time series with finite length are experimentally simulated. The simulation results are significantly better than those obtained so far, and illustrate the availability of the time varying parameter AR model. The results are useful for analyzing and simulating non-stationary time series with finite length.
Grain Yield Prediction of Henan Province Based on Spatio-temporal Regression Model
Institute of Scientific and Technical Information of China (English)
2011-01-01
By using correlation analysis method,regression analysis method and time sequence method,we combine time and space,to establish grain yield spatio-temporal regression prediction model of Henan Province and all prefecture-level cities.At first,we use the grain yield in prefecture-level cities of Henan in the year 2000 and 2005,to establish regression model,and then taking the grain yield in one year as independent variable,we predict the grain yield in the fifth year afterwards.Taking the dependent variable value as independent variable again,we predict the grain yield at an interval of the same years,and based on this,predict year by year forward until the year we need.The research shows that the grain yield of Henan Province in the year 2015 and 2020 is 59.849 6 and 67.929 3 million t respectively,consistent with the research results of other scholars to some extent.
Capacitance Regression Modelling Analysis on Latex from Selected Rubber Tree Clones
Rosli, A. D.; Hashim, H.; Khairuzzaman, N. A.; Mohd Sampian, A. F.; Baharudin, R.; Abdullah, N. E.; Sulaiman, M. S.; Kamaru'zzaman, M.
2015-11-01
This paper investigates the capacitance regression modelling performance of latex for various rubber tree clones, namely clone 2002, 2008, 2014 and 3001. Conventionally, the rubber tree clones identification are based on observation towards tree features such as shape of leaf, trunk, branching habit and pattern of seeds texture. The former method requires expert persons and very time-consuming. Currently, there is no sensing device based on electrical properties that can be employed to measure different clones from latex samples. Hence, with a hypothesis that the dielectric constant of each clone varies, this paper discusses the development of a capacitance sensor via Capacitance Comparison Bridge (known as capacitance sensor) to measure an output voltage of different latex samples. The proposed sensor is initially tested with 30ml of latex sample prior to gradually addition of dilution water. The output voltage and capacitance obtained from the test are recorded and analyzed using Simple Linear Regression (SLR) model. This work outcome infers that latex clone of 2002 has produced the highest and reliable linear regression line with determination coefficient of 91.24%. In addition, the study also found that the capacitive elements in latex samples deteriorate if it is diluted with higher volume of water.
Ko, K.; Cheong, B.; Koh, D.
2010-12-01
Groundwater has been used a main source to provide a drinking water in a rural area with no regional potable water supply system in Korea. More than 50 percent of rural area residents depend on groundwater as drinking water. Thus, research on predicting groundwater pollution for the sustainable groundwater usage and protection from potential pollutants was demanded. This study was carried out to know the vulnerability of groundwater nitrate contamination reflecting the effect of land use in Nonsan city of a representative rural area of South Korea. About 47% of the study area is occupied by cultivated land with high vulnerable area to groundwater nitrate contamination because it has higher nitrogen fertilizer input of 62.3 tons/km2 than that of country’s average of 44.0 tons/km2. The two vulnerability assessment methods, logistic regression and DRASTIC model, were tested and compared to know more suitable techniques for the assessment of groundwater nitrate contamination in Nonsan area. The groundwater quality data were acquired from the collection of analyses of 111 samples of small potable supply system in the study area. The analyzed values of nitrate were classified by land use such as resident, upland, paddy, and field area. One dependent and two independent variables were addressed for logistic regression analysis. One dependent variable was a binary categorical data with 0 or 1 whether or not nitrate exceeding thresholds of 1 through 10 mg/L. The independent variables were one continuous data of slope indicating topography and multiple categorical data of land use which are classified by resident, upland, paddy, and field area. The results of the Levene’s test and T-test for slope and land use were showed the significant difference of mean values among groups in 95% confidence level. From the logistic regression, we could know the negative correlation between slope and nitrate which was caused by the decrease of contaminants inputs into groundwater with
Haben, Stephen
2016-01-01
We present a model for generating probabilistic forecasts by combining kernel density estimation (KDE) and quantile regression techniques, as part of the probabilistic load forecasting track of the Global Energy Forecasting Competition 2014. The KDE method is initially implemented with a time-decay parameter. We later improve this method by conditioning on the temperature or the period of the week variables to provide more accurate forecasts. Secondly, we develop a simple but effective quantile regression forecast. The novel aspects of our methodology are two-fold. First, we introduce symmetry into the time-decay parameter of the kernel density estimation based forecast. Secondly we combine three probabilistic forecasts with different weights for different periods of the month.
Lim, Changwon
2015-03-30
Nonlinear regression is often used to evaluate the toxicity of a chemical or a drug by fitting data from a dose-response study. Toxicologists and pharmacologists may draw a conclusion about whether a chemical is toxic by testing the significance of the estimated parameters. However, sometimes the null hypothesis cannot be rejected even though the fit is quite good. One possible reason for such cases is that the estimated standard errors of the parameter estimates are extremely large. In this paper, we propose robust ridge regression estimation procedures for nonlinear models to solve this problem. The asymptotic properties of the proposed estimators are investigated; in particular, their mean squared errors are derived. The performances of the proposed estimators are compared with several standard estimators using simulation studies. The proposed methodology is also illustrated using high throughput screening assay data obtained from the National Toxicology Program. PMID:25490981
de Souza, R S; Killedar, M; Hilbe, J; Vilalta, R; Maio, U; Biffi, V; Ciardi, B; Riggs, J D
2014-01-01
Revealing hidden patterns in astronomical data is often the path to fundamental scientific breakthroughs; meanwhile the complexity of scientific inquiry increases as more subtle relationships are sought. Contemporary data analysis problems often elude the capabilities of classical statistical techniques, suggesting the use of cutting edge statistical methods. In this light, astronomers have overlooked a whole family of statistical techniques for exploratory data analysis and robust regression, the so-called Generalized Linear Models (GLMs). In this paper -- the first in a series aimed at illustrating the power of these methods in astronomical applications -- we elucidate the potential of a particular class of GLMs for handling binary/binomial data, the so-called logit and probit regression techniques, from both a maximum likelihood and a Bayesian perspective. As a case in point, we present the use of these GLMs to explore the conditions of star formation activity and metal enrichment in primordial minihaloes ...
Institute of Scientific and Technical Information of China (English)
Lingling; TAN
2013-01-01
This article selects some major factors influencing the agricultural economic growth are selected,such as labor,capital input,farmland area,fertilizer input and information input.And it selects some factors to explain information input,such as the number of website ownership,types of books,magazines and newspapers published,the number of telephone ownership per 100 households,the number of home computers ownership per 100 households,farmers’ spending on transportation and communication,culture,education,entertainment and services, and the total number of agricultural science and technology service personnel.Using regression model,this article conducts regression analysis of the cross-section data on 31 provinces,autonomous regions and municipalities in 2010.The results show that the building of information infrastructure,the use of means of information,the popularization and promotion of knowledge of agricultural science and technology,play an important role in promoting agricultural economic growth.
Asymptotic properties for the semiparametric regression model with randomly censored data
Institute of Scientific and Technical Information of China (English)
王启华; 郑忠国
1997-01-01
Suppose that the patients’ survival times,Y,are random variables following the semiparametric regression model Y=Xβ+g(T)+ε,where (X,T) is a radom vector taking values in R×[0,1],β is an unknown parameter,g(·) is an unknown smooth regression function and εis the random error with zero mean and variance σ2.It is assumed that (X,T) is independent of ε.The estimators βn and gm(·) ofβ and g(·) are defined,respectively,when the observations are randomly censored on the right and the censoring distribution is unknown.Moreover,it isshown that βm is asymptotically normal and gm(·) is weak consistence with rate Op(n-1/3).
Existence of unbiased estimate of regression parameters in simple linear EV models
Institute of Scientific and Technical Information of China (English)
LIU; Jixue
2005-01-01
It is well known that for one-dimensional normal EV regression model X = x +u, Y= α+βx+e, where x, u, e are mutually independent normal variables and Eu = Ee= 0,the regression parameters α andβ are not identifiable without some restriction imposed on the parameters. This paper discusses the problem of existence of unbiased estimate for α and β under some restrictions commonly used in practice. It is proved that the unbiased estimate does not exist under many such restrictions. We also point out one important case in which the unbiased estimates of α andβ exist, and the form of the MVUE of α and β are also given.
A semi-parametric regression model for analysis of middle censored lifetime data
Directory of Open Access Journals (Sweden)
S. Rao Jammalamadaka
2016-03-01
Full Text Available Middle censoring introduced by Jammalamadaka and Mangalam (2003, refers to data arising in situations where the exact lifetime becomes unobservable if it falls within a random censoring interval, otherwise it is observable. In the present paper we propose a semi-parametric regression model for such lifetime data, arising from an unknown population and subject to middle censoring. We provide an algorithm to find the nonparametric maximum likelihood estimator (NPMLE for regression parameters and the survival function. The consistency of the estimators are established. We report simulation studies to assess the finite sample properties of the estimators. We then analyze a real life data on survival times for diabetic patients studied by Lee et al. (1988.
A self-organizing power system stabilizer using Fuzzy Auto-Regressive Moving Average (FARMA) model
Energy Technology Data Exchange (ETDEWEB)
Park, Y.M.; Moon, U.C. [Seoul National Univ. (Korea, Republic of). Electrical Engineering Dept.; Lee, K.Y. [Pennsylvania State Univ., University Park, PA (United States). Electrical Engineering Dept.
1996-06-01
This paper presents a self-organizing power system stabilizer (SOPSS) which use the Fuzzy Auto-Regressive Moving Average (FARMA) model. The control rules and the membership functions of the proposed logic controller are generated automatically without using any plant model. The generated rules are stored in the fuzzy rule space and updated on-line by a self-organizing procedure. To show the effectiveness of the proposed controller, comparison with a conventional controller for one-machine infinite-bus system is presented.
Modeling population density across major US cities: a polycentric spatial regression approach
Griffith, Daniel A.; Wong, David W.
2007-04-01
A common approach to modeling population density gradients across a city is to adjust the specification of a selected set of mathematical functions to achieve the best fit to an urban place’s empirical density values. In this paper, we employ a spatial regression approach that takes into account the spatial autocorrelation latent in urban population density. We also use a Minkowskian distance metric instead of Euclidean or network distance to better describe spatial separation. We apply our formulation to the 20 largest metropolitan areas in the US according to the 2000 census, using block group level data. The general model furnishes good descriptions for both monocentric and polycentric cities.
Generalized Empirical Likelihood Inference in Semiparametric Regression Model for Longitudinal Data
Institute of Scientific and Technical Information of China (English)
Gao Rong LI; Ping TIAN; Liu Gen XUE
2008-01-01
In this paper, we consider the semiparametric regression model for longitudinal data. Due to the correlation within groups, a generalized empirical log-likelihood ratio statistic for the unknown parameters in the model is suggested by introducing the working covariance matrix. It is proved that the proposed statistic is asymptotically standard chi-squared under some suitable conditions, and hence it can be used to construct the confidence regions of the parameters. A simulation study is conducted to compare the proposed method with the generalized least squares method in terms of coverage accuracy and average lengths of the confidence intervals.
Estimation of mass flow of seeds using fibre sensor and multiple linear regression modelling
Al-Mallahi, A. A.; Kataoka, T
2013-01-01
A new methodology to estimate the mass of grain seeds, which flow in the shape of clumps, was suggested in this paper. The methodology used an off-the-shelf digital fibre sensor to detect the behaviour of the clumps and multiple linear regression modelling to estimate the mass by the parameters detected by the sensor which were the length and the density of the clumps. An indoor apparatus was used for modelling which resembled the sowing process using the grain drill. A fluted roller was inst...
EFFICIENT ESTIMATION OF FUNCTIONAL-COEFFICIENT REGRESSION MODELS WITH DIFFERENT SMOOTHING VARIABLES
Institute of Scientific and Technical Information of China (English)
Zhang Riquan; Li Guoying
2008-01-01
In this article, a procedure for estimating the coefficient functions on the functional-coefficient regression models with different smoothing variables in different co-efficient functions is defined. First step, by the local linear technique and the averaged method, the initial estimates of the coefficient functions are given. Second step, based on the initial estimates, the efficient estimates of the coefficient functions are proposed by a one-step back-fitting procedure. The efficient estimators share the same asymptotic normalities as the local linear estimators for the functional-coefficient models with a single smoothing variable in different functions. Two simulated examples show that the procedure is effective.
Naghibi, Seyed Amir; Pourghasemi, Hamid Reza; Dixon, Barnali
2016-01-01
Groundwater is considered one of the most valuable fresh water resources. The main objective of this study was to produce groundwater spring potential maps in the Koohrang Watershed, Chaharmahal-e-Bakhtiari Province, Iran, using three machine learning models: boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF). Thirteen hydrological-geological-physiographical (HGP) factors that influence locations of springs were considered in this research. These factors include slope degree, slope aspect, altitude, topographic wetness index (TWI), slope length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, lithology, land use, drainage density, and fault density. Subsequently, groundwater spring potential was modeled and mapped using CART, RF, and BRT algorithms. The predicted results from the three models were validated using the receiver operating characteristics curve (ROC). From 864 springs identified, 605 (≈70 %) locations were used for the spring potential mapping, while the remaining 259 (≈30 %) springs were used for the model validation. The area under the curve (AUC) for the BRT model was calculated as 0.8103 and for CART and RF the AUC were 0.7870 and 0.7119, respectively. Therefore, it was concluded that the BRT model produced the best prediction results while predicting locations of springs followed by CART and RF models, respectively. Geospatially integrated BRT, CART, and RF methods proved to be useful in generating the spring potential map (SPM) with reasonable accuracy.
Naghibi, Seyed Amir; Pourghasemi, Hamid Reza; Dixon, Barnali
2016-01-01
Groundwater is considered one of the most valuable fresh water resources. The main objective of this study was to produce groundwater spring potential maps in the Koohrang Watershed, Chaharmahal-e-Bakhtiari Province, Iran, using three machine learning models: boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF). Thirteen hydrological-geological-physiographical (HGP) factors that influence locations of springs were considered in this research. These factors include slope degree, slope aspect, altitude, topographic wetness index (TWI), slope length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, lithology, land use, drainage density, and fault density. Subsequently, groundwater spring potential was modeled and mapped using CART, RF, and BRT algorithms. The predicted results from the three models were validated using the receiver operating characteristics curve (ROC). From 864 springs identified, 605 (≈70 %) locations were used for the spring potential mapping, while the remaining 259 (≈30 %) springs were used for the model validation. The area under the curve (AUC) for the BRT model was calculated as 0.8103 and for CART and RF the AUC were 0.7870 and 0.7119, respectively. Therefore, it was concluded that the BRT model produced the best prediction results while predicting locations of springs followed by CART and RF models, respectively. Geospatially integrated BRT, CART, and RF methods proved to be useful in generating the spring potential map (SPM) with reasonable accuracy. PMID:26687087
An Adaptive Learning Model in Coordination Games
Directory of Open Access Journals (Sweden)
Naoki Funai
2013-11-01
Full Text Available In this paper, we provide a theoretical prediction of the way in which adaptive players behave in the long run in normal form games with strict Nash equilibria. In the model, each player assigns subjective payoff assessments to his own actions, where the assessment of each action is a weighted average of its past payoffs, and chooses the action which has the highest assessment. After receiving a payoff, each player updates the assessment of his chosen action in an adaptive manner. We show almost sure convergence to a Nash equilibrium under one of the following conditions: (i that, at any non-Nash equilibrium action profile, there exists a player who receives a payoff, which is less than his maximin payoff; (ii that all non-Nash equilibrium action profiles give the same payoff. In particular, the convergence is shown in the following games: the battle of the sexes game, the stag hunt game and the first order statistic game. In the game of chicken and market entry games, players may end up playing the action profile, which consists of each player’s unique maximin action.
International Nuclear Information System (INIS)
A central issue in contemporary science is the development of nonlinear data driven statistical–dynamical models for time series of noisy partial observations from nature or a complex model. It has been established recently that ad-hoc quadratic multi-level regression models can have finite-time blow-up of statistical solutions and/or pathological behavior of their invariant measure. Recently, a new class of physics constrained nonlinear regression models were developed to ameliorate this pathological behavior. Here a new finite ensemble Kalman filtering algorithm is developed for estimating the state, the linear and nonlinear model coefficients, the model and the observation noise covariances from available partial noisy observations of the state. Several stringent tests and applications of the method are developed here. In the most complex application, the perfect model has 57 degrees of freedom involving a zonal (east–west) jet, two topographic Rossby waves, and 54 nonlinearly interacting Rossby waves; the perfect model has significant non-Gaussian statistics in the zonal jet with blocked and unblocked regimes and a non-Gaussian skewed distribution due to interaction with the other 56 modes. We only observe the zonal jet contaminated by noise and apply the ensemble filter algorithm for estimation. Numerically, we find that a three dimensional nonlinear stochastic model with one level of memory mimics the statistical effect of the other 56 modes on the zonal jet in an accurate fashion, including the skew non-Gaussian distribution and autocorrelation decay. On the other hand, a similar stochastic model with zero memory levels fails to capture the crucial non-Gaussian behavior of the zonal jet from the perfect 57-mode model
Energy Technology Data Exchange (ETDEWEB)
Harlim, John, E-mail: jharlim@psu.edu [Department of Mathematics and Department of Meteorology, the Pennsylvania State University, University Park, PA 16802, Unites States (United States); Mahdi, Adam, E-mail: amahdi@ncsu.edu [Department of Mathematics, North Carolina State University, Raleigh, NC 27695 (United States); Majda, Andrew J., E-mail: jonjon@cims.nyu.edu [Department of Mathematics and Center for Atmosphere and Ocean Science, Courant Institute of Mathematical Sciences, New York University, New York, NY 10012 (United States)
2014-01-15
A central issue in contemporary science is the development of nonlinear data driven statistical–dynamical models for time series of noisy partial observations from nature or a complex model. It has been established recently that ad-hoc quadratic multi-level regression models can have finite-time blow-up of statistical solutions and/or pathological behavior of their invariant measure. Recently, a new class of physics constrained nonlinear regression models were developed to ameliorate this pathological behavior. Here a new finite ensemble Kalman filtering algorithm is developed for estimating the state, the linear and nonlinear model coefficients, the model and the observation noise covariances from available partial noisy observations of the state. Several stringent tests and applications of the method are developed here. In the most complex application, the perfect model has 57 degrees of freedom involving a zonal (east–west) jet, two topographic Rossby waves, and 54 nonlinearly interacting Rossby waves; the perfect model has significant non-Gaussian statistics in the zonal jet with blocked and unblocked regimes and a non-Gaussian skewed distribution due to interaction with the other 56 modes. We only observe the zonal jet contaminated by noise and apply the ensemble filter algorithm for estimation. Numerically, we find that a three dimensional nonlinear stochastic model with one level of memory mimics the statistical effect of the other 56 modes on the zonal jet in an accurate fashion, including the skew non-Gaussian distribution and autocorrelation decay. On the other hand, a similar stochastic model with zero memory levels fails to capture the crucial non-Gaussian behavior of the zonal jet from the perfect 57-mode model.
Bayesian binary regression model: an application to in-hospital death after AMI prediction
Directory of Open Access Journals (Sweden)
Aparecida D. P. Souza
2004-08-01
Full Text Available A Bayesian binary regression model is developed to predict death of patients after acute myocardial infarction (AMI. Markov Chain Monte Carlo (MCMC methods are used to make inference and to evaluate Bayesian binary regression models. A model building strategy based on Bayes factor is proposed and aspects of model validation are extensively discussed in the paper, including the posterior distribution for the c-index and the analysis of residuals. Risk assessment, based on variables easily available within minutes of the patients' arrival at the hospital, is very important to decide the course of the treatment. The identified model reveals itself strongly reliable and accurate, with a rate of correct classification of 88% and a concordance index of 83%.Um modelo bayesiano de regressão binária é desenvolvido para predizer óbito hospitalar em pacientes acometidos por infarto agudo do miocárdio. Métodos de Monte Carlo via Cadeias de Markov (MCMC são usados para fazer inferência e validação. Uma estratégia para construção de modelos, baseada no uso do fator de Bayes, é proposta e aspectos de validação são extensivamente discutidos neste artigo, incluindo a distribuição a posteriori para o índice de concordância e análise de resíduos. A determinação de fatores de risco, baseados em variáveis disponíveis na chegada do paciente ao hospital, é muito importante para a tomada de decisão sobre o curso do tratamento. O modelo identificado se revela fortemente confiável e acurado, com uma taxa de classificação correta de 88% e um índice de concordância de 83%.
The limiting behavior of the estimated parameters in a misspecified random field regression model
DEFF Research Database (Denmark)
Dahl, Christian Møller; Qin, Yu
This paper examines the limiting properties of the estimated parameters in the random field regression model recently proposed by Hamilton (Econometrica, 2001). Though the model is parametric, it enjoys the flexibility of the nonparametric approach since it can approximate a large collection...... of nonlinear functions and it has the added advantage that there is no "curse of dimensionality."Contrary to existing literature on the asymptotic properties of the estimated parameters in random field models our results do not require that the explanatory variables are sampled on a grid. However......, as a consequence the random field model specification introduces non-stationarity and non-ergodicity in the misspecified model and it becomes non-trivial, relative to the existing literature, to establish the limiting behavior of the estimated parameters. The asymptotic results are obtained by applying some...
Institute of Scientific and Technical Information of China (English)
XU Jing; YANG Chi; ZHANG Guoping
2007-01-01
Information model is adopted to integrate factors of various geosciences to estimate the susceptibility of geological hazards. Further combining the dynamic rainfall observations, Logistic regression is used for modeling the probabilities of geological hazard occurrences, upon which hierarchical warnings for rainfall-induced geological hazards are produced. The forecasting and warning model takes numerical precipitation forecasts on grid points as its dynamic input, forecasts the probabilities of geological hazard occurrences on the same grid, and translates the results into likelihoods in the form of a 5-level hierarchy. Validation of the model with observational data for the year 2004 shows that 80% of the geological hazards of the year have been identified as "likely enough to release warning messages". The model can satisfy the requirements of an operational warning system, thus is an effective way to improve the meteorological warnings for geological hazards.
Adaptable Multivariate Calibration Models for Spectral Applications
Energy Technology Data Exchange (ETDEWEB)
THOMAS,EDWARD V.
1999-12-20
Multivariate calibration techniques have been used in a wide variety of spectroscopic situations. In many of these situations spectral variation can be partitioned into meaningful classes. For example, suppose that multiple spectra are obtained from each of a number of different objects wherein the level of the analyte of interest varies within each object over time. In such situations the total spectral variation observed across all measurements has two distinct general sources of variation: intra-object and inter-object. One might want to develop a global multivariate calibration model that predicts the analyte of interest accurately both within and across objects, including new objects not involved in developing the calibration model. However, this goal might be hard to realize if the inter-object spectral variation is complex and difficult to model. If the intra-object spectral variation is consistent across objects, an effective alternative approach might be to develop a generic intra-object model that can be adapted to each object separately. This paper contains recommendations for experimental protocols and data analysis in such situations. The approach is illustrated with an example involving the noninvasive measurement of glucose using near-infrared reflectance spectroscopy. Extensions to calibration maintenance and calibration transfer are discussed.
Multiple model adaptive tracking of airborne targets
Norton, John E.
1988-12-01
Over the past ten years considerable work has been accomplished at the Air Force Institute of Technology (AFIT) towards improving the ability of tracking airborne targets. Motivated by the performance advantages in using established models of tracking environment variables within a Kalman filter, an advanced tracking algorithm has been developed based on adaptive estimation filter structures. A multiple model bank of filters that have been designed for various target dynamics, which each accounting for atmospheric disturbance of the Forward Looking Infrared (FLIR) sensor data and mechanical vibrations of the sensor platform, outperforms a correlator tracker. The bank of filters provides the estimation capability to guide the pointing mechanisms of a shared aperture laser/sensor system. The data is provided to the tracking algorithm via an (8 x 8)-pixel tracking Field of View (FOV) from the FLIR image plane. Data at each sample period is compared by an enhanced correlator to a target template. These offsets are measurements to a bank of linear Kalman filters which provide estimates of the target's location in azimuth and elevation coordinates based on a Gauss-Markov acceleration model, and a reduced form of the atmospheric jitter model for the disturbance in the IR wavefront carrying future measurements.
Use of Poisson spatiotemporal regression models for the Brazilian Amazon Forest: malaria count data
Directory of Open Access Journals (Sweden)
Jorge Alberto Achcar
2011-12-01
Full Text Available INTRODUCTION: Malaria is a serious problem in the Brazilian Amazon region, and the detection of possible risk factors could be of great interest for public health authorities. The objective of this article was to investigate the association between environmental variables and the yearly registers of malaria in the Amazon region using Bayesian spatiotemporal methods. METHODS: We used Poisson spatiotemporal regression models to analyze the Brazilian Amazon forest malaria count for the period from 1999 to 2008. In this study, we included some covariates that could be important in the yearly prediction of malaria, such as deforestation rate. We obtained the inferences using a Bayesian approach and Markov Chain Monte Carlo (MCMC methods to simulate samples for the joint posterior distribution of interest. The discrimination of different models was also discussed. RESULTS: The model proposed here suggests that deforestation rate, the number of inhabitants per km², and the human development index (HDI are important in the prediction of malaria cases. CONCLUSIONS: It is possible to conclude that human development, population growth, deforestation, and their associated ecological alterations are conducive to increasing malaria risk. We conclude that the use of Poisson regression models that capture the spatial and temporal effects under the Bayesian paradigm is a good strategy for modeling malaria counts.
Estimating the Impact of Urbanization on Air Quality in China Using Spatial Regression Models
Directory of Open Access Journals (Sweden)
Chuanglin Fang
2015-11-01
Full Text Available Urban air pollution is one of the most visible environmental problems to have accompanied China’s rapid urbanization. Based on emission inventory data from 2014, gathered from 289 cities, we used Global and Local Moran’s I to measure the spatial autorrelation of Air Quality Index (AQI values at the city level, and employed Ordinary Least Squares (OLS, Spatial Lag Model (SAR, and Geographically Weighted Regression (GWR to quantitatively estimate the comprehensive impact and spatial variations of China’s urbanization process on air quality. The results show that a significant spatial dependence and heterogeneity existed in AQI values. Regression models revealed urbanization has played an important negative role in determining air quality in Chinese cities. The population, urbanization rate, automobile density, and the proportion of secondary industry were all found to have had a significant influence over air quality. Per capita Gross Domestic Product (GDP and the scale of urban land use, however, failed the significance test at 10% level. The GWR model performed better than global models and the results of GWR modeling show that the relationship between urbanization and air quality was not constant in space. Further, the local parameter estimates suggest significant spatial variation in the impacts of various urbanization factors on air quality.
Prediction of Wind Speeds Based on Digital Elevation Models Using Boosted Regression Trees
Fischer, P.; Etienne, C.; Tian, J.; Krauß, T.
2015-12-01
In this paper a new approach is presented to predict maximum wind speeds using Gradient Boosted Regression Trees (GBRT). GBRT are a non-parametric regression technique used in various applications, suitable to make predictions without having an in-depth a-priori knowledge about the functional dependancies between the predictors and the response variables. Our aim is to predict maximum wind speeds based on predictors, which are derived from a digital elevation model (DEM). The predictors describe the orography of the Area-of-Interest (AoI) by various means like first and second order derivatives of the DEM, but also higher sophisticated classifications describing exposure and shelterness of the terrain to wind flux. In order to take the different scales into account which probably influence the streams and turbulences of wind flow over complex terrain, the predictors are computed on different spatial resolutions ranging from 30 m up to 2000 m. The geographic area used for examination of the approach is Switzerland, a mountainious region in the heart of europe, dominated by the alps, but also covering large valleys. The full workflow is described in this paper, which consists of data preparation using image processing techniques, model training using a state-of-the-art machine learning algorithm, in-depth analysis of the trained model, validation of the model and application of the model to generate a wind speed map.
Directory of Open Access Journals (Sweden)
Xiaobo Luo
2016-09-01
Full Text Available Urban heat island (UHI effect, the side effect of rapid urbanization, has become an obstacle to the further healthy development of the city. Understanding its relationships with impact factors is important to provide useful information for climate adaptation urban planning strategies. For this purpose, the geographically-weighted regression (GWR approach is used to explore the scale effects in a mountainous city, namely the change laws and characteristics of the relationships between land surface temperature and impact factors at different spatial resolutions (30–960 m. The impact factors include the Soil-adjusted Vegetation Index (SAVI, the Index-based Built-up Index (IBI, and the Soil Brightness Index (NDSI, which indicate the coverage of the vegetation, built-up, and bare land, respectively. For reference, the ordinary least squares (OLS model, a global regression technique, is also employed by using the same dependent variable and explanatory variables as in the GWR model. Results from the experiment exemplified by Chongqing showed that the GWR approach had a better prediction accuracy and a better ability to describe spatial non-stationarity than the OLS approach judged by the analysis of the local coefficient of determination (R2, Corrected Akaike Information Criterion (AICc, and F-test at small spatial resolution (< 240 m; however, when the spatial scale was increased to 480 m, this advantage has become relatively weak. This indicates that the GWR model becomes increasingly global, revealing the relationships with more generalized geographical patterns, and then spatial non-stationarity in the relationship will tend to be neglected with the increase of spatial resolution.
An Optimal Control Modification to Model-Reference Adaptive Control for Fast Adaptation
Nguyen, Nhan T.; Krishnakumar, Kalmanje; Boskovic, Jovan
2008-01-01
This paper presents a method that can achieve fast adaptation for a class of model-reference adaptive control. It is well-known that standard model-reference adaptive control exhibits high-gain control behaviors when a large adaptive gain is used to achieve fast adaptation in order to reduce tracking error rapidly. High gain control creates high-frequency oscillations that can excite unmodeled dynamics and can lead to instability. The fast adaptation approach is based on the minimization of the squares of the tracking error, which is formulated as an optimal control problem. The necessary condition of optimality is used to derive an adaptive law using the gradient method. This adaptive law is shown to result in uniform boundedness of the tracking error by means of the Lyapunov s direct method. Furthermore, this adaptive law allows a large adaptive gain to be used without causing undesired high-gain control effects. The method is shown to be more robust than standard model-reference adaptive control. Simulations demonstrate the effectiveness of the proposed method.
Creating a non-linear total sediment load formula using polynomial best subset regression model
Okcu, Davut; Pektas, Ali Osman; Uyumaz, Ali
2016-08-01
The aim of this study is to derive a new total sediment load formula which is more accurate and which has less application constraints than the well-known formulae of the literature. 5 most known stream power concept sediment formulae which are approved by ASCE are used for benchmarking on a wide range of datasets that includes both field and flume (lab) observations. The dimensionless parameters of these widely used formulae are used as inputs in a new regression approach. The new approach is called Polynomial Best subset regression (PBSR) analysis. The aim of the PBRS analysis is fitting and testing all possible combinations of the input variables and selecting the best subset. Whole the input variables with their second and third powers are included in the regression to test the possible relation between the explanatory variables and the dependent variable. While selecting the best subset a multistep approach is used that depends on significance values and also the multicollinearity degrees of inputs. The new formula is compared to others in a holdout dataset and detailed performance investigations are conducted for field and lab datasets within this holdout data. Different goodness of fit statistics are used as they represent different perspectives of the model accuracy. After the detailed comparisons are carried out we figured out the most accurate equation that is also applicable on both flume and river data. Especially, on field dataset the prediction performance of the proposed formula outperformed the benchmark formulations.
You, Jinhong; Zhou, Haibo
2009-01-01
We consider statistical inference on a regression model in which some covariables are measured with errors together with an auxiliary variable. The proposed estimation for the regression coefficients is based on some estimating equations. This new method alleates some drawbacks of previously proposed estimations. This includes the requirment of undersmoothing the regressor functions over the auxiliary variable, the restriction on other covariables which can be observed exactly, among others. The large sample properties of the proposed estimator are established. We further propose a jackknife estimation, which consists of deleting one estimating equation (instead of one obervation) at a time. We show that the jackknife estimator of the regression coefficients and the estimating equations based estimator are asymptotically equivalent. Simulations show that the jackknife estimator has smaller biases when sample size is small or moderate. In addition, the jackknife estimation can also provide a consistent estimator of the asymptotic covariance matrix, which is robust to the heteroscedasticity. We illustrate these methods by applying them to a real data set from marketing science. PMID:22199460
Using the jackknife for estimation in log link Bernoulli regression models.
Lipsitz, Stuart R; Fitzmaurice, Garrett M; Arriaga, Alex; Sinha, Debajyoti; Gawande, Atul A
2015-02-10
Bernoulli (or binomial) regression using a generalized linear model with a log link function, where the exponentiated regression parameters have interpretation as relative risks, is often more appropriate than logistic regression for prospective studies with common outcomes. In particular, many researchers regard relative risks to be more intuitively interpretable than odds ratios. However, for the log link, when the outcome is very prevalent, the likelihood may not have a unique maximum. To circumvent this problem, a 'COPY method' has been proposed, which is equivalent to creating for each subject an additional observation with the same covariates except the response variable has the outcome values interchanged (1's changed to 0's and 0's changed to 1's). The original response is given weight close to 1, while the new observation is given a positive weight close to 0; this approach always leads to convergence of the maximum likelihood algorithm, except for problems with convergence due to multicollinearity among covariates. Even though this method produces a unique maximum, when the outcome is very prevalent, and/or the sample size is relatively small, the COPY method can yield biased estimates. Here, we propose using the jackknife as a bias-reduction approach for the COPY method. The proposed method is motivated by a study of patients undergoing colorectal cancer surgery.
A comparative study on entrepreneurial attitudes modeled with logistic regression and Bayes nets.
López Puga, Jorge; García García, Juan
2012-11-01
Entrepreneurship research is receiving increasing attention in our context, as entrepreneurs are key social agents involved in economic development. We compare the success of the dichotomic logistic regression model and the Bayes simple classifier to predict entrepreneurship, after manipulating the percentage of missing data and the level of categorization in predictors. A sample of undergraduate university students (N = 1230) completed five scales (motivation, attitude towards business creation, obstacles, deficiencies, and training needs) and we found that each of them predicted different aspects of the tendency to business creation. Additionally, our results show that the receiver operating characteristic (ROC) curve is affected by the rate of missing data in both techniques, but logistic regression seems to be more vulnerable when faced with missing data, whereas Bayes nets underperform slightly when categorization has been manipulated. Our study sheds light on the potential entrepreneur profile and we propose to use Bayesian networks as an additional alternative to overcome the weaknesses of logistic regression when missing data are present in applied research. PMID:23156922
Directory of Open Access Journals (Sweden)
Sutikno Sutikno
2010-08-01
Full Text Available One of the climate models used to predict the climatic conditions is Global Circulation Models (GCM. GCM is a computer-based model that consists of different equations. It uses numerical and deterministic equation which follows the physics rules. GCM is a main tool to predict climate and weather, also it uses as primary information source to review the climate change effect. Statistical Downscaling (SD technique is used to bridge the large-scale GCM with a small scale (the study area. GCM data is spatial and temporal data most likely to occur where the spatial correlation between different data on the grid in a single domain. Multicollinearity problems require the need for pre-processing of variable data X. Continuum Regression (CR and pre-processing with Principal Component Analysis (PCA methods is an alternative to SD modelling. CR is one method which was developed by Stone and Brooks (1990. This method is a generalization from Ordinary Least Square (OLS, Principal Component Regression (PCR and Partial Least Square method (PLS methods, used to overcome multicollinearity problems. Data processing for the station in Ambon, Pontianak, Losarang, Indramayu and Yuntinyuat show that the RMSEP values and R2 predict in the domain 8x8 and 12x12 by uses CR method produces results better than by PCR and PLS.
Modelling Arsenic and Lead Surface Soil Concentrations using Land Use Regression
Directory of Open Access Journals (Sweden)
Deschenes S.
2013-04-01
Full Text Available Land Use Regression (LUR models are increasingly used in environmental and exposure assessments to predict the concentration of contaminants in outdoor air. We explore the use of LUR as an alternative to more complex models to predict the concentration of metals in surface soil. Here, we used 55 soil samples of As and Pb collected in 1996 across British Columbia (BC, Canada by the Ministry of Environment. Predictor variables were derived for each sample site using Geographic Information System (GIS. For As (R2 = 0.44, the resulting linear regression model includes the total length of roads (m within 25 km, and bedrock geology. For the Pb model (R2 =0.78, the predictor variables are the total surface area of industrial land use (m2 within 5 km , the emissions of Pb (t within 10 and 25 km, and the presence of closed mines within 50 km. The study proposes that LUR can reasonably predict the concentrations of As and Pb in surface soil over large areas.
Buck, J. A.; Underhill, P. R.; Morelli, J.; Krause, T. W.
2016-02-01
Nuclear steam generators (SGs) are a critical component for ensuring safe and efficient operation of a reactor. Life management strategies are implemented in which SG tubes are regularly inspected by conventional eddy current testing (ECT) and ultrasonic testing (UT) technologies to size flaws, and safe operating life of SGs is predicted based on growth models. ECT, the more commonly used technique, due to the rapidity with which full SG tube wall inspection can be performed, is challenged when inspecting ferromagnetic support structure materials in the presence of magnetite sludge and multiple overlapping degradation modes. In this work, an emerging inspection method, pulsed eddy current (PEC), is being investigated to address some of these particular inspection conditions. Time-domain signals were collected by an 8 coil array PEC probe in which ferromagnetic drilled support hole diameter, depth of rectangular tube frets and 2D tube off-centering were varied. Data sets were analyzed with a modified principal components analysis (MPCA) to extract dominant signal features. Multiple linear regression models were applied to MPCA scores to size hole diameter as well as size rectangular outer diameter tube frets. Models were improved through exploratory factor analysis, which was applied to MPCA scores to refine selection for regression models inputs by removing nonessential information.
Urrutia, Jackie D.; Tampis, Razzcelle L.; Mercado, Joseph; Baygan, Aaron Vito M.; Baccay, Edcon B.
2016-02-01
The objective of this research is to formulate a mathematical model for the Philippines' Real Gross Domestic Product (Real GDP). The following factors are considered: Consumers' Spending (x1), Government's Spending (x2), Capital Formation (x3) and Imports (x4) as the Independent Variables that can actually influence in the Real GDP in the Philippines (y). The researchers used a Normal Estimation Equation using Matrices to create the model for Real GDP and used α = 0.01.The researchers analyzed quarterly data from 1990 to 2013. The data were acquired from the National Statistical Coordination Board (NSCB) resulting to a total of 96 observations for each variable. The data have undergone a logarithmic transformation particularly the Dependent Variable (y) to satisfy all the assumptions of the Multiple Linear Regression Analysis. The mathematical model for Real GDP was formulated using Matrices through MATLAB. Based on the results, only three of the Independent Variables are significant to the Dependent Variable namely: Consumers' Spending (x1), Capital Formation (x3) and Imports (x4), hence, can actually predict Real GDP (y). The regression analysis displays that 98.7% (coefficient of determination) of the Independent Variables can actually predict the Dependent Variable. With 97.6% of the result in Paired T-Test, the Predicted Values obtained from the model showed no significant difference from the Actual Values of Real GDP. This research will be essential in appraising the forthcoming changes to aid the Government in implementing policies for the development of the economy.
Modeling and Simulation of Road Traffic Noise Using Artificial Neural Network and Regression.
Honarmand, M; Mousavi, S M
2014-04-01
Modeling and simulation of noise pollution has been done in a large city, where the population is over 2 millions. Two models of artificial neural network and regression were developed to predict in-city road traffic noise pollution with using the data of noise measurements and vehicle counts at three points of the city for a period of 12 hours. The MATLAB and DATAFIT softwares were used for simulation. The predicted results of noise level were compared with the measured noise levels in three stations. The values of normalized bias, sum of squared errors, mean of squared errors, root mean of squared errors, and squared correlation coefficient calculated for each model show the results of two models are suitable, and the predictions of artificial neural network are closer to the experimental data.
Photovoltaic Array Condition Monitoring Based on Online Regression of Performance Model
DEFF Research Database (Denmark)
Spataru, Sergiu; Sera, Dezso; Kerekes, Tamas;
2013-01-01
automatic supervision and condition monitoring of the PV system components, especially for small PV installations, where no specialized personnel is present at the site. This work proposes a PV array condition monitoring system based on a PV array performance model. The system is parameterized online, using...... regression modeling, from PV array production, plane-of-array irradiance, and module temperature measurements, acquired during an initial learning phase of the system. After the model has been parameterized automatically, the condition monitoring system enters the normal operation phase, where...... the performance model is used to predict the power output of the PV array. Utilizing the predicted and measured PV array output power values, the condition monitoring system is able to detect power losses above 5%, occurring in the PV array....
Focused information criterion and model averaging based on weighted composite quantile regression
Xu, Ganggang
2013-08-13
We study the focused information criterion and frequentist model averaging and their application to post-model-selection inference for weighted composite quantile regression (WCQR) in the context of the additive partial linear models. With the non-parametric functions approximated by polynomial splines, we show that, under certain conditions, the asymptotic distribution of the frequentist model averaging WCQR-estimator of a focused parameter is a non-linear mixture of normal distributions. This asymptotic distribution is used to construct confidence intervals that achieve the nominal coverage probability. With properly chosen weights, the focused information criterion based WCQR estimators are not only robust to outliers and non-normal residuals but also can achieve efficiency close to the maximum likelihood estimator, without assuming the true error distribution. Simulation studies and a real data analysis are used to illustrate the effectiveness of the proposed procedure. © 2013 Board of the Foundation of the Scandinavian Journal of Statistics..
System identification modelling of ship manoeuvring motion based onε- support vector regression
Institute of Scientific and Technical Information of China (English)
王雪刚; 邹早建; 侯先瑞; 徐锋
2015-01-01
Based on theε-support vector regression, three modelling methods for the ship manoeuvring motion, i.e., the white-box modelling, the grey-box modelling and the black-box modelling, are investigated. Theoo10/10,oo20/20 zigzag tests and the o35 turning circle manoeuvre are simulated. Part of the simulation data for theoo20/20 zigzag test are used to train the support vectors, and the trained support vector machine is used to predict the wholeoo20/20 zigzag test. Comparison between the simula- ted and predictedoo20/20 zigzag test shows a good predictive ability of the three modelling methods. Then all mathematical models obtained by the modelling methods are used to predict theoo10/10 zigzag test ando35 turning circle manoeuvre, and the predicted results are compared with those of simulation tests to demonstrate the good generalization performance of the mathematical models. Finally, the modelling methods are analyzed and compared with each other in terms of the application conditions, the prediction accuracy and the computation speed. An appropriate modelling method can be chosen according to the intended use of the mathematical models and the available data for the system identification.
Adjustment of an Intensive Care Unit (ICU Data in Fuzzy C-Regression Models
Directory of Open Access Journals (Sweden)
Mohd Saifullah Rusiman
2013-02-01
Full Text Available This research is an attempt to present a proper methodology in data modification by using analytical hierarchy process (AHP technique and fuzzy c-mean (FCM model. The continuous data were built from binary data using analytical hierarchy process (AHP. Whereas, the binary data were created from continuous data using fuzzy cmeans (FCM model. The models used in this research are fuzzy c-regression models (FCRM. A case study in scale of health at an intensive care unit (ICU ward using the AHP, FCM model and FCRM models was carried out. There are six independent variables involved in this study. There are four cases considered as a result of using AHP technique and FCM model toward independent data. After comparing the four cases, it was found that case 4 appeared to be the best model, having the lowest mean square error (MSE. The original data have the MSE value of 97.33, while the data of case 4 have MSE by 83.48. This means that the AHP technique can lower the MSE, while the FCM model cannot lower the MSE in modelling scale of health in the ICU. In other words, it can be claimed that the AHP technique can increase the accuracy of modelling prediction.
Modeling animal-vehicle collisions using diagonal inflated bivariate Poisson regression.
Lao, Yunteng; Wu, Yao-Jan; Corey, Jonathan; Wang, Yinhai
2011-01-01
Two types of animal-vehicle collision (AVC) data are commonly adopted for AVC-related risk analysis research: reported AVC data and carcass removal data. One issue with these two data sets is that they were found to have significant discrepancies by previous studies. In order to model these two types of data together and provide a better understanding of highway AVCs, this study adopts a diagonal inflated bivariate Poisson regression method, an inflated version of bivariate Poisson regression model, to fit the reported AVC and carcass removal data sets collected in Washington State during 2002-2006. The diagonal inflated bivariate Poisson model not only can model paired data with correlation, but also handle under- or over-dispersed data sets as well. Compared with three other types of models, double Poisson, bivariate Poisson, and zero-inflated double Poisson, the diagonal inflated bivariate Poisson model demonstrates its capability of fitting two data sets with remarkable overlapping portions resulting from the same stochastic process. Therefore, the diagonal inflated bivariate Poisson model provides researchers a new approach to investigating AVCs from a different perspective involving the three distribution parameters (λ(1), λ(2) and λ(3)). The modeling results show the impacts of traffic elements, geometric design and geographic characteristics on the occurrences of both reported AVC and carcass removal data. It is found that the increase of some associated factors, such as speed limit, annual average daily traffic, and shoulder width, will increase the numbers of reported AVCs and carcass removals. Conversely, the presence of some geometric factors, such as rolling and mountainous terrain, will decrease the number of reported AVCs.
Model-Based Evaluation of Spontaneous Tumor Regression in Pilocytic Astrocytoma.
Directory of Open Access Journals (Sweden)
Thomas Buder
2015-12-01
Full Text Available Pilocytic astrocytoma (PA is the most common brain tumor in children. This tumor is usually benign and has a good prognosis. Total resection is the treatment of choice and will cure the majority of patients. However, often only partial resection is possible due to the location of the tumor. In that case, spontaneous regression, regrowth, or progression to a more aggressive form have been observed. The dependency between the residual tumor size and spontaneous regression is not understood yet. Therefore, the prognosis is largely unpredictable and there is controversy regarding the management of patients for whom complete resection cannot be achieved. Strategies span from pure observation (wait and see to combinations of surgery, adjuvant chemotherapy, and radiotherapy. Here, we introduce a mathematical model to investigate the growth and progression behavior of PA. In particular, we propose a Markov chain model incorporating cell proliferation and death as well as mutations. Our model analysis shows that the tumor behavior after partial resection is essentially determined by a risk coefficient γ, which can be deduced from epidemiological data about PA. Our results quantitatively predict the regression probability of a partially resected benign PA given the residual tumor size and lead to the hypothesis that this dependency is linear, implying that removing any amount of tumor mass will improve prognosis. This finding stands in contrast to diffuse malignant glioma where an extent of resection threshold has been experimentally shown, below which no benefit for survival is expected. These results have important implications for future therapeutic studies in PA that should include residual tumor volume as a prognostic factor.
Regression Models for Aquifer Vulnerability to Nitrate Pollution in Osona (NE Spain)
Boy Roura, M.; Nolan, B. T.; Menció Domingo, A.; Mas-Pla, J.
2012-12-01
Regression models were developed at a local scale in the Osona region (1,260 square kilometers) to predict nitrate concentrations in groundwater. Osona is a semi-arid region in northeast Spain, where livestock and agricultural activities are very intensive, and therefore, it is vulnerable to nitrate pollution from agricultural sources (European Nitrate Directive (91/676/EEC)). Nitrate concentrations in groundwater are commonly above 50 mg/L as nitrate, reaching up to 500 mg/L in some of the sampled wells. Regression models were based on explanatory variables such as geology, land use, and nitrogen inputs, which control the fate, transport and attenuation of nitrate in groundwater. Regression has been widely used to determine aquifer vulnerability to nitrate in groundwater at large spatial scales. We developed models with and without site-specific groundwater chemistry data to see the extent to which the latter improved the models. Although chemistry data could explain additional variation in groundwater nitrate concentration, such data were available only at the well locations and therefore were less amenable for spatial extrapolation. The data set consisted of nitrate data from 63 sampled wells and the following explanatory variables: 1) soils data consisting of texture and other physical properties; 2) geology indicating presence or absence of aquifers in the region, and their type (unconfined, leaky or confined); 3) land use (agricultural, urban, forested); 4) nitrogen input as manure; 5) occurrence of irrigated crops; 6) estimates of nitrogen uptake developed for 10 different crops; 7) slope; 8) population density, and 9) groundwater chemistry data comprising major ions and trace elements. Variables 1 and 2 were compiled as point data because their polygons were much larger than the well buffers which represented contributing areas to the sampled wells. Variables 3 to 8 were compiled within a 500-meter radius buffer around wells using a GIS-based weighted
The analysis of random effects regression model for predicting the shelf-life of gun propellant
Chang, Wei-Te
1995-01-01
Most gun propellant is stored at depots for a long time before it is used. While being stored, the quality of the gun propellant may deteriorate and become unstable. In an attempt to avoid disaster due to use of unstable gun propellant, accurate prediction of the safe shelf-life of gun propellant is necessary. The shelf-life estimation methods used currently for a group of similar gun propellant lots are based on a fixed effects regression model. This does not take into consideration the fact...
Estimation in a Multivariate "Errors in Variables" Regression Model: Large Sample Results
Gleser, Leon Jay
1981-01-01
In a multivariate "errors in variables" regression model, the unknown mean vectors $\\mathbf{u}_{1i}: p \\times 1, \\mathbf{u}_{2i}: r \\times 1$ of the vector observations $\\mathbf{x}_{1i}, \\mathbf{x}_{2i}$, rather than the observations themselves, are assumed to follow the linear relation: $\\mathbf{u}_{2i} = \\alpha + B\\mathbf{u}_{1i}, i = 1,2,\\cdots, n$. It is further assumed that the random errors $\\mathbf{e}_i = \\mathbf{x}_i - \\mathbf{u}_i, \\mathbf{x}'_i = (\\mathbf{x}'_{1i}, \\mathbf{x}'_{2i})...
A logistic regression model of Coronary Artery Disease among Male Patients in Punjab
Directory of Open Access Journals (Sweden)
Sohail Chand
2005-07-01
Full Text Available This is a cross-sectional retrospective study of 308 male patients, who were presented first time for coronary angiography at the Punjab Institute of Cardiology. The mean age was 50.97 + 9.9 among male patients. As the response variable coronary artery disease (CAD was a binary variable, logistic regression model was fitted to predict the Coronary Artery Disease with the help of significant risk factors. Age, Chest pain, Diabetes Mellitus, Smoking and Lipids are resulted as significant risk factors associated with CAD among male population.
Estimation of Panel Data Regression Models with Two-Sided Censoring or Truncation
DEFF Research Database (Denmark)
Alan, Sule; Honore, Bo E.; Hu, Luojia;
2014-01-01
This paper constructs estimators for panel data regression models with individual speci…fic heterogeneity and two–sided censoring and truncation. Following Powell (1986) the estimation strategy is based on moment conditions constructed from re–censored or re–truncated residuals. While these moment...... conditions do not identify the parameter of interest, they can be used to motivate objective functions that do. We apply one of the estimators to study the e¤ect of a Danish tax reform on household portfolio choice. The idea behind the estimators can also be used in a cross sectional setting....
Heteroscedastic nonlinear regression models based on scale mixtures of skew-normal distributions.
Lachos, Victor H; Bandyopadhyay, Dipankar; Garay, Aldo M
2011-08-01
An extension of some standard likelihood based procedures to heteroscedastic nonlinear regression models under scale mixtures of skew-normal (SMSN) distributions is developed. We derive a simple EM-type algorithm for iteratively computing maximum likelihood (ML) estimates and the observed information matrix is derived analytically. Simulation studies demonstrate the robustness of this flexible class against outlying and influential observations, as well as nice asymptotic properties of the proposed EM-type ML estimates. Finally, the methodology is illustrated using an ultrasonic calibration data.
Non-linear regression model for spatial variation in precipitation chemistry for South India
Siva Soumya, B.; Sekhar, M.; Riotte, J.; Braun, Jean-Jacques
Chemical composition of rainwater changes from sea to inland under the influence of several major factors - topographic location of area, its distance from sea, annual rainfall. A model is developed here to quantify the variation in precipitation chemistry under the influence of inland distance and rainfall amount. Various sites in India categorized as 'urban', 'suburban' and 'rural' have been considered for model development. pH, HCO 3, NO 3 and Mg do not change much from coast to inland while, SO 4 and Ca change is subjected to local emissions. Cl and Na originate solely from sea salinity and are the chemistry parameters in the model. Non-linear multiple regressions performed for the various categories revealed that both rainfall amount and precipitation chemistry obeyed a power law reduction with distance from sea. Cl and Na decrease rapidly for the first 100 km distance from sea, then decrease marginally for the next 100 km, and later stabilize. Regression parameters estimated for different cases were found to be consistent ( R2 ˜ 0.8). Variation in one of the parameters accounted for urbanization. Model was validated using data points from the southern peninsular region of the country. Estimates are found to be within 99.9% confidence interval. Finally, this relationship between the three parameters - rainfall amount, coastline distance, and concentration (in terms of Cl and Na) was validated with experiments conducted in a small experimental watershed in the south-west India. Chemistry estimated using the model was in good correlation with observed values with a relative error of ˜5%. Monthly variation in the chemistry is predicted from a downscaling model and then compared with the observed data. Hence, the model developed for rain chemistry is useful in estimating the concentrations at different spatio-temporal scales and is especially applicable for south-west region of India.
Logistic regression model for diagnosis of transition zone prostate cancer on multi-parametric MRI
Energy Technology Data Exchange (ETDEWEB)
Dikaios, Nikolaos; Halligan, Steve; Taylor, Stuart; Atkinson, David; Punwani, Shonit [University College London, Centre for Medical Imaging, London (United Kingdom); University College London Hospital, Departments of Radiology, London (United Kingdom); Alkalbani, Jokha; Sidhu, Harbir Singh; Fujiwara, Taiki [University College London, Centre for Medical Imaging, London (United Kingdom); Abd-Alazeez, Mohamed; Ahmed, Hashim; Emberton, Mark [University College London, Research Department of Urology, London (United Kingdom); Kirkham, Alex; Allen, Clare [University College London Hospital, Departments of Radiology, London (United Kingdom); Freeman, Alex [University College London Hospital, Department of Histopathology, London (United Kingdom)
2014-09-17
We aimed to develop logistic regression (LR) models for classifying prostate cancer within the transition zone on multi-parametric magnetic resonance imaging (mp-MRI). One hundred and fifty-five patients (training cohort, 70 patients; temporal validation cohort, 85 patients) underwent mp-MRI and transperineal-template-prostate-mapping (TPM) biopsy. Positive cores were classified by cancer definitions: (1) any-cancer; (2) definition-1 [≥Gleason 4 + 3 or ≥ 6 mm cancer core length (CCL)] [high risk significant]; and (3) definition-2 (≥Gleason 3 + 4 or ≥ 4 mm CCL) cancer [intermediate-high risk significant]. For each, logistic-regression mp-MRI models were derived from the training cohort and validated internally and with the temporal cohort. Sensitivity/specificity and the area under the receiver operating characteristic (ROC-AUC) curve were calculated. LR model performance was compared to radiologists' performance. Twenty-eight of 70 patients from the training cohort, and 25/85 patients from the temporal validation cohort had significant cancer on TPM. The ROC-AUC of the LR model for classification of cancer was 0.73/0.67 at internal/temporal validation. The radiologist A/B ROC-AUC was 0.65/0.74 (temporal cohort). For patients scored by radiologists as Prostate Imaging Reporting and Data System (Pi-RADS) score 3, sensitivity/specificity of radiologist A 'best guess' and LR model was 0.14/0.54 and 0.71/0.61, respectively; and radiologist B 'best guess' and LR model was 0.40/0.34 and 0.50/0.76, respectively. LR models can improve classification of Pi-RADS score 3 lesions similar to experienced radiologists. (orig.)
Hema, M; Srinivasan, K
2011-07-01
Nickel removal efficiency of powered activated carbons of coconut oilcake, neem oilcake and commercial carbon was investigated by using artificial neural network. The effective parameters for the removal of nickel (%R) by adsorption process, which included the pH, contact time (T), distinctiveness of activated carbon (Cn), amount of activated carbon (Cw) and initial concentration of nickel (Co) were investigated. Levenberg-Marquardt (LM) Back-propagation algorithm is used to train the network. The network topology was optimized by varying number of hidden layer and number of neurons in hidden layer. The model was developed in terms of training; validation and testing of experimental data, the test subsets that each of them contains 60%, 20% and 20% of total experimental data, respectively. Multiple regression equation was developed for nickel adsorption system and the output was compared with both simulated and experimental outputs. Standard deviation (SD) with respect to experimental output was quite higher in the case of regression model when compared with ANN model. The obtained experimental data best fitted with the artificial neural network. PMID:23029923
Single-step genomic evaluation using multitrait random regression model and test-day data.
Koivula, M; Strandén, I; Pösö, J; Aamand, G P; Mäntysaari, E A
2015-04-01
The objectives of this study were to evaluate the feasibility of use of the test-day (TD) single-step genomic BLUP (ssGBLUP) using phenotypic records of Nordic Red Dairy cows. The critical point in ssGBLUP is how genomically derived relationships (G) are integrated with population-based pedigree relationships (A) into a combined relationship matrix (H). Therefore, we also tested how different weights for genomic and pedigree relationships affect ssGBLUP, validation reliability, and validation regression coefficients. Deregressed proofs for 305-d milk, protein, and fat yields were used for a posteriori validation. The results showed that the use of phenotypic TD records in ssGBLUP is feasible. Moreover, the TD ssGBLUP model gave considerably higher validation reliabilities and validation regression coefficients than the TD model without genomic information. No significant differences were found in validation reliability between the different TD ssGBLUP models according to bootstrap confidence intervals. However, the degree of inflation in genomic enhanced breeding values is affected by the method used in construction of the H matrix. The results showed that ssGBLUP provides a good alternative to the currently used multi-step approach but there is a great need to find the best option to combine pedigree and genomic information in the genomic matrix. PMID:25660739
Exergy Analysis of a Subcritical Reheat Steam Power Plant with Regression Modeling and Optimization
Directory of Open Access Journals (Sweden)
MUHIB ALI RAJPER
2016-07-01
Full Text Available In this paper, exergy analysis of a 210 MW SPP (Steam Power Plant is performed. Firstly, the plant is modeled and validated, followed by a parametric study to show the effects of various operating parameters on the performance parameters. The net power output, energy efficiency, and exergy efficiency are taken as the performance parameters, while the condenser pressure, main steam pressure, bled steam pressures, main steam temperature, and reheat steam temperature isnominated as the operating parameters. Moreover, multiple polynomial regression models are developed to correlate each performance parameter with the operating parameters. The performance is then optimizedby using Direct-searchmethod. According to the results, the net power output, energy efficiency, and exergy efficiency are calculated as 186.5 MW, 31.37 and 30.41%, respectively under normal operating conditions as a base case. The condenser is a major contributor towards the energy loss, followed by the boiler, whereas the highest irreversibilities occur in the boiler and turbine. According to the parametric study, variation in the operating parameters greatly influences the performance parameters. The regression models have appeared to be a good estimator of the performance parameters. The optimum net power output, energy efficiency and exergy efficiency are obtained as 227.6 MW, 37.4 and 36.4, respectively, which have been calculated along with optimal values of selected operating parameters.
Directory of Open Access Journals (Sweden)
Katharina Galmbacher
Full Text Available A tumor promoting role of macrophages has been described for a transgenic murine breast cancer model. In this model tumor-associated macrophages (TAMs represent a major component of the leukocytic infiltrate and are associated with tumor progression. Shigella flexneri is a bacterial pathogen known to specificly induce apotosis in macrophages. To evaluate whether Shigella-induced removal of macrophages may be sufficient for achieving tumor regression we have developed an attenuated strain of S. flexneri (M90TDeltaaroA and infected tumor bearing mice. Two mouse models were employed, xenotransplantation of a murine breast cancer cell line and spontanous breast cancer development in MMTV-HER2 transgenic mice. Quantitative analysis of bacterial tumor targeting demonstrated that attenuated, invasive Shigella flexneri primarily infected TAMs after systemic administration. A single i.v. injection of invasive M90TDeltaaroA resulted in caspase-1 dependent apoptosis of TAMs followed by a 74% reduction in tumors of transgenic MMTV-HER-2 mice 7 days post infection. TAM depletion was sustained and associated with complete tumor regression.These data support TAMs as useful targets for antitumor therapy and highlight attenuated bacterial pathogens as potential tools.
Lim, Jongguk; Kim, Giyoung; Mo, Changyeun; Kim, Moon S; Chao, Kuanglin; Qin, Jianwei; Fu, Xiaping; Baek, Insuck; Cho, Byoung-Kwan
2016-05-01
Illegal use of nitrogen-rich melamine (C3H6N6) to boost perceived protein content of food products such as milk, infant formula, frozen yogurt, pet food, biscuits, and coffee drinks has caused serious food safety problems. Conventional methods to detect melamine in foods, such as Enzyme-linked immunosorbent assay (ELISA), High-performance liquid chromatography (HPLC), and Gas chromatography-mass spectrometry (GC-MS), are sensitive but they are time-consuming, expensive, and labor-intensive. In this research, near-infrared (NIR) hyperspectral imaging technique combined with regression coefficient of partial least squares regression (PLSR) model was used to detect melamine particles in milk powders easily and quickly. NIR hyperspectral reflectance imaging data in the spectral range of 990-1700nm were acquired from melamine-milk powder mixture samples prepared at various concentrations ranging from 0.02% to 1%. PLSR models were developed to correlate the spectral data (independent variables) with melamine concentration (dependent variables) in melamine-milk powder mixture samples. PLSR models applying various pretreatment methods were used to reconstruct the two-dimensional PLS images. PLS images were converted to the binary images to detect the suspected melamine pixels in milk powder. As the melamine concentration was increased, the numbers of suspected melamine pixels of binary images were also increased. These results suggested that NIR hyperspectral imaging technique and the PLSR model can be regarded as an effective tool to detect melamine particles in milk powders. PMID:26946026
Dons, Evi; Van Poppel, Martine; Kochan, Bruno; Wets, Geert; Int Panis, Luc
2013-01-01
Land use regression (LUR) modeling is a statistical technique used to determine exposure to air pollutants in epidemiological studies. Time-activity diaries can be combined with LUR models, enabling detailed exposure estimation and limiting exposure misclassification, both in shorter and longer time lags. In this study, the traffic related air pollutant black carbon was measured with mu-aethalometers on a 5-mm time base at 63 locations in Flanders, Belgium. The measurements show that hourly c...
Additive Hazard Regression Models: An Application to the Natural History of Human Papillomavirus
Directory of Open Access Journals (Sweden)
Xianhong Xie
2013-01-01
Full Text Available There are several statistical methods for time-to-event analysis, among which is the Cox proportional hazards model that is most commonly used. However, when the absolute change in risk, instead of the risk ratio, is of primary interest or when the proportional hazard assumption for the Cox proportional hazards model is violated, an additive hazard regression model may be more appropriate. In this paper, we give an overview of this approach and then apply a semiparametric as well as a nonparametric additive model to a data set from a study of the natural history of human papillomavirus (HPV in HIV-positive and HIV-negative women. The results from the semiparametric model indicated on average an additional 14 oncogenic HPV infections per 100 woman-years related to CD4 count < 200 relative to HIV-negative women, and those from the nonparametric additive model showed an additional 40 oncogenic HPV infections per 100 women over 5 years of followup, while the estimated hazard ratio in the Cox model was 3.82. Although the Cox model can provide a better understanding of the exposure disease association, the additive model is often more useful for public health planning and intervention.
PReMiuM: An R Package for Profile Regression Mixture Models Using Dirichlet Processes
Directory of Open Access Journals (Sweden)
Silvia Liverani
2015-03-01
Full Text Available PReMiuM is a recently developed R package for Bayesian clustering using a Dirichlet process mixture model. This model is an alternative to regression models, non- parametrically linking a response vector to covariate data through cluster membership (Molitor, Papathomas, Jerrett, and Richardson 2010. The package allows binary, categorical, count and continuous response, as well as continuous and discrete covariates. Additionally, predictions may be made for the response, and missing values for the covariates are handled. Several samplers and label switching moves are implemented along with diagnostic tools to assess convergence. A number of R functions for post-processing of the output are also provided. In addition to fitting mixtures, it may additionally be of interest to determine which covariates actively drive the mixture components. This is implemented in the package as variable selection.
A binary logistic regression model for discriminating real protein-protein interface
Institute of Scientific and Technical Information of China (English)
无
2003-01-01
The selection and study of descriptive variables of protein-protein complex interface is a major question that many biologists come across when the research of protein-protein recognition is concerned. Several variables have been proposed to understand the structural or energetic features of complex interfaces. Here a systematic study of some of these "traditional" variables, as well as a few new ones, is introduced. With the values of these variables extracted from 42 PDB samples with real or false complex interfaces, a binary logistic regression analysis is performed, which results in an effective empirical model for the evaluation of binding probabilities of protein-protein interfaces. The model is validated with 12 samples, and satisfactory results are obtained for both the training and validation sets. Meanwhile, three potential dimeric interfaces of staphylokinase have been investigated and one with the best suitability to our model is proposed.
A Vector Auto Regression Model Applied to Real Estate Development Investment: A Statistic Analysis
Directory of Open Access Journals (Sweden)
Fengyun Liu
2016-10-01
Full Text Available This study analyzes the economic system dynamics of investment in real estate from mainly four participants in China. Local governments limit the supply of commercial and residential land to raise fiscal revenue, and expand debts by land mortgage to develop industrial zones and parks. Led by local government, banks and real estate development enterprises forge a coalition on real estate investment and facilitate real estate price appreciation. The above theoretical model is empirically evidenced with VAR (Vector Auto Regression methodology. A panel VAR model shows that land leasing and real estate price appreciation positively affect local government general fiscal revenue. Additional VAR models find that bank credit in addition to private and foreign funds respectively have strong positive dynamic effects on housing prices. Housing prices also have a strong positive impact on speculation from private funds and hot money.
An adaptive turbo-shaft engine modeling method based on PS and MRR-LSSVR algorithms
Institute of Scientific and Technical Information of China (English)
Wang Jiankang; Zhang Haibo; Yan Changkai; Duan Shujing; Huang Xianghua
2013-01-01
In order to establish an adaptive turbo-shaft engine model with high accuracy,a new modeling method based on parameter selection (PS) algorithm and multi-input multi-output recursive reduced least square support vector regression (MRR-LSSVR) machine is proposed.Firstly,the PS algorithm is designed to choose the most reasonable inputs of the adaptive module.During this process,a wrapper criterion based on least square support vector regression (LSSVR) machine is adopted,which can not only reduce computational complexity but also enhance generalization performance.Secondly,with the input variables determined by the PS algorithm,a mapping model of engine parameter estimation is trained off-line using MRR-LSSVR,which has a satisfying accuracy within 5‰.Finally,based on a numerical simulation platform of an integrated helicopter/turbo-shaft engine system,an adaptive turbo-shaft engine model is developed and tested in a certain flight envelope.Under the condition of single or multiple engine components being degraded,many simulation experiments are carried out,and the simulation results show the effectiveness and validity of the proposed adaptive modeling method.
Adaptable Authentication Model: Exploring Security with Weaker Attacker Models
DEFF Research Database (Denmark)
Ahmed, Naveed; Jensen, Christian D.
2011-01-01
suffer because of the identified vulnerabilities. Therefore, we may need to analyze a protocol for weaker notions of security. In this paper, we present a security model that supports such weaker notions. In this model, the overall goals of an authentication protocol are broken into a finer granularity......; for each fine level authentication goal, we determine the “least strongest-attacker” for which the authentication goal can be satisfied. We demonstrate that this model can be used to reason about the security of supposedly insecure protocols. Such adaptability is particularly useful in those applications...... where one may need to trade-off security relaxations against resource requirements....
Adaptation in Cones: A General Model
Dawis, Stevan M.; Purple, Richard L.
1982-01-01
Three features appear to characterize steady-state light adaptation in vertebrate cone photoreceptors: (a) the shape of the “log intensity-response” curve at different levels of adaptation is the same, the only change with adaptation is in the position of the point on the curve about which the cones operate; (b) at high adapting intensities the operating point becomes fixed in position; (c) this fixed position is at the steepest point of the log intensity-response curve. These three features ...
Directory of Open Access Journals (Sweden)
KAYODE AYINDE
2012-11-01
Full Text Available Performances of estimators of linear regression model with autocorrelated error term have been attributed to the nature and specification of the explanatory variables. The violation of assumption of the independence of the explanatory variables is not uncommon especially in business, economic and social sciences, leading to the development of many estimators. Moreover, prediction is one of the main essences of regression analysis. This work, therefore, attempts to examine the parameter estimates of the Ordinary Least Square estimator (OLS, Cochrane-Orcutt estimator (COR, Maximum Likelihood estimator (ML and the estimators based on Principal Component analysis (PC in prediction of linear regression model with autocorrelated error terms under the violations of assumption of independent regressors (multicollinearity using Monte-Carlo experiment approach. With uniform variables as regressors, it further identifies the best estimator that can be used for prediction purpose by averaging the adjusted co-efficient of determination of each estimator over the number of trials. Results reveal that the performances of COR and ML estimators at each level of multicollinearity over the levels of autocorrelation are convex – like while that of the OLS and PC estimators are concave; and that asthe level of multicollinearity increases, the estimators perform much better at all the levels of autocorrelation. Except when the sample size is small (n=10, the performances of the COR and ML estimators are generally best and asymptotically the same. When the sample size is small, the COR estimator is still best except when the autocorrelation level is low. At these instances, the PC estimator is either best or competes with the best estimator. Moreover, at low level of autocorrelation in all the sample sizes, the OLS estimator competes with the best estimator in all the levels of multicollinearity.
Modeling group size and scalar stress by logistic regression from an archaeological perspective.
Directory of Open Access Journals (Sweden)
Gianmarco Alberti
Full Text Available Johnson's scalar stress theory, describing the mechanics of (and the remedies to the increase in in-group conflictuality that parallels the increase in groups' size, provides scholars with a useful theoretical framework for the understanding of different aspects of the material culture of past communities (i.e., social organization, communal food consumption, ceramic style, architecture and settlement layout. Due to its relevance in archaeology and anthropology, the article aims at proposing a predictive model of critical level of scalar stress on the basis of community size. Drawing upon Johnson's theory and on Dunbar's findings on the cognitive constrains to human group size, a model is built by means of Logistic Regression on the basis of the data on colony fissioning among the Hutterites of North America. On the grounds of the theoretical framework sketched in the first part of the article, the absence or presence of colony fissioning is considered expression of not critical vs. critical level of scalar stress for the sake of the model building. The model, which is also tested against a sample of archaeological and ethnographic cases: a confirms the existence of a significant relationship between critical scalar stress and group size, setting the issue on firmer statistical grounds; b allows calculating the intercept and slope of the logistic regression model, which can be used in any time to estimate the probability that a community experienced a critical level of scalar stress; c allows locating a critical scalar stress threshold at community size 127 (95% CI: 122-132, while the maximum probability of critical scale stress is predicted at size 158 (95% CI: 147-170. The model ultimately provides grounds to assess, for the sake of any further archaeological/anthropological interpretation, the probability that a group reached a hot spot of size development critical for its internal cohesion.
A theoretical model to describe progressions and regressions for exercise rehabilitation.
Blanchard, Sam; Glasgow, Phil
2014-08-01
This article aims to describe a new theoretical model to simplify and aid visualisation of the clinical reasoning process involved in progressing a single exercise. Exercise prescription is a core skill for physiotherapists but is an area that is lacking in theoretical models to assist clinicians when designing exercise programs to aid rehabilitation from injury. Historical models of periodization and motor learning theories lack any visual aids to assist clinicians. The concept of the proposed model is that new stimuli can be added or exchanged with other stimuli, either intrinsic or extrinsic to the participant, in order to gradually progress an exercise whilst remaining safe and effective. The proposed model maintains the core skills of physiotherapists by assisting clinical reasoning skills, exercise prescription and goal setting. It is not limited to any one pathology or rehabilitation setting and can adapted by any level of skilled clinician. PMID:24913914
A theoretical model to describe progressions and regressions for exercise rehabilitation.
Blanchard, Sam; Glasgow, Phil
2014-08-01
This article aims to describe a new theoretical model to simplify and aid visualisation of the clinical reasoning process involved in progressing a single exercise. Exercise prescription is a core skill for physiotherapists but is an area that is lacking in theoretical models to assist clinicians when designing exercise programs to aid rehabilitation from injury. Historical models of periodization and motor learning theories lack any visual aids to assist clinicians. The concept of the proposed model is that new stimuli can be added or exchanged with other stimuli, either intrinsic or extrinsic to the participant, in order to gradually progress an exercise whilst remaining safe and effective. The proposed model maintains the core skills of physiotherapists by assisting clinical reasoning skills, exercise prescription and goal setting. It is not limited to any one pathology or rehabilitation setting and can adapted by any level of skilled clinician.
Directory of Open Access Journals (Sweden)
Željko V. Račić
2010-12-01
Full Text Available This paper aims to present the specifics of the application of multiple linear regression model. The economic (financial crisis is analyzed in terms of gross domestic product which is in a function of the foreign trade balance (on one hand and the credit cards, i.e. indebtedness of the population on this basis (on the other hand, in the USA (from 1999. to 2008. We used the extended application model which shows how the analyst should run the whole development process of regression model. This process began with simple statistical features and the application of regression procedures, and ended with residual analysis, intended for the study of compatibility of data and model settings. This paper also analyzes the values of some standard statistics used in the selection of appropriate regression model. Testing of the model is carried out with the use of the Statistics PASW 17 program.
Adapting the ALP Model for Student and Institutional Needs
Sides, Meredith
2016-01-01
With the increasing adoption of accelerated models of learning comes the necessary step of adapting these models to fit the unique needs of the student population at each individual institution. One such college adapted the ALP (Accelerated Learning Program) model and made specific changes to the target population, structure and scheduling, and…
The ADAPT design model: towards instructional control of transfer
Jelsma, Otto; Merrienboer, van Jeroen J.G.; Bijlstra, Jim P.
1990-01-01
This paper presents a detailed description of the ADAPT (Apply Delayed Automatization for Positive Transfer) design model. ADAPT is based upon production system models of learning and provides guidelines for developing instructional systems that offer transfer of leamed skills. The model suggests th
STOCHASTIC ADAPTIVE SWITCHING CONTROL BASED ON MULTIPLE MODELS
Institute of Scientific and Technical Information of China (English)
ZHANG Yanxia; GUO Lei
2002-01-01
It is well known that the transient behaviors of the traditional adaptive control may be very poor in general, and that the adaptive control designed based on switching between multiple models is an intuitively appealing and practically feasible approach to improve the transient performances. In this paper, we shall prove that for a typical class of linear systems disturbed by random noises, the multiple model based least-squares (LS)adaptive switching control is stable and convergent, and has the same convergence rate as that established for the standard least-squares-based self-tunning regulators. Moreover,the mixed case combining adaptive models with fixed models is also considered.
A theoretical adaptive model of thermal comfort - Adaptive Predicted Mean Vote (aPMV)
Energy Technology Data Exchange (ETDEWEB)
Yao, Runming [School of Construction Management and Engineering, The University of Reading (United Kingdom); Faculty of Urban Construction and Environmental Engineering, Chongqing University (China); Li, Baizhan [Key Laboratory of the Three Gorges Reservoir Region' s Eco-Environment (Ministry of Education), Chongqing University (China); Faculty of Urban Construction and Environmental Engineering, Chongqing University (China); Liu, Jing [School of Construction Management and Engineering, The University of Reading (United Kingdom)
2009-10-15
This paper presents in detail a theoretical adaptive model of thermal comfort based on the ''Black Box'' theory, taking into account factors such as culture, climate, social, psychological and behavioural adaptations, which have an impact on the senses used to detect thermal comfort. The model is called the Adaptive Predicted Mean Vote (aPMV) model. The aPMV model explains, by applying the cybernetics concept, the phenomena that the Predicted Mean Vote (PMV) is greater than the Actual Mean Vote (AMV) in free-running buildings, which has been revealed by many researchers in field studies. An Adaptive coefficient ({lambda}) representing the adaptive factors that affect the sense of thermal comfort has been proposed. The empirical coefficients in warm and cool conditions for the Chongqing area in China have been derived by applying the least square method to the monitored onsite environmental data and the thermal comfort survey results. (author)
Directory of Open Access Journals (Sweden)
Juniarti Juniarti
2013-01-01
Full Text Available The study aims to prove whether good corporate governance (GCG is able to predict the probability of companies experiencing financial difficulties. Financial ratios that traditionally used for predicting bankruptcy remains used in this study. Besides, this study also compares logit and probit regression models, which are widely used in research related accounting bankruptcy prediction. Both models will be compared to determine which model is more superior. The sample in this study is the infrastructure, transportation, utilities & trade, services and hotels companies experiencing financial distress in the period 2008-2011. The results show that GCG and other three variables control i.e DTA, CR and company category do not prove significantly to predict the probability of companies experiencing financial difficulties. NPM, the only variable that proved significantly distinguishing healthy firms and distress. In general, logit and probit models do not result in different conclusions. Both of the models confirm the goodness of fit of models and the results of hypothesis testing. In terms of classification accuracy, logit model proves more accurate predictions than the probit models.
[Application of Land-use Regression Models in Spatial-temporal Differentiation of Air Pollution].
Wu, Jian-sheng; Xie, Wu-dan; Li, Jia-cheng
2016-02-15
With the rapid development of urbanization, industrialization and motorization, air pollution has become one of the most serious environmental problems in our country, which has negative impacts on public health and ecological environment. LUR model is one of the common methods simulating spatial-temporal differentiation of air pollution at city scale. It has broad application in Europe and North America, but not really in China. Based on many studies at home and abroad, this study started with the main steps to develop LUR model, including obtaining the monitoring data, generating variables, developing models, model validation and regression mapping. Then a conclusion was drawn on the progress of LUR models in spatial-temporal differentiation of air pollution. Furthermore, the research focus and orientation in the future were prospected, including highlighting spatial-temporal differentiation, increasing classes of model variables and improving the methods of model development. This paper was aimed to popularize the application of LUR model in China, and provide a methodological basis for human exposure, epidemiologic study and health risk assessment. PMID:27363125