WorldWideScience

Sample records for adaptive regression modeling

  1. Adaptive regression for modeling nonlinear relationships

    CERN Document Server

    Knafl, George J

    2016-01-01

    This book presents methods for investigating whether relationships are linear or nonlinear and for adaptively fitting appropriate models when they are nonlinear. Data analysts will learn how to incorporate nonlinearity in one or more predictor variables into regression models for different types of outcome variables. Such nonlinear dependence is often not considered in applied research, yet nonlinear relationships are common and so need to be addressed. A standard linear analysis can produce misleading conclusions, while a nonlinear analysis can provide novel insights into data, not otherwise possible. A variety of examples of the benefits of modeling nonlinear relationships are presented throughout the book. Methods are covered using what are called fractional polynomials based on real-valued power transformations of primary predictor variables combined with model selection based on likelihood cross-validation. The book covers how to formulate and conduct such adaptive fractional polynomial modeling in the s...

  2. Adaptive metric kernel regression

    DEFF Research Database (Denmark)

    Goutte, Cyril; Larsen, Jan

    2000-01-01

    regression by minimising a cross-validation estimate of the generalisation error. This allows to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms...

  3. Time-adaptive quantile regression

    DEFF Research Database (Denmark)

    Møller, Jan Kloppenborg; Nielsen, Henrik Aalborg; Madsen, Henrik

    2008-01-01

    An algorithm for time-adaptive quantile regression is presented. The algorithm is based on the simplex algorithm, and the linear optimization formulation of the quantile regression problem is given. The observations have been split to allow a direct use of the simplex algorithm. The simplex method...... and an updating procedure are combined into a new algorithm for time-adaptive quantile regression, which generates new solutions on the basis of the old solution, leading to savings in computation time. The suggested algorithm is tested against a static quantile regression model on a data set with wind power...... production, where the models combine splines and quantile regression. The comparison indicates superior performance for the time-adaptive quantile regression in all the performance parameters considered....

  4. Preference learning with evolutionary Multivariate Adaptive Regression Spline model

    DEFF Research Database (Denmark)

    Abou-Zleikha, Mohamed; Shaker, Noor; Christensen, Mads Græsbøll

    2015-01-01

    for human decision making. Learning models from pairwise preference data is however an NP-hard problem. Therefore, constructing models that can effectively learn such data is a challenging task. Models are usually constructed with accuracy being the most important factor. Another vitally important aspect...... that is usually given less attention is expressiveness, i.e. how easy it is to explain the relationship between the model input and output. Most machine learning techniques are focused either on performance or on expressiveness. This paper employ MARS models which have the advantage of being a powerful method...

  5. Adaptive Metric Kernel Regression

    DEFF Research Database (Denmark)

    Goutte, Cyril; Larsen, Jan

    1998-01-01

    by minimising a cross-validation estimate of the generalisation error. This allows one to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms the standard...

  6. PM10 modeling in the Oviedo urban area (Northern Spain) by using multivariate adaptive regression splines

    Science.gov (United States)

    Nieto, Paulino José García; Antón, Juan Carlos Álvarez; Vilán, José Antonio Vilán; García-Gonzalo, Esperanza

    2014-10-01

    The aim of this research work is to build a regression model of the particulate matter up to 10 micrometers in size (PM10) by using the multivariate adaptive regression splines (MARS) technique in the Oviedo urban area (Northern Spain) at local scale. This research work explores the use of a nonparametric regression algorithm known as multivariate adaptive regression splines (MARS) which has the ability to approximate the relationship between the inputs and outputs, and express the relationship mathematically. In this sense, hazardous air pollutants or toxic air contaminants refer to any substance that may cause or contribute to an increase in mortality or serious illness, or that may pose a present or potential hazard to human health. To accomplish the objective of this study, the experimental dataset of nitrogen oxides (NOx), carbon monoxide (CO), sulfur dioxide (SO2), ozone (O3) and dust (PM10) were collected over 3 years (2006-2008) and they are used to create a highly nonlinear model of the PM10 in the Oviedo urban nucleus (Northern Spain) based on the MARS technique. One main objective of this model is to obtain a preliminary estimate of the dependence between PM10 pollutant in the Oviedo urban area at local scale. A second aim is to determine the factors with the greatest bearing on air quality with a view to proposing health and lifestyle improvements. The United States National Ambient Air Quality Standards (NAAQS) establishes the limit values of the main pollutants in the atmosphere in order to ensure the health of healthy people. Firstly, this MARS regression model captures the main perception of statistical learning theory in order to obtain a good prediction of the dependence among the main pollutants in the Oviedo urban area. Secondly, the main advantages of MARS are its capacity to produce simple, easy-to-interpret models, its ability to estimate the contributions of the input variables, and its computational efficiency. Finally, on the basis of

  7. Risk factor selection in rate making: EM adaptive LASSO for zero-inflated poisson regression models.

    Science.gov (United States)

    Tang, Yanlin; Xiang, Liya; Zhu, Zhongyi

    2014-06-01

    Risk factor selection is very important in the insurance industry, which helps precise rate making and studying the features of high-quality insureds. Zero-inflated data are common in insurance, such as the claim frequency data, and zero-inflation makes the selection of risk factors quite difficult. In this article, we propose a new risk factor selection approach, EM adaptive LASSO, for a zero-inflated Poisson regression model, which combines the EM algorithm and adaptive LASSO penalty. Under some regularity conditions, we show that, with probability approaching 1, important factors are selected and the redundant factors are excluded. We investigate the finite sample performance of the proposed method through a simulation study and the analysis of car insurance data from SAS Enterprise Miner database.

  8. Air quality modeling in the Oviedo urban area (NW Spain) by using multivariate adaptive regression splines.

    Science.gov (United States)

    Nieto, P J García; Antón, J C Álvarez; Vilán, J A Vilán; García-Gonzalo, E

    2015-05-01

    The aim of this research work is to build a regression model of air quality by using the multivariate adaptive regression splines (MARS) technique in the Oviedo urban area (northern Spain) at a local scale. To accomplish the objective of this study, the experimental data set made up of nitrogen oxides (NO x ), carbon monoxide (CO), sulfur dioxide (SO2), ozone (O3), and dust (PM10) was collected over 3 years (2006-2008). The US National Ambient Air Quality Standards (NAAQS) establishes the limit values of the main pollutants in the atmosphere in order to ensure the health of healthy people. Firstly, this MARS regression model captures the main perception of statistical learning theory in order to obtain a good prediction of the dependence among the main pollutants in the Oviedo urban area. Secondly, the main advantages of MARS are its capacity to produce simple, easy-to-interpret models, its ability to estimate the contributions of the input variables, and its computational efficiency. Finally, on the basis of these numerical calculations, using the MARS technique, conclusions of this research work are exposed.

  9. Multivariate adaptive regression splines and neural network models for prediction of pile drivability

    Institute of Scientific and Technical Information of China (English)

    Wengang Zhang; Anthony T.C. Goh

    2016-01-01

    Piles are long, slender structural elements used to transfer the loads from the superstructure through weak strata onto stiffer soils or rocks. For driven piles, the impact of the piling hammer induces compression and tension stresses in the piles. Hence, an important design consideration is to check that the strength of the pile is sufficient to resist the stresses caused by the impact of the pile hammer. Due to its complexity, pile drivability lacks a precise analytical solution with regard to the phenomena involved. In situations where measured data or numerical hypothetical results are available, neural networks stand out in mapping the nonlinear interactions and relationships between the system’s predictors and dependent responses. In addition, unlike most computational tools, no mathematical relationship assumption between the dependent and independent variables has to be made. Nevertheless, neural networks have been criticized for their long trial-and-error training process since the optimal configu-ration is not known a priori. This paper investigates the use of a fairly simple nonparametric regression algorithm known as multivariate adaptive regression splines (MARS), as an alternative to neural net-works, to approximate the relationship between the inputs and dependent response, and to mathe-matically interpret the relationship between the various parameters. In this paper, the Back propagation neural network (BPNN) and MARS models are developed for assessing pile drivability in relation to the prediction of the Maximum compressive stresses (MCS), Maximum tensile stresses (MTS), and Blow per foot (BPF). A database of more than four thousand piles is utilized for model development and comparative performance between BPNN and MARS predictions.

  10. Adaptive robust polynomial regression for power curve modeling with application to wind power forecasting

    DEFF Research Database (Denmark)

    Xu, Man; Pinson, Pierre; Lu, Zongxiang

    2016-01-01

    swiftly and also achieve a better trade-off between robustness against noisy data and time adaptivity. A case study based on a real-world dataset validates the properties of the proposed regression method. Results show that the new method could flexibly respond to abnormal data at different lead times...... of the energy conversion process. Such nature may be due the varying wind conditions, aging and state of the turbines, etc. And, an equivalent steady-state power curve, estimated under normal operating conditions with the intention to filter abnormal data, is not sufficient to solve the problem because...

  11. Adapting Predictive Models for Cepheid Variable Star Classification Using Linear Regression and Maximum Likelihood

    Science.gov (United States)

    Gupta, Kinjal Dhar; Vilalta, Ricardo; Asadourian, Vicken; Macri, Lucas

    2014-05-01

    We describe an approach to automate the classification of Cepheid variable stars into two subtypes according to their pulsation mode. Automating such classification is relevant to obtain a precise determination of distances to nearby galaxies, which in addition helps reduce the uncertainty in the current expansion of the universe. One main difficulty lies in the compatibility of models trained using different galaxy datasets; a model trained using a training dataset may be ineffectual on a testing set. A solution to such difficulty is to adapt predictive models across domains; this is necessary when the training and testing sets do not follow the same distribution. The gist of our methodology is to train a predictive model on a nearby galaxy (e.g., Large Magellanic Cloud), followed by a model-adaptation step to make the model operable on other nearby galaxies. We follow a parametric approach to density estimation by modeling the training data (anchor galaxy) using a mixture of linear models. We then use maximum likelihood to compute the right amount of variable displacement, until the testing data closely overlaps the training data. At that point, the model can be directly used in the testing data (target galaxy).

  12. Recursive Gaussian Process Regression Model for Adaptive Quality Monitoring in Batch Processes

    Directory of Open Access Journals (Sweden)

    Le Zhou

    2015-01-01

    Full Text Available In chemical batch processes with slow responses and a long duration, it is time-consuming and expensive to obtain sufficient normal data for statistical analysis. With the persistent accumulation of the newly evolving data, the modelling becomes adequate gradually and the subsequent batches will change slightly owing to the slow time-varying behavior. To efficiently make use of the small amount of initial data and the newly evolving data sets, an adaptive monitoring scheme based on the recursive Gaussian process (RGP model is designed in this paper. Based on the initial data, a Gaussian process model and the corresponding SPE statistic are constructed at first. When the new batches of data are included, a strategy based on the RGP model is used to choose the proper data for model updating. The performance of the proposed method is finally demonstrated by a penicillin fermentation batch process and the result indicates that the proposed monitoring scheme is effective for adaptive modelling and online monitoring.

  13. Self-Adaptive Revised Land Use Regression Models for Estimating PM2.5 Concentrations in Beijing, China

    Directory of Open Access Journals (Sweden)

    Lujin Hu

    2016-08-01

    Full Text Available Heavy air pollution, especially fine particulate matter (PM2.5, poses serious challenges to environmental sustainability in Beijing. Epidemiological studies and the identification of measures for preventing serious air pollution both require accurate PM2.5 spatial distribution data. Land use regression (LUR models are promising for estimating the spatial distribution of PM2.5 at a high spatial resolution. However, typical LUR models have a limited sampling point explanation rate (SPER, i.e., the rate of the sampling points with reasonable predicted concentrations to the total number of sampling points and accuracy. Hence, self-adaptive revised LUR models are proposed in this paper for improving the SPER and accuracy of typical LUR models. The self-adaptive revised LUR model combines a typical LUR model with self-adaptive LUR model groups. The typical LUR model was used to estimate the PM2.5 concentrations, and the self-adaptive LUR model groups were constructed for all of the sampling points removed from the typical LUR model because they were beyond the prediction data range, which was from 60% of the minimum observation to 120% of the maximum observation. The final results were analyzed using three methods, including an accuracy analysis, and were compared with typical LUR model results and the spatial variations in Beijing. The accuracy satisfied the demands of the analysis, and the accuracies at the different monitoring sites indicated spatial variations in the accuracy of the self-adaptive revised LUR model. The accuracy was high in the central area and low in suburban areas. The comparison analysis showed that the self-adaptive LUR model increased the SPER from 75% to 90% and increased the accuracy (based on the root-mean-square error from 20.643 μg/m3 to 17.443 μg/m3 for the PM2.5 concentrations during the winter of 2014 in Beijing. The spatial variation analysis for Beijing showed that the PM2.5 concentrations were low in the north

  14. Flexible survival regression modelling

    DEFF Research Database (Denmark)

    Cortese, Giuliana; Scheike, Thomas H; Martinussen, Torben

    2009-01-01

    Regression analysis of survival data, and more generally event history data, is typically based on Cox's regression model. We here review some recent methodology, focusing on the limitations of Cox's regression model. The key limitation is that the model is not well suited to represent time......-varying effects. We start by considering classical and also more recent goodness-of-fit procedures for the Cox model that will reveal when the Cox model does not capture important aspects of the data, such as time-varying effects. We present recent regression models that are able to deal with and describe...... such time-varying effects. The introduced models are all applied to data on breast cancer from the Norwegian cancer registry, and these analyses clearly reveal the shortcomings of Cox's regression model and the need for other supplementary analyses with models such as those we present here....

  15. A novel method of target recognition based on 3D-color-space locally adaptive regression kernels model

    Science.gov (United States)

    Liu, Jiaqi; Han, Jing; Zhang, Yi; Bai, Lianfa

    2015-10-01

    Locally adaptive regression kernels model can describe the edge shape of images accurately and graphic trend of images integrally, but it did not consider images' color information while the color is an important element of an image. Therefore, we present a novel method of target recognition based on 3-D-color-space locally adaptive regression kernels model. Different from the general additional color information, this method directly calculate the local similarity features of 3-D data from the color image. The proposed method uses a few examples of an object as a query to detect generic objects with incompact, complex and changeable shapes. Our method involves three phases: First, calculating the novel color-space descriptors from the RGB color space of query image which measure the likeness of a voxel to its surroundings. Salient features which include spatial- dimensional and color -dimensional information are extracted from said descriptors, and simplifying them to construct a non-similar local structure feature set of the object class by principal components analysis (PCA). Second, we compare the salient features with analogous features from the target image. This comparison is done using a matrix generalization of the cosine similarity measure. Then the similar structures in the target image are obtained using local similarity structure statistical matching. Finally, we use the method of non-maxima suppression in the similarity image to extract the object position and mark the object in the test image. Experimental results demonstrate that our approach is effective and accurate in improving the ability to identify targets.

  16. Monte Carlo sampling and multivariate adaptive regression splines as tools for QSAR modelling of HIV-1 reverse transcriptase inhibitors.

    Science.gov (United States)

    Alamdari, R F; Mani-Varnosfaderani, A; Asadollahi-Baboli, M; Khalafi-Nezhad, A

    2012-10-01

    The present work focuses on the development of an interpretable quantitative structure-activity relationship (QSAR) model for predicting the anti-HIV activities of 67 thiazolylthiourea derivatives. This set of molecules has been proposed as potent HIV-1 reverse transcriptase inhibitors (RT-INs). The molecules were encoded to a diverse set of molecular descriptors, spanning different physical and chemical properties. Monte Carlo (MC) sampling and multivariate adaptive regression spline (MARS) techniques were used to select the most important descriptors and to predict the activity of the molecules. The most important descriptor was found to be the aspherisity index. The analysis of variance (ANOVA) and interpretable spline equations showed that the geometrical shape of the molecules has considerable effect on their activities. It seems that the linear molecules are more active than symmetric top compounds. The final MARS model derived displayed a good predictive ability judging from the determination coefficient corresponding to the leave multiple out (LMO) cross-validation technique, i.e. r (2 )= 0.828 (M = 12) and r (2 )= 0.813 (M = 20). The results of this work showed that the developed spline model is robust, has a good predictive power, and can then be used as a reliable tool for designing novel HIV-1 RT-INs.

  17. On the interest of combining an analog model to a regression model for the adaptation of the downscaling link. Application to probabilistic prediction of precipitation over France.

    Science.gov (United States)

    Chardon, Jérémy; Hingray, Benoit; Favre, Anne-Catherine

    2016-04-01

    Scenarios of surface weather required for the impact studies have to be unbiased and adapted to the space and time scales of the considered hydro-systems. Hence, surface weather scenarios obtained from global climate models and/or numerical weather prediction models are not really appropriated. Outputs of these models have to be post-processed, which is often carried out thanks to Statistical Downscaling Methods (SDMs). Among those SDMs, approaches based on regression are often applied. For a given station, a regression link can be established between a set of large scale atmospheric predictors and the surface weather variable. These links are then used for the prediction of the latter. However, physical processes generating surface weather vary in time. This is well known for precipitation for instance. The most relevant predictors and the regression link are also likely to vary in time. A better prediction skill is thus classically obtained with a seasonal stratification of the data. Another strategy is to identify the most relevant predictor set and establish the regression link from dates that are similar - or analog - to the target date. In practice, these dates can be selected thanks to an analog model. In this study, we explore the possibility of improving the local performance of an analog model - where the analogy is applied to the geopotential heights 1000 and 500 hPa - using additional local scale predictors for the probabilistic prediction of the Safran precipitation over France. For each prediction day, the prediction is obtained from two GLM regression models - for both the occurrence and the quantity of precipitation - for which predictors and parameters are estimated from the analog dates. Firstly, the resulting combined model noticeably allows increasing the prediction performance by adapting the downscaling link for each prediction day. Secondly, the selected predictors for a given prediction depend on the large scale situation and on the

  18. TWO REGRESSION CREDIBILITY MODELS

    Directory of Open Access Journals (Sweden)

    Constanţa-Nicoleta BODEA

    2010-03-01

    Full Text Available In this communication we will discuss two regression credibility models from Non – Life Insurance Mathematics that can be solved by means of matrix theory. In the first regression credibility model, starting from a well-known representation formula of the inverse for a special class of matrices a risk premium will be calculated for a contract with risk parameter θ. In the next regression credibility model, we will obtain a credibility solution in the form of a linear combination of the individual estimate (based on the data of a particular state and the collective estimate (based on aggregate USA data. To illustrate the solution with the properties mentioned above, we shall need the well-known representation theorem for a special class of matrices, the properties of the trace for a square matrix, the scalar product of two vectors, the norm with respect to a positive definite matrix given in advance and the complicated mathematical properties of conditional expectations and of conditional covariances.

  19. Drought forecasting in eastern Australia using multivariate adaptive regression spline, least square support vector machine and M5Tree model

    Science.gov (United States)

    Deo, Ravinesh C.; Kisi, Ozgur; Singh, Vijay P.

    2017-02-01

    Drought forecasting using standardized metrics of rainfall is a core task in hydrology and water resources management. Standardized Precipitation Index (SPI) is a rainfall-based metric that caters for different time-scales at which the drought occurs, and due to its standardization, is well-suited for forecasting drought at different periods in climatically diverse regions. This study advances drought modelling using multivariate adaptive regression splines (MARS), least square support vector machine (LSSVM), and M5Tree models by forecasting SPI in eastern Australia. MARS model incorporated rainfall as mandatory predictor with month (periodicity), Southern Oscillation Index, Pacific Decadal Oscillation Index and Indian Ocean Dipole, ENSO Modoki and Nino 3.0, 3.4 and 4.0 data added gradually. The performance was evaluated with root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (r2). Best MARS model required different input combinations, where rainfall, sea surface temperature and periodicity were used for all stations, but ENSO Modoki and Pacific Decadal Oscillation indices were not required for Bathurst, Collarenebri and Yamba, and the Southern Oscillation Index was not required for Collarenebri. Inclusion of periodicity increased the r2 value by 0.5-8.1% and reduced RMSE by 3.0-178.5%. Comparisons showed that MARS superseded the performance of the other counterparts for three out of five stations with lower MAE by 15.0-73.9% and 7.3-42.2%, respectively. For the other stations, M5Tree was better than MARS/LSSVM with lower MAE by 13.8-13.4% and 25.7-52.2%, respectively, and for Bathurst, LSSVM yielded more accurate result. For droughts identified by SPI ≤ - 0.5, accurate forecasts were attained by MARS/M5Tree for Bathurst, Yamba and Peak Hill, whereas for Collarenebri and Barraba, M5Tree was better than LSSVM/MARS. Seasonal analysis revealed disparate results where MARS/M5Tree was better than LSSVM. The results highlight the

  20. The adaptive Lasso for Logistic regression models%Logistic模型中参数的自适应Lasso估计

    Institute of Scientific and Technical Information of China (English)

    王娉; 郭鹏江; 夏志明

    2012-01-01

    目的 研究Logistic模型的参数估计.方法 在L1罚中引用一个自适应的权,即自适应Lasso方法.结果 自适应Lasso方法对Logistic模型同时进行了模型选择和参数估计.结论 在一定的正则条件下,Logistic模型的自适应Lasso估计是满足Oracle性质的.%Aim To estimate the parameters in the Logistic model. Methods Adaptive weights are used in the L1 penalty, which is adaptive Lasso. Results The adaptive Lasso selects variables and estimates parameters simulta-neously for the Logistic model. Conclusion Under certain regular conditions, the adaptive Lasso enjoys the oracle properties.

  1. Stock price forecasting for companies listed on Tehran stock exchange using multivariate adaptive regression splines model and semi-parametric splines technique

    Science.gov (United States)

    Rounaghi, Mohammad Mahdi; Abbaszadeh, Mohammad Reza; Arashi, Mohammad

    2015-11-01

    One of the most important topics of interest to investors is stock price changes. Investors whose goals are long term are sensitive to stock price and its changes and react to them. In this regard, we used multivariate adaptive regression splines (MARS) model and semi-parametric splines technique for predicting stock price in this study. The MARS model as a nonparametric method is an adaptive method for regression and it fits for problems with high dimensions and several variables. semi-parametric splines technique was used in this study. Smoothing splines is a nonparametric regression method. In this study, we used 40 variables (30 accounting variables and 10 economic variables) for predicting stock price using the MARS model and using semi-parametric splines technique. After investigating the models, we select 4 accounting variables (book value per share, predicted earnings per share, P/E ratio and risk) as influencing variables on predicting stock price using the MARS model. After fitting the semi-parametric splines technique, only 4 accounting variables (dividends, net EPS, EPS Forecast and P/E Ratio) were selected as variables effective in forecasting stock prices.

  2. Adaptive support vector regression for UAV flight control.

    Science.gov (United States)

    Shin, Jongho; Jin Kim, H; Kim, Youdan

    2011-01-01

    This paper explores an application of support vector regression for adaptive control of an unmanned aerial vehicle (UAV). Unlike neural networks, support vector regression (SVR) generates global solutions, because SVR basically solves quadratic programming (QP) problems. With this advantage, the input-output feedback-linearized inverse dynamic model and the compensation term for the inversion error are identified off-line, which we call I-SVR (inversion SVR) and C-SVR (compensation SVR), respectively. In order to compensate for the inversion error and the unexpected uncertainty, an online adaptation algorithm for the C-SVR is proposed. Then, the stability of the overall error dynamics is analyzed by the uniformly ultimately bounded property in the nonlinear system theory. In order to validate the effectiveness of the proposed adaptive controller, numerical simulations are performed on the UAV model.

  3. Discriminating between adaptive and carcinogenic liver hypertrophy in rat studies using logistic ridge regression analysis of toxicogenomic data: The mode of action and predictive models.

    Science.gov (United States)

    Liu, Shujie; Kawamoto, Taisuke; Morita, Osamu; Yoshinari, Kouichi; Honda, Hiroshi

    2017-03-01

    Chemical exposure often results in liver hypertrophy in animal tests, characterized by increased liver weight, hepatocellular hypertrophy, and/or cell proliferation. While most of these changes are considered adaptive responses, there is concern that they may be associated with carcinogenesis. In this study, we have employed a toxicogenomic approach using a logistic ridge regression model to identify genes responsible for liver hypertrophy and hypertrophic hepatocarcinogenesis and to develop a predictive model for assessing hypertrophy-inducing compounds. Logistic regression models have previously been used in the quantification of epidemiological risk factors. DNA microarray data from the Toxicogenomics Project-Genomics Assisted Toxicity Evaluation System were used to identify hypertrophy-related genes that are expressed differently in hypertrophy induced by carcinogens and non-carcinogens. Data were collected for 134 chemicals (72 non-hypertrophy-inducing chemicals, 27 hypertrophy-inducing non-carcinogenic chemicals, and 15 hypertrophy-inducing carcinogenic compounds). After applying logistic ridge regression analysis, 35 genes for liver hypertrophy (e.g., Acot1 and Abcc3) and 13 genes for hypertrophic hepatocarcinogenesis (e.g., Asns and Gpx2) were selected. The predictive models built using these genes were 94.8% and 82.7% accurate, respectively. Pathway analysis of the genes indicates that, aside from a xenobiotic metabolism-related pathway as an adaptive response for liver hypertrophy, amino acid biosynthesis and oxidative responses appear to be involved in hypertrophic hepatocarcinogenesis. Early detection and toxicogenomic characterization of liver hypertrophy using our models may be useful for predicting carcinogenesis. In addition, the identified genes provide novel insight into discrimination between adverse hypertrophy associated with carcinogenesis and adaptive hypertrophy in risk assessment.

  4. A New Predictive Model Based on the ABC Optimized Multivariate Adaptive Regression Splines Approach for Predicting the Remaining Useful Life in Aircraft Engines

    Directory of Open Access Journals (Sweden)

    Paulino José García Nieto

    2016-05-01

    Full Text Available Remaining useful life (RUL estimation is considered as one of the most central points in the prognostics and health management (PHM. The present paper describes a nonlinear hybrid ABC–MARS-based model for the prediction of the remaining useful life of aircraft engines. Indeed, it is well-known that an accurate RUL estimation allows failure prevention in a more controllable way so that the effective maintenance can be carried out in appropriate time to correct impending faults. The proposed hybrid model combines multivariate adaptive regression splines (MARS, which have been successfully adopted for regression problems, with the artificial bee colony (ABC technique. This optimization technique involves parameter setting in the MARS training procedure, which significantly influences the regression accuracy. However, its use in reliability applications has not yet been widely explored. Bearing this in mind, remaining useful life values have been predicted here by using the hybrid ABC–MARS-based model from the remaining measured parameters (input variables for aircraft engines with success. A correlation coefficient equal to 0.92 was obtained when this hybrid ABC–MARS-based model was applied to experimental data. The agreement of this model with experimental data confirmed its good performance. The main advantage of this predictive model is that it does not require information about the previous operation states of the aircraft engine.

  5. Adaptive Lasso for Poisson log-linear regression model%自适应Lasso在Poisson对数线性回归模型下的性质

    Institute of Scientific and Technical Information of China (English)

    崔静; 郭鹏江; 夏志明

    2011-01-01

    Aim To study adaptive Lasso for Poisson log-linear regrersion model. Methods The methods of mathematical analysis and probability theory are used. Results Under some conditions, the adaptive Lasso estimator for Poisson log-linear regression has the oracle properties which are sparsity and asymptotic normality. Conclusion A-daptive Lasso can effectively choose variables for Poisson log-linar regression model and estimate the variable coefficient.%目的 研究自适应Lasso在Poisson对数线性模型下的性质.方法 利用数学分析及概率论中的性质.结果 证明了在Poisson对数线性模型下自适应Lasso估计量具有稀疏性和渐进正态性.结论 自适应Lasso可以有效选择Poisson对数线性模型中的变量,并同时估计变量系数.

  6. Adaptive Rank Penalized Estimators in Multivariate Regression

    CERN Document Server

    Bunea, Florentina; Wegkamp, Marten

    2010-01-01

    We introduce a new criterion, the Rank Selection Criterion (RSC), for selecting the optimal reduced rank estimator of the coefficient matrix in multivariate response regression models. The corresponding RSC estimator minimizes the Frobenius norm of the fit plus a regularization term proportional to the number of parameters in the reduced rank model. The rank of the RSC estimator provides a consistent estimator of the rank of the coefficient matrix. The consistency results are valid not only in the classic asymptotic regime, when the number of responses $n$ and predictors $p$ stays bounded, and the number of observations $m$ grows, but also when either, or both, $n$ and $p$ grow, possibly much faster than $m$. Our finite sample prediction and estimation performance bounds show that the RSC estimator achieves the optimal balance between the approximation error and the penalty term. Furthermore, our procedure has very low computational complexity, linear in the number of candidate models, making it particularly ...

  7. Forecasting with Dynamic Regression Models

    CERN Document Server

    Pankratz, Alan

    2012-01-01

    One of the most widely used tools in statistical forecasting, single equation regression models is examined here. A companion to the author's earlier work, Forecasting with Univariate Box-Jenkins Models: Concepts and Cases, the present text pulls together recent time series ideas and gives special attention to possible intertemporal patterns, distributed lag responses of output to input series and the auto correlation patterns of regression disturbance. It also includes six case studies.

  8. Ridge Regression for Interactive Models.

    Science.gov (United States)

    Tate, Richard L.

    1988-01-01

    An exploratory study of the value of ridge regression for interactive models is reported. Assuming that the linear terms in a simple interactive model are centered to eliminate non-essential multicollinearity, a variety of common models, representing both ordinal and disordinal interactions, are shown to have "orientations" that are favorable to…

  9. Prediction of longitudinal dispersion coefficient using multivariate adaptive regression splines

    Indian Academy of Sciences (India)

    Amir Hamzeh Haghiabi

    2016-07-01

    In this paper, multivariate adaptive regression splines (MARS) was developed as a novel soft-computingtechnique for predicting longitudinal dispersion coefficient (DL) in rivers. As mentioned in the literature,experimental dataset related to DL was collected and used for preparing MARS model. Results of MARSmodel were compared with multi-layer neural network model and empirical formulas. To define the mosteffective parameters on DL, the Gamma test was used. Performance of MARS model was assessed bycalculation of standard error indices. Error indices showed that MARS model has suitable performanceand is more accurate compared to multi-layer neural network model and empirical formulas. Results ofthe Gamma test and MARS model showed that flow depth (H) and ratio of the mean velocity to shearvelocity (u/u^∗) were the most effective parameters on the DL.

  10. Inferential Models for Linear Regression

    Directory of Open Access Journals (Sweden)

    Zuoyi Zhang

    2011-09-01

    Full Text Available Linear regression is arguably one of the most widely used statistical methods in applications.  However, important problems, especially variable selection, remain a challenge for classical modes of inference.  This paper develops a recently proposed framework of inferential models (IMs in the linear regression context.  In general, an IM is able to produce meaningful probabilistic summaries of the statistical evidence for and against assertions about the unknown parameter of interest and, moreover, these summaries are shown to be properly calibrated in a frequentist sense.  Here we demonstrate, using simple examples, that the IM framework is promising for linear regression analysis --- including model checking, variable selection, and prediction --- and for uncertain inference in general.

  11. Heteroscedasticity checks for regression models

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    For checking on heteroscedasticity in regression models, a unified approach is proposed to constructing test statistics in parametric and nonparametric regression models. For nonparametric regression, the test is not affected sensitively by the choice of smoothing parameters which are involved in estimation of the nonparametric regression function. The limiting null distribution of the test statistic remains the same in a wide range of the smoothing parameters. When the covariate is one-dimensional, the tests are, under some conditions, asymptotically distribution-free. In the high-dimensional cases, the validity of bootstrap approximations is investigated. It is shown that a variant of the wild bootstrap is consistent while the classical bootstrap is not in the general case, but is applicable if some extra assumption on conditional variance of the squared error is imposed. A simulation study is performed to provide evidence of how the tests work and compare with tests that have appeared in the literature. The approach may readily be extended to handle partial linear, and linear autoregressive models.

  12. Heteroscedasticity checks for regression models

    Institute of Scientific and Technical Information of China (English)

    ZHU; Lixing

    2001-01-01

    [1]Carroll, R. J., Ruppert, D., Transformation and Weighting in Regression, New York: Chapman and Hall, 1988.[2]Cook, R. D., Weisberg, S., Diagnostics for heteroscedasticity in regression, Biometrika, 1988, 70: 1—10.[3]Davidian, M., Carroll, R. J., Variance function estimation, J. Amer. Statist. Assoc., 1987, 82: 1079—1091.[4]Bickel, P., Using residuals robustly I: Tests for heteroscedasticity, Ann. Statist., 1978, 6: 266—291.[5]Carroll, R. J., Ruppert, D., On robust tests for heteroscedasticity, Ann. Statist., 1981, 9: 205—209.[6]Eubank, R. L., Thomas, W., Detecting heteroscedasticity in nonparametric regression, J. Roy. Statist. Soc., Ser. B, 1993, 55: 145—155.[7]Diblasi, A., Bowman, A., Testing for constant variance in a linear model, Statist. and Probab. Letters, 1997, 33: 95—103.[8]Dette, H., Munk, A., Testing heteoscedasticity in nonparametric regression, J. R. Statist. Soc. B, 1998, 60: 693—708.[9]Müller, H. G., Zhao, P. L., On a semi-parametric variance function model and a test for heteroscedasticity, Ann. Statist., 1995, 23: 946—967.[10]Stute, W., Manteiga, G., Quindimil, M. P., Bootstrap approximations in model checks for regression, J. Amer. Statist. Asso., 1998, 93: 141—149.[11]Stute, W., Thies, G., Zhu, L. X., Model checks for regression: An innovation approach, Ann. Statist., 1998, 26: 1916—1939.[12]Shorack, G. R., Wellner, J. A., Empirical Processes with Applications to Statistics, New York: Wiley, 1986.[13]Efron, B., Bootstrap methods: Another look at the jackknife, Ann. Statist., 1979, 7: 1—26.[14]Wu, C. F. J., Jackknife, bootstrap and other re-sampling methods in regression analysis, Ann. Statist., 1986, 14: 1261—1295.[15]H rdle, W., Mammen, E., Comparing non-parametric versus parametric regression fits, Ann. Statist., 1993, 21: 1926—1947.[16]Liu, R. Y., Bootstrap procedures under some non-i.i.d. models, Ann. Statist., 1988, 16: 1696—1708.[17

  13. Regional vertical total electron content (VTEC) modeling together with satellite and receiver differential code biases (DCBs) using semi-parametric multivariate adaptive regression B-splines (SP-BMARS)

    Science.gov (United States)

    Durmaz, Murat; Karslioglu, Mahmut Onur

    2015-04-01

    There are various global and regional methods that have been proposed for the modeling of ionospheric vertical total electron content (VTEC). Global distribution of VTEC is usually modeled by spherical harmonic expansions, while tensor products of compactly supported univariate B-splines can be used for regional modeling. In these empirical parametric models, the coefficients of the basis functions as well as differential code biases (DCBs) of satellites and receivers can be treated as unknown parameters which can be estimated from geometry-free linear combinations of global positioning system observables. In this work we propose a new semi-parametric multivariate adaptive regression B-splines (SP-BMARS) method for the regional modeling of VTEC together with satellite and receiver DCBs, where the parametric part of the model is related to the DCBs as fixed parameters and the non-parametric part adaptively models the spatio-temporal distribution of VTEC. The latter is based on multivariate adaptive regression B-splines which is a non-parametric modeling technique making use of compactly supported B-spline basis functions that are generated from the observations automatically. This algorithm takes advantage of an adaptive scale-by-scale model building strategy that searches for best-fitting B-splines to the data at each scale. The VTEC maps generated from the proposed method are compared numerically and visually with the global ionosphere maps (GIMs) which are provided by the Center for Orbit Determination in Europe (CODE). The VTEC values from SP-BMARS and CODE GIMs are also compared with VTEC values obtained through calibration using local ionospheric model. The estimated satellite and receiver DCBs from the SP-BMARS model are compared with the CODE distributed DCBs. The results show that the SP-BMARS algorithm can be used to estimate satellite and receiver DCBs while adaptively and flexibly modeling the daily regional VTEC.

  14. Sparse Volterra and Polynomial Regression Models: Recoverability and Estimation

    CERN Document Server

    Kekatos, Vassilis

    2011-01-01

    Volterra and polynomial regression models play a major role in nonlinear system identification and inference tasks. Exciting applications ranging from neuroscience to genome-wide association analysis build on these models with the additional requirement of parsimony. This requirement has high interpretative value, but unfortunately cannot be met by least-squares based or kernel regression methods. To this end, compressed sampling (CS) approaches, already successful in linear regression settings, can offer a viable alternative. The viability of CS for sparse Volterra and polynomial models is the core theme of this work. A common sparse regression task is initially posed for the two models. Building on (weighted) Lasso-based schemes, an adaptive RLS-type algorithm is developed for sparse polynomial regressions. The identifiability of polynomial models is critically challenged by dimensionality. However, following the CS principle, when these models are sparse, they could be recovered by far fewer measurements. ...

  15. Regression modeling of ground-water flow

    Science.gov (United States)

    Cooley, R.L.; Naff, R.L.

    1985-01-01

    Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)

  16. [From clinical judgment to linear regression model.

    Science.gov (United States)

    Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O

    2013-01-01

    When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R(2)) indicates the importance of independent variables in the outcome.

  17. Semiparametric Regression and Model Refining

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    This paper presents a semiparametric adjustment method suitable for general cases.Assuming that the regularizer matrix is positive definite,the calculation method is discussed and the corresponding formulae are presented.Finally,a simulated adjustment problem is constructed to explain the method given in this paper.The results from the semiparametric model and G-M model are compared.The results demonstrate that the model errors or the systematic errors of the observations can be detected correctly with the semiparametric estimate method.

  18. Regression Model With Elliptically Contoured Errors

    CERN Document Server

    Arashi, M; Tabatabaey, S M M

    2012-01-01

    For the regression model where the errors follow the elliptically contoured distribution (ECD), we consider the least squares (LS), restricted LS (RLS), preliminary test (PT), Stein-type shrinkage (S) and positive-rule shrinkage (PRS) estimators for the regression parameters. We compare the quadratic risks of the estimators to determine the relative dominance properties of the five estimators.

  19. The Infinite Hierarchical Factor Regression Model

    CERN Document Server

    Rai, Piyush

    2009-01-01

    We propose a nonparametric Bayesian factor regression model that accounts for uncertainty in the number of factors, and the relationship between factors. To accomplish this, we propose a sparse variant of the Indian Buffet Process and couple this with a hierarchical model over factors, based on Kingman's coalescent. We apply this model to two problems (factor analysis and factor regression) in gene-expression data analysis.

  20. Regression modeling methods, theory, and computation with SAS

    CERN Document Server

    Panik, Michael

    2009-01-01

    Regression Modeling: Methods, Theory, and Computation with SAS provides an introduction to a diverse assortment of regression techniques using SAS to solve a wide variety of regression problems. The author fully documents the SAS programs and thoroughly explains the output produced by the programs.The text presents the popular ordinary least squares (OLS) approach before introducing many alternative regression methods. It covers nonparametric regression, logistic regression (including Poisson regression), Bayesian regression, robust regression, fuzzy regression, random coefficients regression,

  1. Robust Bayesian Regularized Estimation Based on t Regression Model

    Directory of Open Access Journals (Sweden)

    Zean Li

    2015-01-01

    Full Text Available The t distribution is a useful extension of the normal distribution, which can be used for statistical modeling of data sets with heavy tails, and provides robust estimation. In this paper, in view of the advantages of Bayesian analysis, we propose a new robust coefficient estimation and variable selection method based on Bayesian adaptive Lasso t regression. A Gibbs sampler is developed based on the Bayesian hierarchical model framework, where we treat the t distribution as a mixture of normal and gamma distributions and put different penalization parameters for different regression coefficients. We also consider the Bayesian t regression with adaptive group Lasso and obtain the Gibbs sampler from the posterior distributions. Both simulation studies and real data example show that our method performs well compared with other existing methods when the error distribution has heavy tails and/or outliers.

  2. Applied Regression Modeling A Business Approach

    CERN Document Server

    Pardoe, Iain

    2012-01-01

    An applied and concise treatment of statistical regression techniques for business students and professionals who have little or no background in calculusRegression analysis is an invaluable statistical methodology in business settings and is vital to model the relationship between a response variable and one or more predictor variables, as well as the prediction of a response value given values of the predictors. In view of the inherent uncertainty of business processes, such as the volatility of consumer spending and the presence of market uncertainty, business professionals use regression a

  3. A new bivariate negative binomial regression model

    Science.gov (United States)

    Faroughi, Pouya; Ismail, Noriszura

    2014-12-01

    This paper introduces a new form of bivariate negative binomial (BNB-1) regression which can be fitted to bivariate and correlated count data with covariates. The BNB regression discussed in this study can be fitted to bivariate and overdispersed count data with positive, zero or negative correlations. The joint p.m.f. of the BNB1 distribution is derived from the product of two negative binomial marginals with a multiplicative factor parameter. Several testing methods were used to check overdispersion and goodness-of-fit of the model. Application of BNB-1 regression is illustrated on Malaysian motor insurance dataset. The results indicated that BNB-1 regression has better fit than bivariate Poisson and BNB-2 models with regards to Akaike information criterion.

  4. Constrained regression models for optimization and forecasting

    Directory of Open Access Journals (Sweden)

    P.J.S. Bruwer

    2003-12-01

    Full Text Available Linear regression models and the interpretation of such models are investigated. In practice problems often arise with the interpretation and use of a given regression model in spite of the fact that researchers may be quite "satisfied" with the model. In this article methods are proposed which overcome these problems. This is achieved by constructing a model where the "area of experience" of the researcher is taken into account. This area of experience is represented as a convex hull of available data points. With the aid of a linear programming model it is shown how conclusions can be formed in a practical way regarding aspects such as optimal levels of decision variables and forecasting.

  5. Modeling confounding by half-sibling regression

    DEFF Research Database (Denmark)

    Schölkopf, Bernhard; Hogg, David W; Wang, Dun

    2016-01-01

    We describe a method for removing the effect of confounders to reconstruct a latent quantity of interest. The method, referred to as "half-sibling regression," is inspired by recent work in causal inference using additive noise models. We provide a theoretical justification, discussing both...

  6. A Skew-Normal Mixture Regression Model

    Science.gov (United States)

    Liu, Min; Lin, Tsung-I

    2014-01-01

    A challenge associated with traditional mixture regression models (MRMs), which rest on the assumption of normally distributed errors, is determining the number of unobserved groups. Specifically, even slight deviations from normality can lead to the detection of spurious classes. The current work aims to (a) examine how sensitive the commonly…

  7. Efficient robust nonparametric estimation in a semimartingale regression model

    CERN Document Server

    Konev, Victor

    2010-01-01

    The paper considers the problem of robust estimating a periodic function in a continuous time regression model with dependent disturbances given by a general square integrable semimartingale with unknown distribution. An example of such a noise is non-gaussian Ornstein-Uhlenbeck process with the L\\'evy process subordinator, which is used to model the financial Black-Scholes type markets with jumps. An adaptive model selection procedure, based on the weighted least square estimates, is proposed. Under general moment conditions on the noise distribution, sharp non-asymptotic oracle inequalities for the robust risks have been derived and the robust efficiency of the model selection procedure has been shown.

  8. An Application on Multinomial Logistic Regression Model

    Directory of Open Access Journals (Sweden)

    Abdalla M El-Habil

    2012-03-01

    Full Text Available Normal 0 false false false EN-US X-NONE X-NONE This study aims to identify an application of Multinomial Logistic Regression model which is one of the important methods for categorical data analysis. This model deals with one nominal/ordinal response variable that has more than two categories, whether nominal or ordinal variable. This model has been applied in data analysis in many areas, for example health, social, behavioral, and educational.To identify the model by practical way, we used real data on physical violence against children, from a survey of Youth 2003 which was conducted by Palestinian Central Bureau of Statistics (PCBS. Segment of the population of children in the age group (10-14 years for residents in Gaza governorate, size of 66,935 had been selected, and the response variable consisted of four categories. Eighteen of explanatory variables were used for building the primary multinomial logistic regression model. Model had been tested through a set of statistical tests to ensure its appropriateness for the data. Also the model had been tested by selecting randomly of two observations of the data used to predict the position of each observation in any classified group it can be, by knowing the values of the explanatory variables used. We concluded by using the multinomial logistic regression model that we can able to define accurately the relationship between the group of explanatory variables and the response variable, identify the effect of each of the variables, and we can predict the classification of any individual case.

  9. Regression Models for Count Data in R

    Directory of Open Access Journals (Sweden)

    Christian Kleiber

    2008-06-01

    Full Text Available The classical Poisson, geometric and negative binomial regression models for count data belong to the family of generalized linear models and are available at the core of the statistics toolbox in the R system for statistical computing. After reviewing the conceptual and computational features of these methods, a new implementation of hurdle and zero-inflated regression models in the functions hurdle( and zeroinfl( from the package pscl is introduced. It re-uses design and functionality of the basic R functions just as the underlying conceptual tools extend the classical models. Both hurdle and zero-inflated model, are able to incorporate over-dispersion and excess zeros-two problems that typically occur in count data sets in economics and the social sciences—better than their classical counterparts. Using cross-section data on the demand for medical care, it is illustrated how the classical as well as the zero-augmented models can be fitted, inspected and tested in practice.

  10. General regression and representation model for classification.

    Directory of Open Access Journals (Sweden)

    Jianjun Qian

    Full Text Available Recently, the regularized coding-based classification methods (e.g. SRC and CRC show a great potential for pattern classification. However, most existing coding methods assume that the representation residuals are uncorrelated. In real-world applications, this assumption does not hold. In this paper, we take account of the correlations of the representation residuals and develop a general regression and representation model (GRR for classification. GRR not only has advantages of CRC, but also takes full use of the prior information (e.g. the correlations between representation residuals and representation coefficients and the specific information (weight matrix of image pixels to enhance the classification performance. GRR uses the generalized Tikhonov regularization and K Nearest Neighbors to learn the prior information from the training data. Meanwhile, the specific information is obtained by using an iterative algorithm to update the feature (or image pixel weights of the test sample. With the proposed model as a platform, we design two classifiers: basic general regression and representation classifier (B-GRR and robust general regression and representation classifier (R-GRR. The experimental results demonstrate the performance advantages of proposed methods over state-of-the-art algorithms.

  11. Bayesian Inference of a Multivariate Regression Model

    Directory of Open Access Journals (Sweden)

    Marick S. Sinay

    2014-01-01

    Full Text Available We explore Bayesian inference of a multivariate linear regression model with use of a flexible prior for the covariance structure. The commonly adopted Bayesian setup involves the conjugate prior, multivariate normal distribution for the regression coefficients and inverse Wishart specification for the covariance matrix. Here we depart from this approach and propose a novel Bayesian estimator for the covariance. A multivariate normal prior for the unique elements of the matrix logarithm of the covariance matrix is considered. Such structure allows for a richer class of prior distributions for the covariance, with respect to strength of beliefs in prior location hyperparameters, as well as the added ability, to model potential correlation amongst the covariance structure. The posterior moments of all relevant parameters of interest are calculated based upon numerical results via a Markov chain Monte Carlo procedure. The Metropolis-Hastings-within-Gibbs algorithm is invoked to account for the construction of a proposal density that closely matches the shape of the target posterior distribution. As an application of the proposed technique, we investigate a multiple regression based upon the 1980 High School and Beyond Survey.

  12. Bayesian model selection in Gaussian regression

    CERN Document Server

    Abramovich, Felix

    2009-01-01

    We consider a Bayesian approach to model selection in Gaussian linear regression, where the number of predictors might be much larger than the number of observations. From a frequentist view, the proposed procedure results in the penalized least squares estimation with a complexity penalty associated with a prior on the model size. We investigate the optimality properties of the resulting estimator. We establish the oracle inequality and specify conditions on the prior that imply its asymptotic minimaxity within a wide range of sparse and dense settings for "nearly-orthogonal" and "multicollinear" designs.

  13. Hierarchical linear regression models for conditional quantiles

    Institute of Scientific and Technical Information of China (English)

    TIAN Maozai; CHEN Gemai

    2006-01-01

    The quantile regression has several useful features and therefore is gradually developing into a comprehensive approach to the statistical analysis of linear and nonlinear response models,but it cannot deal effectively with the data with a hierarchical structure.In practice,the existence of such data hierarchies is neither accidental nor ignorable,it is a common phenomenon.To ignore this hierarchical data structure risks overlooking the importance of group effects,and may also render many of the traditional statistical analysis techniques used for studying data relationships invalid.On the other hand,the hierarchical models take a hierarchical data structure into account and have also many applications in statistics,ranging from overdispersion to constructing min-max estimators.However,the hierarchical models are virtually the mean regression,therefore,they cannot be used to characterize the entire conditional distribution of a dependent variable given high-dimensional covariates.Furthermore,the estimated coefficient vector (marginal effects)is sensitive to an outlier observation on the dependent variable.In this article,a new approach,which is based on the Gauss-Seidel iteration and taking a full advantage of the quantile regression and hierarchical models,is developed.On the theoretical front,we also consider the asymptotic properties of the new method,obtaining the simple conditions for an n1/2-convergence and an asymptotic normality.We also illustrate the use of the technique with the real educational data which is hierarchical and how the results can be explained.

  14. Regression Models For Saffron Yields in Iran

    Science.gov (United States)

    S. H, Sanaeinejad; S. N, Hosseini

    Saffron is an important crop in social and economical aspects in Khorassan Province (Northeast of Iran). In this research wetried to evaluate trends of saffron yield in recent years and to study the relationship between saffron yield and the climate change. A regression analysis was used to predict saffron yield based on 20 years of yield data in Birjand, Ghaen and Ferdows cities.Climatologically data for the same periods was provided by database of Khorassan Climatology Center. Climatologically data includedtemperature, rainfall, relative humidity and sunshine hours for ModelI, and temperature and rainfall for Model II. The results showed the coefficients of determination for Birjand, Ferdows and Ghaen for Model I were 0.69, 0.50 and 0.81 respectively. Also coefficients of determination for the same cities for model II were 0.53, 0.50 and 0.72 respectively. Multiple regression analysisindicated that among weather variables, temperature was the key parameter for variation ofsaffron yield. It was concluded that increasing temperature at spring was the main cause of declined saffron yield during recent years across the province. Finally, yield trend was predicted for the last 5 years using time series analysis.

  15. Quantile regression modeling for Malaysian automobile insurance premium data

    Science.gov (United States)

    Fuzi, Mohd Fadzli Mohd; Ismail, Noriszura; Jemain, Abd Aziz

    2015-09-01

    Quantile regression is a robust regression to outliers compared to mean regression models. Traditional mean regression models like Generalized Linear Model (GLM) are not able to capture the entire distribution of premium data. In this paper we demonstrate how a quantile regression approach can be used to model net premium data to study the effects of change in the estimates of regression parameters (rating classes) on the magnitude of response variable (pure premium). We then compare the results of quantile regression model with Gamma regression model. The results from quantile regression show that some rating classes increase as quantile increases and some decrease with decreasing quantile. Further, we found that the confidence interval of median regression (τ = O.5) is always smaller than Gamma regression in all risk factors.

  16. Inferring gene regression networks with model trees

    Directory of Open Access Journals (Sweden)

    Aguilar-Ruiz Jesus S

    2010-10-01

    Full Text Available Abstract Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear

  17. An Adaptive Support Vector Regression Machine for the State Prognosis of Mechanical Systems

    Directory of Open Access Journals (Sweden)

    Qing Zhang

    2015-01-01

    Full Text Available Due to the unsteady state evolution of mechanical systems, the time series of state indicators exhibits volatile behavior and staged characteristics. To model hidden trends and predict deterioration failure utilizing volatile state indicators, an adaptive support vector regression (ASVR machine is proposed. In ASVR, the width of an error-insensitive tube, which is a constant in the traditional support vector regression, is set as a variable determined by the transient distribution boundary of local regions in the training time series. Thus, the localized regions are obtained using a sliding time window, and their boundaries are defined by a robust measure known as the truncated range. Utilizing an adaptive error-insensitive tube, a stabilized tolerance level for noise is achieved, whether the time series occurs in low-volatility regions or in high-volatility regions. The proposed method is evaluated by vibrational data measured on descaling pumps. The results show that ASVR is capable of capturing the local trends of the volatile time series of state indicators and is superior to the standard support vector regression for state prediction.

  18. Two-step variable selection in quantile regression models

    Directory of Open Access Journals (Sweden)

    FAN Yali

    2015-06-01

    Full Text Available We propose a two-step variable selection procedure for high dimensional quantile regressions,in which the dimension of the covariates, pn is much larger than the sample size n. In the first step, we perform l1 penalty, and we demonstrate that the first step penalized estimator with the LASSO penalty can reduce the model from an ultra-high dimensional to a model whose size has the same order as that of the true model, and the selected model can cover the true model. The second step excludes the remained irrelevant covariates by applying the adaptive LASSO penalty to the reduced model obtained from the first step. Under some regularity conditions, we show that our procedure enjoys the model selection consistency. We conduct a simulation study and a real data analysis to evaluate the finite sample performance of the proposed approach.

  19. Introduction to the use of regression models in epidemiology.

    Science.gov (United States)

    Bender, Ralf

    2009-01-01

    Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.

  20. Predictive densities for day-ahead electricity prices using time-adaptive quantile regression

    DEFF Research Database (Denmark)

    Jónsson, Tryggvi; Pinson, Pierre; Madsen, Henrik;

    2014-01-01

    is compared to that of four benchmark approaches and the well-known the generalist autoregressive conditional heteroskedasticity (GARCH) model over a three-year evaluation period. While all benchmarks are outperformed in terms of forecasting skill overall, the superiority of the semi-parametric model over......A large part of the decision-making problems actors of the power system are facing on a daily basis requires scenarios for day-ahead electricity market prices. These scenarios are most likely to be generated based on marginal predictive densities for such prices, then enhanced with a temporal...... dependence structure. A semi-parametric methodology for generating such densities is presented: it includes: (i) a time-adaptive quantile regression model for the 5%–95% quantiles; and (ii) a description of the distribution tails with exponential distributions. The forecasting skill of the proposed model...

  1. Entrepreneurial intention modeling using hierarchical multiple regression

    Directory of Open Access Journals (Sweden)

    Marina Jeger

    2014-12-01

    Full Text Available The goal of this study is to identify the contribution of effectuation dimensions to the predictive power of the entrepreneurial intention model over and above that which can be accounted for by other predictors selected and confirmed in previous studies. As is often the case in social and behavioral studies, some variables are likely to be highly correlated with each other. Therefore, the relative amount of variance in the criterion variable explained by each of the predictors depends on several factors such as the order of variable entry and sample specifics. The results show the modest predictive power of two dimensions of effectuation prior to the introduction of the theory of planned behavior elements. The article highlights the main advantages of applying hierarchical regression in social sciences as well as in the specific context of entrepreneurial intention formation, and addresses some of the potential pitfalls that this type of analysis entails.

  2. A hybrid neural network model for noisy data regression.

    Science.gov (United States)

    Lee, Eric W M; Lim, Chee Peng; Yuen, Richard K K; Lo, S M

    2004-04-01

    A hybrid neural network model, based on the fusion of fuzzy adaptive resonance theory (FA ART) and the general regression neural network (GRNN), is proposed in this paper. Both FA and the GRNN are incremental learning systems and are very fast in network training. The proposed hybrid model, denoted as GRNNFA, is able to retain these advantages and, at the same time, to reduce the computational requirements in calculating and storing information of the kernels. A clustering version of the GRNN is designed with data compression by FA for noise removal. An adaptive gradient-based kernel width optimization algorithm has also been devised. Convergence of the gradient descent algorithm can be accelerated by the geometric incremental growth of the updating factor. A series of experiments with four benchmark datasets have been conducted to assess and compare effectiveness of GRNNFA with other approaches. The GRNNFA model is also employed in a novel application task for predicting the evacuation time of patrons at typical karaoke centers in Hong Kong in the event of fire. The results positively demonstrate the applicability of GRNNFA in noisy data regression problems.

  3. An adaptive online learning approach for Support Vector Regression: Online-SVR-FID

    Science.gov (United States)

    Liu, Jie; Zio, Enrico

    2016-08-01

    Support Vector Regression (SVR) is a popular supervised data-driven approach for building empirical models from available data. Like all data-driven methods, under non-stationary environmental and operational conditions it needs to be provided with adaptive learning capabilities, which might become computationally burdensome with large datasets cumulating dynamically. In this paper, a cost-efficient online adaptive learning approach is proposed for SVR by combining Feature Vector Selection (FVS) and Incremental and Decremental Learning. The proposed approach adaptively modifies the model only when different pattern drifts are detected according to proposed criteria. Two tolerance parameters are introduced in the approach to control the computational complexity, reduce the influence of the intrinsic noise in the data and avoid the overfitting problem of SVR. Comparisons of the prediction results is made with other online learning approaches e.g. NORMA, SOGA, KRLS, Incremental Learning, on several artificial datasets and a real case study concerning time series prediction based on data recorded on a component of a nuclear power generation system. The performance indicators MSE and MARE computed on the test dataset demonstrate the efficiency of the proposed online learning method.

  4. Study on Adaptive Lasso Quantile Regression for Panel Data Models%面板数据的自适应 Lasso 分位回归方法研究

    Institute of Scientific and Technical Information of China (English)

    李子强; 田茂再; 罗幼喜

    2014-01-01

    如何在对参数进行估计的同时自动选择重要解释变量,一直是面板数据分位回归模型中讨论的热点问题之一。通过构造一种含多重随机效应的贝叶斯分层分位回归模型,在假定固定效应系数先验服从一种新的条件Laplace分布的基础上,给出了模型参数估计的Gibbs抽样算法。考虑到不同重要程度的解释变量权重系数压缩程度应该不同,所构造的先验信息具有自适应性的特点,能够准确地对模型中重要解释变量进行自动选取,且设计的切片Gibbs抽样算法能够快速有效地解决模型中各个参数的后验均值估计问题。模拟结果显示,新方法在参数估计精确度和变量选择准确度上均优于现有文献的常用方法。通过对中国各地区多个宏观经济指标的面板数据进行建模分析,演示了新方法估计参数与挑选变量的能力。%How to do parameter estimation and variable selection simultaneously is a hot issue in the study of quantile regression for panel data models .On the base of the assumption that the fixed effect coefficients are subject to a novel conditional Laplace prior ,the paper constructs a hierarchical Bayesian quantile regression model and gives the Gibbs sample algorithm for the unknow n parameter estimation .In consideration of different explain variables should have different shrinkage degree ,the proposed prior has the property of adaptivity ,w hich could select the important explain variables in the model automatically . Furthermore ,the slice Gibbs sample algorithm that the paper proposed is able to estimate the posteriori mean estimation of unknown parameter quickly and efficiently .Monte Carlo simulation study indicates that the proposed method is obviously superior to the existing methods in literatures on the accuracy of parameter estimation and variable selection .Finally ,the paper gives a research of modeling the panel data including several

  5. USING MULTIVARIATE ADAPTIVE REGRESSION SPLINE AND ARTIFICIAL NEURAL NETWORK TO SIMULATE URBANIZATION IN MUMBAI, INDIA

    Directory of Open Access Journals (Sweden)

    M. Ahmadlou

    2015-12-01

    Full Text Available Land use change (LUC models used for modelling urban growth are different in structure and performance. Local models divide the data into separate subsets and fit distinct models on each of the subsets. Non-parametric models are data driven and usually do not have a fixed model structure or model structure is unknown before the modelling process. On the other hand, global models perform modelling using all the available data. In addition, parametric models have a fixed structure before the modelling process and they are model driven. Since few studies have compared local non-parametric models with global parametric models, this study compares a local non-parametric model called multivariate adaptive regression spline (MARS, and a global parametric model called artificial neural network (ANN to simulate urbanization in Mumbai, India. Both models determine the relationship between a dependent variable and multiple independent variables. We used receiver operating characteristic (ROC to compare the power of the both models for simulating urbanization. Landsat images of 1991 (TM and 2010 (ETM+ were used for modelling the urbanization process. The drivers considered for urbanization in this area were distance to urban areas, urban density, distance to roads, distance to water, distance to forest, distance to railway, distance to central business district, number of agricultural cells in a 7 by 7 neighbourhoods, and slope in 1991. The results showed that the area under the ROC curve for MARS and ANN was 94.77% and 95.36%, respectively. Thus, ANN performed slightly better than MARS to simulate urban areas in Mumbai, India.

  6. Using Multivariate Adaptive Regression Spline and Artificial Neural Network to Simulate Urbanization in Mumbai, India

    Science.gov (United States)

    Ahmadlou, M.; Delavar, M. R.; Tayyebi, A.; Shafizadeh-Moghadam, H.

    2015-12-01

    Land use change (LUC) models used for modelling urban growth are different in structure and performance. Local models divide the data into separate subsets and fit distinct models on each of the subsets. Non-parametric models are data driven and usually do not have a fixed model structure or model structure is unknown before the modelling process. On the other hand, global models perform modelling using all the available data. In addition, parametric models have a fixed structure before the modelling process and they are model driven. Since few studies have compared local non-parametric models with global parametric models, this study compares a local non-parametric model called multivariate adaptive regression spline (MARS), and a global parametric model called artificial neural network (ANN) to simulate urbanization in Mumbai, India. Both models determine the relationship between a dependent variable and multiple independent variables. We used receiver operating characteristic (ROC) to compare the power of the both models for simulating urbanization. Landsat images of 1991 (TM) and 2010 (ETM+) were used for modelling the urbanization process. The drivers considered for urbanization in this area were distance to urban areas, urban density, distance to roads, distance to water, distance to forest, distance to railway, distance to central business district, number of agricultural cells in a 7 by 7 neighbourhoods, and slope in 1991. The results showed that the area under the ROC curve for MARS and ANN was 94.77% and 95.36%, respectively. Thus, ANN performed slightly better than MARS to simulate urban areas in Mumbai, India.

  7. Model performance analysis and model validation in logistic regression

    Directory of Open Access Journals (Sweden)

    Rosa Arboretti Giancristofaro

    2007-10-01

    Full Text Available In this paper a new model validation procedure for a logistic regression model is presented. At first, we illustrate a brief review of different techniques of model validation. Next, we define a number of properties required for a model to be considered "good", and a number of quantitative performance measures. Lastly, we describe a methodology for the assessment of the performance of a given model by using an example taken from a management study.

  8. SMOOTH TRANSITION LOGISTIC REGRESSION MODEL TREE

    OpenAIRE

    RODRIGO PINTO MOREIRA

    2008-01-01

    Este trabalho tem como objetivo principal adaptar o modelo STR-Tree, o qual é a combinação de um modelo Smooth Transition Regression com Classification and Regression Tree (CART), a fim de utilizá-lo em Classificação. Para isto algumas alterações foram realizadas em sua forma estrutural e na estimação. Devido ao fato de estarmos fazendo classificação de variáveis dependentes binárias, se faz necessária a utilização das técnicas empregadas em Regressão Logística, dessa forma a estimação dos pa...

  9. Missing pixels restoration for remote sensing images using adaptive search window and linear regression

    Science.gov (United States)

    Tai, Shen-Chuan; Chen, Peng-Yu; Chao, Chian-Yen

    2016-07-01

    The Consultative Committee for Space Data Systems proposed an efficient image compression standard that can do lossless compression (CCSDS-ICS). CCSDS-ICS is the most widely utilized standard for satellite communications. However, the original CCSDS-ICS is weak in terms of error resilience with even a single incorrect bit possibly causing numerous missing pixels. A restoration algorithm based on the neighborhood similar pixel interpolator is proposed to fill in missing pixels. The linear regression model is used to generate the reference image from other panchromatic or multispectral images. Furthermore, an adaptive search window is utilized to sieve out similar pixels from the pixels in the search region defined in the neighborhood similar pixel interpolator. The experimental results show that the proposed methods are capable of reconstructing missing regions with good visual quality.

  10. Logistic Regression Model on Antenna Control Unit Autotracking Mode

    Science.gov (United States)

    2015-10-20

    412TW-PA-15240 Logistic Regression Model on Antenna Control Unit Autotracking Mode DANIEL T. LAIRD AIR FORCE TEST CENTER EDWARDS AFB, CA...OCT 15 4. TITLE AND SUBTITLE Logistic Regression Model on Antenna Control Unit Autotracking Mode 5a. CONTRACT NUMBER 5b. GRANT...alternative-hypothesis. This paper will present an Antenna Auto- tracking model using Logistic Regression modeling. This paper presents an example of

  11. Using multivariate adaptive regression splines to estimate subadult age from diaphyseal dimensions.

    Science.gov (United States)

    Stull, Kyra E; L'Abbé, Ericka N; Ousley, Stephen D

    2014-07-01

    Subadult age estimation is considered the most accurate parameter estimated in a subadult biological profile, even though the methods are deficient and the samples from which they are based are inappropriate. The current study addresses the problems that plague subadult age estimation and creates age estimation models from diaphyseal dimensions of modern children. The sample included 1,310 males and females between the ages of birth and 12 years. Eighteen diaphyseal length and breadth measurements were obtained from Lodox Statscan radiographic images generated at two institutions in Cape Town, South Africa, between 2007 and 2012. Univariate and multivariate age estimation models were created using multivariate adaptive regression splines. k-fold cross-validated 95% prediction intervals (PIs) were created for each model, and the precision of each model was assessed. The diaphyseal length models generated the narrowest PIs (2 months to 6 years) for all univariate models. The majority of multivariate models had PIs that ranged from 3 months to 5 and 6 years. Mean bias approximated 0 for each model, but most models lost precision after 10 years of age. Univariate diaphyseal length models are recommended for younger children, whereas multivariate models are recommended for older children where the inclusion of more variables minimized the size of the PIs. If diaphyseal lengths are not available, multivariate breadth models are recommended. The present study provides applicable age estimation formulae and explores the advantages and disadvantages of different subadult age estimation models using diaphyseal dimensions. Am J Phys Anthropol 154:376-386, 2014. © 2014 Wiley Periodicals, Inc.

  12. Adapted Active Appearance Models

    Directory of Open Access Journals (Sweden)

    Renaud Séguier

    2009-01-01

    Full Text Available Active Appearance Models (AAMs are able to align efficiently known faces under duress, when face pose and illumination are controlled. We propose Adapted Active Appearance Models to align unknown faces in unknown poses and illuminations. Our proposal is based on the one hand on a specific transformation of the active model texture in an oriented map, which changes the AAM normalization process; on the other hand on the research made in a set of different precomputed models related to the most adapted AAM for an unknown face. Tests on public and private databases show the interest of our approach. It becomes possible to align unknown faces in real-time situations, in which light and pose are not controlled.

  13. Applications of Adaptive Elastic Net Pro cedure for Logistic Regression Mo del%Adaptive Elastic Net方法在Logistic回归模型中的应用

    Institute of Scientific and Technical Information of China (English)

    李春红; 黄登香; 戴洪帅

    2015-01-01

    本文将adaptive Elastic Net方法应用于Logistic回归模型,研究并证明其具有Oracle性质,并利用数值模拟及实际例子将其与Lasso、adaptive Lasso、Elastic Net方法的估计结果进行比较,从结果可以看出,adaptive Elastic Net方法效果更优。%In this paper, we consider the adaptive Elastic Net procedure for the Logistic reg-ression model and prove the Oracle property of its estimates. Compared with the Lasso, the adaptive Lasso and the Elastic Net procedure, we obtain that the proposed procedure has good performance, owing to the Oracle property.

  14. Model selection in kernel ridge regression

    DEFF Research Database (Denmark)

    Exterkate, Peter

    2013-01-01

    Kernel ridge regression is a technique to perform ridge regression with a potentially infinite number of nonlinear transformations of the independent variables as regressors. This method is gaining popularity as a data-rich nonlinear forecasting tool, which is applicable in many different contexts....... The influence of the choice of kernel and the setting of tuning parameters on forecast accuracy is investigated. Several popular kernels are reviewed, including polynomial kernels, the Gaussian kernel, and the Sinc kernel. The latter two kernels are interpreted in terms of their smoothing properties......, and the tuning parameters associated to all these kernels are related to smoothness measures of the prediction function and to the signal-to-noise ratio. Based on these interpretations, guidelines are provided for selecting the tuning parameters from small grids using cross-validation. A Monte Carlo study...

  15. Improved Estimation of Earth Rotation Parameters Using the Adaptive Ridge Regression

    Science.gov (United States)

    Huang, Chengli; Jin, Wenjing

    1998-05-01

    The multicollinearity among regression variables is a common phenomenon in the reduction of astronomical data. The phenomenon of multicollinearity and the diagnostic factors are introduced first. As a remedy, a new method, called adaptive ridge regression (ARR), which is an improved method of choosing the departure constant θ in ridge regression, is suggested and applied in a case that the Earth orientation parameters (EOP) are determined by lunar laser ranging (LLR). It is pointed out, via a diagnosis, the variance inflation factors (VIFs), that there exists serious multicollinearity among the regression variables. It is shown that the ARR method is effective in reducing the multicollinearity and makes the regression coefficients more stable than that of using ordinary least squares estimation (LS), especially when there is serious multicollinearity.

  16. A Dirty Model for Multiple Sparse Regression

    CERN Document Server

    Jalali, Ali; Sanghavi, Sujay

    2011-01-01

    Sparse linear regression -- finding an unknown vector from linear measurements -- is now known to be possible with fewer samples than variables, via methods like the LASSO. We consider the multiple sparse linear regression problem, where several related vectors -- with partially shared support sets -- have to be recovered. A natural question in this setting is whether one can use the sharing to further decrease the overall number of samples required. A line of recent research has studied the use of \\ell_1/\\ell_q norm block-regularizations with q>1 for such problems; however these could actually perform worse in sample complexity -- vis a vis solving each problem separately ignoring sharing -- depending on the level of sharing. We present a new method for multiple sparse linear regression that can leverage support and parameter overlap when it exists, but not pay a penalty when it does not. A very simple idea: we decompose the parameters into two components and regularize these differently. We show both theore...

  17. Evolution of an adaptive behavior and its sensory receptors promotes eye regression in blind cavefish

    OpenAIRE

    2012-01-01

    Abstract Background How and why animals lose eyesight during adaptation to the dark and food-limited cave environment has puzzled biologists since the time of Darwin. More recently, several different adaptive hypotheses have been proposed to explain eye degeneration based on studies in the teleost Astyanax mexicanus, which consists of blind cave-dwelling (cavefish) and sighted surface-dwelling (surface fish) forms. One of these hypotheses is that eye regression is the result of indirect selec...

  18. Combining logistic regression and neural networks to create predictive models.

    OpenAIRE

    Spackman, K. A.

    1992-01-01

    Neural networks are being used widely in medicine and other areas to create predictive models from data. The statistical method that most closely parallels neural networks is logistic regression. This paper outlines some ways in which neural networks and logistic regression are similar, shows how a small modification of logistic regression can be used in the training of neural network models, and illustrates the use of this modification for variable selection and predictive model building wit...

  19. Regularized logistic regression with adjusted adaptive elastic net for gene selection in high dimensional cancer classification.

    Science.gov (United States)

    Algamal, Zakariya Yahya; Lee, Muhammad Hisyam

    2015-12-01

    Cancer classification and gene selection in high-dimensional data have been popular research topics in genetics and molecular biology. Recently, adaptive regularized logistic regression using the elastic net regularization, which is called the adaptive elastic net, has been successfully applied in high-dimensional cancer classification to tackle both estimating the gene coefficients and performing gene selection simultaneously. The adaptive elastic net originally used elastic net estimates as the initial weight, however, using this weight may not be preferable for certain reasons: First, the elastic net estimator is biased in selecting genes. Second, it does not perform well when the pairwise correlations between variables are not high. Adjusted adaptive regularized logistic regression (AAElastic) is proposed to address these issues and encourage grouping effects simultaneously. The real data results indicate that AAElastic is significantly consistent in selecting genes compared to the other three competitor regularization methods. Additionally, the classification performance of AAElastic is comparable to the adaptive elastic net and better than other regularization methods. Thus, we can conclude that AAElastic is a reliable adaptive regularized logistic regression method in the field of high-dimensional cancer classification.

  20. Stochastic Approximation Methods for Latent Regression Item Response Models

    Science.gov (United States)

    von Davier, Matthias; Sinharay, Sandip

    2010-01-01

    This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…

  1. Prediction of Rotor Spun Yarn Strength Using Adaptive Neuro-fuzzy Inference System and Linear Multiple Regression Methods

    Institute of Scientific and Technical Information of China (English)

    NURWAHA Deogratias; WANG Xin-hou

    2008-01-01

    This paper presents a comparison study of two models for predicting the strength of rotor spun cotton yarns from fiber properties. The adaptive neuro-fuzzy system inference (ANFIS) and Multiple Linear Regression models are used to predict the rotor spun yarn strength. Fiber properties and yarn count are used as inputs to train the two models and the count-strength-product (CSP) was the target. The predictive performances of the two models are estimated and compared. We found that the ANFIS has a better predictive power in comparison with linear multipleregression model. The impact of each fiber property is also illustrated.

  2. Support Vector Regression Model Based on Empirical Mode Decomposition and Auto Regression for Electric Load Forecasting

    Directory of Open Access Journals (Sweden)

    Hong-Juan Li

    2013-04-01

    Full Text Available Electric load forecasting is an important issue for a power utility, associated with the management of daily operations such as energy transfer scheduling, unit commitment, and load dispatch. Inspired by strong non-linear learning capability of support vector regression (SVR, this paper presents a SVR model hybridized with the empirical mode decomposition (EMD method and auto regression (AR for electric load forecasting. The electric load data of the New South Wales (Australia market are employed for comparing the forecasting performances of different forecasting models. The results confirm the validity of the idea that the proposed model can simultaneously provide forecasting with good accuracy and interpretability.

  3. Multiple Retrieval Models and Regression Models for Prior Art Search

    CERN Document Server

    Lopez, Patrice

    2009-01-01

    This paper presents the system called PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach presents three main characteristics: 1. The usage of multiple retrieval models (KL, Okapi) and term index definitions (lemma, phrase, concept) for the three languages considered in the present track (English, French, German) producing ten different sets of ranked results. 2. The merging of the different results based on multiple regression models using an additional validation set created from the patent collection. 3. The exploitation of patent metadata and of the citation structures for creating restricted initial working sets of patents and for producing a final re-ranking regression model. As we exploit specific metadata of the patent documents and the citation relations only at the creation of initial working sets and during the final post ranking step, our architecture remains generic and easy to extend.

  4. Using AMMI, factorial regression and partial least squares regression models for interpreting genotype x environment interaction.

    NARCIS (Netherlands)

    Vargas, M.; Crossa, J.; Eeuwijk, van F.A.; Ramirez, M.E.; Sayre, K.

    1999-01-01

    Partial least squares (PLS) and factorial regression (FR) are statistical models that incorporate external environmental and/or cultivar variables for studying and interpreting genotype × environment interaction (GEl). The Additive Main effect and Multiplicative Interaction (AMMI) model uses only th

  5. Model Selection in Kernel Ridge Regression

    DEFF Research Database (Denmark)

    Exterkate, Peter

    Kernel ridge regression is gaining popularity as a data-rich nonlinear forecasting tool, which is applicable in many different contexts. This paper investigates the influence of the choice of kernel and the setting of tuning parameters on forecast accuracy. We review several popular kernels......, including polynomial kernels, the Gaussian kernel, and the Sinc kernel. We interpret the latter two kernels in terms of their smoothing properties, and we relate the tuning parameters associated to all these kernels to smoothness measures of the prediction function and to the signal-to-noise ratio. Based...... on these interpretations, we provide guidelines for selecting the tuning parameters from small grids using cross-validation. A Monte Carlo study confirms the practical usefulness of these rules of thumb. Finally, the flexible and smooth functional forms provided by the Gaussian and Sinc kernels makes them widely...

  6. Relative risk regression models with inverse polynomials.

    Science.gov (United States)

    Ning, Yang; Woodward, Mark

    2013-08-30

    The proportional hazards model assumes that the log hazard ratio is a linear function of parameters. In the current paper, we model the log relative risk as an inverse polynomial, which is particularly suitable for modeling bounded and asymmetric functions. The parameters estimated by maximizing the partial likelihood are consistent and asymptotically normal. The advantages of the inverse polynomial model over the ordinary polynomial model and the fractional polynomial model for fitting various asymmetric log relative risk functions are shown by simulation. The utility of the method is further supported by analyzing two real data sets, addressing the specific question of the location of the minimum risk threshold.

  7. Impact of multicollinearity on small sample hydrologic regression models

    Science.gov (United States)

    Kroll, Charles N.; Song, Peter

    2013-06-01

    Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.

  8. Support vector regression model for complex target RCS predicting

    Institute of Scientific and Technical Information of China (English)

    Wang Gu; Chen Weishi; Miao Jungang

    2009-01-01

    The electromagnetic scattering computation has developed rapidly for many years; some computing problems for complex and coated targets cannot be solved by using the existing theory and computing models. A computing model based on data is established for making up the insufficiency of theoretic models. Based on the "support vector regression method", which is formulated on the principle of minimizing a structural risk, a data model to predicate the unknown radar cross section of some appointed targets is given. Comparison between the actual data and the results of this predicting model based on support vector regression method proved that the support vector regression method is workable and with a comparative precision.

  9. ASYMPTOTIC EFFICIENT ESTIMATION IN SEMIPARAMETRIC NONLINEAR REGRESSION MODELS

    Institute of Scientific and Technical Information of China (English)

    ZhuZhongyi; WeiBocheng

    1999-01-01

    In this paper, the estimation method based on the “generalized profile likelihood” for the conditionally parametric models in the paper given by Severini and Wong (1992) is extendedto fixed design semiparametrie nonlinear regression models. For these semiparametrie nonlinear regression models,the resulting estimator of parametric component of the model is shown to beasymptotically efficient and the strong convergence rate of nonparametric component is investigated. Many results (for example Chen (1988) ,Gao & Zhao (1993), Rice (1986) et al. ) are extended to fixed design semiparametric nonlinear regression models.

  10. Nonlinear and Non Normal Regression Models in Physiological Research

    OpenAIRE

    1984-01-01

    Applications of nonlinear and non normal regression models are in increasing order for appropriate interpretation of complex phenomenon of biomedical sciences. This paper reviews critically some applications of these models physiological research.

  11. Rank-preserving regression: a more robust rank regression model against outliers.

    Science.gov (United States)

    Chen, Tian; Kowalski, Jeanne; Chen, Rui; Wu, Pan; Zhang, Hui; Feng, Changyong; Tu, Xin M

    2016-08-30

    Mean-based semi-parametric regression models such as the popular generalized estimating equations are widely used to improve robustness of inference over parametric models. Unfortunately, such models are quite sensitive to outlying observations. The Wilcoxon-score-based rank regression (RR) provides more robust estimates over generalized estimating equations against outliers. However, the RR and its extensions do not sufficiently address missing data arising in longitudinal studies. In this paper, we propose a new approach to address outliers under a different framework based on the functional response models. This functional-response-model-based alternative not only addresses limitations of the RR and its extensions for longitudinal data, but, with its rank-preserving property, even provides more robust estimates than these alternatives. The proposed approach is illustrated with both real and simulated data. Copyright © 2016 John Wiley & Sons, Ltd.

  12. Adaptive Linear and Normalized Combination of Radial Basis Function Networks for Function Approximation and Regression

    Directory of Open Access Journals (Sweden)

    Yunfeng Wu

    2014-01-01

    Full Text Available This paper presents a novel adaptive linear and normalized combination (ALNC method that can be used to combine the component radial basis function networks (RBFNs to implement better function approximation and regression tasks. The optimization of the fusion weights is obtained by solving a constrained quadratic programming problem. According to the instantaneous errors generated by the component RBFNs, the ALNC is able to perform the selective ensemble of multiple leaners by adaptively adjusting the fusion weights from one instance to another. The results of the experiments on eight synthetic function approximation and six benchmark regression data sets show that the ALNC method can effectively help the ensemble system achieve a higher accuracy (measured in terms of mean-squared error and the better fidelity (characterized by normalized correlation coefficient of approximation, in relation to the popular simple average, weighted average, and the Bagging methods.

  13. Adaptive Covariance Estimation with model selection

    CERN Document Server

    Biscay, Rolando; Loubes, Jean-Michel

    2012-01-01

    We provide in this paper a fully adaptive penalized procedure to select a covariance among a collection of models observing i.i.d replications of the process at fixed observation points. For this we generalize previous results of Bigot and al. and propose to use a data driven penalty to obtain an oracle inequality for the estimator. We prove that this method is an extension to the matricial regression model of the work by Baraud.

  14. Evolution of an adaptive behavior and its sensory receptors promotes eye regression in blind cavefish

    Directory of Open Access Journals (Sweden)

    Yoshizawa Masato

    2012-12-01

    Full Text Available Abstract Background How and why animals lose eyesight during adaptation to the dark and food-limited cave environment has puzzled biologists since the time of Darwin. More recently, several different adaptive hypotheses have been proposed to explain eye degeneration based on studies in the teleost Astyanax mexicanus, which consists of blind cave-dwelling (cavefish and sighted surface-dwelling (surface fish forms. One of these hypotheses is that eye regression is the result of indirect selection for constructive characters that are negatively linked to eye development through the pleiotropic effects of Sonic Hedgehog (SHH signaling. However, subsequent genetic analyses suggested that other mechanisms also contribute to eye regression in Astyanax cavefish. Here, we introduce a new approach to this problem by investigating the phenotypic and genetic relationships between a suite of non-visual constructive traits and eye regression. Results Using quantitative genetic analysis of crosses between surface fish, the Pachón cavefish population and their hybrid progeny, we show that the adaptive vibration attraction behavior (VAB and its sensory receptors, superficial neuromasts (SN specifically found within the cavefish eye orbit (EO, are genetically correlated with reduced eye size. The quantitative trait loci (QTL for these three traits form two clusters of congruent or overlapping QTL on Astyanax linkage groups (LG 2 and 17, but not at the shh locus on LG 13. Ablation of EO SN in cavefish demonstrated a major role for these sensory receptors in VAB expression. Furthermore, experimental induction of eye regression in surface fish via shh overexpression showed that the absence of eyes was insufficient to promote the appearance of VAB or EO SN. Conclusions We conclude that natural selection for the enhancement of VAB and EO SN indirectly promotes eye regression in the Pachón cavefish population through an antagonistic relationship involving genetic

  15. Geometric Properties of AR(q) Nonlinear Regression Models

    Institute of Scientific and Technical Information of China (English)

    LIUYing-ar; WEIBo-cheng

    2004-01-01

    This paper is devoted to a study of geometric properties of AR(q) nonlinear regression models. We present geometric frameworks for regression parameter space and autoregression parameter space respectively based on the weighted inner product by fisher information matrix. Several geometric properties related to statistical curvatures are given for the models. The results of this paper extended the work of Bates & Watts(1980,1988)[1.2] and Seber & Wild (1989)[3].

  16. Regression Test-Selection Technique Using Component Model Based Modification: Code to Test Traceability

    Directory of Open Access Journals (Sweden)

    Ahmad A. Saifan

    2016-04-01

    Full Text Available Regression testing is a safeguarding procedure to validate and verify adapted software, and guarantee that no errors have emerged. However, regression testing is very costly when testers need to re-execute all the test cases against the modified software. This paper proposes a new approach in regression test selection domain. The approach is based on meta-models (test models and structured models to decrease the number of test cases to be used in the regression testing process. The approach has been evaluated using three Java applications. To measure the effectiveness of the proposed approach, we compare the results using the re-test to all approaches. The results have shown that our approach reduces the size of test suite without negative impact on the effectiveness of the fault detection.

  17. Robust Depth-Weighted Wavelet for Nonparametric Regression Models

    Institute of Scientific and Technical Information of China (English)

    Lu LIN

    2005-01-01

    In the nonpaxametric regression models, the original regression estimators including kernel estimator, Fourier series estimator and wavelet estimator are always constructed by the weighted sum of data, and the weights depend only on the distance between the design points and estimation points. As a result these estimators are not robust to the perturbations in data. In order to avoid this problem, a new nonparametric regression model, called the depth-weighted regression model, is introduced and then the depth-weighted wavelet estimation is defined. The new estimation is robust to the perturbations in data, which attains very high breakdown value close to 1/2. On the other hand, some asymptotic behaviours such as asymptotic normality are obtained. Some simulations illustrate that the proposed wavelet estimator is more robust than the original wavelet estimator and, as a price to pay for the robustness, the new method is slightly less efficient than the original method.

  18. Credit Scoring Model Hybridizing Artificial Intelligence with Logistic Regression

    Directory of Open Access Journals (Sweden)

    Han Lu

    2013-01-01

    Full Text Available Today the most commonly used techniques for credit scoring are artificial intelligence and statistics. In this paper, we started a new way to use these two kinds of models. Through logistic regression filters the variables with a high degree of correlation, artificial intelligence models reduce complexity and accelerate convergence, while these models hybridizing logistic regression have better explanations in statistically significance, thus improve the effect of artificial intelligence models. With experiments on German data set, we find an interesting phenomenon defined as ‘Dimensional interference’ with support vector machine and from cross validation it can be seen that the new method gives a lot of help with credit scoring.

  19. Group Lasso for high dimensional sparse quantile regression models

    CERN Document Server

    Kato, Kengo

    2011-01-01

    This paper studies the statistical properties of the group Lasso estimator for high dimensional sparse quantile regression models where the number of explanatory variables (or the number of groups of explanatory variables) is possibly much larger than the sample size while the number of variables in "active" groups is sufficiently small. We establish a non-asymptotic bound on the $\\ell_{2}$-estimation error of the estimator. This bound explains situations under which the group Lasso estimator is potentially superior/inferior to the $\\ell_{1}$-penalized quantile regression estimator in terms of the estimation error. We also propose a data-dependent choice of the tuning parameter to make the method more practical, by extending the original proposal of Belloni and Chernozhukov (2011) for the $\\ell_{1}$-penalized quantile regression estimator. As an application, we analyze high dimensional additive quantile regression models. We show that under a set of primitive regularity conditions, the group Lasso estimator c...

  20. Improving sub-pixel imperviousness change prediction by ensembling heterogeneous non-linear regression models

    Science.gov (United States)

    Drzewiecki, Wojciech

    2016-12-01

    In this work nine non-linear regression models were compared for sub-pixel impervious surface area mapping from Landsat images. The comparison was done in three study areas both for accuracy of imperviousness coverage evaluation in individual points in time and accuracy of imperviousness change assessment. The performance of individual machine learning algorithms (Cubist, Random Forest, stochastic gradient boosting of regression trees, k-nearest neighbors regression, random k-nearest neighbors regression, Multivariate Adaptive Regression Splines, averaged neural networks, and support vector machines with polynomial and radial kernels) was also compared with the performance of heterogeneous model ensembles constructed from the best models trained using particular techniques. The results proved that in case of sub-pixel evaluation the most accurate prediction of change may not necessarily be based on the most accurate individual assessments. When single methods are considered, based on obtained results Cubist algorithm may be advised for Landsat based mapping of imperviousness for single dates. However, Random Forest may be endorsed when the most reliable evaluation of imperviousness change is the primary goal. It gave lower accuracies for individual assessments, but better prediction of change due to more correlated errors of individual predictions. Heterogeneous model ensembles performed for individual time points assessments at least as well as the best individual models. In case of imperviousness change assessment the ensembles always outperformed single model approaches. It means that it is possible to improve the accuracy of sub-pixel imperviousness change assessment using ensembles of heterogeneous non-linear regression models.

  1. Joint regression analysis and AMMI model applied to oat improvement

    Science.gov (United States)

    Oliveira, A.; Oliveira, T. A.; Mejza, S.

    2012-09-01

    In our work we present an application of some biometrical methods useful in genotype stability evaluation, namely AMMI model, Joint Regression Analysis (JRA) and multiple comparison tests. A genotype stability analysis of oat (Avena Sativa L.) grain yield was carried out using data of the Portuguese Plant Breeding Board, sample of the 22 different genotypes during the years 2002, 2003 and 2004 in six locations. In Ferreira et al. (2006) the authors state the relevance of the regression models and of the Additive Main Effects and Multiplicative Interactions (AMMI) model, to study and to estimate phenotypic stability effects. As computational techniques we use the Zigzag algorithm to estimate the regression coefficients and the agricolae-package available in R software for AMMI model analysis.

  2. Buffalos milk yield analysis using random regression models

    Directory of Open Access Journals (Sweden)

    A.S. Schierholt

    2010-02-01

    Full Text Available Data comprising 1,719 milk yield records from 357 females (predominantly Murrah breed, daughters of 110 sires, with births from 1974 to 2004, obtained from the Programa de Melhoramento Genético de Bubalinos (PROMEBUL and from records of EMBRAPA Amazônia Oriental - EAO herd, located in Belém, Pará, Brazil, were used to compare random regression models for estimating variance components and predicting breeding values of the sires. The data were analyzed by different models using the Legendre’s polynomial functions from second to fourth orders. The random regression models included the effects of herd-year, month of parity date of the control; regression coefficients for age of females (in order to describe the fixed part of the lactation curve and random regression coefficients related to the direct genetic and permanent environment effects. The comparisons among the models were based on the Akaike Infromation Criterion. The random effects regression model using third order Legendre’s polynomials with four classes of the environmental effect were the one that best described the additive genetic variation in milk yield. The heritability estimates varied from 0.08 to 0.40. The genetic correlation between milk yields in younger ages was close to the unit, but in older ages it was low.

  3. Stability and adaptability of runner peanut genotypes based on nonlinear regression and AMMI analysis

    Directory of Open Access Journals (Sweden)

    Roseane Cavalcanti dos Santos

    2012-08-01

    Full Text Available The objective of this work was to estimate the stability and adaptability of pod and seed yield in runner peanut genotypes based on the nonlinear regression and AMMI analysis. Yield data from 11 trials, distributed in six environments and three harvests, carried out in the Northeast region of Brazil during the rainy season were used. Significant effects of genotypes (G, environments (E, and GE interactions were detected in the analysis, indicating different behaviors among genotypes in favorable and unfavorable environmental conditions. The genotypes BRS Pérola Branca and LViPE‑06 are more stable and adapted to the semiarid environment, whereas LGoPE‑06 is a promising material for pod production, despite being highly dependent on favorable environments.

  4. What Drives Business Model Adaptation?

    DEFF Research Database (Denmark)

    Saebi, Tina; Lien, Lasse B.; Foss, Nicolai Juul

    2016-01-01

    Business models change as managers not only innovate business models, but also engage in more mundane adaptation in response to external changes, such as changes in the level or composition of demand. However, little is known about what causes such business model adaptation. We employ threat......-rigidity as well as prospect theory to examine business model adaptation in response to external threats and opportunities. Additionally, drawing on the behavioural theory of the firm, we argue that the past strategic orientation of a firm creates path dependencies that influence the propensity of the firm...... to adapt its business model. We test our hypotheses on a sample of 1196 Norwegian companies, and find that firms are more likely to adapt their business model under conditions of perceived threats than opportunities, and that strategic orientation geared towards market development is more conducive...

  5. Geographically Weighted Logistic Regression Applied to Credit Scoring Models

    Directory of Open Access Journals (Sweden)

    Pedro Henrique Melo Albuquerque

    Full Text Available Abstract This study used real data from a Brazilian financial institution on transactions involving Consumer Direct Credit (CDC, granted to clients residing in the Distrito Federal (DF, to construct credit scoring models via Logistic Regression and Geographically Weighted Logistic Regression (GWLR techniques. The aims were: to verify whether the factors that influence credit risk differ according to the borrower’s geographic location; to compare the set of models estimated via GWLR with the global model estimated via Logistic Regression, in terms of predictive power and financial losses for the institution; and to verify the viability of using the GWLR technique to develop credit scoring models. The metrics used to compare the models developed via the two techniques were the AICc informational criterion, the accuracy of the models, the percentage of false positives, the sum of the value of false positive debt, and the expected monetary value of portfolio default compared with the monetary value of defaults observed. The models estimated for each region in the DF were distinct in their variables and coefficients (parameters, with it being concluded that credit risk was influenced differently in each region in the study. The Logistic Regression and GWLR methodologies presented very close results, in terms of predictive power and financial losses for the institution, and the study demonstrated viability in using the GWLR technique to develop credit scoring models for the target population in the study.

  6. FUNCTIONAL-COEFFICIENT REGRESSION MODEL AND ITS ESTIMATION

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    In this paper,a class of functional-coefficient regression models is proposed and an estimation procedure based on the locally weighted least equares is suggested. This class of models,with the proposed estimation method,is a powerful means for exploratory data analysis.

  7. CICAAR - Convolutive ICA with an Auto-Regressive Inverse Model

    DEFF Research Database (Denmark)

    Dyrholm, Mads; Hansen, Lars Kai

    2004-01-01

    We invoke an auto-regressive IIR inverse model for convolutive ICA and derive expressions for the likelihood and its gradient. We argue that optimization will give a stable inverse. When there are more sensors than sources the mixing model parameters are estimated in a second step by least squares...

  8. Fitting Additive Binomial Regression Models with the R Package blm

    Directory of Open Access Journals (Sweden)

    Stephanie Kovalchik

    2013-09-01

    Full Text Available The R package blm provides functions for fitting a family of additive regression models to binary data. The included models are the binomial linear model, in which all covariates have additive effects, and the linear-expit (lexpit model, which allows some covariates to have additive effects and other covariates to have logisitc effects. Additive binomial regression is a model of event probability, and the coefficients of linear terms estimate covariate-adjusted risk differences. Thus, in contrast to logistic regression, additive binomial regression puts focus on absolute risk and risk differences. In this paper, we give an overview of the methodology we have developed to fit the binomial linear and lexpit models to binary outcomes from cohort and population-based case-control studies. We illustrate the blm packages methods for additive model estimation, diagnostics, and inference with risk association analyses of a bladder cancer nested case-control study in the NIH-AARP Diet and Health Study.

  9. Sugarcane Land Classification with Satellite Imagery using Logistic Regression Model

    Science.gov (United States)

    Henry, F.; Herwindiati, D. E.; Mulyono, S.; Hendryli, J.

    2017-03-01

    This paper discusses the classification of sugarcane plantation area from Landsat-8 satellite imagery. The classification process uses binary logistic regression method with time series data of normalized difference vegetation index as input. The process is divided into two steps: training and classification. The purpose of training step is to identify the best parameter of the regression model using gradient descent algorithm. The best fit of the model can be utilized to classify sugarcane and non-sugarcane area. The experiment shows high accuracy and successfully maps the sugarcane plantation area which obtained best result of Cohen’s Kappa value 0.7833 (strong) with 89.167% accuracy.

  10. Support Vector Regression-Based Adaptive Divided Difference Filter for Nonlinear State Estimation Problems

    Directory of Open Access Journals (Sweden)

    Hongjian Wang

    2014-01-01

    Full Text Available We present a support vector regression-based adaptive divided difference filter (SVRADDF algorithm for improving the low state estimation accuracy of nonlinear systems, which are typically affected by large initial estimation errors and imprecise prior knowledge of process and measurement noises. The derivative-free SVRADDF algorithm is significantly simpler to compute than other methods and is implemented using only functional evaluations. The SVRADDF algorithm involves the use of the theoretical and actual covariance of the innovation sequence. Support vector regression (SVR is employed to generate the adaptive factor to tune the noise covariance at each sampling instant when the measurement update step executes, which improves the algorithm’s robustness. The performance of the proposed algorithm is evaluated by estimating states for (i an underwater nonmaneuvering target bearing-only tracking system and (ii maneuvering target bearing-only tracking in an air-traffic control system. The simulation results show that the proposed SVRADDF algorithm exhibits better performance when compared with a traditional DDF algorithm.

  11. Locally adaptive regression filter-based infrared focal plane array non-uniformity correction

    Science.gov (United States)

    Li, Jia; Qin, Hanlin; Yan, Xiang; Huang, He; Zhao, Yingjuan; Zhou, Huixin

    2015-10-01

    Due to the limitations of the manufacturing technology, the response rates to the same infrared radiation intensity in each infrared detector unit are not identical. As a result, the non-uniformity of infrared focal plane array, also known as fixed pattern noise (FPN), is generated. To solve this problem, correcting the non-uniformity in infrared image is a promising approach, and many non-uniformity correction (NUC) methods have been proposed. However, they have some defects such as slow convergence, ghosting and scene degradation. To overcome these defects, a novel non-uniformity correction method based on locally adaptive regression filter is proposed. First, locally adaptive regression method is used to separate the infrared image into base layer containing main scene information and the detail layer containing detailed scene with FPN. Then, the detail layer sequence is filtered by non-linear temporal filter to obtain the non-uniformity. Finally, the high quality infrared image is obtained by subtracting non-uniformity component from original image. The experimental results show that the proposed method can significantly eliminate the ghosting and the scene degradation. The results of correction are superior to the THPF-NUC and NN-NUC in the aspects of subjective visual and objective evaluation index.

  12. Direction of Effects in Multiple Linear Regression Models.

    Science.gov (United States)

    Wiedermann, Wolfgang; von Eye, Alexander

    2015-01-01

    Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed.

  13. Maximum Entropy Discrimination Poisson Regression for Software Reliability Modeling.

    Science.gov (United States)

    Chatzis, Sotirios P; Andreou, Andreas S

    2015-11-01

    Reliably predicting software defects is one of the most significant tasks in software engineering. Two of the major components of modern software reliability modeling approaches are: 1) extraction of salient features for software system representation, based on appropriately designed software metrics and 2) development of intricate regression models for count data, to allow effective software reliability data modeling and prediction. Surprisingly, research in the latter frontier of count data regression modeling has been rather limited. More specifically, a lack of simple and efficient algorithms for posterior computation has made the Bayesian approaches appear unattractive, and thus underdeveloped in the context of software reliability modeling. In this paper, we try to address these issues by introducing a novel Bayesian regression model for count data, based on the concept of max-margin data modeling, effected in the context of a fully Bayesian model treatment with simple and efficient posterior distribution updates. Our novel approach yields a more discriminative learning technique, making more effective use of our training data during model inference. In addition, it allows of better handling uncertainty in the modeled data, which can be a significant problem when the training data are limited. We derive elegant inference algorithms for our model under the mean-field paradigm and exhibit its effectiveness using the publicly available benchmark data sets.

  14. An Implementation of Bayesian Adaptive Regression Splines (BARS in C with S and R Wrappers

    Directory of Open Access Journals (Sweden)

    Garrick Wallstrom

    2007-02-01

    Full Text Available BARS (DiMatteo, Genovese, and Kass 2001 uses the powerful reversible-jump MCMC engine to perform spline-based generalized nonparametric regression. It has been shown to work well in terms of having small mean-squared error in many examples (smaller than known competitors, as well as producing visually-appealing fits that are smooth (filtering out high-frequency noise while adapting to sudden changes (retaining high-frequency signal. However, BARS is computationally intensive. The original implementation in S was too slow to be practical in certain situations, and was found to handle some data sets incorrectly. We have implemented BARS in C for the normal and Poisson cases, the latter being important in neurophysiological and other point-process applications. The C implementation includes all needed subroutines for fitting Poisson regression, manipulating B-splines (using code created by Bates and Venables, and finding starting values for Poisson regression (using code for density estimation created by Kooperberg. The code utilizes only freely-available external libraries (LAPACK and BLAS and is otherwise self-contained. We have also provided wrappers so that BARS can be used easily within S or R.

  15. [A Brillouin Scattering Spectrum Feature Extraction Based on Flies Optimization Algorithm with Adaptive Mutation and Generalized Regression Neural Network].

    Science.gov (United States)

    Zhang, Yan-jun; Liu, Wen-zhe; Fu, Xing-hu; Bi, Wei-hong

    2015-10-01

    According to the high precision extracting characteristics of scattering spectrum in Brillouin optical time domain reflection optical fiber sensing system, this paper proposes a new algorithm based on flies optimization algorithm with adaptive mutation and generalized regression neural network. The method takes advantages of the generalized regression neural network which has the ability of the approximation ability, learning speed and generalization of the model. Moreover, by using the strong search ability of flies optimization algorithm with adaptive mutation, it can enhance the learning ability of the neural network. Thus the fitting degree of Brillouin scattering spectrum and the extraction accuracy of frequency shift is improved. Model of actual Brillouin spectrum are constructed by Gaussian white noise on theoretical spectrum, whose center frequency is 11.213 GHz and the linewidths are 40-50, 30-60 and 20-70 MHz, respectively. Comparing the algorithm with the Levenberg-Marquardt fitting method based on finite element analysis, hybrid algorithm particle swarm optimization, Levenberg-Marquardt and the least square method, the maximum frequency shift error of the new algorithm is 0.4 MHz, the fitting degree is 0.991 2 and the root mean square error is 0.024 1. The simulation results show that the proposed algorithm has good fitting degree and minimum absolute error. Therefore, the algorithm can be used on distributed optical fiber sensing system based on Brillouin optical time domain reflection, which can improve the fitting of Brillouin scattering spectrum and the precision of frequency shift extraction effectively.

  16. The art of regression modeling in road safety

    CERN Document Server

    Hauer, Ezra

    2015-01-01

    This unique book explains how to fashion useful regression models from commonly available data to erect models essential for evidence-based road safety management and research. Composed from techniques and best practices presented over many years of lectures and workshops, The Art of Regression Modeling in Road Safety illustrates that fruitful modeling cannot be done without substantive knowledge about the modeled phenomenon. Class-tested in courses and workshops across North America, the book is ideal for professionals, researchers, university professors, and graduate students with an interest in, or responsibilities related to, road safety. This book also: · Presents for the first time a powerful analytical tool for road safety researchers and practitioners · Includes problems and solutions in each chapter as well as data and spreadsheets for running models and PowerPoint presentation slides · Features pedagogy well-suited for graduate courses and workshops including problems, solutions, and PowerPoint p...

  17. Modelling multimodal photometric redshift regression with noisy observations

    CERN Document Server

    Kügler, S D

    2016-01-01

    In this work, we are trying to extent the existing photometric redshift regression models from modeling pure photometric data back to the spectra themselves. To that end, we developed a PCA that is capable of describing the input uncertainty (including missing values) in a dimensionality reduction framework. With this "spectrum generator" at hand, we are capable of treating the redshift regression problem in a fully Bayesian framework, returning a posterior distribution over the redshift. This approach allows therefore to approach the multimodal regression problem in an adequate fashion. In addition, input uncertainty on the magnitudes can be included quite naturally and lastly, the proposed algorithm allows in principle to make predictions outside the training values which makes it a fascinating opportunity for the detection of high-redshifted quasars.

  18. Applications of some discrete regression models for count data

    Directory of Open Access Journals (Sweden)

    B. M. Golam Kibria

    2006-01-01

    Full Text Available In this paper we have considered several regression models to fit the count data that encounter in the field of Biometrical, Environmental, Social Sciences and Transportation Engineering. We have fitted Poisson (PO, Negative Binomial (NB, Zero-Inflated Poisson (ZIP and Zero-Inflated Negative Binomial (ZINB regression models to run-off-road (ROR crash data which collected on arterial roads in south region (rural of Florida State. To compare the performance of these models, we analyzed data with moderate to high percentage of zero counts. Because the variances were almost three times greater than the means, it appeared that both NB and ZINB models performed better than PO and ZIP models for the zero inflated and over dispersed count data.

  19. Iterative Weighted Semiparametric Least Squares Estimation in Repeated Measurement Partially Linear Regression Models

    Institute of Scientific and Technical Information of China (English)

    Ge-mai Chen; Jin-hong You

    2005-01-01

    Consider a repeated measurement partially linear regression model with an unknown vector pasemiparametric generalized least squares estimator (SGLSE) ofβ, we propose an iterative weighted semiparametric least squares estimator (IWSLSE) and show that it improves upon the SGLSE in terms of asymptotic covariance matrix. An adaptive procedure is given to determine the number of iterations. We also show that when the number of replicates is less than or equal to two, the IWSLSE can not improve upon the SGLSE.These results are generalizations of those in [2] to the case of semiparametric regressions.

  20. PARAMETER ESTIMATION IN LINEAR REGRESSION MODELS FOR LONGITUDINAL CONTAMINATED DATA

    Institute of Scientific and Technical Information of China (English)

    QianWeimin; LiYumei

    2005-01-01

    The parameter estimation and the coefficient of contamination for the regression models with repeated measures are studied when its response variables are contaminated by another random variable sequence. Under the suitable conditions it is proved that the estimators which are established in the paper are strongly consistent estimators.

  1. Linearity and Misspecification Tests for Vector Smooth Transition Regression Models

    DEFF Research Database (Denmark)

    Teräsvirta, Timo; Yang, Yukai

    The purpose of the paper is to derive Lagrange multiplier and Lagrange multiplier type specification and misspecification tests for vector smooth transition regression models. We report results from simulation studies in which the size and power properties of the proposed asymptotic tests in small...

  2. Change-point estimation for censored regression model

    Institute of Scientific and Technical Information of China (English)

    Zhan-feng WANG; Yao-hua WU; Lin-cheng ZHAO

    2007-01-01

    In this paper, we consider the change-point estimation in the censored regression model assuming that there exists one change point. A nonparametric estimate of the change-point is proposed and is shown to be strongly consistent. Furthermore, its convergence rate is also obtained.

  3. A regression model to estimate regional ground water recharge.

    Science.gov (United States)

    Lorenz, David L; Delin, Geoffrey N

    2007-01-01

    A regional regression model was developed to estimate the spatial distribution of ground water recharge in subhumid regions. The regional regression recharge (RRR) model was based on a regression of basin-wide estimates of recharge from surface water drainage basins, precipitation, growing degree days (GDD), and average basin specific yield (SY). Decadal average recharge, precipitation, and GDD were used in the RRR model. The RRR estimates were derived from analysis of stream base flow using a computer program that was based on the Rorabaugh method. As expected, there was a strong correlation between recharge and precipitation. The model was applied to statewide data in Minnesota. Where precipitation was least in the western and northwestern parts of the state (50 to 65 cm/year), recharge computed by the RRR model also was lowest (0 to 5 cm/year). A strong correlation also exists between recharge and SY. SY was least in areas where glacial lake clay occurs, primarily in the northwest part of the state; recharge estimates in these areas were in the 0- to 5-cm/year range. In sand-plain areas where SY is greatest, recharge estimates were in the 15- to 29-cm/year range on the basis of the RRR model. Recharge estimates that were based on the RRR model compared favorably with estimates made on the basis of other methods. The RRR model can be applied in other subhumid regions where region wide data sets of precipitation, streamflow, GDD, and soils data are available.

  4. Adaptation of Predictive Models to PDA Hand-Held Devices

    Directory of Open Access Journals (Sweden)

    Lin, Edward J

    2008-01-01

    Full Text Available Prediction models using multiple logistic regression are appearing with increasing frequency in the medical literature. Problems associated with these models include the complexity of computations when applied in their pure form, and lack of availability at the bedside. Personal digital assistant (PDA hand-held devices equipped with spreadsheet software offer the clinician a readily available and easily applied means of applying predictive models at the bedside. The purposes of this article are to briefly review regression as a means of creating predictive models and to describe a method of choosing and adapting logistic regression models to emergency department (ED clinical practice.

  5. Improved Methodology for Parameter Inference in Nonlinear, Hydrologic Regression Models

    Science.gov (United States)

    Bates, Bryson C.

    1992-01-01

    A new method is developed for the construction of reliable marginal confidence intervals and joint confidence regions for the parameters of nonlinear, hydrologic regression models. A parameter power transformation is combined with measures of the asymptotic bias and asymptotic skewness of maximum likelihood estimators to determine the transformation constants which cause the bias or skewness to vanish. These optimized constants are used to construct confidence intervals and regions for the transformed model parameters using linear regression theory. The resulting confidence intervals and regions can be easily mapped into the original parameter space to give close approximations to likelihood method confidence intervals and regions for the model parameters. Unlike many other approaches to parameter transformation, the procedure does not use a grid search to find the optimal transformation constants. An example involving the fitting of the Michaelis-Menten model to velocity-discharge data from an Australian gauging station is used to illustrate the usefulness of the methodology.

  6. On concurvity in nonlinear and nonparametric regression models

    Directory of Open Access Journals (Sweden)

    Sonia Amodio

    2014-12-01

    Full Text Available When data are affected by multicollinearity in the linear regression framework, then concurvity will be present in fitting a generalized additive model (GAM. The term concurvity describes nonlinear dependencies among the predictor variables. As collinearity results in inflated variance of the estimated regression coefficients in the linear regression model, the result of the presence of concurvity leads to instability of the estimated coefficients in GAMs. Even if the backfitting algorithm will always converge to a solution, in case of concurvity the final solution of the backfitting procedure in fitting a GAM is influenced by the starting functions. While exact concurvity is highly unlikely, approximate concurvity, the analogue of multicollinearity, is of practical concern as it can lead to upwardly biased estimates of the parameters and to underestimation of their standard errors, increasing the risk of committing type I error. We compare the existing approaches to detect concurvity, pointing out their advantages and drawbacks, using simulated and real data sets. As a result, this paper will provide a general criterion to detect concurvity in nonlinear and non parametric regression models.

  7. Using regression models to determine the poroelastic properties of cartilage.

    Science.gov (United States)

    Chung, Chen-Yuan; Mansour, Joseph M

    2013-07-26

    The feasibility of determining biphasic material properties using regression models was investigated. A transversely isotropic poroelastic finite element model of stress relaxation was developed and validated against known results. This model was then used to simulate load intensity for a wide range of material properties. Linear regression equations for load intensity as a function of the five independent material properties were then developed for nine time points (131, 205, 304, 390, 500, 619, 700, 800, and 1000s) during relaxation. These equations illustrate the effect of individual material property on the stress in the time history. The equations at the first four time points, as well as one at a later time (five equations) could be solved for the five unknown material properties given computed values of the load intensity. Results showed that four of the five material properties could be estimated from the regression equations to within 9% of the values used in simulation if time points up to 1000s are included in the set of equations. However, reasonable estimates of the out of plane Poisson's ratio could not be found. Although all regression equations depended on permeability, suggesting that true equilibrium was not realized at 1000s of simulation, it was possible to estimate material properties to within 10% of the expected values using equations that included data up to 800s. This suggests that credible estimates of most material properties can be obtained from tests that are not run to equilibrium, which is typically several thousand seconds.

  8. Regression Cloud Models and Their Applications in Energy Consumption of Data Center

    Directory of Open Access Journals (Sweden)

    Yanshuang Zhou

    2015-01-01

    Full Text Available As cloud data center consumes more and more energy, both researchers and engineers aim to minimize energy consumption while keeping its services available. A good energy model can reflect the relationships between running tasks and the energy consumed by hardware and can be further used to schedule tasks for saving energy. In this paper, we analyzed linear and nonlinear regression energy model based on performance counters and system utilization and proposed a support vector regression energy model. For performance counters, we gave a general linear regression framework and compared three linear regression models. For system utilization, we compared our support vector regression model with linear regression and three nonlinear regression models. The experiments show that linear regression model is good enough to model performance counters, nonlinear regression is better than linear regression model for modeling system utilization, and support vector regression model is better than polynomial and exponential regression models.

  9. Support vector regression-based internal model control

    Institute of Scientific and Technical Information of China (English)

    HUANG Yan-wei; PENG Tie-gen

    2007-01-01

    This paper proposes a design of internal model control systems for process with delay by using support vector regression (SVR). The proposed system fully uses the excellent nonlinear estimation performance of SVR with the structural risk minimization principle. Closed-system stability and steady error are analyzed for the existence of modeling errors. The simulations show that the proposed control systems have the better control performance than that by neural networks in the cases of the training samples with small size and noises.

  10. CONSERVATIVE ESTIMATING FUNCTIONIN THE NONLINEAR REGRESSION MODEL WITHAGGREGATED DATA

    Institute of Scientific and Technical Information of China (English)

    2000-01-01

    The purpose of this paper is to study the theory of conservative estimating functions in nonlinear regression model with aggregated data. In this model, a quasi-score function with aggregated data is defined. When this function happens to be conservative, it is projection of the true score function onto a class of estimation functions. By constructing, the potential function for the projected score with aggregated data is obtained, which have some properties of log-likelihood function.

  11. Improved Testing and Specifivations of Smooth Transition Regression Models

    OpenAIRE

    Escribano, Álvaro; Jordá, Óscar

    1997-01-01

    This paper extends previous work in Escribano and Jordá (1997)and introduces new LM specification procedures to choose between Logistic and Exponential Smooth Transition Regression (STR)Models. These procedures are simpler, consistent and more powerful than those previously available in the literature. An analysis of the properties of Taylor approximations around the transition function of STR models permits one to understand why these procedures work better and it suggests ways to improve te...

  12. Regression modeling strategies with applications to linear models, logistic and ordinal regression, and survival analysis

    CERN Document Server

    Harrell , Jr , Frank E

    2015-01-01

    This highly anticipated second edition features new chapters and sections, 225 new references, and comprehensive R software. In keeping with the previous edition, this book is about the art and science of data analysis and predictive modeling, which entails choosing and using multiple tools. Instead of presenting isolated techniques, this text emphasizes problem solving strategies that address the many issues arising when developing multivariable models using real data and not standard textbook examples. It includes imputation methods for dealing with missing data effectively, methods for fitting nonlinear relationships and for making the estimation of transformations a formal part of the modeling process, methods for dealing with "too many variables to analyze and not enough observations," and powerful model validation techniques based on the bootstrap.  The reader will gain a keen understanding of predictive accuracy, and the harm of categorizing continuous predictors or outcomes.  This text realistically...

  13. REGRESSION ANALYSIS OF PRODUCTIVITY USING MIXED EFFECT MODEL

    Directory of Open Access Journals (Sweden)

    Siana Halim

    2007-01-01

    Full Text Available Production plants of a company are located in several areas that spread across Middle and East Java. As the production process employs mostly manpower, we suspected that each location has different characteristics affecting the productivity. Thus, the production data may have a spatial and hierarchical structure. For fitting a linear regression using the ordinary techniques, we are required to make some assumptions about the nature of the residuals i.e. independent, identically and normally distributed. However, these assumptions were rarely fulfilled especially for data that have a spatial and hierarchical structure. We worked out the problem using mixed effect model. This paper discusses the model construction of productivity and several characteristics in the production line by taking location as a random effect. The simple model with high utility that satisfies the necessary regression assumptions was built using a free statistic software R version 2.6.1.

  14. Structural break detection method based on the Adaptive Regression Splines technique

    Science.gov (United States)

    Kucharczyk, Daniel; Wyłomańska, Agnieszka; Zimroz, Radosław

    2017-04-01

    For many real data, long term observation consists of different processes that coexist or occur one after the other. Those processes very often exhibit different statistical properties and thus before the further analysis the observed data should be segmented. This problem one can find in different applications and therefore new segmentation techniques have been appeared in the literature during last years. In this paper we propose a new method of time series segmentation, i.e. extraction from the analysed vector of observations homogeneous parts with similar behaviour. This method is based on the absolute deviation about the median of the signal and is an extension of the previously proposed techniques also based on the simple statistics. In this paper we introduce the method of structural break point detection which is based on the Adaptive Regression Splines technique, one of the form of regression analysis. Moreover we propose also the statistical test which allows testing hypothesis of behaviour related to different regimes. First, the methodology we apply to the simulated signals with different distributions in order to show the effectiveness of the new technique. Next, in the application part we analyse the real data set that represents the vibration signal from a heavy duty crusher used in a mineral processing plant.

  15. Batch Mode Active Learning for Regression With Expected Model Change.

    Science.gov (United States)

    Cai, Wenbin; Zhang, Muhan; Zhang, Ya

    2016-04-20

    While active learning (AL) has been widely studied for classification problems, limited efforts have been done on AL for regression. In this paper, we introduce a new AL framework for regression, expected model change maximization (EMCM), which aims at choosing the unlabeled data instances that result in the maximum change of the current model once labeled. The model change is quantified as the difference between the current model parameters and the updated parameters after the inclusion of the newly selected examples. In light of the stochastic gradient descent learning rule, we approximate the change as the gradient of the loss function with respect to each single candidate instance. Under the EMCM framework, we propose novel AL algorithms for the linear and nonlinear regression models. In addition, by simulating the behavior of the sequential AL policy when applied for k iterations, we further extend the algorithms to batch mode AL to simultaneously choose a set of k most informative instances at each query time. Extensive experimental results on both UCI and StatLib benchmark data sets have demonstrated that the proposed algorithms are highly effective and efficient.

  16. Hierarchical Neural Regression Models for Customer Churn Prediction

    Directory of Open Access Journals (Sweden)

    Golshan Mohammadi

    2013-01-01

    Full Text Available As customers are the main assets of each industry, customer churn prediction is becoming a major task for companies to remain in competition with competitors. In the literature, the better applicability and efficiency of hierarchical data mining techniques has been reported. This paper considers three hierarchical models by combining four different data mining techniques for churn prediction, which are backpropagation artificial neural networks (ANN, self-organizing maps (SOM, alpha-cut fuzzy c-means (α-FCM, and Cox proportional hazards regression model. The hierarchical models are ANN + ANN + Cox, SOM + ANN + Cox, and α-FCM + ANN + Cox. In particular, the first component of the models aims to cluster data in two churner and nonchurner groups and also filter out unrepresentative data or outliers. Then, the clustered data as the outputs are used to assign customers to churner and nonchurner groups by the second technique. Finally, the correctly classified data are used to create Cox proportional hazards model. To evaluate the performance of the hierarchical models, an Iranian mobile dataset is considered. The experimental results show that the hierarchical models outperform the single Cox regression baseline model in terms of prediction accuracy, Types I and II errors, RMSE, and MAD metrics. In addition, the α-FCM + ANN + Cox model significantly performs better than the two other hierarchical models.

  17. Predicting heartbeat arrival time for failure detection over internet using auto-regressive exogenous model

    Institute of Scientific and Technical Information of China (English)

    Zhao Haijun; Ma Yan; Huang Xiaohong; Su Yujie

    2008-01-01

    Predicting heartbeat message arrival time is crucial for the quality of failure detection service over internet. However, internet dynamic characteristics make it very difficult to understand message behavior and accurately predict heartbeat arrival time. To solve this problem, a novel black-box model is proposed to predict the next heartbeat arrival time. Heartbeat arrival time is modeled as auto-regressive process, heartbeat sending time is modeled as exogenous variable, the model's coefficients are estimated based on the sliding window of observations and this result is used to predict the next heartbeat arrival time. Simulation shows that this adaptive auto-regressive exogenous (ARX) model can accurately capture heartbeat arrival dynamics and minimize prediction error in different network environments.

  18. Regression Model to Predict Global Solar Irradiance in Malaysia

    Directory of Open Access Journals (Sweden)

    Hairuniza Ahmed Kutty

    2015-01-01

    Full Text Available A novel regression model is developed to estimate the monthly global solar irradiance in Malaysia. The model is developed based on different available meteorological parameters, including temperature, cloud cover, rain precipitate, relative humidity, wind speed, pressure, and gust speed, by implementing regression analysis. This paper reports on the details of the analysis of the effect of each prediction parameter to identify the parameters that are relevant to estimating global solar irradiance. In addition, the proposed model is compared in terms of the root mean square error (RMSE, mean bias error (MBE, and the coefficient of determination (R2 with other models available from literature studies. Seven models based on single parameters (PM1 to PM7 and five multiple-parameter models (PM7 to PM12 are proposed. The new models perform well, with RMSE ranging from 0.429% to 1.774%, R2 ranging from 0.942 to 0.992, and MBE ranging from −0.1571% to 0.6025%. In general, cloud cover significantly affects the estimation of global solar irradiance. However, cloud cover in Malaysia lacks sufficient influence when included into multiple-parameter models although it performs fairly well in single-parameter prediction models.

  19. Phone Duration Modeling of Affective Speech Using Support Vector Regression

    Directory of Open Access Journals (Sweden)

    Alexandros Lazaridis

    2012-07-01

    Full Text Available In speech synthesis accurate modeling of prosody is important for producing high quality synthetic speech. One of the main aspects of prosody is phone duration. Robust phone duration modeling is a prerequisite for synthesizing emotional speech with natural sounding. In this work ten phone duration models are evaluated. These models belong to well known and widely used categories of algorithms, such as the decision trees, linear regression, lazy-learning algorithms and meta-learning algorithms. Furthermore, we investigate the effectiveness of Support Vector Regression (SVR in phone duration modeling in the context of emotional speech. The evaluation of the eleven models is performed on a Modern Greek emotional speech database which consists of four categories of emotional speech (anger, fear, joy, sadness plus neutral speech. The experimental results demonstrated that the SVR-based modeling outperforms the other ten models across all the four emotion categories. Specifically, the SVR model achieved an average relative reduction of 8% in terms of root mean square error (RMSE throughout all emotional categories.

  20. Predicting and Modelling of Survival Data when Cox's Regression Model does not hold

    DEFF Research Database (Denmark)

    Scheike, Thomas H.; Zhang, Mei-Jie

    2002-01-01

    Aalen model; additive risk model; counting processes; competing risk; Cox regression; flexible modeling; goodness of fit; prediction of survival; survival analysis; time-varying effects......Aalen model; additive risk model; counting processes; competing risk; Cox regression; flexible modeling; goodness of fit; prediction of survival; survival analysis; time-varying effects...

  1. Fuzzy and Regression Modelling of Hard Milling Process

    Directory of Open Access Journals (Sweden)

    A. Tamilarasan

    2014-04-01

    Full Text Available The present study highlights the application of box-behnken design coupled with fuzzy and regression modeling approach for making expert system in hard milling process to improve the process performance with systematic reduction of production cost. The important input fields of work piece hardness, nose radius, feed per tooth, radial depth of cut and axial depth cut were considered. The cutting forces, work surface temperature and sound pressure level were identified as key index of machining outputs. The results indicate that the fuzzy logic and regression modeling technique can be effectively used for the prediction of desired responses with less average error variation. Predicted results were verified by experiments and shown the good potential characteristics of the developed system for automated machining environment.

  2. Forecasting relativistic electron flux using dynamic multiple regression models

    Directory of Open Access Journals (Sweden)

    H.-L. Wei

    2011-02-01

    Full Text Available The forecast of high energy electron fluxes in the radiation belts is important because the exposure of modern spacecraft to high energy particles can result in significant damage to onboard systems. A comprehensive physical model of processes related to electron energisation that can be used for such a forecast has not yet been developed. In the present paper a systems identification approach is exploited to deduce a dynamic multiple regression model that can be used to predict the daily maximum of high energy electron fluxes at geosynchronous orbit from data. It is shown that the model developed provides reliable predictions.

  3. Central limit theorem of linear regression model under right censorship

    Institute of Scientific and Technical Information of China (English)

    HE; Shuyuan(何书元); HUANG; Xiang(Heung; Wong)(黄香)

    2003-01-01

    In this paper, the estimation of joint distribution F(y,z) of (Y, Z) and the estimation in thelinear regression model Y = b′Z + ε for complete data are extended to that of the right censored data. Theregression parameter estimates of b and the variance of ε are weighted least square estimates with randomweights. The central limit theorems of the estimators are obtained under very weak conditions and the derivedasymptotic variance has a very simple form.

  4. GAUSSIAN COPULA MARGINAL REGRESSION FOR MODELING EXTREME DATA WITH APPLICATION

    Directory of Open Access Journals (Sweden)

    Sutikno

    2014-01-01

    Full Text Available Regression is commonly used to determine the relationship between the response variable and the predictor variable, where the parameters are estimated by Ordinary Least Square (OLS. This method can be used with an assumption that residuals are normally distributed (0, σ2. However, the assumption of normality of the data is often violated due to extreme observations, which are often found in the climate data. Modeling of rice harvested area with rainfall predictor variables allows extreme observations. Therefore, another approximation is necessary to be applied in order to overcome the presence of extreme observations. The method used to solve this problem is a Gaussian Copula Marginal Regression (GCMR, the regression-based Copula. As a case study, the method is applied to model rice harvested area of rice production centers in East Java, Indonesia, covering District: Banyuwangi, Lamongan, Bojonegoro, Ngawi and Jember. Copula is chosen because this method is not strict against the assumption distribution, especially the normal distribution. Moreover, this method can describe dependency on extreme point clearly. The GCMR performance will be compared with OLS and Generalized Linear Models (GLM. The identification result of the dependencies structure between the Rice Harvest per period (RH and monthly rainfall showed a dependency in all areas of research. It is shown that the real test copula type mostly follows the Gumbel distribution. While the comparison of the model goodness for rice harvested area in the modeling showed that the method used to model the exact GCMR in five districts RH1 and RH2 in Jember district since its lowest AICc. Looking at the data distribution pattern of response variables, it can be concluded that the GCMR good for modeling the response variable that is not normally distributed and tend to have a large skew.

  5. K factor estimation in distribution transformers using linear regression models

    Directory of Open Access Journals (Sweden)

    Juan Miguel Astorga Gómez

    2016-06-01

    Full Text Available Background: Due to massive incorporation of electronic equipment to distribution systems, distribution transformers are subject to operation conditions other than the design ones, because of the circulation of harmonic currents. It is necessary to quantify the effect produced by these harmonic currents to determine the capacity of the transformer to withstand these new operating conditions. The K-factor is an indicator that estimates the ability of a transformer to withstand the thermal effects caused by harmonic currents. This article presents a linear regression model to estimate the value of the K-factor, from total current harmonic content obtained with low-cost equipment.Method: Two distribution transformers that feed different loads are studied variables, current total harmonic distortion factor K are recorded, and the regression model that best fits the data field is determined. To select the regression model the coefficient of determination R2 and the Akaike Information Criterion (AIC are used. With the selected model, the K-factor is estimated to actual operating conditions.Results: Once determined the model it was found that for both agricultural cargo and industrial mining, present harmonic content (THDi exceeds the values that these transformers can drive (average of 12.54% and minimum 8,90% in the case of agriculture and average value of 18.53% and a minimum of 6.80%, for industrial mining case.Conclusions: When estimating the K factor using polynomial models it was determined that studied transformers can not withstand the current total harmonic distortion of their current loads. The appropriate K factor for studied transformer should be 4; this allows transformers support the current total harmonic distortion of their respective loads.

  6. A New Approach in Regression Analysis for Modeling Adsorption Isotherms

    Directory of Open Access Journals (Sweden)

    Dana D. Marković

    2014-01-01

    Full Text Available Numerous regression approaches to isotherm parameters estimation appear in the literature. The real insight into the proper modeling pattern can be achieved only by testing methods on a very big number of cases. Experimentally, it cannot be done in a reasonable time, so the Monte Carlo simulation method was applied. The objective of this paper is to introduce and compare numerical approaches that involve different levels of knowledge about the noise structure of the analytical method used for initial and equilibrium concentration determination. Six levels of homoscedastic noise and five types of heteroscedastic noise precision models were considered. Performance of the methods was statistically evaluated based on median percentage error and mean absolute relative error in parameter estimates. The present study showed a clear distinction between two cases. When equilibrium experiments are performed only once, for the homoscedastic case, the winning error function is ordinary least squares, while for the case of heteroscedastic noise the use of orthogonal distance regression or Margart’s percent standard deviation is suggested. It was found that in case when experiments are repeated three times the simple method of weighted least squares performed as well as more complicated orthogonal distance regression method.

  7. Modeling the number of car theft using Poisson regression

    Science.gov (United States)

    Zulkifli, Malina; Ling, Agnes Beh Yen; Kasim, Maznah Mat; Ismail, Noriszura

    2016-10-01

    Regression analysis is the most popular statistical methods used to express the relationship between the variables of response with the covariates. The aim of this paper is to evaluate the factors that influence the number of car theft using Poisson regression model. This paper will focus on the number of car thefts that occurred in districts in Peninsular Malaysia. There are two groups of factor that have been considered, namely district descriptive factors and socio and demographic factors. The result of the study showed that Bumiputera composition, Chinese composition, Other ethnic composition, foreign migration, number of residence with the age between 25 to 64, number of employed person and number of unemployed person are the most influence factors that affect the car theft cases. These information are very useful for the law enforcement department, insurance company and car owners in order to reduce and limiting the car theft cases in Peninsular Malaysia.

  8. Regularized multivariate regression models with skew-t error distributions

    KAUST Repository

    Chen, Lianfu

    2014-06-01

    We consider regularization of the parameters in multivariate linear regression models with the errors having a multivariate skew-t distribution. An iterative penalized likelihood procedure is proposed for constructing sparse estimators of both the regression coefficient and inverse scale matrices simultaneously. The sparsity is introduced through penalizing the negative log-likelihood by adding L1-penalties on the entries of the two matrices. Taking advantage of the hierarchical representation of skew-t distributions, and using the expectation conditional maximization (ECM) algorithm, we reduce the problem to penalized normal likelihood and develop a procedure to minimize the ensuing objective function. Using a simulation study the performance of the method is assessed, and the methodology is illustrated using a real data set with a 24-dimensional response vector. © 2014 Elsevier B.V.

  9. Interpreting parameters in the logistic regression model with random effects

    DEFF Research Database (Denmark)

    Larsen, Klaus; Petersen, Jørgen Holm; Budtz-Jørgensen, Esben

    2000-01-01

    interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects......interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects...

  10. Adaptive visual attention model

    OpenAIRE

    Hügli, Heinz; Bur, Alexandre

    2009-01-01

    Visual attention, defined as the ability of a biological or artificial vision system to rapidly detect potentially relevant parts of a visual scene, provides a general purpose solution for low level feature detection in a vision architecture. Well considered for its universal detection behaviour, the general model of visual attention is suited for any environment but inferior to dedicated feature detectors in more specific environments. The goal of the development presented in this paper is t...

  11. Mixed-model Regression for Variable-star Photometry

    Science.gov (United States)

    Dose, Eric

    2016-05-01

    Mixed-model regression, a recent advance from social-science statistics, applies directly to reducing one night's photometric raw data, especially for variable stars in fields with multiple comparison stars. One regression model per filter/passband yields any or all of: transform values, extinction values, nightly zero-points, rapid zero-point fluctuations ("cirrus effect"), ensemble comparisons, vignette and gradient removal arising from incomplete flat-correction, check-star and target-star magnitudes, and specific indications of unusually large catalog magnitude errors. When images from several different fields of view are included, the models improve without complicating the calculations. The mixed-model approach is generally robust to outliers and missing data points, and it directly yields 14 diagnostic plots, used to monitor data set quality and/or residual systematic errors - these diagnostic plots may in fact turn out to be the prime advantage of this approach. Also presented is initial work on a split-annulus approach to sky background estimation, intended to address the sensitivity of photometric observations to noise within the sky-background annulus.

  12. Modeling of the Monthly Rainfall-Runoff Process Through Regressions

    Directory of Open Access Journals (Sweden)

    Campos-Aranda Daniel Francisco

    2014-10-01

    Full Text Available To solve the problems associated with the assessment of water resources of a river, the modeling of the rainfall-runoff process (RRP allows the deduction of runoff missing data and to extend its record, since generally the information available on precipitation is larger. It also enables the estimation of inputs to reservoirs, when their building led to the suppression of the gauging station. The simplest mathematical model that can be set for the RRP is the linear regression or curve on a monthly basis. Such a model is described in detail and is calibrated with the simultaneous record of monthly rainfall and runoff in Ballesmi hydrometric station, which covers 35 years. Since the runoff of this station has an important contribution from the spring discharge, the record is corrected first by removing that contribution. In order to do this a procedure was developed based either on the monthly average regional runoff coefficients or on nearby and similar watershed; in this case the Tancuilín gauging station was used. Both stations belong to the Partial Hydrologic Region No. 26 (Lower Rio Panuco and are located within the state of San Luis Potosi, México. The study performed indicates that the monthly regression model, due to its conceptual approach, faithfully reproduces monthly average runoff volumes and achieves an excellent approximation in relation to the dispersion, proved by calculation of the means and standard deviations.

  13. Genetic evaluation of European quails by random regression models

    Directory of Open Access Journals (Sweden)

    Flaviana Miranda Gonçalves

    2012-09-01

    Full Text Available The objective of this study was to compare different random regression models, defined from different classes of heterogeneity of variance combined with different Legendre polynomial orders for the estimate of (covariance of quails. The data came from 28,076 observations of 4,507 female meat quails of the LF1 lineage. Quail body weights were determined at birth and 1, 14, 21, 28, 35 and 42 days of age. Six different classes of residual variance were fitted to Legendre polynomial functions (orders ranging from 2 to 6 to determine which model had the best fit to describe the (covariance structures as a function of time. According to the evaluated criteria (AIC, BIC and LRT, the model with six classes of residual variances and of sixth-order Legendre polynomial was the best fit. The estimated additive genetic variance increased from birth to 28 days of age, and dropped slightly from 35 to 42 days. The heritability estimates decreased along the growth curve and changed from 0.51 (1 day to 0.16 (42 days. Animal genetic and permanent environmental correlation estimates between weights and age classes were always high and positive, except for birth weight. The sixth order Legendre polynomial, along with the residual variance divided into six classes was the best fit for the growth rate curve of meat quails; therefore, they should be considered for breeding evaluation processes by random regression models.

  14. Stochastic expansions using continuous dictionaries: L\\'{e}vy adaptive regression kernels

    CERN Document Server

    Wolpert, Robert L; Tu, Chong; 10.1214/11-AOS889

    2011-01-01

    This article describes a new class of prior distributions for nonparametric function estimation. The unknown function is modeled as a limit of weighted sums of kernels or generator functions indexed by continuous parameters that control local and global features such as their translation, dilation, modulation and shape. L\\'{e}vy random fields and their stochastic integrals are employed to induce prior distributions for the unknown functions or, equivalently, for the number of kernels and for the parameters governing their features. Scaling, shape, and other features of the generating functions are location-specific to allow quite different function properties in different parts of the space, as with wavelet bases and other methods employing overcomplete dictionaries. We provide conditions under which the stochastic expansions converge in specified Besov or Sobolev norms. Under a Gaussian error model, this may be viewed as a sparse regression problem, with regularization induced via the L\\'{e}vy random field p...

  15. The application of Dynamic Linear Bayesian Models in hydrological forecasting: Varying Coefficient Regression and Discount Weighted Regression

    Science.gov (United States)

    Ciupak, Maurycy; Ozga-Zielinski, Bogdan; Adamowski, Jan; Quilty, John; Khalil, Bahaa

    2015-11-01

    A novel implementation of Dynamic Linear Bayesian Models (DLBM), using either a Varying Coefficient Regression (VCR) or a Discount Weighted Regression (DWR) algorithm was used in the hydrological modeling of annual hydrographs as well as 1-, 2-, and 3-day lead time stream flow forecasting. Using hydrological data (daily discharge, rainfall, and mean, maximum and minimum air temperatures) from the Upper Narew River watershed in Poland, the forecasting performance of DLBM was compared to that of traditional multiple linear regression (MLR) and more recent artificial neural network (ANN) based models. Model performance was ranked DLBM-DWR > DLBM-VCR > MLR > ANN for both annual hydrograph modeling and 1-, 2-, and 3-day lead forecasting, indicating that the DWR and VCR algorithms, operating in a DLBM framework, represent promising new methods for both annual hydrograph modeling and short-term stream flow forecasting.

  16. Fuzzy regression modeling for tool performance prediction and degradation detection.

    Science.gov (United States)

    Li, X; Er, M J; Lim, B S; Zhou, J H; Gan, O P; Rutkowski, L

    2010-10-01

    In this paper, the viability of using Fuzzy-Rule-Based Regression Modeling (FRM) algorithm for tool performance and degradation detection is investigated. The FRM is developed based on a multi-layered fuzzy-rule-based hybrid system with Multiple Regression Models (MRM) embedded into a fuzzy logic inference engine that employs Self Organizing Maps (SOM) for clustering. The FRM converts a complex nonlinear problem to a simplified linear format in order to further increase the accuracy in prediction and rate of convergence. The efficacy of the proposed FRM is tested through a case study - namely to predict the remaining useful life of a ball nose milling cutter during a dry machining process of hardened tool steel with a hardness of 52-54 HRc. A comparative study is further made between four predictive models using the same set of experimental data. It is shown that the FRM is superior as compared with conventional MRM, Back Propagation Neural Networks (BPNN) and Radial Basis Function Networks (RBFN) in terms of prediction accuracy and learning speed.

  17. G/SPLINES: A hybrid of Friedman's Multivariate Adaptive Regression Splines (MARS) algorithm with Holland's genetic algorithm

    Science.gov (United States)

    Rogers, David

    1991-01-01

    G/SPLINES are a hybrid of Friedman's Multivariable Adaptive Regression Splines (MARS) algorithm with Holland's Genetic Algorithm. In this hybrid, the incremental search is replaced by a genetic search. The G/SPLINE algorithm exhibits performance comparable to that of the MARS algorithm, requires fewer least squares computations, and allows significantly larger problems to be considered.

  18. Approximation by randomly weighting method in censored regression model

    Institute of Scientific and Technical Information of China (English)

    2009-01-01

    Censored regression ("Tobit") models have been in common use, and their linear hypothesis testings have been widely studied. However, the critical values of these tests are usually related to quantities of an unknown error distribution and estimators of nuisance parameters. In this paper, we propose a randomly weighting test statistic and take its conditional distribution as an approximation to null distribution of the test statistic. It is shown that, under both the null and local alternative hypotheses, conditionally asymptotic distribution of the randomly weighting test statistic is the same as the null distribution of the test statistic. Therefore, the critical values of the test statistic can be obtained by randomly weighting method without estimating the nuisance parameters. At the same time, we also achieve the weak consistency and asymptotic normality of the randomly weighting least absolute deviation estimate in censored regression model. Simulation studies illustrate that the per-formance of our proposed resampling test method is better than that of central chi-square distribution under the null hypothesis.

  19. Approximation by randomly weighting method in censored regression model

    Institute of Scientific and Technical Information of China (English)

    WANG ZhanFeng; WU YaoHua; ZHAO LinCheng

    2009-01-01

    Censored regression ("Tobit") models have been in common use,and their linear hypothesis testings have been widely studied.However,the critical values of these tests are usually related to quantities of an unknown error distribution and estimators of nuisance parameters.In this paper,we propose a randomly weighting test statistic and take its conditional distribution as an approximation to null distribution of the test statistic.It is shown that,under both the null and local alternative hypotheses,conditionally asymptotic distribution of the randomly weighting test statistic is the same as the null distribution of the test statistic.Therefore,the critical values of the test statistic can be obtained by randomly weighting method without estimating the nuisance parameters.At the same time,we also achieve the weak consistency and asymptotic normality of the randomly weighting least absolute deviation estimate in censored regression model.Simulation studies illustrate that the performance of our proposed resampling test method is better than that of central chi-square distribution under the null hypothesis.

  20. Empirical likelihood ratio tests for multivariate regression models

    Institute of Scientific and Technical Information of China (English)

    WU Jianhong; ZHU Lixing

    2007-01-01

    This paper proposes some diagnostic tools for checking the adequacy of multivariate regression models including classical regression and time series autoregression. In statistical inference, the empirical likelihood ratio method has been well known to be a powerful tool for constructing test and confidence region. For model checking, however, the naive empirical likelihood (EL) based tests are not of Wilks' phenomenon. Hence, we make use of bias correction to construct the EL-based score tests and derive a nonparametric version of Wilks' theorem. Moreover, by the advantages of both the EL and score test method, the EL-based score tests share many desirable features as follows: They are self-scale invariant and can detect the alternatives that converge to the null at rate n-1/2, the possibly fastest rate for lack-of-fit testing; they involve weight functions, which provides us with the flexibility to choose scores for improving power performance, especially under directional alternatives. Furthermore, when the alternatives are not directional, we construct asymptotically distribution-free maximin tests for a large class of possible alternatives. A simulation study is carried out and an application for a real dataset is analyzed.

  1. Regression Models for Predicting Force Coefficients of Aerofoils

    Directory of Open Access Journals (Sweden)

    Mohammed ABDUL AKBAR

    2015-09-01

    Full Text Available Renewable sources of energy are attractive and advantageous in a lot of different ways. Among the renewable energy sources, wind energy is the fastest growing type. Among wind energy converters, Vertical axis wind turbines (VAWTs have received renewed interest in the past decade due to some of the advantages they possess over their horizontal axis counterparts. VAWTs have evolved into complex 3-D shapes. A key component in predicting the output of VAWTs through analytical studies is obtaining the values of lift and drag coefficients which is a function of shape of the aerofoil, ‘angle of attack’ of wind and Reynolds’s number of flow. Sandia National Laboratories have carried out extensive experiments on aerofoils for the Reynolds number in the range of those experienced by VAWTs. The volume of experimental data thus obtained is huge. The current paper discusses three Regression analysis models developed wherein lift and drag coefficients can be found out using simple formula without having to deal with the bulk of the data. Drag coefficients and Lift coefficients were being successfully estimated by regression models with R2 values as high as 0.98.

  2. Meta-Modeling by Symbolic Regression and Pareto Simulated Annealing

    NARCIS (Netherlands)

    Stinstra, E.; Rennen, G.; Teeuwen, G.J.A.

    2006-01-01

    The subject of this paper is a new approach to Symbolic Regression.Other publications on Symbolic Regression use Genetic Programming.This paper describes an alternative method based on Pareto Simulated Annealing.Our method is based on linear regression for the estimation of constants.Interval arithm

  3. Information Criteria for Deciding between Normal Regression Models

    CERN Document Server

    Maier, Robert S

    2013-01-01

    Regression models fitted to data can be assessed on their goodness of fit, though models with many parameters should be disfavored to prevent over-fitting. Statisticians' tools for this are little known to physical scientists. These include the Akaike Information Criterion (AIC), a penalized goodness-of-fit statistic, and the AICc, a variant including a small-sample correction. They entered the physical sciences through being used by astrophysicists to compare cosmological models; e.g., predictions of the distance-redshift relation. The AICc is shown to have been misapplied, being applicable only if error variances are unknown. If error bars accompany the data, the AIC should be used instead. Erroneous applications of the AICc are listed in an appendix. It is also shown how the variability of the AIC difference between models with a known error variance can be estimated. This yields a significance test that can potentially replace the use of `Akaike weights' for deciding between such models. Additionally, the...

  4. Genomic breeding value estimation using nonparametric additive regression models

    Directory of Open Access Journals (Sweden)

    Solberg Trygve

    2009-01-01

    Full Text Available Abstract Genomic selection refers to the use of genomewide dense markers for breeding value estimation and subsequently for selection. The main challenge of genomic breeding value estimation is the estimation of many effects from a limited number of observations. Bayesian methods have been proposed to successfully cope with these challenges. As an alternative class of models, non- and semiparametric models were recently introduced. The present study investigated the ability of nonparametric additive regression models to predict genomic breeding values. The genotypes were modelled for each marker or pair of flanking markers (i.e. the predictors separately. The nonparametric functions for the predictors were estimated simultaneously using additive model theory, applying a binomial kernel. The optimal degree of smoothing was determined by bootstrapping. A mutation-drift-balance simulation was carried out. The breeding values of the last generation (genotyped was predicted using data from the next last generation (genotyped and phenotyped. The results show moderate to high accuracies of the predicted breeding values. A determination of predictor specific degree of smoothing increased the accuracy.

  5. Multiple linear combination (MLC) regression tests for common variants adapted to linkage disequilibrium structure

    Science.gov (United States)

    Yoo, Yun Joo; Sun, Lei; Poirier, Julia G.; Paterson, Andrew D.

    2016-01-01

    ABSTRACT By jointly analyzing multiple variants within a gene, instead of one at a time, gene‐based multiple regression can improve power, robustness, and interpretation in genetic association analysis. We investigate multiple linear combination (MLC) test statistics for analysis of common variants under realistic trait models with linkage disequilibrium (LD) based on HapMap Asian haplotypes. MLC is a directional test that exploits LD structure in a gene to construct clusters of closely correlated variants recoded such that the majority of pairwise correlations are positive. It combines variant effects within the same cluster linearly, and aggregates cluster‐specific effects in a quadratic sum of squares and cross‐products, producing a test statistic with reduced degrees of freedom (df) equal to the number of clusters. By simulation studies of 1000 genes from across the genome, we demonstrate that MLC is a well‐powered and robust choice among existing methods across a broad range of gene structures. Compared to minimum P‐value, variance‐component, and principal‐component methods, the mean power of MLC is never much lower than that of other methods, and can be higher, particularly with multiple causal variants. Moreover, the variation in gene‐specific MLC test size and power across 1000 genes is less than that of other methods, suggesting it is a complementary approach for discovery in genome‐wide analysis. The cluster construction of the MLC test statistics helps reveal within‐gene LD structure, allowing interpretation of clustered variants as haplotypic effects, while multiple regression helps to distinguish direct and indirect associations. PMID:27885705

  6. Predicting strength of recycled aggregate concrete using Artificial Neural Network, Adaptive Neuro-Fuzzy Inference System and Multiple Linear Regression

    Directory of Open Access Journals (Sweden)

    Faezehossadat Khademi

    2016-12-01

    Full Text Available Compressive strength of concrete, recognized as one of the most significant mechanical properties of concrete, is identified as one of the most essential factors for the quality assurance of concrete. In the current study, three different data-driven models, i.e., Artificial Neural Network (ANN, Adaptive Neuro-Fuzzy Inference System (ANFIS, and Multiple Linear Regression (MLR were used to predict the 28 days compressive strength of recycled aggregate concrete (RAC. Recycled aggregate is the current need of the hour owing to its environmental pleasant aspect of re-using the wastes due to construction. 14 different input parameters, including both dimensional and non-dimensional parameters, were used in this study for predicting the 28 days compressive strength of concrete. The present study concluded that estimation of 28 days compressive strength of recycled aggregate concrete was performed better by ANN and ANFIS in comparison to MLR. In other words, comparing the test step of all the three models, it can be concluded that the MLR model is better to be utilized for preliminary mix design of concrete, and ANN and ANFIS models are suggested to be used in the mix design optimization and in the case of higher accuracy necessities. In addition, the performance of data-driven models with and without the non-dimensional parameters is explored. It was observed that the data-driven models show better accuracy when the non-dimensional parameters were used as additional input parameters. Furthermore, the effect of each non-dimensional parameter on the performance of each data-driven model is investigated. Finally, the effect of number of input parameters on 28 days compressive strength of concrete is examined.

  7. A Gompertz regression model for fern spores germination

    Directory of Open Access Journals (Sweden)

    Gabriel y Galán, Jose María

    2015-06-01

    Full Text Available Germination is one of the most important biological processes for both seed and spore plants, also for fungi. At present, mathematical models of germination have been developed in fungi, bryophytes and several plant species. However, ferns are the only group whose germination has never been modelled. In this work we develop a regression model of the germination of fern spores. We have found that for Blechnum serrulatum, Blechnum yungense, Cheilanthes pilosa, Niphidium macbridei and Polypodium feuillei species the Gompertz growth model describe satisfactorily cumulative germination. An important result is that regression parameters are independent of fern species and the model is not affected by intraspecific variation. Our results show that the Gompertz curve represents a general germination model for all the non-green spore leptosporangiate ferns, including in the paper a discussion about the physiological and ecological meaning of the model.La germinación es uno de los procesos biológicos más relevantes tanto para las plantas con esporas, como para las plantas con semillas y los hongos. Hasta el momento, se han desarrollado modelos de germinación para hongos, briofitos y diversas especies de espermatófitos. Los helechos son el único grupo de plantas cuya germinación nunca ha sido modelizada. En este trabajo se desarrolla un modelo de regresión para explicar la germinación de las esporas de helechos. Observamos que para las especies Blechnum serrulatum, Blechnum yungense, Cheilanthes pilosa, Niphidium macbridei y Polypodium feuillei el modelo de crecimiento de Gompertz describe satisfactoriamente la germinación acumulativa. Un importante resultado es que los parámetros de la regresión son independientes de la especie y que el modelo no está afectado por variación intraespecífica. Por lo tanto, los resultados del trabajo muestran que la curva de Gompertz puede representar un modelo general para todos los helechos leptosporangiados

  8. Statistical Inference for Partially Linear Regression Models with Measurement Errors

    Institute of Scientific and Technical Information of China (English)

    Jinhong YOU; Qinfeng XU; Bin ZHOU

    2008-01-01

    In this paper, the authors investigate three aspects of statistical inference for the partially linear regression models where some covariates are measured with errors. Firstly,a bandwidth selection procedure is proposed, which is a combination of the difference-based technique and GCV method. Secondly, a goodness-of-fit test procedure is proposed,which is an extension of the generalized likelihood technique. Thirdly, a variable selection procedure for the parametric part is provided based on the nonconcave penalization and corrected profile least squares. Same as "Variable selection via nonconcave penalized like-lihood and its oracle properties" (J. Amer. Statist. Assoc., 96, 2001, 1348-1360), it is shown that the resulting estimator has an oracle property with a proper choice of regu-larization parameters and penalty function. Simulation studies are conducted to illustrate the finite sample performances of the proposed procedures.

  9. The R Package threg to Implement Threshold Regression Models

    Directory of Open Access Journals (Sweden)

    Tao Xiao

    2015-08-01

    This new package includes four functions: threg, and the methods hr, predict and plot for threg objects returned by threg. The threg function is the model-fitting function which is used to calculate regression coefficient estimates, asymptotic standard errors and p values. The hr method for threg objects is the hazard-ratio calculation function which provides the estimates of hazard ratios at selected time points for specified scenarios (based on given categories or value settings of covariates. The predict method for threg objects is used for prediction. And the plot method for threg objects provides plots for curves of estimated hazard functions, survival functions and probability density functions of the first-hitting-time; function curves corresponding to different scenarios can be overlaid in the same plot for comparison to give additional research insights.

  10. Epistasis analysis for quantitative traits by functional regression model.

    Science.gov (United States)

    Zhang, Futao; Boerwinkle, Eric; Xiong, Momiao

    2014-06-01

    The critical barrier in interaction analysis for rare variants is that most traditional statistical methods for testing interactions were originally designed for testing the interaction between common variants and are difficult to apply to rare variants because of their prohibitive computational time and poor ability. The great challenges for successful detection of interactions with next-generation sequencing (NGS) data are (1) lack of methods for interaction analysis with rare variants, (2) severe multiple testing, and (3) time-consuming computations. To meet these challenges, we shift the paradigm of interaction analysis between two loci to interaction analysis between two sets of loci or genomic regions and collectively test interactions between all possible pairs of SNPs within two genomic regions. In other words, we take a genome region as a basic unit of interaction analysis and use high-dimensional data reduction and functional data analysis techniques to develop a novel functional regression model to collectively test interactions between all possible pairs of single nucleotide polymorphisms (SNPs) within two genome regions. By intensive simulations, we demonstrate that the functional regression models for interaction analysis of the quantitative trait have the correct type 1 error rates and a much better ability to detect interactions than the current pairwise interaction analysis. The proposed method was applied to exome sequence data from the NHLBI's Exome Sequencing Project (ESP) and CHARGE-S study. We discovered 27 pairs of genes showing significant interactions after applying the Bonferroni correction (P-values < 4.58 × 10(-10)) in the ESP, and 11 were replicated in the CHARGE-S study.

  11. Modeling Information Content Via Dirichlet-Multinomial Regression Analysis.

    Science.gov (United States)

    Ferrari, Alberto

    2017-02-16

    Shannon entropy is being increasingly used in biomedical research as an index of complexity and information content in sequences of symbols, e.g. languages, amino acid sequences, DNA methylation patterns and animal vocalizations. Yet, distributional properties of information entropy as a random variable have seldom been the object of study, leading to researchers mainly using linear models or simulation-based analytical approach to assess differences in information content, when entropy is measured repeatedly in different experimental conditions. Here a method to perform inference on entropy in such conditions is proposed. Building on results coming from studies in the field of Bayesian entropy estimation, a symmetric Dirichlet-multinomial regression model, able to deal efficiently with the issue of mean entropy estimation, is formulated. Through a simulation study the model is shown to outperform linear modeling in a vast range of scenarios and to have promising statistical properties. As a practical example, the method is applied to a data set coming from a real experiment on animal communication.

  12. A nonlinear regression model-based predictive control algorithm.

    Science.gov (United States)

    Dubay, R; Abu-Ayyad, M; Hernandez, J M

    2009-04-01

    This paper presents a unique approach for designing a nonlinear regression model-based predictive controller (NRPC) for single-input-single-output (SISO) and multi-input-multi-output (MIMO) processes that are common in industrial applications. The innovation of this strategy is that the controller structure allows nonlinear open-loop modeling to be conducted while closed-loop control is executed every sampling instant. Consequently, the system matrix is regenerated every sampling instant using a continuous function providing a more accurate prediction of the plant. Computer simulations are carried out on nonlinear plants, demonstrating that the new approach is easily implemented and provides tight control. Also, the proposed algorithm is implemented on two real time SISO applications; a DC motor, a plastic injection molding machine and a nonlinear MIMO thermal system comprising three temperature zones to be controlled with interacting effects. The experimental closed-loop responses of the proposed algorithm were compared to a multi-model dynamic matrix controller (MPC) with improved results for various set point trajectories. Good disturbance rejection was attained, resulting in improved tracking of multi-set point profiles in comparison to multi-model MPC.

  13. Regression Models Using Fully Discharged Voltage and Internal Resistance for State of Health Estimation of Lithium-Ion Batteries

    Directory of Open Access Journals (Sweden)

    Kuo-Hsin Tseng

    2015-04-01

    Full Text Available Accurate estimation of lithium-ion battery life is essential to assure the reliable operation of the energy supply system. This study develops regression models for battery prognostics using statistical methods. The resultant regression models can not only monitor a battery’s degradation trend but also accurately predict its remaining useful life (RUL at an early stage. Three sets of test data are employed in the training stage for regression models. Another set of data is then applied to the regression models for validation. The fully discharged voltage (Vdis and internal resistance (R are adopted as aging parameters in two different mathematical models, with polynomial and exponential functions. A particle swarm optimization (PSO process is applied to search for optimal coefficients of the regression models. Simulations indicate that the regression models using Vdis and R as aging parameters can build a real state of health profile more accurately than those using cycle number, N. The Monte Carlo method is further employed to make the models adaptive. The subsequent results, however, show that this results in an insignificant improvement of the battery life prediction. A reasonable speculation is that the PSO process already yields the major model coefficients.

  14. The Analysis of Internet Addiction Scale Using Multivariate Adaptive Regression Splines

    Directory of Open Access Journals (Sweden)

    M Kayri

    2010-12-01

    Full Text Available "nBackground: Determining real effects on internet dependency is too crucial with unbiased and robust statistical method. MARS is a new non-parametric method in use in the literature for parameter estimations of cause and effect based research. MARS can both obtain legible model curves and make unbiased parametric predictions."nMethods: In order to examine the performance of MARS, MARS findings will be compared to Classification and Regres­sion Tree (C&RT findings, which are considered in the literature to be efficient in revealing correlations between variables. The data set for the study is taken from "The Internet Addiction Scale" (IAS, which attempts to reveal addiction levels of individu­als. The population of the study consists of 754 secondary school students (301 female, 443 male students with 10 miss­ing data. MARS 2.0 trial version is used for analysis by MARS method and C&RT analysis was done by SPSS."nResults: MARS obtained six base functions of the model. As a common result of these six functions, regression equation of the model was found. Over the predicted variable, MARS showed that the predictors of daily Internet-use time on average, the purpose of Internet- use, grade of students and occupations of mothers had a significant effect (P< 0.05. In this compara­tive study, MARS obtained different findings from C&RT in dependency level prediction."nConclusion: The fact that MARS revealed extent to which the variable, which was considered significant, changes the charac­ter of the model was observed in this study.

  15. Modeling Pan Evaporation for Kuwait by Multiple Linear Regression

    Directory of Open Access Journals (Sweden)

    Jaber Almedeij

    2012-01-01

    Full Text Available Evaporation is an important parameter for many projects related to hydrology and water resources systems. This paper constitutes the first study conducted in Kuwait to obtain empirical relations for the estimation of daily and monthly pan evaporation as functions of available meteorological data of temperature, relative humidity, and wind speed. The data used here for the modeling are daily measurements of substantial continuity coverage, within a period of 17 years between January 1993 and December 2009, which can be considered representative of the desert climate of the urban zone of the country. Multiple linear regression technique is used with a procedure of variable selection for fitting the best model forms. The correlations of evaporation with temperature and relative humidity are also transformed in order to linearize the existing curvilinear patterns of the data by using power and exponential functions, respectively. The evaporation models suggested with the best variable combinations were shown to produce results that are in a reasonable agreement with observation values.

  16. Air Pollution Analysis using Ontologies and Regression Models

    Directory of Open Access Journals (Sweden)

    Parul Choudhary

    2016-07-01

    Full Text Available Rapidly throughout the world economy, "the expansive Web" in the "world" explosive growth, rapidly growing market characterized by short product cycles exists and the demand for increased flexibility as well as the extensive use of a new data vision managed data society. A new socio-economic system that relies more and more on movement and allocation results in data whose daily existence, refinement, economy and adjust the exchange industry. Cooperative Engineering Co -operation and multi -disciplinary installed on people's cooperation is a good example. Semantic Web is a new form of Web content that is meaningful to computers and additional approved another example. Communication, vision sharing and exchanging data Society's are new commercial bet. Urban air pollution modeling and data processing techniques need elevated Association. Artificial intelligence in countless ways and breakthrough technologies can solve environmental problems from uneven offers. A method for data to formal ontology means a true meaning and lack of ambiguity to allow us to portray memo. In this work we survey regression model for ontologies and air pollution.

  17. High dimensional linear regression models under long memory dependence and measurement error

    Science.gov (United States)

    Kaul, Abhishek

    This dissertation consists of three chapters. The first chapter introduces the models under consideration and motivates problems of interest. A brief literature review is also provided in this chapter. The second chapter investigates the properties of Lasso under long range dependent model errors. Lasso is a computationally efficient approach to model selection and estimation, and its properties are well studied when the regression errors are independent and identically distributed. We study the case, where the regression errors form a long memory moving average process. We establish a finite sample oracle inequality for the Lasso solution. We then show the asymptotic sign consistency in this setup. These results are established in the high dimensional setup (p> n) where p can be increasing exponentially with n. Finally, we show the consistency, n½ --d-consistency of Lasso, along with the oracle property of adaptive Lasso, in the case where p is fixed. Here d is the memory parameter of the stationary error sequence. The performance of Lasso is also analysed in the present setup with a simulation study. The third chapter proposes and investigates the properties of a penalized quantile based estimator for measurement error models. Standard formulations of prediction problems in high dimension regression models assume the availability of fully observed covariates and sub-Gaussian and homogeneous model errors. This makes these methods inapplicable to measurement errors models where covariates are unobservable and observations are possibly non sub-Gaussian and heterogeneous. We propose weighted penalized corrected quantile estimators for the regression parameter vector in linear regression models with additive measurement errors, where unobservable covariates are nonrandom. The proposed estimators forgo the need for the above mentioned model assumptions. We study these estimators in both the fixed dimension and high dimensional sparse setups, in the latter setup, the

  18. Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.

    Science.gov (United States)

    Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko

    2016-03-01

    In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique.

  19. Regression model for tuning the PID controller with fractional order time delay system

    OpenAIRE

    S.P. Agnihotri; Laxman Madhavrao Waghmare

    2014-01-01

    In this paper a regression model based for tuning proportional integral derivative (PID) controller with fractional order time delay system is proposed. The novelty of this paper is that tuning parameters of the fractional order time delay system are optimally predicted using the regression model. In the proposed method, the output parameters of the fractional order system are used to derive the regression function. Here, the regression model depends on the weights of the exponential function...

  20. MODELING SNAKE MICROHABITAT FROM RADIOTELEMETRY STUDIES USING POLYTOMOUS LOGISTIC REGRESSION

    Science.gov (United States)

    Multivariate analysis of snake microhabitat has historically used techniques that were derived under assumptions of normality and common covariance structure (e.g., discriminant function analysis, MANOVA). In this study, polytomous logistic regression (PLR which does not require ...

  1. Extending the linear model with R generalized linear, mixed effects and nonparametric regression models

    CERN Document Server

    Faraway, Julian J

    2005-01-01

    Linear models are central to the practice of statistics and form the foundation of a vast range of statistical methodologies. Julian J. Faraway''s critically acclaimed Linear Models with R examined regression and analysis of variance, demonstrated the different methods available, and showed in which situations each one applies. Following in those footsteps, Extending the Linear Model with R surveys the techniques that grow from the regression model, presenting three extensions to that framework: generalized linear models (GLMs), mixed effect models, and nonparametric regression models. The author''s treatment is thoroughly modern and covers topics that include GLM diagnostics, generalized linear mixed models, trees, and even the use of neural networks in statistics. To demonstrate the interplay of theory and practice, throughout the book the author weaves the use of the R software environment to analyze the data of real examples, providing all of the R commands necessary to reproduce the analyses. All of the ...

  2. Linking Simple Economic Theory Models and the Cointegrated Vector AutoRegressive Model

    DEFF Research Database (Denmark)

    Møller, Niels Framroze

    This paper attempts to clarify the connection between simple economic theory models and the approach of the Cointegrated Vector-Auto-Regressive model (CVAR). By considering (stylized) examples of simple static equilibrium models, it is illustrated in detail, how the theoretical model and its...

  3. Regression of retinopathy by squalamine in a mouse model.

    Science.gov (United States)

    Higgins, Rosemary D; Yan, Yun; Geng, Yixun; Zasloff, Michael; Williams, Jon I

    2004-07-01

    The goal of this study was to determine whether an antiangiogenic agent, squalamine, given late during the evolution of oxygen-induced retinopathy (OIR) in the mouse, could improve retinal neovascularization. OIR was induced in neonatal C57BL6 mice and the neonates were treated s.c. with squalamine doses begun at various times after OIR induction. A system of retinal whole mounts and assessment of neovascular nuclei extending beyond the inner limiting membrane from animals reared under room air or OIR conditions and killed periodically from d 12 to 21 were used to assess retinopathy in squalamine-treated and untreated animals. OIR evolved after 75% oxygen exposure in neonatal mice with florid retinal neovascularization developing by d 14. Squalamine (single dose, 25 mg/kg s.c.) given on d 15 or 16, but not d 17, substantially improved retinal neovascularization in the mouse model of OIR. There was improvement seen in the degree of blood vessel tuft formation, blood vessel tortuosity, and central vasoconstriction with squalamine treatment at d 15 or 16. Single-dose squalamine at d 12 was effective at reducing subsequent development of retinal neovascularization at doses as low as 1 mg/kg. Squalamine is a very active inhibitor of OIR in mouse neonates at doses as low as 1 mg/kg given once. Further, squalamine given late in the course of OIR improves retinopathy by inducing regression of retinal neovessels and abrogating invasion of new vessels beyond the inner-limiting membrane of the retina.

  4. USING MULTIVARIATE ADAPTIVE REGRESSION SPLINE AND ARTIFICIAL NEURAL NETWORK TO SIMULATE URBANIZATION IN MUMBAI, INDIA

    OpenAIRE

    M. Ahmadlou; M. R. Delavar; Tayyebi, A.; H. Shafizadeh-Moghadam

    2015-01-01

    Land use change (LUC) models used for modelling urban growth are different in structure and performance. Local models divide the data into separate subsets and fit distinct models on each of the subsets. Non-parametric models are data driven and usually do not have a fixed model structure or model structure is unknown before the modelling process. On the other hand, global models perform modelling using all the available data. In addition, parametric models have a fixed structure before the m...

  5. A Bayesian Nonparametric Causal Model for Regression Discontinuity Designs

    Science.gov (United States)

    Karabatsos, George; Walker, Stephen G.

    2013-01-01

    The regression discontinuity (RD) design (Thistlewaite & Campbell, 1960; Cook, 2008) provides a framework to identify and estimate causal effects from a non-randomized design. Each subject of a RD design is assigned to the treatment (versus assignment to a non-treatment) whenever her/his observed value of the assignment variable equals or…

  6. VARIABLE SELECTION BY PSEUDO WAVELETS IN HETEROSCEDASTIC REGRESSION MODELS INVOLVING TIME SERIES

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    A simple but efficient method has been proposed to select variables in heteroscedastic regression models. It is shown that the pseudo empirical wavelet coefficients corresponding to the significant explanatory variables in the regression models are clearly larger than those nonsignificant ones, on the basis of which a procedure is developed to select variables in regression models. The coefficients of the models are also estimated. All estimators are proved to be consistent.

  7. A generalized additive regression model for survival times

    DEFF Research Database (Denmark)

    Scheike, Thomas H.

    2001-01-01

    Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models......Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models...

  8. First Look at Photometric Reduction via Mixed-Model Regression (Poster abstract)

    Science.gov (United States)

    Dose, E.

    2016-12-01

    (Abstract only) Mixed-model regression is proposed as a new approach to photometric reduction, especially for variable-star photometry in several filters. Mixed-model regression adds to normal multivariate regression certain "random effects": categorical-variable terms that model and extract specific systematic errors such as image-to-image zero-point fluctuations (cirrus effect) or even errors in comp-star catalog magnitudes.

  9. Data to support "Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations & Biological Condition"

    Data.gov (United States)

    U.S. Environmental Protection Agency — Spreadsheets are included here to support the manuscript "Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition". This...

  10. Spatial Double Generalized Beta Regression Models: Extensions and Application to Study Quality of Education in Colombia

    Science.gov (United States)

    Cepeda-Cuervo, Edilberto; Núñez-Antón, Vicente

    2013-01-01

    In this article, a proposed Bayesian extension of the generalized beta spatial regression models is applied to the analysis of the quality of education in Colombia. We briefly revise the beta distribution and describe the joint modeling approach for the mean and dispersion parameters in the spatial regression models' setting. Finally, we…

  11. A note on the maximum likelihood estimator in the gamma regression model

    Directory of Open Access Journals (Sweden)

    Jerzy P. Rydlewski

    2009-01-01

    Full Text Available This paper considers a nonlinear regression model, in which the dependent variable has the gamma distribution. A model is considered in which the shape parameter of the random variable is the sum of continuous and algebraically independent functions. The paper proves that there is exactly one maximum likelihood estimator for the gamma regression model.

  12. Stochastic Approximation Methods for Latent Regression Item Response Models. Research Report. ETS RR-09-09

    Science.gov (United States)

    von Davier, Matthias; Sinharay, Sandip

    2009-01-01

    This paper presents an application of a stochastic approximation EM-algorithm using a Metropolis-Hastings sampler to estimate the parameters of an item response latent regression model. Latent regression models are extensions of item response theory (IRT) to a 2-level latent variable model in which covariates serve as predictors of the…

  13. Spatial Double Generalized Beta Regression Models: Extensions and Application to Study Quality of Education in Colombia

    Science.gov (United States)

    Cepeda-Cuervo, Edilberto; Núñez-Antón, Vicente

    2013-01-01

    In this article, a proposed Bayesian extension of the generalized beta spatial regression models is applied to the analysis of the quality of education in Colombia. We briefly revise the beta distribution and describe the joint modeling approach for the mean and dispersion parameters in the spatial regression models' setting. Finally, we motivate…

  14. A nonparametric dynamic additive regression model for longitudinal data

    DEFF Research Database (Denmark)

    Martinussen, Torben; Scheike, Thomas H.

    2000-01-01

    dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models......dynamic linear models, estimating equations, least squares, longitudinal data, nonparametric methods, partly conditional mean models, time-varying-coefficient models...

  15. 基于多元自适应回归样条的青藏高原温泉区域的冻土分布制图%Modeling Permafrost Distribution in Wenquan Area over Qinghai-Tibet Plateau by Using Multivariate Adaptive Regression Splines

    Institute of Scientific and Technical Information of China (English)

    张秀敏; 南卓铜; 吴吉春; 杜二计; 王通; 游艳辉

    2011-01-01

    以探地雷达、电磁测深、钻探等技术方法获得野外数据及数字高程(DEM)遥感数据为基础,通过聚类分析和相关性分析对高程、坡度、坡向等因素对多年冻土分布的影响进行了定量化研究.利用非线性的多元自适应回归样条(MARS)方法建立了基于高程、太阳辐射的多年冻土分布模型,通过自身的交叉验证及对比年平均地温模型和逻辑回归模型的总体分类精度,说明MARS模型具有较好的分类精度.运用MARS模型模拟了整个温泉区域冻土的空间分布特征.结果表明:MARS模型分类精度较高,验证了此模型模拟温泉区域冻土分布的可行性;此模型除了考虑高程对对多年冻土分布的控制作用外,还体现了太阳辐射这一局地综合因素对多年冻土分布的调整作用,较好地模拟了高程相对较低的低山区多年冻土的存在.%In order to understand the distribution patterns of permafrost in the Wenquan area on the Qinghai-Tibet Plateau,the effects of altitude,slope and aspect and other topo-climatic factors on the distribution of permafrost were studied,using the correlation analysis with digital elevation(DEM) data,borehole observations and measures from ground penetrating radar(GPR) and the electromagnetic sounding method.A permafrost distribution model based on the nonlinear multiple adaptive regression splines(MARS) method was developed,taking elevation and direct solar radiation as variables.Five-fold cross validation shows that this model has a good simulation capability in describing the permafrost distribution spatial pattern in the study area.Applying the model to the study area indicates that in the Wenquan area there is 1 881 km2 of permafrost area,accounting for 76% of the total Wenquan area.The MARS model is better than the mean annual ground temperature model and the logistic model,because the MARS model takes into account not only elevation,the predominantly

  16. Noise Reduction and Gap Filling of fAPAR Time Series Using an Adapted Local Regression Filter

    Directory of Open Access Journals (Sweden)

    Álvaro Moreno

    2014-08-01

    Full Text Available Time series of remotely sensed data are an important source of information for understanding land cover dynamics. In particular, the fraction of absorbed photosynthetic active radiation (fAPAR is a key variable in the assessment of vegetation primary production over time. However, the fAPAR series derived from polar orbit satellites are not continuous and consistent in space and time. Filtering methods are thus required to fill in gaps and produce high-quality time series. This study proposes an adapted (iteratively reweighted local regression filter (LOESS and performs a benchmarking intercomparison with four popular and generally applicable smoothing methods: Double Logistic (DLOG, smoothing spline (SSP, Interpolation for Data Reconstruction (IDR and adaptive Savitzky-Golay (ASG. This paper evaluates the main advantages and drawbacks of the considered techniques. The results have shown that ASG and the adapted LOESS perform better in recovering fAPAR time series over multiple controlled noisy scenarios. Both methods can robustly reconstruct the fAPAR trajectories, reducing the noise up to 80% in the worst simulation scenario, which might be attributed to the quality control (QC MODIS information incorporated into these filtering algorithms, their flexibility and adaptation to the upper envelope. The adapted LOESS is particularly resistant to outliers. This method clearly outperforms the other considered methods to deal with the high presence of gaps and noise in satellite data records. The low RMSE and biases obtained with the LOESS method (|rMBE| < 8%; rRMSE < 20% reveals an optimal reconstruction even in most extreme situations with long seasonal gaps. An example of application of the LOESS method to fill in invalid values in real MODIS images presenting persistent cloud and snow coverage is also shown. The LOESS approach is recommended in most remote sensing applications, such as gap-filling, cloud-replacement, and observing temporal

  17. Quantile regression

    CERN Document Server

    Hao, Lingxin

    2007-01-01

    Quantile Regression, the first book of Hao and Naiman's two-book series, establishes the seldom recognized link between inequality studies and quantile regression models. Though separate methodological literature exists for each subject, the authors seek to explore the natural connections between this increasingly sought-after tool and research topics in the social sciences. Quantile regression as a method does not rely on assumptions as restrictive as those for the classical linear regression; though more traditional models such as least squares linear regression are more widely utilized, Hao

  18. An assessment of coefficient accuracy in linear regression models with spatially varying coefficients

    Science.gov (United States)

    Wheeler, David C.; Calder, Catherine A.

    2007-06-01

    The realization in the statistical and geographical sciences that a relationship between an explanatory variable and a response variable in a linear regression model is not always constant across a study area has led to the development of regression models that allow for spatially varying coefficients. Two competing models of this type are geographically weighted regression (GWR) and Bayesian regression models with spatially varying coefficient processes (SVCP). In the application of these spatially varying coefficient models, marginal inference on the regression coefficient spatial processes is typically of primary interest. In light of this fact, there is a need to assess the validity of such marginal inferences, since these inferences may be misleading in the presence of explanatory variable collinearity. In this paper, we present the results of a simulation study designed to evaluate the sensitivity of the spatially varying coefficients in the competing models to various levels of collinearity. The simulation study results show that the Bayesian regression model produces more accurate inferences on the regression coefficients than does GWR. In addition, the Bayesian regression model is overall fairly robust in terms of marginal coefficient inference to moderate levels of collinearity, and degrades less substantially than GWR with strong collinearity.

  19. Moment-bases estimation of smooth transition regression models with endogenous variables

    NARCIS (Netherlands)

    W.D. Areosa (Waldyr Dutra); M.J. McAleer (Michael); M.C. Medeiros (Marcelo)

    2008-01-01

    textabstractNonlinear regression models have been widely used in practice for a variety of time series and cross-section datasets. For purposes of analyzing univariate and multivariate time series data, in particular, Smooth Transition Regression (STR) models have been shown to be very useful for re

  20. Climate variations and salmonellosis transmission in Adelaide, South Australia: a comparison between regression models

    Science.gov (United States)

    Zhang, Ying; Bi, Peng; Hiller, Janet

    2008-01-01

    This is the first study to identify appropriate regression models for the association between climate variation and salmonellosis transmission. A comparison between different regression models was conducted using surveillance data in Adelaide, South Australia. By using notified salmonellosis cases and climatic variables from the Adelaide metropolitan area over the period 1990-2003, four regression methods were examined: standard Poisson regression, autoregressive adjusted Poisson regression, multiple linear regression, and a seasonal autoregressive integrated moving average (SARIMA) model. Notified salmonellosis cases in 2004 were used to test the forecasting ability of the four models. Parameter estimation, goodness-of-fit and forecasting ability of the four regression models were compared. Temperatures occurring 2 weeks prior to cases were positively associated with cases of salmonellosis. Rainfall was also inversely related to the number of cases. The comparison of the goodness-of-fit and forecasting ability suggest that the SARIMA model is better than the other three regression models. Temperature and rainfall may be used as climatic predictors of salmonellosis cases in regions with climatic characteristics similar to those of Adelaide. The SARIMA model could, thus, be adopted to quantify the relationship between climate variations and salmonellosis transmission.

  1. Local asymptotic behavior of regression splines for marginal semiparametric models with longitudinal data

    Institute of Scientific and Technical Information of China (English)

    2009-01-01

    In this paper, we study the local asymptotic behavior of the regression spline estimator in the framework of marginal semiparametric model. Similarly to Zhu, Fung and He (2008), we give explicit expression for the asymptotic bias of regression spline estimator for nonparametric function f. Our results also show that the asymptotic bias of the regression spline estimator does not depend on the working covariance matrix, which distinguishes the regression splines from the smoothing splines and the seemingly unrelated kernel. To understand the local bias result of the regression spline estimator, we show that the regression spline estimator can be obtained iteratively by applying the standard weighted least squares regression spline estimator to pseudo-observations. At each iteration, the bias of the estimator is unchanged and only the variance is updated.

  2. A diversified portfolio model of adaptability.

    Science.gov (United States)

    Chandra, Siddharth; Leong, Frederick T L

    2016-12-01

    A new model of adaptability, the diversified portfolio model (DPM) of adaptability, is introduced. In the 1950s, Markowitz developed the financial portfolio model by demonstrating that investors could optimize the ratio of risk and return on their portfolios through risk diversification. The DPM integrates attractive features of a variety of models of adaptability, including Linville's self-complexity model, the risk and resilience model, and Bandura's social cognitive theory. The DPM draws on the concept of portfolio diversification, positing that diversified investment in multiple life experiences, life roles, and relationships promotes positive adaptation to life's challenges. The DPM provides a new integrative model of adaptability across the biopsychosocial levels of functioning. More importantly, the DPM addresses a gap in the literature by illuminating the antecedents of adaptive processes studied in a broad array of psychological models. The DPM is described in relation to the biopsychosocial model and propositions are offered regarding its utility in increasing adaptiveness. Recommendations for future research are also offered. (PsycINFO Database Record

  3. Introduction to mixed modelling beyond regression and analysis of variance

    CERN Document Server

    Galwey, N W

    2007-01-01

    Mixed modelling is one of the most promising and exciting areas of statistical analysis, enabling more powerful interpretation of data through the recognition of random effects. However, many perceive mixed modelling as an intimidating and specialized technique.

  4. Inference of gene regulatory networks from genetic perturbations with linear regression model.

    Directory of Open Access Journals (Sweden)

    Zijian Dong

    Full Text Available It is an effective strategy to use both genetic perturbation data and gene expression data to infer regulatory networks that aims to improve the detection accuracy of the regulatory relationships among genes. Based on both types of data, the genetic regulatory networks can be accurately modeled by Structural Equation Modeling (SEM. In this paper, a linear regression (LR model is formulated based on the SEM, and a novel iterative scheme using Bayesian inference is proposed to estimate the parameters of the LR model (LRBI. Comparative evaluations of LRBI with other two algorithms, the Adaptive Lasso (AL-Based and the Sparsity-aware Maximum Likelihood (SML, are also presented. Simulations show that LRBI has significantly better performance than AL-Based, and overperforms SML in terms of power of detection. Applying the LRBI algorithm to experimental data, we inferred the interactions in a network of 35 yeast genes. An open-source program of the LRBI algorithm is freely available upon request.

  5. Accounting for exhaust gas transport dynamics in instantaneous emission models via smooth transition regression.

    Science.gov (United States)

    Kamarianakis, Yiannis; Gao, H Oliver

    2010-02-15

    Collecting and analyzing high frequency emission measurements has become very usual during the past decade as significantly more information with respect to formation conditions can be collected than from regulated bag measurements. A challenging issue for researchers is the accurate time-alignment between tailpipe measurements and engine operating variables. An alignment procedure should take into account both the reaction time of the analyzers and the dynamics of gas transport in the exhaust and measurement systems. This paper discusses a statistical modeling framework that compensates for variable exhaust transport delay while relating tailpipe measurements with engine operating covariates. Specifically it is shown that some variants of the smooth transition regression model allow for transport delays that vary smoothly as functions of the exhaust flow rate. These functions are characterized by a pair of coefficients that can be estimated via a least-squares procedure. The proposed models can be adapted to encompass inherent nonlinearities that were implicit in previous instantaneous emissions modeling efforts. This article describes the methodology and presents an illustrative application which uses data collected from a diesel bus under real-world driving conditions.

  6. Predicting Antitumor Activity of Peptides by Consensus of Regression Models Trained on a Small Data Sample

    Directory of Open Access Journals (Sweden)

    Ivanka Jerić

    2011-11-01

    Full Text Available Predicting antitumor activity of compounds using regression models trained on a small number of compounds with measured biological activity is an ill-posed inverse problem. Yet, it occurs very often within the academic community. To counteract, up to some extent, overfitting problems caused by a small training data, we propose to use consensus of six regression models for prediction of biological activity of virtual library of compounds. The QSAR descriptors of 22 compounds related to the opioid growth factor (OGF, Tyr-Gly-Gly-Phe-Met with known antitumor activity were used to train regression models: the feed-forward artificial neural network, the k-nearest neighbor, sparseness constrained linear regression, the linear and nonlinear (with polynomial and Gaussian kernel support vector machine. Regression models were applied on a virtual library of 429 compounds that resulted in six lists with candidate compounds ranked by predicted antitumor activity. The highly ranked candidate compounds were synthesized, characterized and tested for an antiproliferative activity. Some of prepared peptides showed more pronounced activity compared with the native OGF; however, they were less active than highly ranked compounds selected previously by the radial basis function support vector machine (RBF SVM regression model. The ill-posedness of the related inverse problem causes unstable behavior of trained regression models on test data. These results point to high complexity of prediction based on the regression models trained on a small data sample.

  7. Asymptotic Normality of LS Estimate in Simple Linear EV Regression Model

    Institute of Scientific and Technical Information of China (English)

    Jixue LIU

    2006-01-01

    Though EV model is theoretically more appropriate for applications in which measurement errors exist, people are still more inclined to use the ordinary regression models and the traditional LS method owing to the difficulties of statistical inference and computation. So it is meaningful to study the performance of LS estimate in EV model.In this article we obtain general conditions guaranteeing the asymptotic normality of the estimates of regression coefficients in the linear EV model. It is noticeable that the result is in some way different from the corresponding result in the ordinary regression model.

  8. Logistic Regression Models to Forecast Travelling Behaviour in Tripoli City

    Directory of Open Access Journals (Sweden)

    Amiruddin Ismail

    2011-01-01

    Full Text Available Transport modes are very important to Libyan’s Tripoli residents for their daily trips. However, the total number of own car and private transport namely taxi and micro buses on the road increases and causes many problems such as traffic congestion, accidents, air and noise pollution. These problems then causes other related phenomena to the travel activities such as delay in trips, stress and frustration to motorists which may affect their productivity and efficiency to both workers and students. Delay may also increase travel cost as well inefficiency in trips making if compare to other public transport users in some Arabs cities. Switching to public transport (PT modes alternatives such as buses, light rail transit and underground train could improve travel time and travel costs. A transport study has been carried out at Tripoli City Authority areas among own car users who live in areas with inadequate of private transport and poor public transportation services. Analyses about relation between factors such as travel time, travel cost, trip purpose and parking cost have been made to answer research questions. Logistic regression technique has been used to analyse these factors that influence users to switch their trips mode to public transport alternatives.

  9. Teacher training through the Regression Model in foreign language education

    Directory of Open Access Journals (Sweden)

    Jesús García Laborda

    2011-01-01

    Full Text Available In the last few years, Spain has seen dramatic changes in its educational system. Many of them have been rejected by most teachers after their implementation (LOGSE while others have found potential drawbacks even before starting operating (LOCE, LOE. To face these changes, schools need well qualified instructors. Given this need, and also considering that, although all the schools want the best teachers but, as teachers’ salaries are regulated by the state, few schools can actually offer incentives to their teachers and consequently schools never have the instructors they wish. Apart from this, state schools have a fixed salary for their teachers and private institutions offer no additional bonuses for things like additional training or diplomas (for example, masters or post-degree courses and, therefore, teachers are rarely interested in pursuing any further studies in methodology or any other related fields such as education or applied linguistics. Although many teachers acknowledge their love to teaching, the current situation in schools (school violence, bad salaries, depression, social desprestige, legal changes and so has made the teaching job one of the most complicated and undevoted in Spain. It is not unusual to have a couple of instructors ill due to depression and other psychological sicknesses. This paper deals with the development and implementation of a training program based on regressive visualizations of one’s experience both as a teacher as well as a learner.

  10. An alternative approach to the ground motion prediction problem by a non-parametric adaptive regression method

    Science.gov (United States)

    Yerlikaya-Özkurt, Fatma; Askan, Aysegul; Weber, Gerhard-Wilhelm

    2014-12-01

    Ground Motion Prediction Equations (GMPEs) are empirical relationships which are used for determining the peak ground response at a particular distance from an earthquake source. They relate the peak ground responses as a function of earthquake source type, distance from the source, local site conditions where the data are recorded and finally the depth and magnitude of the earthquake. In this article, a new prediction algorithm, called Conic Multivariate Adaptive Regression Splines (CMARS), is employed on an available dataset for deriving a new GMPE. CMARS is based on a special continuous optimization technique, conic quadratic programming. These convex optimization problems are very well-structured, resembling linear programs and, hence, permitting the use of interior point methods. The CMARS method is performed on the strong ground motion database of Turkey. Results are compared with three other GMPEs. CMARS is found to be effective for ground motion prediction purposes.

  11. Unobtrusive user modeling for adaptive hypermedia

    NARCIS (Netherlands)

    Holz, H.J.; Hofmann, K.; Reed, C.; Uchyigit, G.; Ma, M.Y.

    2008-01-01

    We propose a technique for user modeling in Adaptive Hypermedia (AH) that is unobtrusive at both the level of observable behavior and that of cognition. Unobtrusive user modeling is complementary to transparent user modeling. Unobtrusive user modeling induces user models appropriate for Educational

  12. Modeling by regression for laser cutting of quartz crystal

    Institute of Scientific and Technical Information of China (English)

    2000-01-01

    Presents the theoretical models built by analysis of the mechanism of laser cutting of quartz crystal and re gression of test results for the laser cutting of quartz crystal, and comparative analysis of calculation errors for these models, and concludes with test results that these models comprehensively reflect the physical features of laser cutting of quartz crystal and satisfy the industrial production requirements, and they can be used to select right parameters for improvement of productivity and quality and saving of energy.

  13. Crime Trend Prediction Using Regression Models for Salinas, California

    Science.gov (United States)

    2012-06-01

    supervision  Lack of positive adult male role models  Multi-generational gang families  Lack of effective teacher/ student attachments with at-risk...to many police departments adopting crime analysis techniques, finally concluding in the creation of the International Association of Crime Analysts...Model Goodness-of-fit for a Poisson model is measured using the residual deviance instead of R2 or the residual standard error used in linear

  14. Reduction of the curvature of a class of nonlinear regression models

    Institute of Scientific and Technical Information of China (English)

    吴翊; 易东云

    2000-01-01

    It is proved that the curvature of nonlinear model can be reduced to zero by increasing measured data for a class of nonlinear regression models. The result is important to actual problem and has obtained satisfying effect on data fusing.

  15. Additive Intensity Regression Models in Corporate Default Analysis

    DEFF Research Database (Denmark)

    Lando, David; Medhat, Mamdouh; Nielsen, Mads Stenbo

    2013-01-01

    We consider additive intensity (Aalen) models as an alternative to the multiplicative intensity (Cox) models for analyzing the default risk of a sample of rated, nonfinancial U.S. firms. The setting allows for estimating and testing the significance of time-varying effects. We use a variety of mo...

  16. Misspecified poisson regression models for large-scale registry data

    DEFF Research Database (Denmark)

    Grøn, Randi; Gerds, Thomas A.; Andersen, Per K.

    2016-01-01

    working models that are then likely misspecified. To support and improve conclusions drawn from such models, we discuss methods for sensitivity analysis, for estimation of average exposure effects using aggregated data, and a semi-parametric bootstrap method to obtain robust standard errors. The methods...

  17. A Noncentral "t" Regression Model for Meta-Analysis

    Science.gov (United States)

    Camilli, Gregory; de la Torre, Jimmy; Chiu, Chia-Yi

    2010-01-01

    In this article, three multilevel models for meta-analysis are examined. Hedges and Olkin suggested that effect sizes follow a noncentral "t" distribution and proposed several approximate methods. Raudenbush and Bryk further refined this model; however, this procedure is based on a normal approximation. In the current research literature, this…

  18. CONSISTENCY OF LS ESTIMATOR IN SIMPLE LINEAR EV REGRESSION MODELS

    Institute of Scientific and Technical Information of China (English)

    Liu Jixue; Chen Xiru

    2005-01-01

    Consistency of LS estimate of simple linear EV model is studied. It is shown that under some common assumptions of the model, both weak and strong consistency of the estimate are equivalent but it is not so for quadratic-mean consistency.

  19. A Negative Binomial Regression Model for Accuracy Tests

    Science.gov (United States)

    Hung, Lai-Fa

    2012-01-01

    Rasch used a Poisson model to analyze errors and speed in reading tests. An important property of the Poisson distribution is that the mean and variance are equal. However, in social science research, it is very common for the variance to be greater than the mean (i.e., the data are overdispersed). This study embeds the Rasch model within an…

  20. Comparative analysis of regression and artificial neural network models for wind speed prediction

    Science.gov (United States)

    Bilgili, Mehmet; Sahin, Besir

    2010-11-01

    In this study, wind speed was modeled by linear regression (LR), nonlinear regression (NLR) and artificial neural network (ANN) methods. A three-layer feedforward artificial neural network structure was constructed and a backpropagation algorithm was used for the training of ANNs. To get a successful simulation, firstly, the correlation coefficients between all of the meteorological variables (wind speed, ambient temperature, atmospheric pressure, relative humidity and rainfall) were calculated taking two variables in turn for each calculation. All independent variables were added to the simple regression model. Then, the method of stepwise multiple regression was applied for the selection of the “best” regression equation (model). Thus, the best independent variables were selected for the LR and NLR models and also used in the input layer of the ANN. The results obtained by all methods were compared to each other. Finally, the ANN method was found to provide better performance than the LR and NLR methods.

  1. A generalized exponential time series regression model for electricity prices

    DEFF Research Database (Denmark)

    Haldrup, Niels; Knapik, Oskar; Proietti, Tomasso

    We consider the issue of modeling and forecasting daily electricity spot prices on the Nord Pool Elspot power market. We propose a method that can handle seasonal and non-seasonal persistence by modelling the price series as a generalized exponential process. As the presence of spikes can distort...... the estimation of the dynamic structure of the series we consider an iterative estimation strategy which, conditional on a set of parameter estimates, clears the spikes using a data cleaning algorithm, and reestimates the parameters using the cleaned data so as to robustify the estimates. Conditional...... on the estimated model, the best linear predictor is constructed. Our modeling approach provides good fit within sample and outperforms competing benchmark predictors in terms of forecasting accuracy. We also find that building separate models for each hour of the day and averaging the forecasts is a better...

  2. Multivariable Linear Regression Model for Promotional Forecasting:The Coca Cola - Morrisons Case

    OpenAIRE

    Zheng, Yiwei/Y

    2009-01-01

    This paper describes a promotional forecasting model, built by linear regression module in Microsoft Excel. It intends to provide quick and reliable forecasts with a moderate credit and to assist the CPFR between the Coca Cola Enterprises (CCE) and the Morrisons. The model is derived from previous researches and literature review on CPFR, promotion, forecasting and modelling. It is designed as a multivariable linear regression model, which involves several promotional mix as variables includi...

  3. Multiscale regression model to infer historical temperatures in a central Mediterranean sub-regional area

    Directory of Open Access Journals (Sweden)

    N. Diodato

    2010-12-01

    Full Text Available To reconstruct sub-regional European climate over the past centuries, several efforts have been made using historical datasets. However, only scattered information at low spatial and temporal resolution have been produced to date for the Mediterranean area. This paper has exploited, for Southern and Central Italy (Mediterranean Sub-Regional Area, an unprecedented historical dataset as an attempt to model seasonal (winter and summer air temperatures in pre-instrumental time (back to 1500. Combining information derived from proxy documentary data and large-scale simulation, a statistical methodology in the form of multiscale-temperature regression (MTR-model was developed to adapt larger-scale estimations to the sub-regional temperature pattern. The modelled response lacks essentially of autocorrelations among the residuals (marginal or any significance in the Durbin-Watson statistic, and agrees well with the independent data from the validation sample (Nash-Sutcliffe efficiency coefficient >0.60. The advantage of the approach is not merely increased accuracy in estimation. Rather, it relies on the ability to extract (and exploit the right information to replicate coherent temperature series in historical times.

  4. Multiple models adaptive feedforward decoupling controller

    Institute of Scientific and Technical Information of China (English)

    Wang Xin; Li Shaoyuan; Wang Zhongjie

    2005-01-01

    When the parameters of the system change abruptly, a new multivariable adaptive feedforward decoupling controller using multiple models is presented to improve the transient response. The system models are composed of multiple fixed models, one free-running adaptive model and one re-initialized adaptive model. The fixed models are used to provide initial control to the process. The re-initialized adaptive model can be reinitialized as the selected model to improve the adaptation speed. The free-running adaptive controller is added to guarantee the overall system stability. At each instant, the best system model is selected according to the switching index and the corresponding controller is designed. During the controller design, the interaction is viewed as the measurable disturbance and eliminated by the choice of the weighting polynomial matrix. It not only eliminates the steady-state error but also decouples the system dynamically. The global convergence is obtained and several simulation examples are presented to illustrate the effectiveness of the proposed controller.

  5. Logistic regression models for polymorphic and antagonistic pleiotropic gene action on human aging and longevity

    DEFF Research Database (Denmark)

    Tan, Qihua; Bathum, L; Christiansen, L

    2003-01-01

    In this paper, we apply logistic regression models to measure genetic association with human survival for highly polymorphic and pleiotropic genes. By modelling genotype frequency as a function of age, we introduce a logistic regression model with polytomous responses to handle the polymorphic...... situation. Genotype and allele-based parameterization can be used to investigate the modes of gene action and to reduce the number of parameters, so that the power is increased while the amount of multiple testing minimized. A binomial logistic regression model with fractional polynomials is used to capture...

  6. STATISTICAL INFERENCES FOR VARYING-COEFFICINT MODELS BASED ON LOCALLY WEIGHTED REGRESSION TECHNIQUE

    Institute of Scientific and Technical Information of China (English)

    梅长林; 张文修; 梁怡

    2001-01-01

    Some fundamental issues on statistical inferences relating to varying-coefficient regression models are addressed and studied. An exact testing procedure is proposed for checking the goodness of fit of a varying-coefficient model fired by the locally weighted regression technique versus an ordinary linear regression model. Also, an appropriate statistic for testing variation of model parameters over the locations where the observations are collected is constructed and a formal testing approach which is essential to exploring spatial non-stationarity in geography science is suggested.

  7. Genetic parameters for tunisian holsteins using a test-day random regression model.

    Science.gov (United States)

    Hammami, H; Rekik, B; Soyeurt, H; Ben Gara, A; Gengler, N

    2008-05-01

    Genetic parameters of milk, fat, and protein yields were estimated in the first 3 lactations for registered Tunisian Holsteins. Data included 140,187; 97,404; and 62,221 test-day production records collected on 22,538; 15,257; and 9,722 first-, second-, and third-parity cows, respectively. Records were of cows calving from 1992 to 2004 in 96 herds. (Co)variance components were estimated by Bayesian methods and a 3-trait-3-lactation random regression model. Gibbs sampling was used to obtain posterior distributions. The model included herd x test date, age x season of calving x stage of lactation [classes of 25 days in milk (DIM)], production sector x stage of lactation (classes of 5 DIM) as fixed effects, and random regression coefficients for additive genetic, permanent environmental, and herd-year of calving effects, which were defined as modified constant, linear, and quadratic Legendre coefficients. Heritability estimates for 305-d milk, fat and protein yields were moderate (0.12 to 0.18) and in the same range of parameters estimated in management systems with low to medium production levels. Heritabilities of test-day milk and protein yields for selected DIM were higher in the middle than at the beginning or the end of lactation. Inversely, heritabilities of fat yield were high at the peripheries of lactation. Genetic correlations among 305-d yield traits ranged from 0.50 to 0.86. The largest genetic correlation was observed between the first and second lactation, potentially due to the limited expression of genetic potential of superior cows in later lactations. Results suggested a lack of adaptation under the local management and climatic conditions. Results should be useful to implement a BLUP evaluation for the Tunisian cow population; however, results also indicated that further research focused on data quality might be needed.

  8. Prediction of the result in race walking using regularized regression models

    Directory of Open Access Journals (Sweden)

    Krzysztof Przednowek

    2013-04-01

    Full Text Available The following paper presents the use of regularized linear models as tools to optimize training process. The models were calculated by using data collected from race-walkers' training events. The models used predict the outcomes over a 3 km race and following a prescribed training plan. The material included a total of 122 training patterns made by 21 players. The methods of analysis include: classical model of OLS regression, ridge regression, LASSO regression and elastic net regression. In order to compare and choose the best method a cross-validation of the extit{leave-one-out} was used. All models were calculated using R language with additional packages. The best model was determined by the LASSO method which generates an error of about 26 seconds. The method has simplified the structure of the model by eliminating 5 out of 18 predictors.

  9. Linking Simple Economic Theory Models and the Cointegrated Vector AutoRegressive Model

    DEFF Research Database (Denmark)

    Møller, Niels Framroze

    This paper attempts to clarify the connection between simple economic theory models and the approach of the Cointegrated Vector-Auto-Regressive model (CVAR). By considering (stylized) examples of simple static equilibrium models, it is illustrated in detail, how the theoretical model and its stru....... Further fundamental extensions and advances to more sophisticated theory models, such as those related to dynamics and expectations (in the structural relations) are left for future papers......This paper attempts to clarify the connection between simple economic theory models and the approach of the Cointegrated Vector-Auto-Regressive model (CVAR). By considering (stylized) examples of simple static equilibrium models, it is illustrated in detail, how the theoretical model and its......, it is demonstrated how other controversial hypotheses such as Rational Expectations can be formulated directly as restrictions on the CVAR-parameters. A simple example of a "Neoclassical synthetic" AS-AD model is also formulated. Finally, the partial- general equilibrium distinction is related to the CVAR as well...

  10. Mechanisms of Developmental Regression in Autism and the Broader Phenotype: A Neural Network Modeling Approach

    Science.gov (United States)

    Thomas, Michael S. C.; Knowland, Victoria C. P.; Karmiloff-Smith, Annette

    2011-01-01

    Loss of previously established behaviors in early childhood constitutes a markedly atypical developmental trajectory. It is found almost uniquely in autism and its cause is currently unknown (Baird et al., 2008). We present an artificial neural network model of developmental regression, exploring the hypothesis that regression is caused by…

  11. Linear Multivariable Regression Models for Prediction of Eddy Dissipation Rate from Available Meteorological Data

    Science.gov (United States)

    MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.

    2005-01-01

    Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.

  12. Genomic Prediction of Genotype × Environment Interaction Kernel Regression Models.

    Science.gov (United States)

    Cuevas, Jaime; Crossa, José; Soberanis, Víctor; Pérez-Elizalde, Sergio; Pérez-Rodríguez, Paulino; Campos, Gustavo de Los; Montesinos-López, O A; Burgueño, Juan

    2016-11-01

    In genomic selection (GS), genotype × environment interaction (G × E) can be modeled by a marker × environment interaction (M × E). The G × E may be modeled through a linear kernel or a nonlinear (Gaussian) kernel. In this study, we propose using two nonlinear Gaussian kernels: the reproducing kernel Hilbert space with kernel averaging (RKHS KA) and the Gaussian kernel with the bandwidth estimated through an empirical Bayesian method (RKHS EB). We performed single-environment analyses and extended to account for G × E interaction (GBLUP-G × E, RKHS KA-G × E and RKHS EB-G × E) in wheat ( L.) and maize ( L.) data sets. For single-environment analyses of wheat and maize data sets, RKHS EB and RKHS KA had higher prediction accuracy than GBLUP for all environments. For the wheat data, the RKHS KA-G × E and RKHS EB-G × E models did show up to 60 to 68% superiority over the corresponding single environment for pairs of environments with positive correlations. For the wheat data set, the models with Gaussian kernels had accuracies up to 17% higher than that of GBLUP-G × E. For the maize data set, the prediction accuracy of RKHS EB-G × E and RKHS KA-G × E was, on average, 5 to 6% higher than that of GBLUP-G × E. The superiority of the Gaussian kernel models over the linear kernel is due to more flexible kernels that accounts for small, more complex marker main effects and marker-specific interaction effects.

  13. Comparing uncertainty resulting from two-step and global regression procedures applied to microbial growth models.

    Science.gov (United States)

    Martino, K G; Marks, B P

    2007-12-01

    Two different microbial modeling procedures were compared and validated against independent data for Listeria monocytogenes growth. The most generally used method is two consecutive regressions: growth parameters are estimated from a primary regression of microbial counts, and a secondary regression relates the growth parameters to experimental conditions. A global regression is an alternative method in which the primary and secondary models are combined, giving a direct relationship between experimental factors and microbial counts. The Gompertz equation was the primary model, and a response surface model was the secondary model. Independent data from meat and poultry products were used to validate the modeling procedures. The global regression yielded the lower standard errors of calibration, 0.95 log CFU/ml for aerobic and 1.21 log CFU/ml for anaerobic conditions. The two-step procedure yielded errors of 1.35 log CFU/ml for aerobic and 1.62 log CFU/ ml for anaerobic conditions. For food products, the global regression was more robust than the two-step procedure for 65% of the cases studied. The robustness index for the global regression ranged from 0.27 (performed better than expected) to 2.60. For the two-step method, the robustness index ranged from 0.42 to 3.88. The predictions were overestimated (fail safe) in more than 50% of the cases using the global regression and in more than 70% of the cases using the two-step regression. Overall, the global regression performed better than the two-step procedure for this specific application.

  14. Maximum likelihood polynomial regression for robust speech recognition

    Institute of Scientific and Technical Information of China (English)

    LU Yong; WU Zhenyang

    2011-01-01

    The linear hypothesis is the main disadvantage of maximum likelihood linear re- gression (MLLR). This paper applies the polynomial regression method to model adaptation and establishes a nonlinear model adaptation algorithm using maximum likelihood polyno

  15. CONFIDENCE REGIONS IN TERMS OF STATISTICAL CURVATURE FOR AR(q) NONLINEAR REGRESSION MODELS

    Institute of Scientific and Technical Information of China (English)

    刘应安; 韦博成

    2004-01-01

    This paper constructs a set of confidence regions of parameters in terms of statistical curvatures for AR(q) nonlinear regression models. The geometric frameworks are proposed for the model. Then several confidence regions for parameters and parameter subsets in terms of statistical curvatures are given based on the likelihood ratio statistics and score statistics. Several previous results, such as [1] and [2] are extended to AR(q)nonlinear regression models.

  16. Comparing Methodologies for Developing an Early Warning System: Classification and Regression Tree Model versus Logistic Regression. REL 2015-077

    Science.gov (United States)

    Koon, Sharon; Petscher, Yaacov

    2015-01-01

    The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…

  17. Aboveground biomass and carbon stocks modelling using non-linear regression model

    Science.gov (United States)

    Ain Mohd Zaki, Nurul; Abd Latif, Zulkiflee; Nazip Suratman, Mohd; Zainee Zainal, Mohd

    2016-06-01

    Aboveground biomass (AGB) is an important source of uncertainty in the carbon estimation for the tropical forest due to the variation biodiversity of species and the complex structure of tropical rain forest. Nevertheless, the tropical rainforest holds the most extensive forest in the world with the vast diversity of tree with layered canopies. With the usage of optical sensor integrate with empirical models is a common way to assess the AGB. Using the regression, the linkage between remote sensing and a biophysical parameter of the forest may be made. Therefore, this paper exemplifies the accuracy of non-linear regression equation of quadratic function to estimate the AGB and carbon stocks for the tropical lowland Dipterocarp forest of Ayer Hitam forest reserve, Selangor. The main aim of this investigation is to obtain the relationship between biophysical parameter field plots with the remotely-sensed data using nonlinear regression model. The result showed that there is a good relationship between crown projection area (CPA) and carbon stocks (CS) with Pearson Correlation (p < 0.01), the coefficient of correlation (r) is 0.671. The study concluded that the integration of Worldview-3 imagery with the canopy height model (CHM) raster based LiDAR were useful in order to quantify the AGB and carbon stocks for a larger sample area of the lowland Dipterocarp forest.

  18. Combining an additive and tree-based regression model simultaneously: STIMA

    NARCIS (Netherlands)

    Dusseldorp, E.; Conversano, C.; Os, B.J. van

    2010-01-01

    Additive models and tree-based regression models are two main classes of statistical models used to predict the scores on a continuous response variable. It is known that additive models become very complex in the presence of higher order interaction effects, whereas some tree-based models, such as

  19. Procedures for adjusting regional regression models of urban-runoff quality using local data

    Science.gov (United States)

    Hoos, A.B.; Sisolak, J.K.

    1993-01-01

    Statistical operations termed model-adjustment procedures (MAP?s) can be used to incorporate local data into existing regression models to improve the prediction of urban-runoff quality. Each MAP is a form of regression analysis in which the local data base is used as a calibration data set. Regression coefficients are determined from the local data base, and the resulting `adjusted? regression models can then be used to predict storm-runoff quality at unmonitored sites. The response variable in the regression analyses is the observed load or mean concentration of a constituent in storm runoff for a single storm. The set of explanatory variables used in the regression analyses is different for each MAP, but always includes the predicted value of load or mean concentration from a regional regression model. The four MAP?s examined in this study were: single-factor regression against the regional model prediction, P, (termed MAP-lF-P), regression against P,, (termed MAP-R-P), regression against P, and additional local variables (termed MAP-R-P+nV), and a weighted combination of P, and a local-regression prediction (termed MAP-W). The procedures were tested by means of split-sample analysis, using data from three cities included in the Nationwide Urban Runoff Program: Denver, Colorado; Bellevue, Washington; and Knoxville, Tennessee. The MAP that provided the greatest predictive accuracy for the verification data set differed among the three test data bases and among model types (MAP-W for Denver and Knoxville, MAP-lF-P and MAP-R-P for Bellevue load models, and MAP-R-P+nV for Bellevue concentration models) and, in many cases, was not clearly indicated by the values of standard error of estimate for the calibration data set. A scheme to guide MAP selection, based on exploratory data analysis of the calibration data set, is presented and tested. The MAP?s were tested for sensitivity to the size of a calibration data set. As expected, predictive accuracy of all MAP?s for

  20. Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition

    Science.gov (United States)

    Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on ...

  1. Estimation of pyrethroid pesticide intake using regression modeling of food groups based on composite dietary samples

    Science.gov (United States)

    Population-based estimates of pesticide intake are needed to characterize exposure for particular demographic groups based on their dietary behaviors. Regression modeling performed on measurements of selected pesticides in composited duplicate diet samples allowed (1) estimation ...

  2. Rank Set Sampling in Improving the Estimates of Simple Regression Model

    Directory of Open Access Journals (Sweden)

    M Iqbal Jeelani

    2015-04-01

    Full Text Available In this paper Rank set sampling (RSS is introduced with a view of increasing the efficiency of estimates of Simple regression model. Regression model is considered with respect to samples taken from sampling techniques like Simple random sampling (SRS, Systematic sampling (SYS and Rank set sampling (RSS. It is found that R2 and Adj R2 obtained from regression model based on Rank set sample is higher than rest of two sampling schemes. Similarly Root mean square error, p-values, coefficient of variation are much lower in Rank set based regression model, also under validation technique (Jackknifing there is consistency in the measure of R2, Adj R2 and RMSE in case of RSS as compared to SRS and SYS. Results are supported with an empirical study involving a real data set generated of Pinus Wallichiana taken from block Langate of district Kupwara. 

  3. Parameter-elevation Regressions on Independent Slopes Model Monthly Climate Data for the Continental United States.

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This dataset was created using the PRISM (Parameter-elevation Regressions on Independent Slopes Model) climate mapping system, developed by Dr. Christopher Daly,...

  4. Efficient Estimation for Semiparametric Varying Coefficient Partially Linear Regression Models with Current Status Data

    Institute of Scientific and Technical Information of China (English)

    Tao Hu; Heng-jian Cui; Xing-wei Tong

    2009-01-01

    This article considers a semiparametric varying-coefficient partially linear regression model with current status data. The semiparametric varying-coefficient partially linear regression model which is a gen-eralization of the partially linear regression model and varying-coefficient regression model that allows one to explore the possibly nonlinear effect of a certain covariate on the response variable. A Sieve maximum likelihood estimation method is proposed and the asymptotic properties of the proposed estimators are discussed. Under some mild conditions, the estimators are shown to be strongly consistent. The convergence rate of the estima-tor for the unknown smooth function is obtained and the estimator for the unknown parameter is shown to be asymptotically efficient and normally distributed. Simulation studies are conducted to examine the small-sample properties of the proposed estimates and a real dataset is used to illustrate our approach.

  5. Evaluation of accuracy of linear regression models in predicting urban stormwater discharge characteristics.

    Science.gov (United States)

    Madarang, Krish J; Kang, Joo-Hyon

    2014-06-01

    Stormwater runoff has been identified as a source of pollution for the environment, especially for receiving waters. In order to quantify and manage the impacts of stormwater runoff on the environment, predictive models and mathematical models have been developed. Predictive tools such as regression models have been widely used to predict stormwater discharge characteristics. Storm event characteristics, such as antecedent dry days (ADD), have been related to response variables, such as pollutant loads and concentrations. However it has been a controversial issue among many studies to consider ADD as an important variable in predicting stormwater discharge characteristics. In this study, we examined the accuracy of general linear regression models in predicting discharge characteristics of roadway runoff. A total of 17 storm events were monitored in two highway segments, located in Gwangju, Korea. Data from the monitoring were used to calibrate United States Environmental Protection Agency's Storm Water Management Model (SWMM). The calibrated SWMM was simulated for 55 storm events, and the results of total suspended solid (TSS) discharge loads and event mean concentrations (EMC) were extracted. From these data, linear regression models were developed. R(2) and p-values of the regression of ADD for both TSS loads and EMCs were investigated. Results showed that pollutant loads were better predicted than pollutant EMC in the multiple regression models. Regression may not provide the true effect of site-specific characteristics, due to uncertainty in the data.

  6. Prediction Model of Cutting Parameters for Turning High Strength Steel Grade-H: Comparative Study of Regression Model versus ANFIS

    Directory of Open Access Journals (Sweden)

    Adel T. Abbas

    2017-01-01

    Full Text Available The Grade-H high strength steel is used in the manufacturing of many civilian and military products. The procedures of manufacturing these parts have several turning operations. The key factors for the manufacturing of these parts are the accuracy, surface roughness (Ra, and material removal rate (MRR. The production line of these parts contains many CNC turning machines to get good accuracy and repeatability. The manufacturing engineer should fulfill the required surface roughness value according to the design drawing from first trail (otherwise these parts will be rejected as well as keeping his eye on maximum metal removal rate. The rejection of these parts at any processing stage will represent huge problems to any factory because the processing and raw material of these parts are very expensive. In this paper the artificial neural network was used for predicting the surface roughness for different cutting parameters in CNC turning operations. These parameters were investigated to get the minimum surface roughness. In addition, a mathematical model for surface roughness was obtained from the experimental data using a regression analysis method. The experimental data are then compared with both the regression analysis results and ANFIS (Adaptive Network-based Fuzzy Inference System estimations.

  7. The empirical likelihood goodness-of-fit test for regression model

    Institute of Scientific and Technical Information of China (English)

    Li-xing ZHU; Yong-song QIN; Wang-li XU

    2007-01-01

    Goodness-of-fit test for regression modes has received much attention in literature. In this paper, empirical likelihood (EL) goodness-of-fit tests for regression models including classical parametric and autoregressive (AR) time series models are proposed. Unlike the existing locally smoothing and globally smoothing methodologies, the new method has the advantage that the tests are self-scale invariant and that the asymptotic null distribution is chi-squared. Simulations are carried out to illustrate the methodology.

  8. On asymptotics of t-type regression estimation in multiple linear model

    Institute of Scientific and Technical Information of China (English)

    2004-01-01

    We consider a robust estimator (t-type regression estimator) of multiple linear regression model by maximizing marginal likelihood of a scaled t-type error t-distribution.The marginal likelihood can also be applied to the de-correlated response when the withinsubject correlation can be consistently estimated from an initial estimate of the model based on the independent working assumption. This paper shows that such a t-type estimator is consistent.

  9. Comparison of land-use regression models between Great Britain and the Netherlands.

    NARCIS (Netherlands)

    Vienneau, D.; de Hoogh, K.; Beelen, R.M.J.; Fischer, P.; Hoek, G.; Briggs, D.

    2010-01-01

    Land-use regression models have increasingly been applied for air pollution mapping at typically the city level. Though models generally predict spatial variability well, the structure of models differs widely between studies. The observed differences in the models may be due to artefacts of data an

  10. Developing and testing a global-scale regression model to quantify mean annual streamflow

    Science.gov (United States)

    Barbarossa, Valerio; Huijbregts, Mark A. J.; Hendriks, A. Jan; Beusen, Arthur H. W.; Clavreul, Julie; King, Henry; Schipper, Aafke M.

    2017-01-01

    Quantifying mean annual flow of rivers (MAF) at ungauged sites is essential for assessments of global water supply, ecosystem integrity and water footprints. MAF can be quantified with spatially explicit process-based models, which might be overly time-consuming and data-intensive for this purpose, or with empirical regression models that predict MAF based on climate and catchment characteristics. Yet, regression models have mostly been developed at a regional scale and the extent to which they can be extrapolated to other regions is not known. In this study, we developed a global-scale regression model for MAF based on a dataset unprecedented in size, using observations of discharge and catchment characteristics from 1885 catchments worldwide, measuring between 2 and 106 km2. In addition, we compared the performance of the regression model with the predictive ability of the spatially explicit global hydrological model PCR-GLOBWB by comparing results from both models to independent measurements. We obtained a regression model explaining 89% of the variance in MAF based on catchment area and catchment averaged mean annual precipitation and air temperature, slope and elevation. The regression model performed better than PCR-GLOBWB for the prediction of MAF, as root-mean-square error (RMSE) values were lower (0.29-0.38 compared to 0.49-0.57) and the modified index of agreement (d) was higher (0.80-0.83 compared to 0.72-0.75). Our regression model can be applied globally to estimate MAF at any point of the river network, thus providing a feasible alternative to spatially explicit process-based global hydrological models.

  11. OPLS statistical model versus linear regression to assess sonographic predictors of stroke prognosis.

    Science.gov (United States)

    Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi

    2012-01-01

    The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.

  12. Use of empirical likelihood to calibrate auxiliary information in partly linear monotone regression models.

    Science.gov (United States)

    Chen, Baojiang; Qin, Jing

    2014-05-10

    In statistical analysis, a regression model is needed if one is interested in finding the relationship between a response variable and covariates. When the response depends on the covariate, then it may also depend on the function of this covariate. If one has no knowledge of this functional form but expect for monotonic increasing or decreasing, then the isotonic regression model is preferable. Estimation of parameters for isotonic regression models is based on the pool-adjacent-violators algorithm (PAVA), where the monotonicity constraints are built in. With missing data, people often employ the augmented estimating method to improve estimation efficiency by incorporating auxiliary information through a working regression model. However, under the framework of the isotonic regression model, the PAVA does not work as the monotonicity constraints are violated. In this paper, we develop an empirical likelihood-based method for isotonic regression model to incorporate the auxiliary information. Because the monotonicity constraints still hold, the PAVA can be used for parameter estimation. Simulation studies demonstrate that the proposed method can yield more efficient estimates, and in some situations, the efficiency improvement is substantial. We apply this method to a dementia study.

  13. A hybrid model using logistic regression and wavelet transformation to detect traffic incidents

    Directory of Open Access Journals (Sweden)

    Shaurya Agarwal

    2016-07-01

    Full Text Available This research paper investigates a hybrid model using logistic regression with a wavelet-based feature extraction for detecting traffic incidents. A logistic regression model is suitable when the outcome can take only a limited number of values. For traffic incident detection, the outcome is limited to only two values, the presence or absence of an incident. The logistic regression model used in this study is a generalized linear model (GLM with a binomial response and a logit link function. This paper presents a framework to use logistic regression and wavelet-based feature extraction for traffic incident detection. It investigates the effect of preprocessing data on the performance of incident detection models. Results of this study indicate that logistic regression along with wavelet based feature extraction can be used effectively for incident detection by balancing the incident detection rate and the false alarm rate according to need. Logistic regression on raw data resulted in a maximum detection rate of 95.4% at the cost of 14.5% false alarm rate. Whereas the hybrid model achieved a maximum detection rate of 98.78% at the expense of 6.5% false alarm rate. Results indicate that the proposed approach is practical and efficient; with future improvements in the proposed technique, it will make an effective tool for traffic incident detection.

  14. Regression-based air temperature spatial prediction models: an example from Poland

    Directory of Open Access Journals (Sweden)

    Mariusz Szymanowski

    2013-10-01

    Full Text Available A Geographically Weighted Regression ? Kriging (GWRK algorithm, based on the local Geographically Weighted Regression (GWR, is applied for spatial prediction of air temperature in Poland. Hengl's decision tree for selecting a suitable prediction model is extended for varying spatial relationships between the air temperature and environmental predictors with an assumption of existing environmental dependence of analyzed temperature variables. The procedure includes the potential choice of a local GWR instead of the global Multiple Linear Regression (MLR method for modeling the deterministic part of spatial variation, which is usual in the standard regression (residual kriging model (MLRK. The analysis encompassed: testing for environmental correlation, selecting an appropriate regression model, testing for spatial autocorrelation of the residual component, and validating the prediction accuracy. The proposed approach was performed for 69 air temperature cases, with time aggregation ranging from daily to annual average air temperatures. The results show that, irrespective of the level of data aggregation, the spatial distribution of temperature is better fitted by local models, and hence is the reason for choosing a GWR instead of the MLR for all variables analyzed. Additionally, in most cases (78% there is spatial autocorrelation in the residuals of the deterministic part, which suggests that the GWR model should be extended by ordinary kriging of residuals to the GWRK form. The decision tree used in this paper can be considered as universal as it encompasses either spatially varying relationships of modeled and explanatory variables or random process that can be modeled by a stochastic extension of the regression model (residual kriging. Moreover, for all cases analyzed, the selection of a method based on the local regression model (GWRK or GWR does not depend on the data aggregation level, showing the potential versatility of the technique.

  15. Exploring nonlinear relations: models of clinical decision making by regression with optimal scaling.

    Science.gov (United States)

    Hartmann, Armin; Van Der Kooij, Anita J; Zeeck, Almut

    2009-07-01

    In explorative regression studies, linear models are often applied without questioning the linearity of the relations between the predictor variables and the dependent variable, or linear relations are taken as an approximation. In this study, the method of regression with optimal scaling transformations is demonstrated. This method does not require predefined nonlinear functions and results in easy-to-interpret transformations that will show the form of the relations. The method is illustrated using data from a German multicenter project on the indication criteria for inpatient or day clinic psychotherapy treatment. The indication criteria to include in the regression model were selected with the Lasso, which is a tool for predictor selection that overcomes the disadvantages of stepwise regression methods. The resulting prediction model indicates that treatment status is (approximately) linearly related to some criteria and nonlinearly related to others.

  16. Intuitionistic Fuzzy Weighted Linear Regression Model with Fuzzy Entropy under Linear Restrictions.

    Science.gov (United States)

    Kumar, Gaurav; Bajaj, Rakesh Kumar

    2014-01-01

    In fuzzy set theory, it is well known that a triangular fuzzy number can be uniquely determined through its position and entropies. In the present communication, we extend this concept on triangular intuitionistic fuzzy number for its one-to-one correspondence with its position and entropies. Using the concept of fuzzy entropy the estimators of the intuitionistic fuzzy regression coefficients have been estimated in the unrestricted regression model. An intuitionistic fuzzy weighted linear regression (IFWLR) model with some restrictions in the form of prior information has been considered. Further, the estimators of regression coefficients have been obtained with the help of fuzzy entropy for the restricted/unrestricted IFWLR model by assigning some weights in the distance function.

  17. Modeling personalized head-related impulse response using support vector regression

    Institute of Scientific and Technical Information of China (English)

    HUANG Qing-hua; FANG Yong

    2009-01-01

    A new customization approach based on support vector regression (SVR) is proposed to obtain individual headrelated impulse response (HRIR) without complex measurement and special equipment. Principal component analysis (PCA) is first applied to obtain a few principal components and corresponding weight vectors correlated with individual anthropometric parameters. Then the weight vectors act as output of the nonlinear regression model. Some measured anthropometric parameters are selected as input of the model according to the correlation coefficients between the parameters and the weight vectors. After the regression model is learned from the training data, the individual HRIR can be predicted based on the measured anthropometric parameters. Compared with a back-propagation neural network (BPNN) for nonlinear regression,better generalization and prediction performance for small training samples can be obtained using the proposed PCA-SVR algorithm.

  18. Multivariable Regression and Adaptive Neurofuzzy Inference System Predictions of Ash Fusion Temperatures Using Ash Chemical Composition of US Coals

    Directory of Open Access Journals (Sweden)

    Shahab Karimi

    2014-01-01

    Full Text Available In this study, the effects of ratios of dolomite, base/acid, silica, SiO2/Al2O3, and Fe2O3/CaO, base and acid oxides, and 11 oxides (SiO2, Al2O3, CaO, MgO, MnO, Na2O, K2O, Fe2O3, TiO2, P2O5, and SO3 on ash fusion temperatures for 1040 US coal samples from 12 states were evaluated using regression and adaptive neurofuzzy inference system (ANFIS methods. Different combinations of independent variables were examined to predict ash fusion temperatures in the multivariable procedure. The combination of the “11 oxides + (Base/Acid + Silica ratio” was the best predictor. Correlation coefficients (R2 of 0.891, 0.917, and 0.94 were achieved using nonlinear equations for the prediction of initial deformation temperature (IDT, softening temperature (ST, and fluid temperature (FT, respectively. The mentioned “best predictor” was used as input to the ANFIS system as well, and the correlation coefficients (R2 of the prediction were enhanced to 0.97, 0.98, and 0.99 for IDT, ST, and FT, respectively. The prediction precision that was achieved in this work exceeded that reported in previously published works.

  19. Predicting dissolved oxygen concentration using kernel regression modeling approaches with nonlinear hydro-chemical data.

    Science.gov (United States)

    Singh, Kunwar P; Gupta, Shikha; Rai, Premanjali

    2014-05-01

    Kernel function-based regression models were constructed and applied to a nonlinear hydro-chemical dataset pertaining to surface water for predicting the dissolved oxygen levels. Initial features were selected using nonlinear approach. Nonlinearity in the data was tested using BDS statistics, which revealed the data with nonlinear structure. Kernel ridge regression, kernel principal component regression, kernel partial least squares regression, and support vector regression models were developed using the Gaussian kernel function and their generalization and predictive abilities were compared in terms of several statistical parameters. Model parameters were optimized using the cross-validation procedure. The proposed kernel regression methods successfully captured the nonlinear features of the original data by transforming it to a high dimensional feature space using the kernel function. Performance of all the kernel-based modeling methods used here were comparable both in terms of predictive and generalization abilities. Values of the performance criteria parameters suggested for the adequacy of the constructed models to fit the nonlinear data and their good predictive capabilities.

  20. Modelling QTL effect on BTA06 using random regression test day models.

    Science.gov (United States)

    Suchocki, T; Szyda, J; Zhang, Q

    2013-02-01

    In statistical models, a quantitative trait locus (QTL) effect has been incorporated either as a fixed or as a random term, but, up to now, it has been mainly considered as a time-independent variable. However, for traits recorded repeatedly, it is very interesting to investigate the variation of QTL over time. The major goal of this study was to estimate the position and effect of QTL for milk, fat, protein yields and for somatic cell score based on test day records, while testing whether the effects are constant or variable throughout lactation. The analysed data consisted of 23 paternal half-sib families (716 daughters of 23 sires) of Chinese Holstein-Friesian cattle genotyped at 14 microsatellites located in the area of the casein loci on BTA6. A sequence of three models was used: (i) a lactation model, (ii) a random regression model with a QTL constant in time and (iii) a random regression model with a QTL variable in time. The results showed that, for each production trait, at least one significant QTL exists. For milk and protein yields, the QTL effect was variable in time, while for fat yield, each of the three models resulted in a significant QTL effect. When a QTL is incorporated into a model as a constant over time, its effect is averaged over lactation stages and may, thereby, be difficult or even impossible to be detected. Our results showed that, in such a situation, only a longitudinal model is able to identify loci significantly influencing trait variation.

  1. RAINFALL-RUNOFF MODELING IN THE TURKEY RIVER USING NUMERICAL AND REGRESSION METHODS

    Directory of Open Access Journals (Sweden)

    J. Behmanesh

    2015-01-01

    Full Text Available Modeling rainfall-runoff relationships in a watershed have an important role in water resources engineering. Researchers have used numerical models for modeling rainfall-runoff process in the watershed because of non-linear nature of rainfall-runoff relationship, vast data requirement and physical models hardness. The main object of this research was to model the rainfall-runoff relationship at the Turkey River in Mississippi. In this research, two numerical models including ANN and ANFIS were used to model the rainfall-runoff process and the best model was chosen. Also, by using SPSS software, the regression equations were developed and then the best equation was selected from regression analysis. The obtained results from the numerical and regression modeling were compared each other. The comparison showed that the model obtained from ANFIS modeling was better than the model obtained from regression modeling. The results also stated that the Turkey river flow rate had a logical relationship with one and two days ago flow rate and one, two and three days ago rainfall values.

  2. RAINFALL-RUNOFF MODELING IN THE TURKEY RIVER USING NUMERICAL AND REGRESSION METHODS

    Directory of Open Access Journals (Sweden)

    J. Behmanesh

    2015-03-01

    Full Text Available Modeling rainfall-runoff relationships in a watershed have an important role in water resources engineering. Researchers have used numerical models for modeling rainfall-runoff process in the watershed because of non-linear nature of rainfall-runoff relationship, vast data requirement and physical models hardness. The main object of this research was to model the rainfall-runoff relationship at the Turkey River in Mississippi. In this research, two numerical models including ANN and ANFIS were used to model the rainfall-runoff process and the best model was chosen. Also, by using SPSS software, the regression equations were developed and then the best equation was selected from regression analysis. The obtained results from the numerical and regression modeling were compared each other. The comparison showed that the model obtained from ANFIS modeling was better than the model obtained from regression modeling. The results also stated that the Turkey river flow rate had a logical relationship with one and two days ago flow rate and one, two and three days ago rainfall values.

  3. Mixed-effects Gaussian process functional regression models with application to dose-response curve prediction.

    Science.gov (United States)

    Shi, J Q; Wang, B; Will, E J; West, R M

    2012-11-20

    We propose a new semiparametric model for functional regression analysis, combining a parametric mixed-effects model with a nonparametric Gaussian process regression model, namely a mixed-effects Gaussian process functional regression model. The parametric component can provide explanatory information between the response and the covariates, whereas the nonparametric component can add nonlinearity. We can model the mean and covariance structures simultaneously, combining the information borrowed from other subjects with the information collected from each individual subject. We apply the model to dose-response curves that describe changes in the responses of subjects for differing levels of the dose of a drug or agent and have a wide application in many areas. We illustrate the method for the management of renal anaemia. An individual dose-response curve is improved when more information is included by this mechanism from the subject/patient over time, enabling a patient-specific treatment regime.

  4. Graphical Models and Computerized Adaptive Testing.

    Science.gov (United States)

    Mislevy, Robert J.; Almond, Russell G.

    This paper synthesizes ideas from the fields of graphical modeling and education testing, particularly item response theory (IRT) applied to computerized adaptive testing (CAT). Graphical modeling can offer IRT a language for describing multifaceted skills and knowledge, and disentangling evidence from complex performances. IRT-CAT can offer…

  5. MULTIPLE LOGISTIC REGRESSION MODEL TO PREDICT RISK FACTORS OF ORAL HEALTH DISEASES

    Directory of Open Access Journals (Sweden)

    Parameshwar V. Pandit

    2012-06-01

    Full Text Available Purpose: To analysis the dependence of oral health diseases i.e. dental caries and periodontal disease on considering the number of risk factors through the applications of logistic regression model. Method: The cross sectional study involves a systematic random sample of 1760 permanent dentition aged between 18-40 years in Dharwad, Karnataka, India. Dharwad is situated in North Karnataka. The mean age was 34.26±7.28. The risk factors of dental caries and periodontal disease were established by multiple logistic regression model using SPSS statistical software. Results: The factors like frequency of brushing, timings of cleaning teeth and type of toothpastes are significant persistent predictors of dental caries and periodontal disease. The log likelihood value of full model is –1013.1364 and Akaike’s Information Criterion (AIC is 1.1752 as compared to reduced regression model are -1019.8106 and 1.1748 respectively for dental caries. But, the log likelihood value of full model is –1085.7876 and AIC is 1.2577 followed by reduced regression model are -1019.8106 and 1.1748 respectively for periodontal disease. The area under Receiver Operating Characteristic (ROC curve for the dental caries is 0.7509 (full model and 0.7447 (reduced model; the ROC for the periodontal disease is 0.6128 (full model and 0.5821 (reduced model. Conclusions: The frequency of brushing, timings of cleaning teeth and type of toothpastes are main signifi cant risk factors of dental caries and periodontal disease. The fitting performance of reduced logistic regression model is slightly a better fit as compared to full logistic regression model in identifying the these risk factors for both dichotomous dental caries and periodontal disease.

  6. A primer for biomedical scientists on how to execute model II linear regression analysis.

    Science.gov (United States)

    Ludbrook, John

    2012-04-01

    1. There are two very different ways of executing linear regression analysis. One is Model I, when the x-values are fixed by the experimenter. The other is Model II, in which the x-values are free to vary and are subject to error. 2. I have received numerous complaints from biomedical scientists that they have great difficulty in executing Model II linear regression analysis. This may explain the results of a Google Scholar search, which showed that the authors of articles in journals of physiology, pharmacology and biochemistry rarely use Model II regression analysis. 3. I repeat my previous arguments in favour of using least products linear regression analysis for Model II regressions. I review three methods for executing ordinary least products (OLP) and weighted least products (WLP) regression analysis: (i) scientific calculator and/or computer spreadsheet; (ii) specific purpose computer programs; and (iii) general purpose computer programs. 4. Using a scientific calculator and/or computer spreadsheet, it is easy to obtain correct values for OLP slope and intercept, but the corresponding 95% confidence intervals (CI) are inaccurate. 5. Using specific purpose computer programs, the freeware computer program smatr gives the correct OLP regression coefficients and obtains 95% CI by bootstrapping. In addition, smatr can be used to compare the slopes of OLP lines. 6. When using general purpose computer programs, I recommend the commercial programs systat and Statistica for those who regularly undertake linear regression analysis and I give step-by-step instructions in the Supplementary Information as to how to use loss functions.

  7. Hybrid Surface Mesh Adaptation for Climate Modeling

    Institute of Scientific and Technical Information of China (English)

    Ahmed Khamayseh; Valmor de Almeida; Glen Hansen

    2008-01-01

    Solution-driven mesh adaptation is becoming quite popular for spatial error control in the numerical simulation of complex computational physics applications, such as climate modeling. Typically, spatial adaptation is achieved by element subdivision (h adaptation) with a primary goal of resolving the local length scales of interest. A second, lesspopular method of spatial adaptivity is called "mesh motion" (r adaptation); the smooth repositioning of mesh node points aimed at resizing existing elements to capture the local length scales. This paper proposes an adaptation method based on a combination of both element subdivision and node point repositioning (rh adaptation). By combining these two methods using the notion of a mobility function, the proposed approach seeks to increase the flexibility and extensibility of mesh motion algorithms while providing a somewhat smoother transition between refined regions than is pro-duced by element subdivision alone. Further, in an attempt to support the requirements of a very general class of climate simulation applications, the proposed method is de-signed to accommodate unstructured, polygonal mesh topologies in addition to the most popular mesh types.

  8. Penalized regression techniques for modeling relationships between metabolites and tomato taste attributes

    NARCIS (Netherlands)

    Menendez, P.; Eilers, P.; Tikunov, Y.M.; Bovy, A.G.; Eeuwijk, van F.

    2012-01-01

    The search for models which link tomato taste attributes to their metabolic profiling, is a main challenge within the breeding programs that aim to enhance tomato flavor. In this paper, we compared such models calculated by the traditional statistical approach, stepwise regression, with models obtai

  9. Evaluation of Regression Models of Balance Calibration Data Using an Empirical Criterion

    Science.gov (United States)

    Ulbrich, Norbert; Volden, Thomas R.

    2012-01-01

    An empirical criterion for assessing the significance of individual terms of regression models of wind tunnel strain gage balance outputs is evaluated. The criterion is based on the percent contribution of a regression model term. It considers a term to be significant if its percent contribution exceeds the empirical threshold of 0.05%. The criterion has the advantage that it can easily be computed using the regression coefficients of the gage outputs and the load capacities of the balance. First, a definition of the empirical criterion is provided. Then, it is compared with an alternate statistical criterion that is widely used in regression analysis. Finally, calibration data sets from a variety of balances are used to illustrate the connection between the empirical and the statistical criterion. A review of these results indicated that the empirical criterion seems to be suitable for a crude assessment of the significance of a regression model term as the boundary between a significant and an insignificant term cannot be defined very well. Therefore, regression model term reduction should only be performed by using the more universally applicable statistical criterion.

  10. A Stochastic Restricted Principal Components Regression Estimator in the Linear Model

    Directory of Open Access Journals (Sweden)

    Daojiang He

    2014-01-01

    Full Text Available We propose a new estimator to combat the multicollinearity in the linear model when there are stochastic linear restrictions on the regression coefficients. The new estimator is constructed by combining the ordinary mixed estimator (OME and the principal components regression (PCR estimator, which is called the stochastic restricted principal components (SRPC regression estimator. Necessary and sufficient conditions for the superiority of the SRPC estimator over the OME and the PCR estimator are derived in the sense of the mean squared error matrix criterion. Finally, we give a numerical example and a Monte Carlo study to illustrate the performance of the proposed estimator.

  11. Regression analysis understanding and building business and economic models using Excel

    CERN Document Server

    Wilson, J Holton

    2012-01-01

    The technique of regression analysis is used so often in business and economics today that an understanding of its use is necessary for almost everyone engaged in the field. This book will teach you the essential elements of building and understanding regression models in a business/economic context in an intuitive manner. The authors take a non-theoretical treatment that is accessible even if you have a limited statistical background. It is specifically designed to teach the correct use of regression, while advising you of its limitations and teaching about common pitfalls. This book describe

  12. Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat.

    Science.gov (United States)

    Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne

    2012-12-01

    In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.

  13. Regression models for interval censored survival data: Application to HIV infection in Danish homosexual men

    DEFF Research Database (Denmark)

    Carstensen, Bendix

    1996-01-01

    This paper shows how to fit excess and relative risk regression models to interval censored survival data, and how to implement the models in standard statistical software. The methods developed are used for the analysis of HIV infection rates in a cohort of Danish homosexual men.......This paper shows how to fit excess and relative risk regression models to interval censored survival data, and how to implement the models in standard statistical software. The methods developed are used for the analysis of HIV infection rates in a cohort of Danish homosexual men....

  14. The Relationship between Economic Growth and Money Laundering – a Linear Regression Model

    Directory of Open Access Journals (Sweden)

    Daniel Rece

    2009-09-01

    Full Text Available This study provides an overview of the relationship between economic growth and money laundering modeled by a least squares function. The report analyzes statistically data collected from USA, Russia, Romania and other eleven European countries, rendering a linear regression model. The study illustrates that 23.7% of the total variance in the regressand (level of money laundering is “explained” by the linear regression model. In our opinion, this model will provide critical auxiliary judgment and decision support for anti-money laundering service systems.

  15. LINEAR LAYER AND GENERALIZED REGRESSION COMPUTATIONAL INTELLIGENCE MODELS FOR PREDICTING SHELF LIFE OF PROCESSED CHEESE

    Directory of Open Access Journals (Sweden)

    S. Goyal

    2012-03-01

    Full Text Available This paper highlights the significance of computational intelligence models for predicting shelf life of processed cheese stored at 7-8 g.C. Linear Layer and Generalized Regression models were developed with input parameters: Soluble nitrogen, pH, Standard plate count, Yeast & mould count, Spores, and sensory score as output parameter. Mean Square Error, Root Mean Square Error, Coefficient of Determination and Nash - Sutcliffo Coefficient were used in order to compare the prediction ability of the models. The study revealed that Generalized Regression computational intelligence models are quite effective in predicting the shelf life of processed cheese stored at 7-8 g.C.

  16. Modeling of retardance in ferrofluid with Taguchi-based multiple regression analysis

    Science.gov (United States)

    Lin, Jing-Fung; Wu, Jyh-Shyang; Sheu, Jer-Jia

    2015-03-01

    The citric acid (CA) coated Fe3O4 ferrofluids are prepared by a co-precipitation method and the magneto-optical retardance property is measured by a Stokes polarimeter. Optimization and multiple regression of retardance in ferrofluids are executed by combining Taguchi method and Excel. From the nine tests for four parameters, including pH of suspension, molar ratio of CA to Fe3O4, volume of CA, and coating temperature, influence sequence and excellent program are found. Multiple regression analysis and F-test on the significance of regression equation are performed. It is found that the model F value is much larger than Fcritical and significance level P <0.0001. So it can be concluded that the regression model has statistically significant predictive ability. Substituting excellent program into equation, retardance is obtained as 32.703°, higher than the highest value in tests by 11.4%.

  17. Structured Additive Regression Models: An R Interface to BayesX

    Directory of Open Access Journals (Sweden)

    Nikolaus Umlauf

    2015-02-01

    Full Text Available Structured additive regression (STAR models provide a flexible framework for model- ing possible nonlinear effects of covariates: They contain the well established frameworks of generalized linear models and generalized additive models as special cases but also allow a wider class of effects, e.g., for geographical or spatio-temporal data, allowing for specification of complex and realistic models. BayesX is standalone software package providing software for fitting general class of STAR models. Based on a comprehensive open-source regression toolbox written in C++, BayesX uses Bayesian inference for estimating STAR models based on Markov chain Monte Carlo simulation techniques, a mixed model representation of STAR models, or stepwise regression techniques combining penalized least squares estimation with model selection. BayesX not only covers models for responses from univariate exponential families, but also models from less-standard regression situations such as models for multi-categorical responses with either ordered or unordered categories, continuous time survival data, or continuous time multi-state models. This paper presents a new fully interactive R interface to BayesX: the R package R2BayesX. With the new package, STAR models can be conveniently specified using Rs formula language (with some extended terms, fitted using the BayesX binary, represented in R with objects of suitable classes, and finally printed/summarized/plotted. This makes BayesX much more accessible to users familiar with R and adds extensive graphics capabilities for visualizing fitted STAR models. Furthermore, R2BayesX complements the already impressive capabilities for semiparametric regression in R by a comprehensive toolbox comprising in particular more complex response types and alternative inferential procedures such as simulation-based Bayesian inference.

  18. A land use regression model for ambient ultrafine particles in Montreal, Canada: A comparison of linear regression and a machine learning approach.

    Science.gov (United States)

    Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne

    2016-04-01

    Existing evidence suggests that ambient ultrafine particles (UFPs) (regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure.

  19. A brief introduction to regression designs and mixed-effects modelling by a recent convert

    OpenAIRE

    Balling, Laura Winther

    2008-01-01

    This article discusses the advantages of multiple regression designs over the factorial designs traditionally used in many psycholinguistic experiments. It is shown that regression designs are typically more informative, statistically more powerful and better suited to the analysis of naturalistic tasks. The advantages of including both fixed and random effects are demonstrated with reference to linear mixed-effects models, and problems of collinearity, variable distribution and variable sele...

  20. Regression Basics

    CERN Document Server

    Kahane, Leo H

    2007-01-01

    Using a friendly, nontechnical approach, the Second Edition of Regression Basics introduces readers to the fundamentals of regression. Accessible to anyone with an introductory statistics background, this book builds from a simple two-variable model to a model of greater complexity. Author Leo H. Kahane weaves four engaging examples throughout the text to illustrate not only the techniques of regression but also how this empirical tool can be applied in creative ways to consider a broad array of topics. New to the Second Edition Offers greater coverage of simple panel-data estimation:

  1. Modelling of binary logistic regression for obesity among secondary students in a rural area of Kedah

    Science.gov (United States)

    Kamaruddin, Ainur Amira; Ali, Zalila; Noor, Norlida Mohd.; Baharum, Adam; Ahmad, Wan Muhamad Amir W.

    2014-07-01

    Logistic regression analysis examines the influence of various factors on a dichotomous outcome by estimating the probability of the event's occurrence. Logistic regression, also called a logit model, is a statistical procedure used to model dichotomous outcomes. In the logit model the log odds of the dichotomous outcome is modeled as a linear combination of the predictor variables. The log odds ratio in logistic regression provides a description of the probabilistic relationship of the variables and the outcome. In conducting logistic regression, selection procedures are used in selecting important predictor variables, diagnostics are used to check that assumptions are valid which include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers and a test statistic is calculated to determine the aptness of the model. This study used the binary logistic regression model to investigate overweight and obesity among rural secondary school students on the basis of their demographics profile, medical history, diet and lifestyle. The results indicate that overweight and obesity of students are influenced by obesity in family and the interaction between a student's ethnicity and routine meals intake. The odds of a student being overweight and obese are higher for a student having a family history of obesity and for a non-Malay student who frequently takes routine meals as compared to a Malay student.

  2. Intelligent CAD Methodology Research of Adaptive Modeling

    Institute of Scientific and Technical Information of China (English)

    ZHANG Weibo; LI Jun; YAN Jianrong

    2006-01-01

    The key to carry out ICAD technology is to establish the knowledge-based and wide rang of domains-covered product model. This paper put out a knowledge-based methodology of adaptive modeling. It is under the Ontology mind, using the Object-Oriented technology and being a knowledge-based model framework. It involves the diverse domains in product design and realizes the multi-domain modeling, embedding the relative information including standards, regulars and expert experience. To test the feasibility of the methodology, the research bonds of the automotive diaphragm spring clutch design and an adaptive clutch design model is established, using the knowledge-based modeling language-AML.

  3. truncSP: An R Package for Estimation of Semi-Parametric Truncated Linear Regression Models

    Directory of Open Access Journals (Sweden)

    Maria Karlsson

    2014-05-01

    Full Text Available Problems with truncated data occur in many areas, complicating estimation and inference. Regarding linear regression models, the ordinary least squares estimator is inconsistent and biased for these types of data and is therefore unsuitable for use. Alternative estimators, designed for the estimation of truncated regression models, have been developed. This paper presents the R package truncSP. The package contains functions for the estimation of semi-parametric truncated linear regression models using three different estimators: the symmetrically trimmed least squares, quadratic mode, and left truncated estimators, all of which have been shown to have good asymptotic and ?nite sample properties. The package also provides functions for the analysis of the estimated models. Data from the environmental sciences are used to illustrate the functions in the package.

  4. Computer and Modernization%Low-dose CT Image Reconstruction Based on Adaptive Kernel Regression Method and Algebraic Reconstruction Technique

    Institute of Scientific and Technical Information of China (English)

    钟志威

    2016-01-01

    针对稀疏角度投影数据CT图像重建问题,TV-ART算法将图像的梯度稀疏先验知识引入代数重建法( ART)中,对分段平滑的图像具有较好的重建效果。但是,该算法在边界重建时会产生阶梯效应,影响重建质量。因此,本文提出自适应核回归函数结合代数重建法的重建算法( LAKR-ART),不仅在边界重建时不会产生阶梯效应,而且对细节纹理重建具有更好的重建效果。最后对shepp-logan标准CT图像和实际CT头颅图像进行仿真实验,并与ART、TV-ART算法进行比较,实验结果表明本文算法有效。%To the problem of sparse angular projection data of CT image reconstruction, TV-ART algorithm introduces the gradient sparse prior knowledge of image to algebraic reconstruction, and the local smooth image gets a better reconstruction effect. How-ever, the algorithm generates step effect when the borders are reconstructed, affecting the quality of the reconstruction. Therefore, this paper proposes an adaptive kernel regression function combined with Algebraic Reconstruction Technique reconstruction algo-rithm ( LAKR-ART) , it does not produce the step effect on the border reconstruction, and has a better effect to detail reconstruc-tion. Finally we use the shepp-logan CT image and the actual CT image to make the simulation experiment, and compare with ART and TV-ART algorithm. The experimental results show the algorithm is of effectiveness.

  5. Proximate analysis based multiple regression models for higher heating value estimation of low rank coals

    Energy Technology Data Exchange (ETDEWEB)

    Akkaya, Ali Volkan [Department of Mechanical Engineering, Yildiz Technical University, 34349 Besiktas, Istanbul (Turkey)

    2009-02-15

    In this paper, multiple nonlinear regression models for estimation of higher heating value of coals are developed using proximate analysis data obtained generally from the low rank coal samples as-received basis. In this modeling study, three main model structures depended on the number of proximate analysis parameters, which are named the independent variables, such as moisture, ash, volatile matter and fixed carbon, are firstly categorized. Secondly, sub-model structures with different arrangements of the independent variables are considered. Each sub-model structure is analyzed with a number of model equations in order to find the best fitting model using multiple nonlinear regression method. Based on the results of nonlinear regression analysis, the best model for each sub-structure is determined. Among them, the models giving highest correlation for three main structures are selected. Although the selected all three models predicts HHV rather accurately, the model involving four independent variables provides the most accurate estimation of HHV. Additionally, when the chosen model with four independent variables and a literature model are tested with extra proximate analysis data, it is seen that that the developed model in this study can give more accurate prediction of HHV of coals. It can be concluded that the developed model is effective tool for HHV estimation of low rank coals. (author)

  6. Efficient Quantile Estimation for Functional-Coefficient Partially Linear Regression Models

    Institute of Scientific and Technical Information of China (English)

    Zhangong ZHOU; Rong JIANG; Weimin QIAN

    2011-01-01

    The quantile estimation methods are proposed for functional-coefficient partially linear regression (FCPLR) model by combining nonparametric and functional-coefficient regression (FCR) model.The local linear scheme and the integrated method are used to obtain local quantile estimators of all unknown functions in the FCPLR model.These resulting estimators are asymptotically normal,but each of them has big variance.To reduce variances of these quantile estimators,the one-step backfitting technique is used to obtain the efficient quantile estimators of all unknown functions,and their asymptotic normalities are derived.Two simulated examples are carried out to illustrate the proposed estimation methodology.

  7. Deep ensemble learning of sparse regression models for brain disease diagnosis.

    Science.gov (United States)

    Suk, Heung-Il; Lee, Seong-Whan; Shen, Dinggang

    2017-04-01

    Recent studies on brain imaging analysis witnessed the core roles of machine learning techniques in computer-assisted intervention for brain disease diagnosis. Of various machine-learning techniques, sparse regression models have proved their effectiveness in handling high-dimensional data but with a small number of training samples, especially in medical problems. In the meantime, deep learning methods have been making great successes by outperforming the state-of-the-art performances in various applications. In this paper, we propose a novel framework that combines the two conceptually different methods of sparse regression and deep learning for Alzheimer's disease/mild cognitive impairment diagnosis and prognosis. Specifically, we first train multiple sparse regression models, each of which is trained with different values of a regularization control parameter. Thus, our multiple sparse regression models potentially select different feature subsets from the original feature set; thereby they have different powers to predict the response values, i.e., clinical label and clinical scores in our work. By regarding the response values from our sparse regression models as target-level representations, we then build a deep convolutional neural network for clinical decision making, which thus we call 'Deep Ensemble Sparse Regression Network.' To our best knowledge, this is the first work that combines sparse regression models with deep neural network. In our experiments with the ADNI cohort, we validated the effectiveness of the proposed method by achieving the highest diagnostic accuracies in three classification tasks. We also rigorously analyzed our results and compared with the previous studies on the ADNI cohort in the literature.

  8. Restricted spatial regression in practice: Geostatistical models, confounding, and robustness under model misspecification

    Science.gov (United States)

    Hanks, Ephraim M.; Schliep, Erin M.; Hooten, Mevin B.; Hoeting, Jennifer A.

    2015-01-01

    In spatial generalized linear mixed models (SGLMMs), covariates that are spatially smooth are often collinear with spatially smooth random effects. This phenomenon is known as spatial confounding and has been studied primarily in the case where the spatial support of the process being studied is discrete (e.g., areal spatial data). In this case, the most common approach suggested is restricted spatial regression (RSR) in which the spatial random effects are constrained to be orthogonal to the fixed effects. We consider spatial confounding and RSR in the geostatistical (continuous spatial support) setting. We show that RSR provides computational benefits relative to the confounded SGLMM, but that Bayesian credible intervals under RSR can be inappropriately narrow under model misspecification. We propose a posterior predictive approach to alleviating this potential problem and discuss the appropriateness of RSR in a variety of situations. We illustrate RSR and SGLMM approaches through simulation studies and an analysis of malaria frequencies in The Gambia, Africa.

  9. Hybrid adaptive control of a dragonfly model

    Science.gov (United States)

    Couceiro, Micael S.; Ferreira, Nuno M. F.; Machado, J. A. Tenreiro

    2012-02-01

    Dragonflies show unique and superior flight performances than most of other insect species and birds. They are equipped with two pairs of independently controlled wings granting an unmatchable flying performance and robustness. In this paper, it is presented an adaptive scheme controlling a nonlinear model inspired in a dragonfly-like robot. It is proposed a hybrid adaptive ( HA) law for adjusting the parameters analyzing the tracking error. At the current stage of the project it is considered essential the development of computational simulation models based in the dynamics to test whether strategies or algorithms of control, parts of the system (such as different wing configurations, tail) as well as the complete system. The performance analysis proves the superiority of the HA law over the direct adaptive ( DA) method in terms of faster and improved tracking and parameter convergence.

  10. Testing and Modeling Fuel Regression Rate in a Miniature Hybrid Burner

    Directory of Open Access Journals (Sweden)

    Luciano Fanton

    2012-01-01

    Full Text Available Ballistic characterization of an extended group of innovative HTPB-based solid fuel formulations for hybrid rocket propulsion was performed in a lab-scale burner. An optical time-resolved technique was used to assess the quasisteady regression history of single perforation, cylindrical samples. The effects of metalized additives and radiant heat transfer on the regression rate of such formulations were assessed. Under the investigated operating conditions and based on phenomenological models from the literature, analyses of the collected experimental data show an appreciable influence of the radiant heat flux from burnt gases and soot for both unloaded and loaded fuel formulations. Pure HTPB regression rate data are satisfactorily reproduced, while the impressive initial regression rates of metalized formulations require further assessment.

  11. A Robbins-Monro procedure for estimation in semiparametric regression models

    CERN Document Server

    Bercu, Bernard

    2011-01-01

    This paper is devoted to the parametric estimation of a shift together with the nonparametric estimation of a regression function in a semiparametric regression model. We implement a Robbins-Monro procedure very efficient and easy to handle. On the one hand, we propose a stochastic algorithm similar to that of Robbins-Monro in order to estimate the shift parameter. A preliminary evaluation of the regression function is not necessary for estimating the shift parameter. On the other hand, we make use of a recursive Nadaraya-Watson estimator for the estimation of the regression function. This kernel estimator takes in account the previous estimation of the shift parameter. We establish the almost sure convergence for both Robbins-Monro and Nadaraya-Watson estimators. The asymptotic normality of our estimates is also provided.

  12. A Percentile Regression Model for the Number of Errors in Group Conversation Tests.

    Science.gov (United States)

    Liski, Erkki P.; Puntanen, Simo

    A statistical model is presented for analyzing the results of group conversation tests in English, developed in a Finnish university study from 1977 to 1981. The model is illustrated with the findings from the study. In this study, estimates of percentile curves for the number of errors are of greater interest than the mean regression line. It was…

  13. Sensitivity analysis and optimization of system dynamics models : Regression analysis and statistical design of experiments

    NARCIS (Netherlands)

    Kleijnen, J.P.C.

    1995-01-01

    This tutorial discusses what-if analysis and optimization of System Dynamics models. These problems are solved, using the statistical techniques of regression analysis and design of experiments (DOE). These issues are illustrated by applying the statistical techniques to a System Dynamics model for

  14. Sample Size Determination for Regression Models Using Monte Carlo Methods in R

    Science.gov (United States)

    Beaujean, A. Alexander

    2014-01-01

    A common question asked by researchers using regression models is, What sample size is needed for my study? While there are formulae to estimate sample sizes, their assumptions are often not met in the collected data. A more realistic approach to sample size determination requires more information such as the model of interest, strength of the…

  15. Random regression models in the evaluation of the growth curve of Simbrasil beef cattle

    NARCIS (Netherlands)

    Mota, M.; Marques, F.A.; Lopes, P.S.; Hidalgo, A.M.

    2013-01-01

    Random regression models were used to estimate the types and orders of random effects of (co)variance functions in the description of the growth trajectory of the Simbrasil cattle breed. Records for 7049 animals totaling 18,677 individual weighings were submitted to 15 models from the third to the f

  16. Fitting multistate transition models with autoregressive logistic regression : Supervised exercise in intermittent claudication

    NARCIS (Netherlands)

    de Vries, S O; Fidler, Vaclav; Kuipers, Wietze D; Hunink, Maria G M

    1998-01-01

    The purpose of this study was to develop a model that predicts the outcome of supervised exercise for intermittent claudication. The authors present an example of the use of autoregressive logistic regression for modeling observed longitudinal data. Data were collected from 329 participants in a six

  17. DESIGNING A FORECAST MODEL FOR ECONOMIC GROWTH OF JAPAN USING COMPETITIVE (HYBRID ANN VS MULTIPLE REGRESSION MODELS

    Directory of Open Access Journals (Sweden)

    Ahmet DEMIR

    2015-07-01

    Full Text Available Artificial neural network models have been already used on many different fields successfully. However, many researches show that ANN models provide better optimum results than other competitive models in most of the researches. But does it provide optimum solutions in case ANN is proposed as hybrid model? The answer of this question is given in this research by using these models on modelling a forecast for GDP growth of Japan. Multiple regression models utilized as competitive models versus hybrid ANN (ANN + multiple regression models. Results have shown that hybrid model gives better responds than multiple regression models. However, variables, which were significantly affecting GDP growth, were determined and some of the variables, which were assumed to be affecting GDP growth of Japan, were eliminated statistically.

  18. An explanatory model of underwater adaptation

    Directory of Open Access Journals (Sweden)

    Joaquín Colodro

    Full Text Available The underwater environment is an extreme environment that requires a process of human adaptation with specific psychophysiological demands to ensure survival and productive activity. From the standpoint of existing models of intelligence, personality and performance, in this explanatory study we have analyzed the contribution of individual differences in explaining the adaptation of military personnel in a stressful environment. Structural equation analysis was employed to verify a model representing the direct effects of psychological variables on individual adaptation to an adverse environment, and we have been able to confirm, during basic military diving courses, the structural relationships among these variables and their ability to predict a third of the variance of a criterion that has been studied very little to date. In this way, we have confirmed in a sample of professionals (N = 575 the direct relationship of emotional adjustment, conscientiousness and general mental ability with underwater adaptation, as well as the inverse relationship of emotional reactivity. These constructs are the psychological basis for working under water, contributing to an improved adaptation to this environment and promoting risk prevention and safety in diving activities.

  19. Higher precision estimates of regional polar warming by ensemble regression of climate model projections

    Energy Technology Data Exchange (ETDEWEB)

    Bracegirdle, Thomas J. [British Antarctic Survey, Cambridge (United Kingdom); Stephenson, David B. [University of Exeter, Mathematics Research Institute, Exeter (United Kingdom); NCAS-Climate, Reading (United Kingdom)

    2012-12-15

    This study presents projections of twenty-first century wintertime surface temperature changes over the high-latitude regions based on the third Coupled Model Inter-comparison Project (CMIP3) multi-model ensemble. The state-dependence of the climate change response on the present day mean state is captured using a simple yet robust ensemble linear regression model. The ensemble regression approach gives different and more precise estimated mean responses compared to the ensemble mean approach. Over the Arctic in January, ensemble regression gives less warming than the ensemble mean along the boundary between sea ice and open ocean (sea ice edge). Most notably, the results show 3 C less warming over the Barents Sea ({proportional_to} 7 C compared to {proportional_to} 10 C). In addition, the ensemble regression method gives projections that are 30 % more precise over the Sea of Okhostk, Bering Sea and Labrador Sea. For the Antarctic in winter (July) the ensemble regression method gives 2 C more warming over the Southern Ocean close to the Greenwich Meridian ({proportional_to} 7 C compared to {proportional_to} 5 C). Projection uncertainty was almost half that of the ensemble mean uncertainty over the Southern Ocean between 30 W to 90 E and 30 % less over the northern Antarctic Peninsula. The ensemble regression model avoids the need for explicit ad hoc weighting of models and exploits the whole ensemble to objectively identify overly influential outlier models. Bootstrap resampling shows that maximum precision over the Southern Ocean can be obtained with ensembles having as few as only six climate models. (orig.)

  20. Predicting recovery of cognitive function soon after stroke: differential modeling of logarithmic and linear regression.

    Science.gov (United States)

    Suzuki, Makoto; Sugimura, Yuko; Yamada, Sumio; Omori, Yoshitsugu; Miyamoto, Masaaki; Yamamoto, Jun-ichi

    2013-01-01

    Cognitive disorders in the acute stage of stroke are common and are important independent predictors of adverse outcome in the long term. Despite the impact of cognitive disorders on both patients and their families, it is still difficult to predict the extent or duration of cognitive impairments. The objective of the present study was, therefore, to provide data on predicting the recovery of cognitive function soon after stroke by differential modeling with logarithmic and linear regression. This study included two rounds of data collection comprising 57 stroke patients enrolled in the first round for the purpose of identifying the time course of cognitive recovery in the early-phase group data, and 43 stroke patients in the second round for the purpose of ensuring that the correlation of the early-phase group data applied to the prediction of each individual's degree of cognitive recovery. In the first round, Mini-Mental State Examination (MMSE) scores were assessed 3 times during hospitalization, and the scores were regressed on the logarithm and linear of time. In the second round, calculations of MMSE scores were made for the first two scoring times after admission to tailor the structures of logarithmic and linear regression formulae to fit an individual's degree of functional recovery. The time course of early-phase recovery for cognitive functions resembled both logarithmic and linear functions. However, MMSE scores sampled at two baseline points based on logarithmic regression modeling could estimate prediction of cognitive recovery more accurately than could linear regression modeling (logarithmic modeling, R(2) = 0.676, Plinear regression modeling, R(2) = 0.598, P<0.0001). Logarithmic modeling based on MMSE scores could accurately predict the recovery of cognitive function soon after the occurrence of stroke. This logarithmic modeling with mathematical procedures is simple enough to be adopted in daily clinical practice.

  1. Predicting recovery of cognitive function soon after stroke: differential modeling of logarithmic and linear regression.

    Directory of Open Access Journals (Sweden)

    Makoto Suzuki

    Full Text Available Cognitive disorders in the acute stage of stroke are common and are important independent predictors of adverse outcome in the long term. Despite the impact of cognitive disorders on both patients and their families, it is still difficult to predict the extent or duration of cognitive impairments. The objective of the present study was, therefore, to provide data on predicting the recovery of cognitive function soon after stroke by differential modeling with logarithmic and linear regression. This study included two rounds of data collection comprising 57 stroke patients enrolled in the first round for the purpose of identifying the time course of cognitive recovery in the early-phase group data, and 43 stroke patients in the second round for the purpose of ensuring that the correlation of the early-phase group data applied to the prediction of each individual's degree of cognitive recovery. In the first round, Mini-Mental State Examination (MMSE scores were assessed 3 times during hospitalization, and the scores were regressed on the logarithm and linear of time. In the second round, calculations of MMSE scores were made for the first two scoring times after admission to tailor the structures of logarithmic and linear regression formulae to fit an individual's degree of functional recovery. The time course of early-phase recovery for cognitive functions resembled both logarithmic and linear functions. However, MMSE scores sampled at two baseline points based on logarithmic regression modeling could estimate prediction of cognitive recovery more accurately than could linear regression modeling (logarithmic modeling, R(2 = 0.676, P<0.0001; linear regression modeling, R(2 = 0.598, P<0.0001. Logarithmic modeling based on MMSE scores could accurately predict the recovery of cognitive function soon after the occurrence of stroke. This logarithmic modeling with mathematical procedures is simple enough to be adopted in daily clinical practice.

  2. Prediction of soil temperature using regression and artificial neural network models

    Science.gov (United States)

    Bilgili, Mehmet

    2010-12-01

    In this study, monthly soil temperature was modeled by linear regression (LR), nonlinear regression (NLR) and artificial neural network (ANN) methods. The soil temperature and other meteorological parameters, which have been taken from Adana meteorological station, were observed between the years of 2000 and 2007 by the Turkish State Meteorological Service (TSMS). The soil temperatures were measured at depths of 5, 10, 20, 50 and 100 cm below the ground level. A three-layer feed-forward ANN structure was constructed and a back-propagation algorithm was used for the training of ANNs. In order to get a successful simulation, the correlation coefficients between all of the meteorological variables (soil temperature, atmospheric temperature, atmospheric pressure, relative humidity, wind speed, rainfall, global solar radiation and sunshine duration) were calculated taking them two by two. First, all independent variables were split into two time periods such as cold and warm seasons. They were added to the enter regression model. Then, the method of stepwise multiple regression was applied for the selection of the "best" regression equation (model). Thus, the best independent variables were selected for the LR and NLR models and they were also used in the input layer of the ANN method. Results of these methods were compared to each other. Finally, the ANN method was found to provide better performance than the LR and NLR methods.

  3. SPSS macros to compare any two fitted values from a regression model.

    Science.gov (United States)

    Weaver, Bruce; Dubois, Sacha

    2012-12-01

    In regression models with first-order terms only, the coefficient for a given variable is typically interpreted as the change in the fitted value of Y for a one-unit increase in that variable, with all other variables held constant. Therefore, each regression coefficient represents the difference between two fitted values of Y. But the coefficients represent only a fraction of the possible fitted value comparisons that might be of interest to researchers. For many fitted value comparisons that are not captured by any of the regression coefficients, common statistical software packages do not provide the standard errors needed to compute confidence intervals or carry out statistical tests-particularly in more complex models that include interactions, polynomial terms, or regression splines. We describe two SPSS macros that implement a matrix algebra method for comparing any two fitted values from a regression model. The !OLScomp and !MLEcomp macros are for use with models fitted via ordinary least squares and maximum likelihood estimation, respectively. The output from the macros includes the standard error of the difference between the two fitted values, a 95% confidence interval for the difference, and a corresponding statistical test with its p-value.

  4. Longitudinal beta regression models for analyzing health-related quality of life scores over time

    Directory of Open Access Journals (Sweden)

    Hunger Matthias

    2012-09-01

    Full Text Available Abstract Background Health-related quality of life (HRQL has become an increasingly important outcome parameter in clinical trials and epidemiological research. HRQL scores are typically bounded at both ends of the scale and often highly skewed. Several regression techniques have been proposed to model such data in cross-sectional studies, however, methods applicable in longitudinal research are less well researched. This study examined the use of beta regression models for analyzing longitudinal HRQL data using two empirical examples with distributional features typically encountered in practice. Methods We used SF-6D utility data from a German older age cohort study and stroke-specific HRQL data from a randomized controlled trial. We described the conceptual differences between mixed and marginal beta regression models and compared both models to the commonly used linear mixed model in terms of overall fit and predictive accuracy. Results At any measurement time, the beta distribution fitted the SF-6D utility data and stroke-specific HRQL data better than the normal distribution. The mixed beta model showed better likelihood-based fit statistics than the linear mixed model and respected the boundedness of the outcome variable. However, it tended to underestimate the true mean at the upper part of the distribution. Adjusted group means from marginal beta model and linear mixed model were nearly identical but differences could be observed with respect to standard errors. Conclusions Understanding the conceptual differences between mixed and marginal beta regression models is important for their proper use in the analysis of longitudinal HRQL data. Beta regression fits the typical distribution of HRQL data better than linear mixed models, however, if focus is on estimating group mean scores rather than making individual predictions, the two methods might not differ substantially.

  5. Semantic models for adaptive interactive systems

    CERN Document Server

    Hussein, Tim; Lukosch, Stephan; Ziegler, Jürgen; Calvary, Gaëlle

    2013-01-01

    Providing insights into methodologies for designing adaptive systems based on semantic data, and introducing semantic models that can be used for building interactive systems, this book showcases many of the applications made possible by the use of semantic models.Ontologies may enhance the functional coverage of an interactive system as well as its visualization and interaction capabilities in various ways. Semantic models can also contribute to bridging gaps; for example, between user models, context-aware interfaces, and model-driven UI generation. There is considerable potential for using

  6. Error estimation and adaptive chemical transport modeling

    Directory of Open Access Journals (Sweden)

    Malte Braack

    2014-09-01

    Full Text Available We present a numerical method to use several chemical transport models of increasing accuracy and complexity in an adaptive way. In largest parts of the domain, a simplified chemical model may be used, whereas in certain regions a more complex model is needed for accuracy reasons. A mathematically derived error estimator measures the modeling error and provides information where to use more accurate models. The error is measured in terms of output functionals. Therefore, one has to consider adjoint problems which carry sensitivity information. This concept is demonstrated by means of ozone formation and pollution emission.

  7. Modelling and (adaptive) control of greenhouse climates

    NARCIS (Netherlands)

    Udink ten Cate, A.J.

    1983-01-01

    The material presented in this thesis can be grouped around four themes, system concepts, modeling, control and adaptive control. In this summary these themes will be treated separately.System conceptsIn Chapters 1 and 2 an overview of the problem formulation is presented. It is suggested that there

  8. Time-varying parameter auto-regressive models for autocovariance nonstationary time series

    Institute of Scientific and Technical Information of China (English)

    FEI WanChun; BAI Lun

    2009-01-01

    In this paper,autocovariance nonstationary time series is clearly defined on a family of time series.We propose three types of TVPAR (time-varying parameter auto-regressive) models:the full order TVPAR model,the time-unvarying order TVPAR model and the time-varying order TVPAR model for autocovariance nonstationary time series.Related minimum AIC (Akaike information criterion) estimations are carried out.

  9. Time-varying parameter auto-regressive models for autocovariance nonstationary time series

    Institute of Scientific and Technical Information of China (English)

    2009-01-01

    In this paper, autocovariance nonstationary time series is clearly defined on a family of time series. We propose three types of TVPAR (time-varying parameter auto-regressive) models: the full order TVPAR model, the time-unvarying order TVPAR model and the time-varying order TV-PAR model for autocovariance nonstationary time series. Related minimum AIC (Akaike information criterion) estimations are carried out.

  10. A general framework for the use of logistic regression models in meta-analysis.

    Science.gov (United States)

    Simmonds, Mark C; Higgins, Julian Pt

    2016-12-01

    Where individual participant data are available for every randomised trial in a meta-analysis of dichotomous event outcomes, "one-stage" random-effects logistic regression models have been proposed as a way to analyse these data. Such models can also be used even when individual participant data are not available and we have only summary contingency table data. One benefit of this one-stage regression model over conventional meta-analysis methods is that it maximises the correct binomial likelihood for the data and so does not require the common assumption that effect estimates are normally distributed. A second benefit of using this model is that it may be applied, with only minor modification, in a range of meta-analytic scenarios, including meta-regression, network meta-analyses and meta-analyses of diagnostic test accuracy. This single model can potentially replace the variety of often complex methods used in these areas. This paper considers, with a range of meta-analysis examples, how random-effects logistic regression models may be used in a number of different types of meta-analyses. This one-stage approach is compared with widely used meta-analysis methods including Bayesian network meta-analysis and the bivariate and hierarchical summary receiver operating characteristic (ROC) models for meta-analyses of diagnostic test accuracy.

  11. Construction of risk prediction model of type 2 diabetes mellitus based on logistic regression

    Directory of Open Access Journals (Sweden)

    Li Jian

    2017-01-01

    Full Text Available Objective: to construct multi factor prediction model for the individual risk of T2DM, and to explore new ideas for early warning, prevention and personalized health services for T2DM. Methods: using logistic regression techniques to screen the risk factors for T2DM and construct the risk prediction model of T2DM. Results: Male’s risk prediction model logistic regression equation: logit(P=BMI × 0.735+ vegetables × (−0.671 + age × 0.838+ diastolic pressure × 0.296+ physical activity× (−2.287 + sleep ×(−0.009 +smoking ×0.214; Female’s risk prediction model logistic regression equation: logit(P=BMI ×1.979+ vegetables× (−0.292 + age × 1.355+ diastolic pressure× 0.522+ physical activity × (−2.287 + sleep × (−0.010.The area under the ROC curve of male was 0.83, the sensitivity was 0.72, the specificity was 0.86, the area under the ROC curve of female was 0.84, the sensitivity was 0.75, the specificity was 0.90. Conclusion: This study model data is from a compared study of nested case, the risk prediction model has been established by using the more mature logistic regression techniques, and the model is higher predictive sensitivity, specificity and stability.

  12. Risk stratification for prognosis in intracerebral hemorrhage: A decision tree model and logistic regression

    Directory of Open Access Journals (Sweden)

    Gang WU

    2016-01-01

    Full Text Available Objective  To analyze the risk factors for prognosis in intracerebral hemorrhage using decision tree (classification and regression tree, CART model and logistic regression model. Methods  CART model and logistic regression model were established according to the risk factors for prognosis of patients with cerebral hemorrhage. The differences in the results were compared between the two methods. Results  Logistic regression analyses showed that hematoma volume (OR-value 0.953, initial Glasgow Coma Scale (GCS score (OR-value 1.210, pulmonary infection (OR-value 0.295, and basal ganglia hemorrhage (OR-value 0.336 were the risk factors for the prognosis of cerebral hemorrhage. The results of CART analysis showed that volume of hematoma and initial GCS score were the main factors affecting the prognosis of cerebral hemorrhage. The effects of two models on the prognosis of cerebral hemorrhage were similar (Z-value 0.402, P=0.688. Conclusions  CART model has a similar value to that of logistic model in judging the prognosis of cerebral hemorrhage, and it is characterized by using transactional analysis between the risk factors, and it is more intuitive. DOI: 10.11855/j.issn.0577-7402.2015.12.13

  13. Linear regression models of floor surface parameters on friction between Neolite and quarry tiles.

    Science.gov (United States)

    Chang, Wen-Ruey; Matz, Simon; Grönqvist, Raoul; Hirvonen, Mikko

    2010-01-01

    For slips and falls, friction is widely used as an indicator of surface slipperiness. Surface parameters, including surface roughness and waviness, were shown to influence friction by correlating individual surface parameters with the measured friction. A collective input from multiple surface parameters as a predictor of friction, however, could provide a broader perspective on the contributions from all the surface parameters evaluated. The objective of this study was to develop regression models between the surface parameters and measured friction. The dynamic friction was measured using three different mixtures of glycerol and water as contaminants. Various surface roughness and waviness parameters were measured using three different cut-off lengths. The regression models indicate that the selected surface parameters can predict the measured friction coefficient reliably in most of the glycerol concentrations and cut-off lengths evaluated. The results of the regression models were, in general, consistent with those obtained from the correlation between individual surface parameters and the measured friction in eight out of nine conditions evaluated in this experiment. A hierarchical regression model was further developed to evaluate the cumulative contributions of the surface parameters in the final iteration by adding these parameters to the regression model one at a time from the easiest to measure to the most difficult to measure and evaluating their impacts on the adjusted R(2) values. For practical purposes, the surface parameter R(a) alone would account for the majority of the measured friction even if it did not reach a statistically significant level in some of the regression models.

  14. Adaptive Modeling for Security Infrastructure Fault Response

    Institute of Scientific and Technical Information of China (English)

    CUI Zhong-jie; YAO Shu-ping; HU Chang-zhen

    2008-01-01

    Based on the analysis of inherent limitations in existing security response decision-making systems, a dynamic adaptive model of fault response is presented. Several security fault levels were founded, which comprise the basic level, equipment level and mechanism level. Fault damage cost is calculated using the analytic hierarchy process. Meanwhile, the model evaluates the impact of different responses upon fault repair and normal operation. Response operation cost and response negative cost are introduced through quantitative calculation. This model adopts a comprehensive response decision of security fault in three principles-the maximum and minimum principle, timeliness principle, acquiescence principle, which assure optimal response countermeasure is selected for different situations. Experimental results show that the proposed model has good self-adaptation ability, timeliness and cost-sensitiveness.

  15. Blind identification of threshold auto-regressive model for machine fault diagnosis

    Institute of Scientific and Technical Information of China (English)

    LI Zhinong; HE Yongyong; CHU Fulei; WU Zhaotong

    2007-01-01

    A blind identification method was developed for the threshold auto-regressive (TAR) model. The method had good identification accuracy and rapid convergence, especially for higher order systems. The proposed method was then combined with the hidden Markov model (HMM) to determine the auto-regressive (AR) coefficients for each interval used for feature extraction, with the HMM as a classifier. The fault diagnoses during the speed-up and speed- down processes for rotating machinery have been success- fully completed. The result of the experiment shows that the proposed method is practical and effective.

  16. Methods and applications of linear models regression and the analysis of variance

    CERN Document Server

    Hocking, Ronald R

    2013-01-01

    Praise for the Second Edition"An essential desktop reference book . . . it should definitely be on your bookshelf." -Technometrics A thoroughly updated book, Methods and Applications of Linear Models: Regression and the Analysis of Variance, Third Edition features innovative approaches to understanding and working with models and theory of linear regression. The Third Edition provides readers with the necessary theoretical concepts, which are presented using intuitive ideas rather than complicated proofs, to describe the inference that is appropriate for the methods being discussed. The book

  17. Local polynomial estimation of heteroscedasticity in a multivariate linear regression model and its applications in economics.

    Science.gov (United States)

    Su, Liyun; Zhao, Yanyong; Yan, Tianshun; Li, Fenglan

    2012-01-01

    Multivariate local polynomial fitting is applied to the multivariate linear heteroscedastic regression model. Firstly, the local polynomial fitting is applied to estimate heteroscedastic function, then the coefficients of regression model are obtained by using generalized least squares method. One noteworthy feature of our approach is that we avoid the testing for heteroscedasticity by improving the traditional two-stage method. Due to non-parametric technique of local polynomial estimation, it is unnecessary to know the form of heteroscedastic function. Therefore, we can improve the estimation precision, when the heteroscedastic function is unknown. Furthermore, we verify that the regression coefficients is asymptotic normal based on numerical simulations and normal Q-Q plots of residuals. Finally, the simulation results and the local polynomial estimation of real data indicate that our approach is surely effective in finite-sample situations.

  18. Stahel-Donoho kernel estimation for fixed design nonparametric regression models

    Institute of Scientific and Technical Information of China (English)

    LIN; Lu

    2006-01-01

    This paper reports a robust kernel estimation for fixed design nonparametric regression models.A Stahel-Donoho kernel estimation is introduced,in which the weight functions depend on both the depths of data and the distances between the design points and the estimation points.Based on a local approximation,a computational technique is given to approximate to the incomputable depths of the errors.As a result the new estimator is computationally efficient.The proposed estimator attains a high breakdown point and has perfect asymptotic behaviors such as the asymptotic normality and convergence in the mean squared error.Unlike the depth-weighted estimator for parametric regression models,this depth-weighted nonparametric estimator has a simple variance structure and then we can compare its efficiency with the original one.Some simulations show that the new method can smooth the regression estimation and achieve some desirable balances between robustness and efficiency.

  19. Bayesian Bandwidth Selection for a Nonparametric Regression Model with Mixed Types of Regressors

    Directory of Open Access Journals (Sweden)

    Xibin Zhang

    2016-04-01

    Full Text Available This paper develops a sampling algorithm for bandwidth estimation in a nonparametric regression model with continuous and discrete regressors under an unknown error density. The error density is approximated by the kernel density estimator of the unobserved errors, while the regression function is estimated using the Nadaraya-Watson estimator admitting continuous and discrete regressors. We derive an approximate likelihood and posterior for bandwidth parameters, followed by a sampling algorithm. Simulation results show that the proposed approach typically leads to better accuracy of the resulting estimates than cross-validation, particularly for smaller sample sizes. This bandwidth estimation approach is applied to nonparametric regression model of the Australian All Ordinaries returns and the kernel density estimation of gross domestic product (GDP growth rates among the organisation for economic co-operation and development (OECD and non-OECD countries.

  20. Accounting for spatial effects in land use regression for urban air pollution modeling.

    Science.gov (United States)

    Bertazzon, Stefania; Johnson, Markey; Eccles, Kristin; Kaplan, Gilaad G

    2015-01-01

    In order to accurately assess air pollution risks, health studies require spatially resolved pollution concentrations. Land-use regression (LUR) models estimate ambient concentrations at a fine spatial scale. However, spatial effects such as spatial non-stationarity and spatial autocorrelation can reduce the accuracy of LUR estimates by increasing regression errors and uncertainty; and statistical methods for resolving these effects--e.g., spatially autoregressive (SAR) and geographically weighted regression (GWR) models--may be difficult to apply simultaneously. We used an alternate approach to address spatial non-stationarity and spatial autocorrelation in LUR models for nitrogen dioxide. Traditional models were re-specified to include a variable capturing wind speed and direction, and re-fit as GWR models. Mean R(2) values for the resulting GWR-wind models (summer: 0.86, winter: 0.73) showed a 10-20% improvement over traditional LUR models. GWR-wind models effectively addressed both spatial effects and produced meaningful predictive models. These results suggest a useful method for improving spatially explicit models.

  1. Magnetic resonance imaging for assessment of parametrial tumour spread and regression patterns in adaptive cervix cancer radiotherapy

    Energy Technology Data Exchange (ETDEWEB)

    Schmid, Maximilian P.; Fidarova, Elena [Dept. of Radiotherapy, Comprehensive Cancer Center, Medical Univ. of Vienna, Vienna (Austria)], e-mail: maximilian.schmid@akhwien.at; Poetter, Richard [Dept. of Radiotherapy, Comprehensive Cancer Center, Medical Univ. of Vienna, Vienna (Austria); Christian Doppler Lab. for Medical Radiation Research for Radiation Oncology, Medical Univ. of Vienna (Austria)] [and others

    2013-10-15

    Purpose: To investigate the impact of magnetic resonance imaging (MRI)-morphologic differences in parametrial infiltration on tumour response during primary radio chemotherapy in cervical cancer. Material and methods: Eighty-five consecutive cervical cancer patients with FIGO stages IIB (n = 59) and IIIB (n = 26), treated by external beam radiotherapy ({+-}chemotherapy) and image-guided adaptive brachytherapy, underwent T2-weighted MRI at the time of diagnosis and at the time of brachytherapy. MRI patterns of parametrial tumour infiltration at the time of diagnosis were assessed with regard to predominant morphology and maximum extent of parametrial tumour infiltration and were stratified into five tumour groups (TG): 1) expansive with spiculae; 2) expansive with spiculae and infiltrating parts; 3) infiltrative into the inner third of the parametrial space (PM); 4) infiltrative into the middle third of the PM; and 5) infiltrative into the outer third of the PM. MRI at the time of brachytherapy was used for identifying presence (residual vs. no residual disease) and signal intensity (high vs. intermediate) of residual disease within the PM. Left and right PM of each patient were evaluated separately at both time points. The impact of the TG on tumour remission status within the PM was analysed using {chi}2-test and logistic regression analysis. Results: In total, 170 PM were analysed. The TG 1, 2, 3, 4, 5 were present in 12%, 11%, 35%, 25% and 12% of the cases, respectively. Five percent of the PM were tumour-free. Residual tumour in the PM was identified in 19%, 68%, 88%, 90% and 85% of the PM for the TG 1, 2, 3, 4, and 5, respectively. The TG 3 - 5 had significantly higher rates of residual tumour in the PM in comparison to TG 1 + 2 (88% vs. 43%, p < 0.01). Conclusion: MRI-morphologic features of PM infiltration appear to allow for prediction of tumour response during external beam radiotherapy and chemotherapy. A predominantly infiltrative tumour spread at the

  2. Adapting virtual camera behaviour through player modelling

    DEFF Research Database (Denmark)

    Burelli, Paolo; Yannakakis, Georgios N.

    2015-01-01

    a novel approach to virtual camera control, which builds upon camera control and player modelling to provide the user with an adaptive point-of-view. To achieve this goal, we propose a methodology to model the player’s preferences on virtual camera movements and we employ the resulting models to tailor......Research in virtual camera control has focused primarily on finding methods to allow designers to place cameras effectively and efficiently in dynamic and unpredictable environments, and to generate complex and dynamic plans for cinematography in virtual environments. In this article, we propose...... the viewpoint movements to the player type and her game-play style. Ultimately, the methodology is applied to a 3D platform game and is evaluated through a controlled experiment; the results suggest that the resulting adaptive cinematographic experience is favoured by some player types and it can generate...

  3. Logistic回归模型及其应用%Logistic regression model and its application

    Institute of Scientific and Technical Information of China (English)

    常振海; 刘薇

    2012-01-01

    为了利用Logistic模型提高多分类定性因变量的预测准确率,在二分类Logistic回归模型的基础上,对实际统计数据建立三类别的Logistic模型.采用似然比检验法对自变量的显著性进行检验,剔除了不显著的变量;对每个类别的因变量都确定了1个线性回归函数,并进行了模型检验.分析结果表明,在处理因变量为定性变量的回归分析中,Logistic模型具有很好的预测准确度和实用推广性.%To improve the forecasting accuracy of the multinomial qualitative dependent variable by using logistic model,ternary logistic model is established for actual statistical data based on binary logistic regression model.The significance of independent variables is tested by using the likelihood ratio test method to remove the non-significant variable.A linear regression function is determined for each category dependent variable,and the models are tested.The analysis results show that logistic regression model has good predictive accuracy and practical promotional value in handling regression analysis of qualitative dependent variable.

  4. Analysis of the Influence of Quantile Regression Model on Mainland Tourists’ Service Satisfaction Performance

    Directory of Open Access Journals (Sweden)

    Wen-Cheng Wang

    2014-01-01

    Full Text Available It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models.

  5. Modeling Approach of Regression Orthogonal Experiment Design for Thermal Error Compensation of CNC Turning Center

    Institute of Scientific and Technical Information of China (English)

    2002-01-01

    The thermal induced errors can account for as much as 70% of the dimensional errors on a workpiece. Accurate modeling of errors is an essential part of error compensation. Base on analyzing the existing approaches of the thermal error modeling for machine tools, a new approach of regression orthogonal design is proposed, which combines the statistic theory with machine structures, surrounding condition, engineering judgements, and experience in modeling. A whole computation and analysis procedure is given. ...

  6. Improving regression-model-based streamwater constituent load estimates derived from serially correlated data

    Science.gov (United States)

    Aulenbach, Brent T.

    2013-10-01

    A regression-model based approach is a commonly used, efficient method for estimating streamwater constituent load when there is a relationship between streamwater constituent concentration and continuous variables such as streamwater discharge, season and time. A subsetting experiment using a 30-year dataset of daily suspended sediment observations from the Mississippi River at Thebes, Illinois, was performed to determine optimal sampling frequency, model calibration period length, and regression model methodology, as well as to determine the effect of serial correlation of model residuals on load estimate precision. Two regression-based methods were used to estimate streamwater loads, the Adjusted Maximum Likelihood Estimator (AMLE), and the composite method, a hybrid load estimation approach. While both methods accurately and precisely estimated loads at the model's calibration period time scale, precisions were progressively worse at shorter reporting periods, from annually to monthly. Serial correlation in model residuals resulted in observed AMLE precision to be significantly worse than the model calculated standard errors of prediction. The composite method effectively improved upon AMLE loads for shorter reporting periods, but required a sampling interval of at least 15-days or shorter, when the serial correlations in the observed load residuals were greater than 0.15. AMLE precision was better at shorter sampling intervals and when using the shortest model calibration periods, such that the regression models better fit the temporal changes in the concentration-discharge relationship. The models with the largest errors typically had poor high flow sampling coverage resulting in unrepresentative models. Increasing sampling frequency and/or targeted high flow sampling are more efficient approaches to ensure sufficient sampling and to avoid poorly performing models, than increasing calibration period length.

  7. Significance tests to determine the direction of effects in linear regression models.

    Science.gov (United States)

    Wiedermann, Wolfgang; Hagmann, Michael; von Eye, Alexander

    2015-02-01

    Previous studies have discussed asymmetric interpretations of the Pearson correlation coefficient and have shown that higher moments can be used to decide on the direction of dependence in the bivariate linear regression setting. The current study extends this approach by illustrating that the third moment of regression residuals may also be used to derive conclusions concerning the direction of effects. Assuming non-normally distributed variables, it is shown that the distribution of residuals of the correctly specified regression model (e.g., Y is regressed on X) is more symmetric than the distribution of residuals of the competing model (i.e., X is regressed on Y). Based on this result, 4 one-sample tests are discussed which can be used to decide which variable is more likely to be the response and which one is more likely to be the explanatory variable. A fifth significance test is proposed based on the differences of skewness estimates, which leads to a more direct test of a hypothesis that is compatible with direction of dependence. A Monte Carlo simulation study was performed to examine the behaviour of the procedures under various degrees of associations, sample sizes, and distributional properties of the underlying population. An empirical example is given which illustrates the application of the tests in practice.

  8. APPLICATION OF PARTIAL LEAST SQUARES REGRESSION FOR AUDIO-VISUAL SPEECH PROCESSING AND MODELING

    Directory of Open Access Journals (Sweden)

    A. L. Oleinik

    2015-09-01

    Full Text Available Subject of Research. The paper deals with the problem of lip region image reconstruction from speech signal by means of Partial Least Squares regression. Such problems arise in connection with development of audio-visual speech processing methods. Audio-visual speech consists of acoustic and visual components (called modalities. Applications of audio-visual speech processing methods include joint modeling of voice and lips’ movement dynamics, synchronization of audio and video streams, emotion recognition, liveness detection. Method. Partial Least Squares regression was applied to solve the posed problem. This method extracts components of initial data with high covariance. These components are used to build regression model. Advantage of this approach lies in the possibility of achieving two goals: identification of latent interrelations between initial data components (e.g. speech signal and lip region image and approximation of initial data component as a function of another one. Main Results. Experimental research on reconstruction of lip region images from speech signal was carried out on VidTIMIT audio-visual speech database. Results of the experiment showed that Partial Least Squares regression is capable of solving reconstruction problem. Practical Significance. Obtained findings give the possibility to assert that Partial Least Squares regression is successfully applicable for solution of vast variety of audio-visual speech processing problems: from synchronization of audio and video streams to liveness detection.

  9. Building factorial regression models to explain and predict nitrate concentrations in groundwater under agricultural land

    Science.gov (United States)

    Stigter, T. Y.; Ribeiro, L.; Dill, A. M. M. Carvalho

    2008-07-01

    SummaryFactorial regression models, based on correspondence analysis, are built to explain the high nitrate concentrations in groundwater beneath an agricultural area in the south of Portugal, exceeding 300 mg/l, as a function of chemical variables, electrical conductivity (EC), land use and hydrogeological setting. Two important advantages of the proposed methodology are that qualitative parameters can be involved in the regression analysis and that multicollinearity is avoided. Regression is performed on eigenvectors extracted from the data similarity matrix, the first of which clearly reveals the impact of agricultural practices and hydrogeological setting on the groundwater chemistry of the study area. Significant correlation exists between response variable NO3- and explanatory variables Ca 2+, Cl -, SO42-, depth to water, aquifer media and land use. Substituting Cl - by the EC results in the most accurate regression model for nitrate, when disregarding the four largest outliers (model A). When built solely on land use and hydrogeological setting, the regression model (model B) is less accurate but more interesting from a practical viewpoint, as it is based on easily obtainable data and can be used to predict nitrate concentrations in groundwater in other areas with similar conditions. This is particularly useful for conservative contaminants, where risk and vulnerability assessment methods, based on assumed rather than established correlations, generally produce erroneous results. Another purpose of the models can be to predict the future evolution of nitrate concentrations under influence of changes in land use or fertilization practices, which occur in compliance with policies such as the Nitrates Directive. Model B predicts a 40% decrease in nitrate concentrations in groundwater of the study area, when horticulture is replaced by other land use with much lower fertilization and irrigation rates.

  10. Post-L1-Penalized Estimators in High-Dimensional Linear Regression Models

    CERN Document Server

    Belloni, Alexandre

    2010-01-01

    In this paper we study the post-penalized estimator which applies ordinary, unpenalized linear regression to the model selected by the first step penalized estimators, typically the LASSO. We show that post-LASSO can perform as well or nearly as well as the LASSO in terms of the rate of convergence. We show that this performance occurs even if the LASSO-based model selection "fails", in the sense of missing some components of the "true" regression model. Furthermore, post-LASSO can perform strictly better than LASSO, in the sense of a strictly faster rate of convergence, if the LASSO-based model selection correctly includes all components of the "true" model as a subset and enough sparsity is obtained. Of course, in the extreme case, when LASSO perfectly selects the true model, the past-LASSO estimator becomes the oracle estimator. We show that the results hold in both parametric and non-parametric models; and by the "true" model we mean the best $s$-dimensional approximation to the true regression model, whe...

  11. Modeling Zero – Inflated Regression of Road Accidents at Johor Federal Road F001

    Directory of Open Access Journals (Sweden)

    Prasetijo Joewono

    2016-01-01

    Full Text Available This study focused on the Poisson regression with excess zero outcomes on the response variable. A generalized linear modelling technique such as Poisson regression model and Negative Binomial model was found to be insignificant in explaining and handle over dispersion which due to high amount of zeros thus Zero Inflated model was introduced to overcome the problem. The application work on the number of road accidents on F001 Jalan Jb – Air Hitam. Data on road accident were collected for five-year period from 2010 through 2014. The result from analysis show that ZINB model performed best, in terms of the comparative criteria based on the P value less than 0.05.

  12. Profile-driven regression for modeling and runtime optimization of mobile networks

    DEFF Research Database (Denmark)

    McClary, Dan; Syrotiuk, Violet; Kulahci, Murat

    2010-01-01

    of throughput in a mobile ad hoc network, a self-organizing collection of mobile wireless nodes without any fixed infrastructure. The intermediate models generated in profile-driven regression are used to fit an overall model of throughput, and are also used to optimize controllable factors at runtime. Unlike......Computer networks often display nonlinear behavior when examined over a wide range of operating conditions. There are few strategies available for modeling such behavior and optimizing such systems as they run. Profile-driven regression is developed and applied to modeling and runtime optimization...... others, the throughput model accounts for node speed. The resulting optimization is very effective; locally optimizing the network factors at runtime results in throughput as much as six times higher than that achieved with the factors at their default levels....

  13. Specific features of modelling rules of monetary policy on the basis of hybrid regression models with a neural component

    Directory of Open Access Journals (Sweden)

    Lukianenko Iryna H.

    2014-01-01

    Full Text Available The article considers possibilities and specific features of modelling economic phenomena with the help of the category of models that unite elements of econometric regressions and artificial neural networks. This category of models contains auto-regression neural networks (AR-NN, regressions of smooth transition (STR/STAR, multi-mode regressions of smooth transition (MRSTR/MRSTAR and smooth transition regressions with neural coefficients (NCSTR/NCSTAR. Availability of the neural network component allows models of this category achievement of a high empirical authenticity, including reproduction of complex non-linear interrelations. On the other hand, the regression mechanism expands possibilities of interpretation of the obtained results. An example of multi-mode monetary rule is used to show one of the cases of specification and interpretation of this model. In particular, the article models and interprets principles of management of the UAH exchange rate that come into force when economy passes from a relatively stable into a crisis state.

  14. Adaptive evolution on a continuous lattice model

    Science.gov (United States)

    Claudino, Elder S.; Lyra, M. L.; Gleria, Iram; Campos, Paulo R. A.

    2013-03-01

    In the current work, we investigate the evolutionary dynamics of a spatially structured population model defined on a continuous lattice. In the model, individuals disperse at a constant rate v and competition is local and delimited by the competition radius R. Due to dispersal, the neighborhood size (number of individuals competing for reproduction) fluctuates over time. Here we address how these new variables affect the adaptive process. While the fixation probabilities of beneficial mutations are roughly the same as in a panmitic population for small fitness effects s, a dependence on v and R becomes more evident for large s. These quantities also strongly influence fixation times, but their dependencies on s are well approximated by s-1/2, which means that the speed of the genetic wave front is proportional to s. Most important is the observation that the model exhibits a dual behavior displaying a power-law growth for the fixation rate and speed of adaptation with the beneficial mutation rate, as observed in other spatially structured population models, while simultaneously showing a nonsaturating behavior for the speed of adaptation with the population size N, as in homogeneous populations.

  15. Multiple regression models for the prediction of the maximum obtainable thermal efficiency of organic Rankine cycles

    DEFF Research Database (Denmark)

    Larsen, Ulrik; Pierobon, Leonardo; Wronski, Jorrit;

    2014-01-01

    to power. In this study we propose four linear regression models to predict the maximum obtainable thermal efficiency for simple and recuperated ORCs. A previously derived methodology is able to determine the maximum thermal efficiency among many combinations of fluids and processes, given the boundary...

  16. The Performance of the Full Information Maximum Likelihood Estimator in Multiple Regression Models with Missing Data.

    Science.gov (United States)

    Enders, Craig K.

    2001-01-01

    Examined the performance of a recently available full information maximum likelihood (FIML) estimator in a multiple regression model with missing data using Monte Carlo simulation and considering the effects of four independent variables. Results indicate that FIML estimation was superior to that of three ad hoc techniques, with less bias and less…

  17. FRICTION MODELING OF Al-Mg ALLOY SHEETS BASED ON MULTIPLE REGRESSION ANALYSIS AND NEURAL NETWORKS

    Directory of Open Access Journals (Sweden)

    Hirpa G. Lemu

    2017-03-01

    Full Text Available This article reports a proposed approach to a frictional resistance description in sheet metal forming processes that enables determination of the friction coefficient value under a wide range of friction conditions without performing time-consuming experiments. The motivation for this proposal is the fact that there exists a considerable amount of factors affect the friction coefficient value and as a result building analytical friction model for specified process conditions is practically impossible. In this proposed approach, a mathematical model of friction behaviour is created using multiple regression analysis and artificial neural networks. The regression analysis was performed using a subroutine in MATLAB programming code and STATISTICA Neural Networks was utilized to build an artificial neural networks model. The effect of different training strategies on the quality of neural networks was studied. As input variables for regression model and training of radial basis function networks, generalized regression neural networks and multilayer networks the results of strip drawing friction test were utilized. Four kinds of Al-Mg alloy sheets were used as a test material.

  18. Sieve M-estimation for semiparametric varying-coefficient partially linear regression model

    Institute of Scientific and Technical Information of China (English)

    2010-01-01

    This article considers a semiparametric varying-coefficient partially linear regression model.The semiparametric varying-coefficient partially linear regression model which is a generalization of the partially linear regression model and varying-coefficient regression model that allows one to explore the possibly nonlinear effect of a certain covariate on the response variable.A sieve M-estimation method is proposed and the asymptotic properties of the proposed estimators are discussed.Our main object is to estimate the nonparametric component and the unknown parameters simultaneously.It is easier to compute and the required computation burden is much less than the existing two-stage estimation method.Furthermore,the sieve M-estimation is robust in the presence of outliers if we choose appropriate ρ(·).Under some mild conditions,the estimators are shown to be strongly consistent;the convergence rate of the estimator for the unknown nonparametric component is obtained and the estimator for the unknown parameter is shown to be asymptotically normally distributed.Numerical experiments are carried out to investigate the performance of the proposed method.

  19. Testing Mediation Using Multiple Regression and Structural Equation Modeling Analyses in Secondary Data

    Science.gov (United States)

    Li, Spencer D.

    2011-01-01

    Mediation analysis in child and adolescent development research is possible using large secondary data sets. This article provides an overview of two statistical methods commonly used to test mediated effects in secondary analysis: multiple regression and structural equation modeling (SEM). Two empirical studies are presented to illustrate the…

  20. Proposal of a regressive model for the hourly diffuse solar radiation under all sky conditions

    Energy Technology Data Exchange (ETDEWEB)

    Ruiz-Arias, J.A.; Alsamamra, H.; Tovar-Pescador, J.; Pozo-Vazquez, D. [Department of Physics, Building A3-066, University of Jaen, 23071 Jaen (Spain)

    2010-05-15

    In this work, we propose a new regressive model for the estimation of the hourly diffuse solar irradiation under all sky conditions. This new model is based on the sigmoid function and uses the clearness index and the relative optical mass as predictors. The model performance was compared against other five regressive models using radiation data corresponding to 21 stations in the USA and Europe. In a first part, the 21 stations were grouped into seven subregions (corresponding to seven different climatic regions) and all the models were locally-fitted and evaluated using these seven datasets. Results showed that the new proposed model provides slightly better estimates. Particularly, this new model provides a relative root mean square error in the range 25-35% and a relative mean bias error in the range -15% to 15%, depending on the region. In a second part, the potential global character of the new model was evaluated. To this end, the model was fitted using the whole dataset. Results showed that the global fitting model provides overall better estimates that the locally-fitted models, with relative root mean square error values ranging 20-35% and a relative mean bias error ranging -5% to -12%. Additionally, the new proposed model showed some advantages compared to other evaluated models. Particularly, the sigmoid behaviour of this model is able to provide physically reliable estimates for extreme values of the clearness index even though using less parameter than other tested models. (author)

  1. Goodness-of-fit tests and model diagnostics for negative binomial regression of RNA sequencing data.

    Directory of Open Access Journals (Sweden)

    Gu Mi

    Full Text Available This work is about assessing model adequacy for negative binomial (NB regression, particularly (1 assessing the adequacy of the NB assumption, and (2 assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models.

  2. Goodness-of-fit tests and model diagnostics for negative binomial regression of RNA sequencing data.

    Science.gov (United States)

    Mi, Gu; Di, Yanming; Schafer, Daniel W

    2015-01-01

    This work is about assessing model adequacy for negative binomial (NB) regression, particularly (1) assessing the adequacy of the NB assumption, and (2) assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq) data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models.

  3. A brief introduction to regression designs and mixed-effects modelling by a recent convert

    DEFF Research Database (Denmark)

    Balling, Laura Winther

    2008-01-01

    This article discusses the advantages of multiple regression designs over the factorial designs traditionally used in many psycholinguistic experiments. It is shown that regression designs are typically more informative, statistically more powerful and better suited to the analysis of naturalistic...... tasks. The advantages of including both fixed and random effects are demonstrated with reference to linear mixed-effects models, and problems of collinearity, variable distribution and variable selection are discussed. The advantages of these techniques are exemplified in an analysis of a word...

  4. Deriving Genomic Breeding Values for Residual Feed Intake from Covariance Functions of Random Regression Models

    DEFF Research Database (Denmark)

    Strathe, Anders B; Mark, Thomas; Nielsen, Bjarne;

    Random regression models were used to estimate covariance functions between cumulated feed intake (CFI) and body weight (BW) in 8424 Danish Duroc pigs. Random regressions on second order Legendre polynomials of age were used to describe genetic and permanent environmental curves in BW and CFI. Ba......% of the genetic variance in feed intake, revealing that a minor component of feed intake was genetically independent of maintenance and growth. In conclusion, the approach derived herein led to a consistent definition of RFI, where genomic breeding values were easily obtained...

  5. Regressions by leaps and bounds and biased estimation techniques in yield modeling

    Science.gov (United States)

    Marquina, N. E. (Principal Investigator)

    1979-01-01

    The author has identified the following significant results. It was observed that OLS was not adequate as an estimation procedure when the independent or regressor variables were involved in multicollinearities. This was shown to cause the presence of small eigenvalues of the extended correlation matrix A'A. It was demonstrated that the biased estimation techniques and the all-possible subset regression could help in finding a suitable model for predicting yield. Latent root regression was an excellent tool that found how many predictive and nonpredictive multicollinearities there were.

  6. A Study of Wind Statistics Through Auto-Regressive and Moving-Average (ARMA) Modeling

    Institute of Scientific and Technical Information of China (English)

    尹彰; 周宗仁

    2001-01-01

    Statistical properties of winds near the Taichung Harbour are investigated. The 26 years′incomplete data of wind speeds, measured on an hourly basis, are used as reference. The possibility of imputation using simulated results of the Auto-Regressive (AR), Moving-Average (MA), and/or Auto-Regressive and Moving-Average (ARMA) models is studied. Predictions of the 25-year extreme wind speeds based upon the augmented data are compared with the original series. Based upon the results, predictions of the 50- and 100-year extreme wind speeds are then made.

  7. Wind adaptive modeling of transmission lines using minimum description length

    Science.gov (United States)

    Jaw, Yoonseok; Sohn, Gunho

    2017-03-01

    The transmission lines are moving objects, which positions are dynamically affected by wind-induced conductor motion while they are acquired by airborne laser scanners. This wind effect results in a noisy distribution of laser points, which often hinders accurate representation of transmission lines and thus, leads to various types of modeling errors. This paper presents a new method for complete 3D transmission line model reconstruction in the framework of inner and across span analysis. The highlighted fact is that the proposed method is capable of indirectly estimating noise scales, which corrupts the quality of laser observations affected by different wind speeds through a linear regression analysis. In the inner span analysis, individual transmission line models of each span are evaluated based on the Minimum Description Length theory and erroneous transmission line segments are subsequently replaced by precise transmission line models with wind-adaptive noise scale estimated. In the subsequent step of across span analysis, detecting the precise start and end positions of the transmission line models, known as the Point of Attachment, is the key issue for correcting partial modeling errors, as well as refining transmission line models. Finally, the geometric and topological completion of transmission line models are achieved over the entire network. A performance evaluation was conducted over 138.5 km long corridor data. In a modest wind condition, the results demonstrates that the proposed method can improve the accuracy of non-wind-adaptive initial models on an average of 48% success rate to produce complete transmission line models in the range between 85% and 99.5% with the positional accuracy of 9.55 cm transmission line models and 28 cm Point of Attachment in the root-mean-square error.

  8. Neural Network and Regression Soft Model Extended for PAX-300 Aircraft Engine

    Science.gov (United States)

    Patnaik, Surya N.; Hopkins, Dale A.

    2002-01-01

    In fiscal year 2001, the neural network and regression capabilities of NASA Glenn Research Center's COMETBOARDS design optimization testbed were extended to generate approximate models for the PAX-300 aircraft engine. The analytical model of the engine is defined through nine variables: the fan efficiency factor, the low pressure of the compressor, the high pressure of the compressor, the high pressure of the turbine, the low pressure of the turbine, the operating pressure, and three critical temperatures (T(sub 4), T(sub vane), and T(sub metal)). Numerical Propulsion System Simulation (NPSS) calculations of the specific fuel consumption (TSFC), as a function of the variables can become time consuming, and numerical instabilities can occur during these design calculations. "Soft" models can alleviate both deficiencies. These approximate models are generated from a set of high-fidelity input-output pairs obtained from the NPSS code and a design of the experiment strategy. A neural network and a regression model with 45 weight factors were trained for the input/output pairs. Then, the trained models were validated through a comparison with the original NPSS code. Comparisons of TSFC versus the operating pressure and of TSFC versus the three temperatures (T(sub 4), T(sub vane), and T(sub metal)) are depicted in the figures. The overall performance was satisfactory for both the regression and the neural network model. The regression model required fewer calculations than the neural network model, and it produced marginally superior results. Training the approximate methods is time consuming. Once trained, the approximate methods generated the solution with only a trivial computational effort, reducing the solution time from hours to less than a minute.

  9. Model-Free Adaptive Heating Process Control

    OpenAIRE

    Ivana LUKÁČOVÁ; Piteľ, Ján

    2009-01-01

    The aim of this paper is to analyze the dynamic behaviour of a Model-Free Adaptive (MFA) heating process control. The MFA controller is designed as three layer neural network with proportional element. The method of backward propagation of errors was used for neural network training. Visualization and training of the artificial neural network was executed by Netlab in Matlab environment. Simulation of the MFA heating process control with outdoor temperature compensation has proved better resu...

  10. Fatigue design of a cellular phone folder using regression model-based multi-objective optimization

    Science.gov (United States)

    Kim, Young Gyun; Lee, Jongsoo

    2016-08-01

    In a folding cellular phone, the folding device is repeatedly opened and closed by the user, which eventually results in fatigue damage, particularly to the front of the folder. Hence, it is important to improve the safety and endurance of the folder while also reducing its weight. This article presents an optimal design for the folder front that maximizes its fatigue endurance while minimizing its thickness. Design data for analysis and optimization were obtained experimentally using a test jig. Multi-objective optimization was carried out using a nonlinear regression model. Three regression methods were employed: back-propagation neural networks, logistic regression and support vector machines. The AdaBoost ensemble technique was also used to improve the approximation. Two-objective Pareto-optimal solutions were identified using the non-dominated sorting genetic algorithm (NSGA-II). Finally, a numerically optimized solution was validated against experimental product data, in terms of both fatigue endurance and thickness index.

  11. Exploratory regression analysis: a tool for selecting models and determining predictor importance.

    Science.gov (United States)

    Braun, Michael T; Oswald, Frederick L

    2011-06-01

    Linear regression analysis is one of the most important tools in a researcher's toolbox for creating and testing predictive models. Although linear regression analysis indicates how strongly a set of predictor variables, taken together, will predict a relevant criterion (i.e., the multiple R), the analysis cannot indicate which predictors are the most important. Although there is no definitive or unambiguous method for establishing predictor variable importance, there are several accepted methods. This article reviews those methods for establishing predictor importance and provides a program (in Excel) for implementing them (available for direct download at http://dl.dropbox.com/u/2480715/ERA.xlsm?dl=1) . The program investigates all 2(p) - 1 submodels and produces several indices of predictor importance. This exploratory approach to linear regression, similar to other exploratory data analysis techniques, has the potential to yield both theoretical and practical benefits.

  12. A note on constrained M-estimation and its recursive analog in multivariate linear regression models

    Institute of Scientific and Technical Information of China (English)

    RAO; Calyampudi; R

    2009-01-01

    In this paper,the constrained M-estimation of the regression coeffcients and scatter parameters in a general multivariate linear regression model is considered.Since the constrained M-estimation is not easy to compute,an up-dating recursion procedure is proposed to simplify the com-putation of the estimators when a new observation is obtained.We show that,under mild conditions,the recursion estimates are strongly consistent.In addition,the asymptotic normality of the recursive constrained M-estimators of regression coeffcients is established.A Monte Carlo simulation study of the recursion estimates is also provided.Besides,robustness and asymptotic behavior of constrained M-estimators are briefly discussed.

  13. A multivariate linear regression model for the Jordanian industrial electric energy consumption

    Energy Technology Data Exchange (ETDEWEB)

    Al-Ghandoor, A.; Nahleh, Y.A.; Sandouqa, Y.; Al-Salaymeh, M. [Hashemite Univ., Zarqa (Jordan). Dept. of Industrial Engineering

    2007-08-09

    The amount of electricity used by the industrial sector in Jordan is an important driver for determining the future energy needs of the country. This paper proposed a model to simulate electricity and energy consumption by industry. The general model approach was based on multivariate regression analysis to provide valuable information regarding energy demands and analysis, and to identify the various factors that influence Jordanian industrial electricity consumption. It was determined that industrial gross output and capacity utilization are the most important variables that drive electricity consumption. The results revealed that the multivariate linear regression model can be used to adequately model the Jordanian industrial electricity consumption with coefficient of determination (R2) and adjusted R2 values of 99.3 and 99.2 per cent, respectively. 19 refs., 4 tabs., 2 figs.

  14. Application of artificial neural engineering and regression models for forecasting shelf life of instant coffee drink

    Directory of Open Access Journals (Sweden)

    Sumit Goyal

    2011-07-01

    Full Text Available Coffee as beverage is prepared from the roasted seeds (beans of the coffee plant. Coffee is the second most important product in the international market in terms of volume trade and the most important in terms of value. Artificial neural engineering and regression models were developed to predict shelf life of instant coffee drink. Colour and appearance, flavour, viscosity and sediment were used as input parameters. Overall acceptability was used as output parameter. The dataset consisted of experimentally developed 50 observations. The dataset was divided into two disjoint subsets, namely, training set containing 40 observations (80% of total observations and test set comprising of 10 observations (20% of total observations. The network was trained with 500 epochs. Neural network toolbox under Matlab 7.0 software was used for training the models. From the investigation it was revealed that multiple linear regression model was superior over radial basis model for forecasting shelf life of instant coffee drink.

  15. The applicability of linear regression models in working environments' thermal evaluation.

    Directory of Open Access Journals (Sweden)

    Pablo Adamoglu de Oliveira

    2006-04-01

    Full Text Available The simultaneous analysis of thermal variables with normal distribution with the aim of checking if there is any significative correlation among them or if there is the possibility of making predictions of the values of some of them based on others’ values is considered a problem of great importance in statistics studies. The aim of this paper is to study the applicability of linear regression models in working environments’ thermal comfort studies, thus contributing for the comprehension of the possible environmental cooling, heating or winding needs. It starts with a bibliographical research, followed by a field research, data collection and and software statistical-mathematical data treatment. It was then performed data analysis and the construction of the regression linear models using the t and F tests for determining the consistency of the models and their parameters, as well as the building of conclusions based on the information obtained and on the significance of the mathematical models built.

  16. Floating Car Data Based Nonparametric Regression Model for Short-Term Travel Speed Prediction

    Institute of Scientific and Technical Information of China (English)

    WENG Jian-cheng; HU Zhong-wei; YU Quan; REN Fu-tian

    2007-01-01

    A K-nearest neighbor (K-NN) based nonparametric regression model was proposed to predict travel speed for Beijing expressway. By using the historical traffic data collected from the detectors in Beijing expressways, a specically designed database was developed via the processes including data filtering, wavelet analysis and clustering. The relativity based weighted Euclidean distance was used as the distance metric to identify the K groups of nearest data series. Then, a K-NN nonparametric regression model was built to predict the average travel speeds up to 6 min into the future. Several randomly selected travel speed data series,collected from the floating car data (FCD) system, were used to validate the model. The results indicate that using the FCD, the model can predict average travel speeds with an accuracy of above 90%, and hence is feasible and effective.

  17. An empirical approach to update multivariate regression models intended for routine industrial use

    Energy Technology Data Exchange (ETDEWEB)

    Garcia-Mencia, M.V.; Andrade, J.M.; Lopez-Mahia, P.; Prada, D. [University of La Coruna, La Coruna (Spain). Dept. of Analytical Chemistry

    2000-11-01

    Many problems currently tackled by analysts are highly complex and, accordingly, multivariate regression models need to be developed. Two intertwined topics are important when such models are to be applied within the industrial routines: (1) Did the model account for the 'natural' variance of the production samples? (2) Is the model stable on time? This paper focuses on the second topic and it presents an empirical approach where predictive models developed by using Mid-FTIR and PLS and PCR hold its utility during about nine months when used to predict the octane number of platforming naphthas in a petrochemical refinery. 41 refs., 10 figs., 1 tab.

  18. BOOTSTRAP WAVELET IN THE NONPARAMETRIC REGRESSION MODEL WITH WEAKLY DEPENDENT PROCESSES

    Institute of Scientific and Technical Information of China (English)

    林路; 张润楚

    2004-01-01

    This paper introduces a method of bootstrap wavelet estimation in a nonparametric regression model with weakly dependent processes for both fixed and random designs. The asymptotic bounds for the bias and variance of the bootstrap wavelet estimators are given in the fixed design model. The conditional normality for a modified version of the bootstrap wavelet estimators is obtained in the fixed model. The consistency for the bootstrap wavelet estimator is also proved in the random design model. These results show that the bootstrap wavelet method is valid for the model with weakly dependent processes.

  19. A quantile regression approach for modelling a Health-Related Quality of Life Measure

    Directory of Open Access Journals (Sweden)

    Giulia Cavrini

    2013-05-01

    Full Text Available Objective. The aim of this study is to propose a new approach for modeling the EQ-5D index and EQ-5D VAS in order to explain the lifestyle determinants effect using the quantile regression analysis. Methods. Data was collected within a cross-sectional study that involved a probabilistic sample of 1,622 adults randomly selected from the population register of two Health Authorities of Bologna in northern Italy. The perceived health status of people was measured using the EQ-5D questionnaire. The Visual Analogue Scale included in the EQ-5D Questionnaire, the EQ-VAS, and the EQ-5D index were used to obtain the synthetic measures of quality of life. To model EQ-VAS Score and EQ-5D index, a quantile regression analysis was employed. Quantile Regression is a way to estimate the conditional quantiles of the VAS Score distribution in a linear model, in order to have a more complete view of possible associations between a measure of Health Related Quality of Life (dependent variable and socio-demographic and determinants data. This methodological approach was preferred to an OLS regression because of the EQ-VAS Score and EQ-5D index typical distribution. Main Results. The analysis suggested that age, gender, and comorbidity can explain variability in perceived health status measured by the EQ-5D index and the VAS.

  20. Comparison of a Bayesian Network with a Logistic Regression Model to Forecast IgA Nephropathy

    Directory of Open Access Journals (Sweden)

    Michel Ducher

    2013-01-01

    Full Text Available Models are increasingly used in clinical practice to improve the accuracy of diagnosis. The aim of our work was to compare a Bayesian network to logistic regression to forecast IgA nephropathy (IgAN from simple clinical and biological criteria. Retrospectively, we pooled the results of all biopsies (n=155 performed by nephrologists in a specialist clinical facility between 2002 and 2009. Two groups were constituted at random. The first subgroup was used to determine the parameters of the models adjusted to data by logistic regression or Bayesian network, and the second was used to compare the performances of the models using receiver operating characteristics (ROC curves. IgAN was found (on pathology in 44 patients. Areas under the ROC curves provided by both methods were highly significant but not different from each other. Based on the highest Youden indices, sensitivity reached (100% versus 67% and specificity (73% versus 95% using the Bayesian network and logistic regression, respectively. A Bayesian network is at least as efficient as logistic regression to estimate the probability of a patient suffering IgAN, using simple clinical and biological data obtained during consultation.

  1. Comparison of a Bayesian network with a logistic regression model to forecast IgA nephropathy.

    Science.gov (United States)

    Ducher, Michel; Kalbacher, Emilie; Combarnous, François; Finaz de Vilaine, Jérome; McGregor, Brigitte; Fouque, Denis; Fauvel, Jean Pierre

    2013-01-01

    Models are increasingly used in clinical practice to improve the accuracy of diagnosis. The aim of our work was to compare a Bayesian network to logistic regression to forecast IgA nephropathy (IgAN) from simple clinical and biological criteria. Retrospectively, we pooled the results of all biopsies (n = 155) performed by nephrologists in a specialist clinical facility between 2002 and 2009. Two groups were constituted at random. The first subgroup was used to determine the parameters of the models adjusted to data by logistic regression or Bayesian network, and the second was used to compare the performances of the models using receiver operating characteristics (ROC) curves. IgAN was found (on pathology) in 44 patients. Areas under the ROC curves provided by both methods were highly significant but not different from each other. Based on the highest Youden indices, sensitivity reached (100% versus 67%) and specificity (73% versus 95%) using the Bayesian network and logistic regression, respectively. A Bayesian network is at least as efficient as logistic regression to estimate the probability of a patient suffering IgAN, using simple clinical and biological data obtained during consultation.

  2. Comparison of regression models for estimation of isometric wrist joint torques using surface electromyography

    Directory of Open Access Journals (Sweden)

    Menon Carlo

    2011-09-01

    Full Text Available Abstract Background Several regression models have been proposed for estimation of isometric joint torque using surface electromyography (SEMG signals. Common issues related to torque estimation models are degradation of model accuracy with passage of time, electrode displacement, and alteration of limb posture. This work compares the performance of the most commonly used regression models under these circumstances, in order to assist researchers with identifying the most appropriate model for a specific biomedical application. Methods Eleven healthy volunteers participated in this study. A custom-built rig, equipped with a torque sensor, was used to measure isometric torque as each volunteer flexed and extended his wrist. SEMG signals from eight forearm muscles, in addition to wrist joint torque data were gathered during the experiment. Additional data were gathered one hour and twenty-four hours following the completion of the first data gathering session, for the purpose of evaluating the effects of passage of time and electrode displacement on accuracy of models. Acquired SEMG signals were filtered, rectified, normalized and then fed to models for training. Results It was shown that mean adjusted coefficient of determination (Ra2 values decrease between 20%-35% for different models after one hour while altering arm posture decreased mean Ra2 values between 64% to 74% for different models. Conclusions Model estimation accuracy drops significantly with passage of time, electrode displacement, and alteration of limb posture. Therefore model retraining is crucial for preserving estimation accuracy. Data resampling can significantly reduce model training time without losing estimation accuracy. Among the models compared, ordinary least squares linear regression model (OLS was shown to have high isometric torque estimation accuracy combined with very short training times.

  3. Regression by L1 regularization of smart contrasts and sums (ROSCAS) beats PLS and elastic net in latent variable model

    NARCIS (Netherlands)

    Braak, ter C.J.F.

    2009-01-01

    This paper proposes a regression method, ROSCAS, which regularizes smart contrasts and sums of regression coefficients by an L1 penalty. The contrasts and sums are based on the sample correlation matrix of the predictors and are suggested by a latent variable regression model. The contrasts express

  4. Reduced Rank Regression

    DEFF Research Database (Denmark)

    Johansen, Søren

    2008-01-01

    The reduced rank regression model is a multivariate regression model with a coefficient matrix with reduced rank. The reduced rank regression algorithm is an estimation procedure, which estimates the reduced rank regression model. It is related to canonical correlations and involves calculating e...... eigenvalues and eigenvectors. We give a number of different applications to regression and time series analysis, and show how the reduced rank regression estimator can be derived as a Gaussian maximum likelihood estimator. We briefly mention asymptotic results...

  5. Adaptive Control and Synchronization of the Shallow Water Model

    Directory of Open Access Journals (Sweden)

    P. Sangapate

    2012-01-01

    Full Text Available The shallow water model is one of the important models in dynamical systems. This paper investigates the adaptive chaos control and synchronization of the shallow water model. First, adaptive control laws are designed to stabilize the shallow water model. Then adaptive control laws are derived to chaos synchronization of the shallow water model. The sufficient conditions for the adaptive control and synchronization have been analyzed theoretically, and the results are proved using a Barbalat's Lemma.

  6. A review of a priori regression models for warfarin maintenance dose prediction.

    Science.gov (United States)

    Francis, Ben; Lane, Steven; Pirmohamed, Munir; Jorgensen, Andrea

    2014-01-01

    A number of a priori warfarin dosing algorithms, derived using linear regression methods, have been proposed. Although these dosing algorithms may have been validated using patients derived from the same centre, rarely have they been validated using a patient cohort recruited from another centre. In order to undertake external validation, two cohorts were utilised. One cohort formed by patients from a prospective trial and the second formed by patients in the control arm of the EU-PACT trial. Of these, 641 patients were identified as having attained stable dosing and formed the dataset used for validation. Predicted maintenance doses from six criterion fulfilling regression models were then compared to individual patient stable warfarin dose. Predictive ability was assessed with reference to several statistics including the R-square and mean absolute error. The six regression models explained different amounts of variability in the stable maintenance warfarin dose requirements of the patients in the two validation cohorts; adjusted R-squared values ranged from 24.2% to 68.6%. An overview of the summary statistics demonstrated that no one dosing algorithm could be considered optimal. The larger validation cohort from the prospective trial produced more consistent statistics across the six dosing algorithms. The study found that all the regression models performed worse in the validation cohort when compared to the derivation cohort. Further, there was little difference between regression models that contained pharmacogenetic coefficients and algorithms containing just non-pharmacogenetic coefficients. The inconsistency of results between the validation cohorts suggests that unaccounted population specific factors cause variability in dosing algorithm performance. Better methods for dosing that take into account inter- and intra-individual variability, at the initiation and maintenance phases of warfarin treatment, are needed.

  7. A review of a priori regression models for warfarin maintenance dose prediction.

    Directory of Open Access Journals (Sweden)

    Ben Francis

    Full Text Available A number of a priori warfarin dosing algorithms, derived using linear regression methods, have been proposed. Although these dosing algorithms may have been validated using patients derived from the same centre, rarely have they been validated using a patient cohort recruited from another centre. In order to undertake external validation, two cohorts were utilised. One cohort formed by patients from a prospective trial and the second formed by patients in the control arm of the EU-PACT trial. Of these, 641 patients were identified as having attained stable dosing and formed the dataset used for validation. Predicted maintenance doses from six criterion fulfilling regression models were then compared to individual patient stable warfarin dose. Predictive ability was assessed with reference to several statistics including the R-square and mean absolute error. The six regression models explained different amounts of variability in the stable maintenance warfarin dose requirements of the patients in the two validation cohorts; adjusted R-squared values ranged from 24.2% to 68.6%. An overview of the summary statistics demonstrated that no one dosing algorithm could be considered optimal. The larger validation cohort from the prospective trial produced more consistent statistics across the six dosing algorithms. The study found that all the regression models performed worse in the validation cohort when compared to the derivation cohort. Further, there was little difference between regression models that contained pharmacogenetic coefficients and algorithms containing just non-pharmacogenetic coefficients. The inconsistency of results between the validation cohorts suggests that unaccounted population specific factors cause variability in dosing algorithm performance. Better methods for dosing that take into account inter- and intra-individual variability, at the initiation and maintenance phases of warfarin treatment, are needed.

  8. APPLICATION OF REGRESSION MODELLING TECHNIQUES IN DESALINATION OF SEA WATER BY MEMBRANE DISTILLATION

    Directory of Open Access Journals (Sweden)

    SELVI S. R

    2015-08-01

    Full Text Available The objective of this work is to gain an idea about the statistical significance of experimental parameters on the performance of membrane distillation. In this work the raw sea water sample without pretreatment was collected from Puducherry and desalinated using direct contact membrane distillation method. Experimental data analysis was carried out using statistical methods. The experimental data involves the effects of feed temperature, feed flow rate and feed concentration on the permeate flux. In statistical methods, regression model was developed to correlate the significance of input parameters like feed temperature, feed concentration and feed flow rate with the output parameter like permeate flux in the process of membrane distillation. Since the performance of the membrane distillation in the desalination of water is characterised by permeate flux, regression model using simple linear method was carried out. Goodness of model fitting should always has to be validated. Regression model was validated using ANOVA. Estimates of ANOVA for the parameter study was given and the coefficient obtained by regression analysis was specified in the regression equation and concluded that the highest coefficient of input parameter is significant, highly influences the response. Feed flow rate and feed temperature has higher influence on permeate flux than that of feed concentration. The coefficient of feed concentration was found to be negative which indicates less significant factor on permeate flux. The chemical composition of sea water was given by water quality analysis . TDS of membrane distilled water was found to be 18ppm than the initial feed TDS of sea water 27,720 ppm. From the experimental work it was found, salt rejection as 99% and water analysis report confirms the quality of distillate obtained by this desalination process as potable water.

  9. Passenger Flow Prediction of Subway Transfer Stations Based on Nonparametric Regression Model

    Directory of Open Access Journals (Sweden)

    Yujuan Sun

    2014-01-01

    Full Text Available Passenger flow is increasing dramatically with accomplishment of subway network system in big cities of China. As convergence nodes of subway lines, transfer stations need to assume more passengers due to amount transfer demand among different lines. Then, transfer facilities have to face great pressure such as pedestrian congestion or other abnormal situations. In order to avoid pedestrian congestion or warn the management before it occurs, it is very necessary to predict the transfer passenger flow to forecast pedestrian congestions. Thus, based on nonparametric regression theory, a transfer passenger flow prediction model was proposed. In order to test and illustrate the prediction model, data of transfer passenger flow for one month in XIDAN transfer station were used to calibrate and validate the model. By comparing with Kalman filter model and support vector machine regression model, the results show that the nonparametric regression model has the advantages of high accuracy and strong transplant ability and could predict transfer passenger flow accurately for different intervals.

  10. A general binomial regression model to estimate standardized risk differences from binary response data.

    Science.gov (United States)

    Kovalchik, Stephanie A; Varadhan, Ravi; Fetterman, Barbara; Poitras, Nancy E; Wacholder, Sholom; Katki, Hormuzd A

    2013-02-28

    Estimates of absolute risks and risk differences are necessary for evaluating the clinical and population impact of biomedical research findings. We have developed a linear-expit regression model (LEXPIT) to incorporate linear and nonlinear risk effects to estimate absolute risk from studies of a binary outcome. The LEXPIT is a generalization of both the binomial linear and logistic regression models. The coefficients of the LEXPIT linear terms estimate adjusted risk differences, whereas the exponentiated nonlinear terms estimate residual odds ratios. The LEXPIT could be particularly useful for epidemiological studies of risk association, where adjustment for multiple confounding variables is common. We present a constrained maximum likelihood estimation algorithm that ensures the feasibility of risk estimates of the LEXPIT model and describe procedures for defining the feasible region of the parameter space, judging convergence, and evaluating boundary cases. Simulations demonstrate that the methodology is computationally robust and yields feasible, consistent estimators. We applied the LEXPIT model to estimate the absolute 5-year risk of cervical precancer or cancer associated with different Pap and human papillomavirus test results in 167,171 women undergoing screening at Kaiser Permanente Northern California. The LEXPIT model found an increased risk due to abnormal Pap test in human papillomavirus-negative that was not detected with logistic regression. Our R package blm provides free and easy-to-use software for fitting the LEXPIT model.

  11. Efficient Blind System Identification of Non-Gaussian Auto-Regressive Models with HMM Modeling of the Excitation

    DEFF Research Database (Denmark)

    Li, Chunjian; Andersen, Søren Vang

    2007-01-01

    We propose two blind system identification methods that exploit the underlying dynamics of non-Gaussian signals. The two signal models to be identified are: an Auto-Regressive (AR) model driven by a discrete-state Hidden Markov process, and the same model whose output is perturbed by white Gaussian...

  12. Identifying of risks in pricing using a regression model of demand on price dependence

    Directory of Open Access Journals (Sweden)

    O.I. Yashkina

    2016-09-01

    Full Text Available The aim of the article. The main purpose of the article is to describe scientific and methodological approaches of determining the price elasticity of demand as a regression model based on the price and risk assessment of price variations on the received model. The results of the analysis. The study is based on the assumption that the index of price elasticity of demand on high-tech innovation is not constant as it is commonly understood in the classical sense. On the stage of commodity market release and subsequent sales growth, the index of price elasticity of demand may vary within certain limits. Index value and thereafter market response are closely related to the current price. Achieving the stated purpose of the article is possible when having factual information about prices and corresponding volumes of sales of new high-tech products for a short period of time, on the basis of which types of demand and prices interrelation are modeled. Risk assessment of pricing and profit optimization by the regression of demand depending on price consists of three stages: a obtaining of a regression model of the demand on the price; b obtaining of function of demand price elasticity and risk assessment of pricing depending on behavior of the function; c determination of the price of company to receive a maximum operating profit based on the specific model of price to demand function. To receive the regression model of dependence of demand on price it is recommended to use specific reference models. The article includes linear, hyperbolic and parabolic models. The regression dependence of price elasticity of demand on price for each of the reference models of demand is obtained on the basis of the function elasticity concept in mathematical analysis. The concept of «function of price elasticity of demand» expresses this dependence. For the received functions of price elasticity of demand, the article provides intervals with the highest and lowest

  13. A Model for Climate Change Adaptation

    Science.gov (United States)

    Pasqualini, D.; Keating, G. N.

    2009-12-01

    Climate models predict serious impacts on the western U.S. in the next few decades, including increased temperatures and reduced precipitation. In combination, these changes are linked to profound impacts on fundamental systems, such as water and energy supplies, agriculture, population stability, and the economy. Global and national imperatives for climate change mitigation and adaptation are made actionable at the state level, for instance through greenhouse gas (GHG) emission regulations and incentives for renewable energy sources. However, adaptation occurs at the local level, where energy and water usage can be understood relative to local patterns of agriculture, industry, and culture. In response to the greenhouse gas emission reductions required by California’s Assembly Bill 32 (2006), Sonoma County has committed to sharp emissions reductions across several sectors, including water, energy, and transportation. To assist Sonoma County develop a renewable energy (RE) portfolio to achieve this goal we have developed an integrated assessment model, CLEAR (CLimate-Energy Assessment for Resiliency) model. Building on Sonoma County’s existing baseline studies of energy use, carbon emissions and potential RE sources, the CLEAR model simulates the complex interactions among technology deployment, economics and social behavior. This model enables assessment of these and other components with specific analysis of their coupling and feedbacks because, due to the complex nature of the problem, the interrelated sectors cannot be studied independently. The goal is an approach to climate change mitigation and adaptation that is replicable for use by other interested communities. The model user interfaces helps stakeholders and policymakers understand options for technology implementation.

  14. The study on Sanmenxia annual flow forecasting in the Yellow River with mix regression model

    Institute of Scientific and Technical Information of China (English)

    JIANG Xiaohui; LIU Changming; WANG Yu; WANG Hongrui

    2004-01-01

    This paper established mix regression model for simulating annual flow, in which annual runoff is auto-regression factor, precipitation, air temperature and water consumption are regression factors; we adopted 9 hypothesis climate change schemes to forecast the change of annual flow of Sanmenxia Station. The results show: (1) When temperature is steady, the average annual runoff will increase by 8.3% if precipitation increases by 10%; when precipitation decreases by 10%, the average annual runoff will decrease by 8.2%; when precipitation is steady, the average annual runoff will decrease by 2.4% if temperature increases 1 ℃; if temperature decreases 1 ℃, runoff will increase by 1.2%. The mix regression model can well simulate annual runoff. (2) As to 9 different temperature and precipitation scenarios, scenario 9 is the most adverse to the runoff of Sanmenxia Station of Yellow River; i.e. temperature increases 1℃and precipitation decreases by 10%. Under this condition, the simulated average annual runoff decreases by 10.8%. On the contrary, scenario 1 is the best to the enhancement of runoff; i.e. when temperature decreases 1 ℃ precipitation will increase by 10%, which will make the annual runoff of Sanmenxia increase by 10.6%.

  15. A regression-kriging model for estimation of rainfall in the Laohahe basin

    Science.gov (United States)

    Wang, Hong; Ren, Li L.; Liu, Gao H.

    2009-10-01

    This paper presents a multivariate geostatistical algorithm called regression-kriging (RK) for predicting the spatial distribution of rainfall by incorporating five topographic/geographic factors of latitude, longitude, altitude, slope and aspect. The technique is illustrated using rainfall data collected at 52 rain gauges from the Laohahe basis in northeast China during 1986-2005 . Rainfall data from 44 stations were selected for modeling and the remaining 8 stations were used for model validation. To eliminate multicollinearity, the five explanatory factors were first transformed using factor analysis with three Principal Components (PCs) extracted. The rainfall data were then fitted using step-wise regression and residuals interpolated using SK. The regression coefficients were estimated by generalized least squares (GLS), which takes the spatial heteroskedasticity between rainfall and PCs into account. Finally, the rainfall prediction based on RK was compared with that predicted from ordinary kriging (OK) and ordinary least squares (OLS) multiple regression (MR). For correlated topographic factors are taken into account, RK improves the efficiency of predictions. RK achieved a lower relative root mean square error (RMSE) (44.67%) than MR (49.23%) and OK (73.60%) and a lower bias than MR and OK (23.82 versus 30.89 and 32.15 mm) for annual rainfall. It is much more effective for the wet season than for the dry season. RK is suitable for estimation of rainfall in areas where there are no stations nearby and where topography has a major influence on rainfall.

  16. Probing turbulence intermittency via Auto-Regressive Moving-Average models

    CERN Document Server

    Faranda, Davide; Dubrulle, Berengere; Daviaud, Francois

    2014-01-01

    We suggest a new approach to probing intermittency corrections to the Kolmogorov law in turbulent flows based on the Auto-Regressive Moving-Average modeling of turbulent time series. We introduce a new index $\\Upsilon$ that measures the distance from a Kolmogorov-Obukhov model in the Auto-Regressive Moving-Average models space. Applying our analysis to Particle Image Velocimetry and Laser Doppler Velocimetry measurements in a von K\\'arm\\'an swirling flow, we show that $\\Upsilon$ is proportional to the traditional intermittency correction computed from the structure function. Therefore it provides the same information, using much shorter time series. We conclude that $\\Upsilon$ is a suitable index to reconstruct the spatial intermittency of the dissipation in both numerical and experimental turbulent fields.

  17. Ordinal regression models to describe tourist satisfaction with Sintra's world heritage

    Science.gov (United States)

    Mouriño, Helena

    2013-10-01

    In Tourism Research, ordinal regression models are becoming a very powerful tool in modelling the relationship between an ordinal response variable and a set of explanatory variables. In August and September 2010, we conducted a pioneering Tourist Survey in Sintra, Portugal. The data were obtained by face-to-face interviews at the entrances of the Palaces and Parks of Sintra. The work developed in this paper focus on two main points: tourists' perception of the entrance fees; overall level of satisfaction with this heritage site. For attaining these goals, ordinal regression models were developed. We concluded that tourist's nationality was the only significant variable to describe the perception of the admission fees. Also, Sintra's image among tourists depends not only on their nationality, but also on previous knowledge about Sintra's World Heritage status.

  18. Random regression models for daily feed intake in Danish Duroc pigs

    DEFF Research Database (Denmark)

    Strathe, Anders Bjerring; Mark, Thomas; Jensen, Just;

    The objective of this study was to develop random regression models and estimate covariance functions for daily feed intake (DFI) in Danish Duroc pigs. A total of 476201 DFI records were available on 6542 Duroc boars between 70 to 160 days of age. The data originated from the National test station......-year-season, permanent, and animal genetic effects. The functional form was based on Legendre polynomials. A total of 64 models for random regressions were initially ranked by BIC to identify the approximate order for the Legendre polynomials using AI-REML. The parsimonious model included Legendre polynomials of 2nd....... Eigenvalues of the genetic covariance function showed that 33% of genetic variability was explained by the individual genetic curve of the pigs. This proportion was covered by linear (27%) and quadratic (6%) coefficients. Genetic eigenfunctions revealed that altering the shape of the feed intake curve...

  19. Partial Least Squares Regression Model to Predict Water Quality in Urban Water Distribution Systems

    Institute of Scientific and Technical Information of China (English)

    LUO Bijun; ZHAO Yuan; CHEN Kai; ZHAO Xinhua

    2009-01-01

    The water distribution system of one residential district in Tianjin is taken as an example to analyze the changes of water quality. Partial least squares (PLS) regression model, in which the turbidity and Fe are regarded as con-trol objectives, is used to establish the statistical model. The experimental results indicate that the PLS regression model has good predicted results of water quality compared with the monitored data. The percentages of absolute relative error (below 15%, 20%, 30%) are 44.4%, 66.7%, 100% (turbidity) and 33.3%, 44.4%, 77.8% (Fe) on the 4th sampling point; 77.8%, 88.9%, 88.9% (turbidity) and 44.4%, 55.6%, 66.7% (Fe) on the 5th sampling point.

  20. The limiting behavior of the estimated parameters in a misspecified random field regression model

    DEFF Research Database (Denmark)

    Dahl, Christian Møller; Qin, Yu

    , as a consequence the random field model specification introduces non-stationarity and non-ergodicity in the misspecified model and it becomes non-trivial, relative to the existing literature, to establish the limiting behavior of the estimated parameters. The asymptotic results are obtained by applying some...... convenient new uniform convergence results that we propose. This theory may have applications beyond those presented here. Our results indicate that classical statistical inference techniques, in general, works very well for random field regression models in finite samples and that these models succesfully...

  1. Comprehensible Predictive Modeling Using Regularized Logistic Regression and Comorbidity Based Features.

    Directory of Open Access Journals (Sweden)

    Gregor Stiglic

    Full Text Available Different studies have demonstrated the importance of comorbidities to better understand the origin and evolution of medical complications. This study focuses on improvement of the predictive model interpretability based on simple logical features representing comorbidities. We use group lasso based feature interaction discovery followed by a post-processing step, where simple logic terms are added. In the final step, we reduce the feature set by applying lasso logistic regression to obtain a compact set of non-zero coefficients that represent a more comprehensible predictive model. The effectiveness of the proposed approach was demonstrated on a pediatric hospital discharge dataset that was used to build a readmission risk estimation model. The evaluation of the proposed method demonstrates a reduction of the initial set of features in a regression model by 72%, with a slight improvement in the Area Under the ROC Curve metric from 0.763 (95% CI: 0.755-0.771 to 0.769 (95% CI: 0.761-0.777. Additionally, our results show improvement in comprehensibility of the final predictive model using simple comorbidity based terms for logistic regression.

  2. ATLS Hypovolemic Shock Classification by Prediction of Blood Loss in Rats Using Regression Models.

    Science.gov (United States)

    Choi, Soo Beom; Choi, Joon Yul; Park, Jee Soo; Kim, Deok Won

    2016-07-01

    In our previous study, our input data set consisted of 78 rats, the blood loss in percent as a dependent variable, and 11 independent variables (heart rate, systolic blood pressure, diastolic blood pressure, mean arterial pressure, pulse pressure, respiration rate, temperature, perfusion index, lactate concentration, shock index, and new index (lactate concentration/perfusion)). The machine learning methods for multicategory classification were applied to a rat model in acute hemorrhage to predict the four Advanced Trauma Life Support (ATLS) hypovolemic shock classes for triage in our previous study. However, multicategory classification is much more difficult and complicated than binary classification. We introduce a simple approach for classifying ATLS hypovolaemic shock class by predicting blood loss in percent using support vector regression and multivariate linear regression (MLR). We also compared the performance of the classification models using absolute and relative vital signs. The accuracies of support vector regression and MLR models with relative values by predicting blood loss in percent were 88.5% and 84.6%, respectively. These were better than the best accuracy of 80.8% of the direct multicategory classification using the support vector machine one-versus-one model in our previous study for the same validation data set. Moreover, the simple MLR models with both absolute and relative values could provide possibility of the future clinical decision support system for ATLS classification. The perfusion index and new index were more appropriate with relative changes than absolute values.

  3. A Stepwise Time Series Regression Procedure for Water Demand Model Identification

    Science.gov (United States)

    Miaou, Shaw-Pin

    1990-09-01

    Annual time series water demand has traditionally been studied through multiple linear regression analysis. Four associated model specification problems have long been recognized: (1) the length of the available time series data is relatively short, (2) a large set of candidate explanatory or "input" variables needs to be considered, (3) input variables can be highly correlated with each other (multicollinearity problem), and (4) model error series are often highly autocorrelated or even nonstationary. A step wise time series regression identification procedure is proposed to alleviate these problems. The proposed procedure adopts the sequential input variable selection concept of stepwise regression and the "three-step" time series model building strategy of Box and Jenkins. Autocorrelated model error is assumed to follow an autoregressive integrated moving average (ARIMA) process. The stepwise selection procedure begins with a univariate time series demand model with no input variables. Subsequently, input variables are selected and inserted into the equation one at a time until the last entered variable is found to be statistically insignificant. The order of insertion is determined by a statistical measure called between-variable partial correlation. This correlation measure is free from the contamination of serial autocorrelation. Three data sets from previous studies are employed to illustrate the proposed procedure. The results are then compared with those from their original studies.

  4. A Linear Regression Model for Global Solar Radiation on Horizontal Surfaces at Warri, Nigeria

    Directory of Open Access Journals (Sweden)

    Michael S. Okundamiya

    2013-10-01

    Full Text Available The growing anxiety on the negative effects of fossil fuels on the environment and the global emission reduction targets call for a more extensive use of renewable energy alternatives. Efficient solar energy utilization is an essential solution to the high atmospheric pollution caused by fossil fuel combustion. Global solar radiation (GSR data, which are useful for the design and evaluation of solar energy conversion system, are not measured at the forty-five meteorological stations in Nigeria. The dearth of the measured solar radiation data calls for accurate estimation. This study proposed a temperature-based linear regression, for predicting the monthly average daily GSR on horizontal surfaces, at Warri (latitude 5.020N and longitude 7.880E an oil city located in the south-south geopolitical zone, in Nigeria. The proposed model is analyzed based on five statistical indicators (coefficient of correlation, coefficient of determination, mean bias error, root mean square error, and t-statistic, and compared with the existing sunshine-based model for the same study. The results indicate that the proposed temperature-based linear regression model could replace the existing sunshine-based model for generating global solar radiation data. Keywords: air temperature; empirical model; global solar radiation; regression analysis; renewable energy; Warri

  5. Multivariate Multiple Regression Models for a Big Data-Empowered SON Framework in Mobile Wireless Networks

    Directory of Open Access Journals (Sweden)

    Yoonsu Shin

    2016-01-01

    Full Text Available In the 5G era, the operational cost of mobile wireless networks will significantly increase. Further, massive network capacity and zero latency will be needed because everything will be connected to mobile networks. Thus, self-organizing networks (SON are needed, which expedite automatic operation of mobile wireless networks, but have challenges to satisfy the 5G requirements. Therefore, researchers have proposed a framework to empower SON using big data. The recent framework of a big data-empowered SON analyzes the relationship between key performance indicators (KPIs and related network parameters (NPs using machine-learning tools, and it develops regression models using a Gaussian process with those parameters. The problem, however, is that the methods of finding the NPs related to the KPIs differ individually. Moreover, the Gaussian process regression model cannot determine the relationship between a KPI and its various related NPs. In this paper, to solve these problems, we proposed multivariate multiple regression models to determine the relationship between various KPIs and NPs. If we assume one KPI and multiple NPs as one set, the proposed models help us process multiple sets at one time. Also, we can find out whether some KPIs are conflicting or not. We implement the proposed models using MapReduce.

  6. Random regression models using Legendre orthogonal polynomials to evaluate the milk production of Alpine goats.

    Science.gov (United States)

    Silva, F G; Torres, R A; Brito, L F; Euclydes, R F; Melo, A L P; Souza, N O; Ribeiro, J I; Rodrigues, M T

    2013-12-11

    The objective of this study was to identify the best random regression model using Legendre orthogonal polynomials to evaluate Alpine goats genetically and to estimate the parameters for test day milk yield. On the test day, we analyzed 20,710 records of milk yield of 667 goats from the Goat Sector of the Universidade Federal de Viçosa. The evaluated models had combinations of distinct fitting orders for polynomials (2-5), random genetic (1-7), and permanent environmental (1-7) fixed curves and a number of classes for residual variance (2, 4, 5, and 6). WOMBAT software was used for all genetic analyses. A random regression model using the best Legendre orthogonal polynomial for genetic evaluation of milk yield on the test day of Alpine goats considered a fixed curve of order 4, curve of genetic additive effects of order 2, curve of permanent environmental effects of order 7, and a minimum of 5 classes of residual variance because it was the most economical model among those that were equivalent to the complete model by the likelihood ratio test. Phenotypic variance and heritability were higher at the end of the lactation period, indicating that the length of lactation has more genetic components in relation to the production peak and persistence. It is very important that the evaluation utilizes the best combination of fixed, genetic additive and permanent environmental regressions, and number of classes of heterogeneous residual variance for genetic evaluation using random regression models, thereby enhancing the precision and accuracy of the estimates of parameters and prediction of genetic values.

  7. A poisson regression approach for modelling spatial autocorrelation between geographically referenced observations

    Directory of Open Access Journals (Sweden)

    Jolley Damien

    2011-10-01

    Full Text Available Abstract Background Analytic methods commonly used in epidemiology do not account for spatial correlation between observations. In regression analyses, omission of that autocorrelation can bias parameter estimates and yield incorrect standard error estimates. Methods We used age standardised incidence ratios (SIRs of esophageal cancer (EC from the Babol cancer registry from 2001 to 2005, and extracted socioeconomic indices from the Statistical Centre of Iran. The following models for SIR were used: (1 Poisson regression with agglomeration-specific nonspatial random effects; (2 Poisson regression with agglomeration-specific spatial random effects. Distance-based and neighbourhood-based autocorrelation structures were used for defining the spatial random effects and a pseudolikelihood approach was applied to estimate model parameters. The Bayesian information criterion (BIC, Akaike's information criterion (AIC and adjusted pseudo R2, were used for model comparison. Results A Gaussian semivariogram with an effective range of 225 km best fit spatial autocorrelation in agglomeration-level EC incidence. The Moran's I index was greater than its expected value indicating systematic geographical clustering of EC. The distance-based and neighbourhood-based Poisson regression estimates were generally similar. When residual spatial dependence was modelled, point and interval estimates of covariate effects were different to those obtained from the nonspatial Poisson model. Conclusions The spatial pattern evident in the EC SIR and the observation that point estimates and standard errors differed depending on the modelling approach indicate the importance of accounting for residual spatial correlation in analyses of EC incidence in the Caspian region of Iran. Our results also illustrate that spatial smoothing must be applied with care.

  8. A comparison of ordinal regression models in an analysis of factors associated with periodontal disease

    Directory of Open Access Journals (Sweden)

    Javali Shivalingappa

    2010-01-01

    Full Text Available Aim: The study aimed to determine the factors associated with periodontal disease (different levels of severity by using different regression models for ordinal data. Design: A cross-sectional design was employed using clinical examination and ′questionnaire with interview′ method. Materials and Methods: The study was conducted during June 2008 to October 2008 in Dharwad, Karnataka, India. It involved a systematic random sample of 1760 individuals aged 18-40 years. The periodontal disease examination was conducted by using Community Periodontal Index for Treatment Needs (CPITN. Statistical Analysis Used: Regression models for ordinal data with different built-in link functions were used in determination of factors associated with periodontal disease. Results: The study findings indicated that, the ordinal regression models with four built-in link functions (logit, probit, Clog-log and nlog-log displayed similar results with negligible differences in significant factors associated with periodontal disease. The factors such as religion, caste, sources of drinking water, Timings for sweet consumption, Timings for cleaning or brushing the teeth and materials used for brushing teeth were significantly associated with periodontal disease in all ordinal models. Conclusions: The ordinal regression model with Clog-log is a better fit in determination of significant factors associated with periodontal disease as compared to models with logit, probit and nlog-log built-in link functions. The factors such as caste and time for sweet consumption are negatively associated with periodontal disease. But religion, sources of drinking water, Timings for cleaning or brushing the teeth and materials used for brushing teeth are significantly and positively associated with periodontal disease.

  9. A class of additive-accelerated means regression models for recurrent event data

    Institute of Scientific and Technical Information of China (English)

    2010-01-01

    In this article, we propose a class of additive-accelerated means regression models for analyzing recurrent event data. The class includes the proportional means model, the additive rates model, the accelerated failure time model, the accelerated rates model and the additive-accelerated rate model as special cases. The new model offers great flexibility in formulating the effects of covariates on the mean functions of counting processes while leaving the stochastic structure completely unspecified. For the inference on the model parameters, estimating equation approaches are derived and asymptotic properties of the proposed estimators are established. In addition, a technique is provided for model checking. The finite-sample behavior of the proposed methods is examined through Monte Carlo simulation studies, and an application to a bladder cancer study is illustrated.

  10. On adaptive refinements in discrete probabilistic fracture models

    Directory of Open Access Journals (Sweden)

    J. Eliáš

    2017-01-01

    Full Text Available The possibility to adaptively change discretization density is a well acknowledged and used feature of many continuum models. It is employed to save computational time and increase solution accuracy. Recently, adaptivity has been introduced also for discrete particle models. This contribution applies adaptive technique in probabilistic discrete modelling where material properties are varying in space according to a random field. The random field discretization is adaptively refined hand in hand with the model geometry.

  11. An adaptive contextual quantum language model

    Science.gov (United States)

    Li, Jingfei; Zhang, Peng; Song, Dawei; Hou, Yuexian

    2016-08-01

    User interactions in search system represent a rich source of implicit knowledge about the user's cognitive state and information need that continuously evolves over time. Despite massive efforts that have been made to exploiting and incorporating this implicit knowledge in information retrieval, it is still a challenge to effectively capture the term dependencies and the user's dynamic information need (reflected by query modifications) in the context of user interaction. To tackle these issues, motivated by the recent Quantum Language Model (QLM), we develop a QLM based retrieval model for session search, which naturally incorporates the complex term dependencies occurring in user's historical queries and clicked documents with density matrices. In order to capture the dynamic information within users' search session, we propose a density matrix transformation framework and further develop an adaptive QLM ranking model. Extensive comparative experiments show the effectiveness of our session quantum language models.

  12. Adaptive Lattice Boltzmann Model for Compressible Flows

    Institute of Scientific and Technical Information of China (English)

    2000-01-01

    A new lattice Boltzmann model for compressible flows is presented. The main difference from the standard lattice Boltzmann model is that the particle velocities are no longer constant, but vary with the mean velocity and internal energy. The adaptive nature of the particle velocities permits the mean flow to have a high Mach number. The introduction of a particle potential energy makes the model suitable for a perfect gas with arbitrary specific heat ratio. The Navier-Stokes (N-S) equations are derived by the Chapman-Enskog method from the BGK Boltzmann equation. Two kinds of simulations have been carried out on the hexagonal lattice to test the proposed model. One is the Sod shock-tube simulation. The other is a strong shock of Mach number 5.09 diffracting around a corner.

  13. Model Driven Mutation Applied to Adaptative Systems Testing

    CERN Document Server

    Bartel, Alexandre; Munoz, Freddy; Klein, Jacques; Mouelhi, Tejeddine; Traon, Yves Le

    2012-01-01

    Dynamically Adaptive Systems modify their behav- ior and structure in response to changes in their surrounding environment and according to an adaptation logic. Critical sys- tems increasingly incorporate dynamic adaptation capabilities; examples include disaster relief and space exploration systems. In this paper, we focus on mutation testing of the adaptation logic. We propose a fault model for adaptation logics that classifies faults into environmental completeness and adaptation correct- ness. Since there are several adaptation logic languages relying on the same underlying concepts, the fault model is expressed independently from specific adaptation languages. Taking benefit from model-driven engineering technology, we express these common concepts in a metamodel and define the operational semantics of mutation operators at this level. Mutation is applied on model elements and model transformations are used to propagate these changes to a given adaptation policy in the chosen formalism. Preliminary resul...

  14. A methodology for the design of experiments in computational intelligence with multiple regression models.

    Science.gov (United States)

    Fernandez-Lozano, Carlos; Gestal, Marcos; Munteanu, Cristian R; Dorado, Julian; Pazos, Alejandro

    2016-01-01

    The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable.

  15. Regression models for near-infrared measurement of subcutaneous adipose tissue thickness.

    Science.gov (United States)

    Wang, Yu; Hao, Dongmei; Shi, Jingbin; Yang, Zeqiang; Jin, Liu; Zhang, Song; Yang, Yimin; Bin, Guangyu; Zeng, Yanjun; Zheng, Dingchang

    2016-07-01

    Obesity is often associated with the risks of diabetes and cardiovascular disease, and there is a need to measure subcutaneous adipose tissue (SAT) thickness for acquiring the distribution of body fat. The present study aimed to develop and evaluate different model-based methods for SAT thickness measurement using an SATmeter developed in our laboratory. Near-infrared signals backscattered from the body surfaces from 40 subjects at 20 body sites each were recorded. Linear regression (LR) and support vector regression (SVR) models were established to predict SAT thickness on different body sites. The measurement accuracy was evaluated by ultrasound, and compared with results from a mechanical skinfold caliper (MSC) and a body composition balance monitor (BCBM). The results showed that both LR- and SVR-based measurement produced better accuracy than MSC and BCBM. It was also concluded that by using regression models specifically designed for certain parts of human body, higher measurement accuracy could be achieved than using a general model for the whole body. Our results demonstrated that the SATmeter is a feasible method, which can be applied at home and in the community due to its portability and convenience.

  16. The Combination Forecasting Model of Grain Production Based on Stepwise Regression Method and RBF Neural Network

    Directory of Open Access Journals (Sweden)

    Lihua Yang

    2015-04-01

    Full Text Available In order to improve the accuracy of grain production forecasting, this study proposed a new combination forecasting model, the model combined stepwise regression method with RBF neural network by assigning proper weights using inverse variance method. By comparing different criteria, the result indicates that the combination forecasting model is superior to other models. The performance of the models is measured using three types of error measurement, which are Mean Absolute Percentage Error (MAPE, Theil Inequality Coefficient (Theil IC and Root Mean Squared Error (RMSE. The model with smallest value of MAPE, Theil IC and RMSE stands out to be the best model in predicting the grain production. Based on the MAPE, Theil IC and RMSE evaluation criteria, the combination model can reduce the forecasting error and has high prediction accuracy in grain production forecasting, making the decision more scientific and rational.

  17. Intermittent reservoir daily-inflow prediction using lumped and distributed data multi-linear regression models

    Indian Academy of Sciences (India)

    R B Magar; V Jothiprakash

    2011-12-01

    In this study, multi-linear regression (MLR) approach is used to construct intermittent reservoir daily inflow forecasting system. To illustrate the applicability and effect of using lumped and distributed input data in MLR approach, Koyna river watershed in Maharashtra, India is chosen as a case study. The results are also compared with autoregressive integrated moving average (ARIMA) models. MLR attempts to model the relationship between two or more independent variables over a dependent variable by fitting a linear regression equation. The main aim of the present study is to see the consequences of development and applicability of simple models, when sufficient data length is available. Out of 47 years of daily historical rainfall and reservoir inflow data, 33 years of data is used for building the model and 14 years of data is used for validating the model. Based on the observed daily rainfall and reservoir inflow, various types of time-series, cause-effect and combined models are developed using lumped and distributed input data. Model performance was evaluated using various performance criteria and it was found that as in the present case, of well correlated input data, both lumped and distributed MLR models perform equally well. For the present case study considered, both MLR and ARIMA models performed equally sound due to availability of large dataset.

  18. Selection of higher order regression models in the analysis of multi-factorial transcription data.

    Directory of Open Access Journals (Sweden)

    Olivia Prazeres da Costa

    Full Text Available INTRODUCTION: Many studies examine gene expression data that has been obtained under the influence of multiple factors, such as genetic background, environmental conditions, or exposure to diseases. The interplay of multiple factors may lead to effect modification and confounding. Higher order linear regression models can account for these effects. We present a new methodology for linear model selection and apply it to microarray data of bone marrow-derived macrophages. This experiment investigates the influence of three variable factors: the genetic background of the mice from which the macrophages were obtained, Yersinia enterocolitica infection (two strains, and a mock control, and treatment/non-treatment with interferon-γ. RESULTS: We set up four different linear regression models in a hierarchical order. We introduce the eruption plot as a new practical tool for model selection complementary to global testing. It visually compares the size and significance of effect estimates between two nested models. Using this methodology we were able to select the most appropriate model by keeping only relevant factors showing additional explanatory power. Application to experimental data allowed us to qualify the interaction of factors as either neutral (no interaction, alleviating (co-occurring effects are weaker than expected from the single effects, or aggravating (stronger than expected. We find a biologically meaningful gene cluster of putative C2TA target genes that appear to be co-regulated with MHC class II genes. CONCLUSIONS: We introduced the eruption plot as a tool for visual model comparison to identify relevant higher order interactions in the analysis of expression data obtained under the influence of multiple factors. We conclude that model selection in higher order linear regression models should generally be performed for the analysis of multi-factorial microarray data.

  19. Support vector regression model for predicting the sorption capacity of lead (II

    Directory of Open Access Journals (Sweden)

    Nusrat Parveen

    2016-09-01

    Full Text Available Biosorption is supposed to be an economical process for the treatment of wastewater containing heavy metals like lead (II. In this research paper, the support vector regression (SVR has been used to predict the sorption capacity of lead (II ions with the independent input parameters being: initial lead ion concentration, pH, temperature and contact time. Tree fern, an agricultural by-product, has been employed as a low cost biosorbent. Comparison between multiple linear regression (MLR and SVR-based models has been made using statistical parameters. It has been found that the SVR model is more accurate and generalized for prediction of the sorption capacity of lead (II ions.

  20. Optimization of biomass torrefaction conditions by the gain and loss method and regression model analysis.

    Science.gov (United States)

    Lee, Soo Min; Lee, Jae-Won

    2014-11-01

    In this study, the optimal conditions for biomass torrefaction were determined by comparing the gain of energy content to the weight loss of biomass from the final products. Torrefaction experiments were performed at temperatures ranging from 220 to 280°C using 20-80min reaction times. Polynomial regression models ranging from the 1st to the 3rd order were used to determine a relationship between the severity factor (SF) and calorific value or weight loss. The intersection of two regression models for calorific value and weight loss was determined and assumed to be the optimized SF. The optimized SFs on each biomass ranged from 6.056 to 6.372. Optimized torrefaction conditions were determined at various reaction times of 15, 30, and 60min. The average optimized temperature was 248.55°C in the studied biomass when torrefaction was performed for 60min.

  1. Small-sample likelihood inference in extreme-value regression models

    CERN Document Server

    Ferrari, Silvia L P

    2012-01-01

    We deal with a general class of extreme-value regression models introduced by Barreto- Souza and Vasconcellos (2011). Our goal is to derive an adjusted likelihood ratio statistic that is approximately distributed as \\c{hi}2 with a high degree of accuracy. Although the adjusted statistic requires more computational effort than its unadjusted counterpart, it is shown that the adjustment term has a simple compact form that can be easily implemented in standard statistical software. Further, we compare the finite sample performance of the three classical tests (likelihood ratio, Wald, and score), the gradient test that has been recently proposed by Terrell (2002), and the adjusted likelihood ratio test obtained in this paper. Our simulations favor the latter. Applications of our results are presented. Key words: Extreme-value regression; Gradient test; Gumbel distribution; Likelihood ratio test; Nonlinear models; Score test; Small-sample adjustments; Wald test.

  2. Alpins and thibos vectorial astigmatism analyses: proposal of a linear regression model between methods

    Directory of Open Access Journals (Sweden)

    Giuliano de Oliveira Freitas

    2013-10-01

    Full Text Available PURPOSE: To determine linear regression models between Alpins descriptive indices and Thibos astigmatic power vectors (APV, assessing the validity and strength of such correlations. METHODS: This case series prospectively assessed 62 eyes of 31 consecutive cataract patients with preoperative corneal astigmatism between 0.75 and 2.50 diopters in both eyes. Patients were randomly assorted among two phacoemulsification groups: one assigned to receive AcrySof®Toric intraocular lens (IOL in both eyes and another assigned to have AcrySof Natural IOL associated with limbal relaxing incisions, also in both eyes. All patients were reevaluated postoperatively at 6 months, when refractive astigmatism analysis was performed using both Alpins and Thibos methods. The ratio between Thibos postoperative APV and preoperative APV (APVratio and its linear regression to Alpins percentage of success of astigmatic surgery, percentage of astigmatism corrected and percentage of astigmatism reduction at the intended axis were assessed. RESULTS: Significant negative correlation between the ratio of post- and preoperative Thibos APVratio and Alpins percentage of success (%Success was found (Spearman's ρ=-0.93; linear regression is given by the following equation: %Success = (-APVratio + 1.00x100. CONCLUSION: The linear regression we found between APVratio and %Success permits a validated mathematical inference concerning the overall success of astigmatic surgery.

  3. Observer-based and Regression Model-based Detection of Emerging Faults in Coal Mills

    DEFF Research Database (Denmark)

    Odgaard, Peter Fogh; Lin, Bao; Jørgensen, Sten Bay

    2006-01-01

    In order to improve the reliability of power plants it is important to detect fault as fast as possible. Doing this it is interesting to find the most efficient method. Since modeling of large scale systems is time consuming it is interesting to compare a model-based method with data driven ones....... In this paper three different fault detection approaches are compared using a example of a coal mill, where a fault emerges. The compared methods are based on: an optimal unknown input observer, static and dynamic regression model-based detections. The conclusion on the comparison is that observer-based scheme...

  4. Estimating regional conductivity in 2D disc head model by multidemensional support vector regression.

    Science.gov (United States)

    Shen, Xueqin; Yan, Hui; Yan, Weili; Guo, Lei

    2007-01-01

    In this paper, we introduce multidimensional support vector regression (MSVR) with iterative re-weight least square (IRWLS) based procedure to estimating the regional conductivity in 2D disc head model. The results show that the method is capable of determining for the regional location of the disturbed conductivity in the 2D disc head model with single tissue and estimating for the tissue conductivities in the 2D disc head model with four kinds of tissue. The estimation errors are all within a few percent.

  5. CODEVECTOR MODELING USING LOCAL POLYNOMIAL REGRESSION FOR VECTOR QUANTIZATION BASED IMAGE COMPRESSION

    OpenAIRE

    P. Arockia Jansi Rani; V. Sadasivam

    2010-01-01

    Image compression is very important in reducing the costs of data storage and transmission in relatively slow channels. In this paper, a still image compression scheme driven by Self-Organizing Map with polynomial regression modeling and entropy coding, employed within the wavelet framework is presented. The image compressibility and interpretability are improved by incorporating noise reduction into the compression scheme. The implementation begins with the classical wavelet decomposition, q...

  6. USE OF THE SIMPLE LINEAR REGRESSION MODEL IN MACRO-ECONOMICAL ANALYSES

    Directory of Open Access Journals (Sweden)

    Constantin ANGHELACHE

    2011-10-01

    Full Text Available The article presents the fundamental aspects of the linear regression, as a toolbox which can be used in macroeconomic analyses. The article describes the estimation of the parameters, the statistical tests used, the homoscesasticity and heteroskedasticity. The use of econometrics instrument in macroeconomics is an important factor that guarantees the quality of the models, analyses, results and possible interpretation that can be drawn at this level.

  7. Asymptotic Properties in Semiparametric Partially Linear Regression Models for Functional Data

    Institute of Scientific and Technical Information of China (English)

    Tao ZHANG

    2013-01-01

    We consider the semiparametric partially linear regression models with mean function xTβ+g(z),where X and z are functional data.The new estimators of β and g(z) are presented and some asymptotic results are given.The strong convergence rates of the proposed estimators are obtained.In our estimation,the observation number of each subject will be completely flexible.Some simulation study is conducted to investigate the finite sample performance of the proposed estimators.

  8. Investigating nonlinear speculation in cattle, corn, and hog futures markets using logistic smooth transition regression models

    OpenAIRE

    Röthig, Andreas; Chiarella, Carl

    2006-01-01

    This article explores nonlinearities in the response of speculators' trading activity to price changes in live cattle, corn, and lean hog futures markets. Analyzing weekly data from March 4, 1997 to December 27, 2005, we reject linearity in all of these markets. Using smooth transition regression models, we find a similar structure of nonlinearities with regard to the number of different regimes, the choice of the transition variable, and the value at which the transition occurs.

  9. A generalized regression model of arsenic variations in the shallow groundwater of Bangladesh

    OpenAIRE

    Shamsudduha, M.; Taylor, R. G.; Chandler, R. E.

    2015-01-01

    Abstract Localized studies of arsenic (As) in Bangladesh have reached disparate conclusions regarding the impact of irrigation‐induced recharge on As concentrations in shallow (≤50 m below ground level) groundwater. We construct generalized regression models (GRMs) to describe observed spatial variations in As concentrations in shallow groundwater both (i) nationally, and (ii) regionally within Holocene deposits where As concentrations in groundwater are generally high (>10 μg L−1). At these ...

  10. A comparison of ordinal regression models in an analysis of factors associated with periodontal disease

    OpenAIRE

    Javali Shivalingappa; Pandit Parameshwar

    2010-01-01

    Aim: The study aimed to determine the factors associated with periodontal disease (different levels of severity) by using different regression models for ordinal data. Design: A cross-sectional design was employed using clinical examination and ′questionnaire with interview′ method. Materials and Methods: The study was conducted during June 2008 to October 2008 in Dharwad, Karnataka, India. It involved a systematic random sample of 1760 individuals aged 18-40 years. The periodon...

  11. Urban Growth Modelling with Artificial Neural Network and Logistic Regression. Case Study: Sanandaj City, Iran

    Directory of Open Access Journals (Sweden)

    SASSAN MOHAMMADY

    2013-01-01

    Full Text Available Cities have shown remarkable growth due to attraction, economic, social and facilities centralization in the past few decades. Population and urban expansion especially in developing countries, led to lack of resources, land use change from appropriate agricultural land to urban land use and marginalization. Under these circumstances, land use activity is a major issue and challenge for town and country planners. Different approaches have been attempted in urban expansion modelling. Artificial Neural network (ANN models are among knowledge-based models which have been used for urban growth modelling. ANNs are powerful tools that use a machine learning approach to quantify and model complex behaviour and patterns. In this research, ANN and logistic regression have been employed for interpreting urban growth modelling. Our case study is Sanandaj city and we used Landsat TM and ETM+ imageries acquired at 2000 and 2006. The dataset used includes distance to main roads, distance to the residence region, elevation, slope, and distance to green space. Percent Area Match (PAM obtained from modelling of these changes with ANN is equal to 90.47% and the accuracy achieved for urban growth modelling with Logistic Regression (LR is equal to 88.91%. Percent Correct Match (PCM and Figure of Merit for ANN method were 91.33% and 59.07% and then for LR were 90.84% and 57.07%, respectively.

  12. Adaptive Genetic Algorithm Model for Intrusion Detection

    Directory of Open Access Journals (Sweden)

    K. S. Anil Kumar

    2012-09-01

    Full Text Available Intrusion detection systems are intelligent systems designed to identify and prevent the misuse of computer networks and systems. Various approaches to Intrusion Detection are currently being used, but they are relatively ineffective. Thus the emerging network security systems need be part of the life system and this ispossible only by embedding knowledge into the network. The Adaptive Genetic Algorithm Model - IDS comprising of K-Means clustering Algorithm, Genetic Algorithm and Neural Network techniques. Thetechnique is tested using multitude of background knowledge sets in DARPA network traffic datasets.

  13. A Model for Dynamic Adaptive Coscheduling

    Institute of Scientific and Technical Information of China (English)

    LU Sanglu; ZHOU Xiaobo; XIE Li

    1999-01-01

    This paper proposes a dynamic adaptive coscheduling modelDASIC to take advantage of excess available resources in anetwork of workstations (NOW). Besides coscheduling related subtasksdynamically, DASIC can scale up or down the process space dependingupon the number of available processors on an NOW. Based on thedynamic idle processor group (IPG), DASIC employs three modules: thecoscheduling module, the scalable scheduling module and the loadbalancing module, and uses six algorithms to achieve scalability. Asimplified DASIC was also implemented, and experimental results arepresented in this paper, which show that it can maximize systemutilization, and achieve task parallelism as much as possible.

  14. Statistical downscaling modeling with quantile regression using lasso to estimate extreme rainfall

    Science.gov (United States)

    Santri, Dewi; Wigena, Aji Hamim; Djuraidah, Anik

    2016-02-01

    Rainfall is one of the climatic elements with high diversity and has many negative impacts especially extreme rainfall. Therefore, there are several methods that required to minimize the damage that may occur. So far, Global circulation models (GCM) are the best method to forecast global climate changes include extreme rainfall. Statistical downscaling (SD) is a technique to develop the relationship between GCM output as a global-scale independent variables and rainfall as a local- scale response variable. Using GCM method will have many difficulties when assessed against observations because GCM has high dimension and multicollinearity between the variables. The common method that used to handle this problem is principal components analysis (PCA) and partial least squares regression. The new method that can be used is lasso. Lasso has advantages in simultaneuosly controlling the variance of the fitted coefficients and performing automatic variable selection. Quantile regression is a method that can be used to detect extreme rainfall in dry and wet extreme. Objective of this study is modeling SD using quantile regression with lasso to predict extreme rainfall in Indramayu. The results showed that the estimation of extreme rainfall (extreme wet in January, February and December) in Indramayu could be predicted properly by the model at quantile 90th.

  15. Growth regression models at two generations of selected populations Alabio ducks

    Directory of Open Access Journals (Sweden)

    L Hardi Prasetyo

    2007-12-01

    Full Text Available A selection process to increase egg production of Alabio ducks was conducted in Balai Penelitian Ternak, Ciawi-Bogor. The selection aimed at increasing production, however observation on growth of the selected ducks was necessary since early growth stage (0-8 wks determines the performance during laying period. This paper presents the growth models and the coefficient of determination of two generations of selected Alabio ducks. Body weight were observed weekly on 363 ducks from F1 and 356 ducks from F2, between 0-8 weeks and then fortinghly until 16 weeks. Growth curves were analysed using regression models between age and bodyweight of each population. The selection of model with the best fit was based on the large value of determination coefficient (R2, small value of MSE, and sinificant level of regression coefficient. Result showed that cubic polynomial regression was the best fit for the two populations, Y = 56.31-1.44X+0.64X2-0.005X3 for F1 and Y = 43.05 + 0.96X + 0.69X2 - 0.0056X3 for F2. The values of R2 were 0.9466 for F1 and 0.9243 for F2, and the values of MSE were 11.586 for F1 and 19.978 for F2. The growth of F1 is better during starter period, but F2 is better during grower period.

  16. Comparison of various texture classification methods using multiresolution analysis and linear regression modelling.

    Science.gov (United States)

    Dhanya, S; Kumari Roshni, V S

    2016-01-01

    Textures play an important role in image classification. This paper proposes a high performance texture classification method using a combination of multiresolution analysis tool and linear regression modelling by channel elimination. The correlation between different frequency regions has been validated as a sort of effective texture characteristic. This method is motivated by the observation that there exists a distinctive correlation between the image samples belonging to the same kind of texture, at different frequency regions obtained by a wavelet transform. Experimentally, it is observed that this correlation differs across textures. The linear regression modelling is employed to analyze this correlation and extract texture features that characterize the samples. Our method considers not only the frequency regions but also the correlation between these regions. This paper primarily focuses on applying the Dual Tree Complex Wavelet Packet Transform and the Linear Regression model for classification of the obtained texture features. Additionally the paper also presents a comparative assessment of the classification results obtained from the above method with two more types of wavelet transform methods namely the Discrete Wavelet Transform and the Discrete Wavelet Packet Transform.

  17. Multisite and multivariable statistical downscaling using a Gaussian copula quantile regression model

    Science.gov (United States)

    Ben Alaya, M. A.; Chebana, F.; Ouarda, T. B. M. J.

    2016-09-01

    Statistical downscaling techniques are required to refine atmosphere-ocean global climate data and provide reliable meteorological information such as a realistic temporal variability and relationships between sites and variables in a changing climate. To this end, the present paper introduces a modular structure combining two statistical tools of increasing interest during the last years: (1) Gaussian copula and (2) quantile regression. The quantile regression tool is employed to specify the entire conditional distribution of downscaled variables and to address the limitations of traditional regression-based approaches whereas the Gaussian copula is performed to describe and preserve the dependence between both variables and sites. A case study based on precipitation and maximum and minimum temperatures from the province of Quebec, Canada, is used to evaluate the performance of the proposed model. Obtained results suggest that this approach is capable of generating series with realistic correlation structures and temporal variability. Furthermore, the proposed model performed better than a classical multisite multivariate statistical downscaling model for most evaluation criteria.

  18. Evaluation for Long Term PM10 Concentration Forecasting using Multi Linear Regression (MLR and Principal Component Regression (PCR Models

    Directory of Open Access Journals (Sweden)

    Samsuri Abdullah

    2016-07-01

    Full Text Available Air pollution in Peninsular Malaysia is dominated by particulate matter which is demonstrated by having the highest Air Pollution Index (API value compared to the other pollutants at most part of the country. Particulate Matter (PM10 forecasting models development is crucial because it allows the authority and citizens of a community to take necessary actions to limit their exposure to harmful levels of particulates pollution and implement protection measures to significantly improve air quality on designated locations. This study aims in improving the ability of MLR using PCs inputs for PM10 concentrations forecasting. Daily observations for PM10 in Kuala Terengganu, Malaysia from January 2003 till December 2011 were utilized to forecast PM10 concentration levels. MLR and PCR (using PCs input models were developed and the performance was evaluated using RMSE, NAE and IA. Results revealed that PCR performed better than MLR due to the implementation of PCA which reduce intricacy and eliminate data multi-collinearity.

  19. Evaluation for Long Term PM10 Concentration Forecasting using Multi Linear Regression (MLR) and Principal Component Regression (PCR) Models

    OpenAIRE

    Samsuri Abdullah; Marzuki Ismail; Si Yuen Fong; Al Mahfoodh Ali Najah Ahmed

    2016-01-01

    Air pollution in Peninsular Malaysia is dominated by particulate matter which is demonstrated by having the highest Air Pollution Index (API) value compared to the other pollutants at most part of the country. Particulate Matter (PM10) forecasting models development is crucial because it allows the authority and citizens of a community to take necessary actions to limit their exposure to harmful levels of particulates pollution and implement protection measures to significantly improve air qu...

  20. Data Quality in Linear Regression Models: Effect of Errors in Test Data and Errors in Training Data on Predictive Accuracy

    Directory of Open Access Journals (Sweden)

    Barbara D. Klein

    1999-01-01

    Full Text Available Although databases used in many organizations have been found to contain errors, little is known about the effect of these errors on predictions made by linear regression models. The paper uses a real-world example, the prediction of the net asset values of mutual funds, to investigate the effect of data quality on linear regression models. The results of two experiments are reported. The first experiment shows that the error rate and magnitude of error in data used in model prediction negatively affect the predictive accuracy of linear regression models. The second experiment shows that the error rate and the magnitude of error in data used to build the model positively affect the predictive accuracy of linear regression models. All findings are statistically significant. The findings have managerial implications for users and builders of linear regression models.

  1. Minimax lower bound for kink location estimators in a nonparametric regression model with long-range dependence

    CERN Document Server

    Wishart, Justin Rory

    2011-01-01

    In this paper, a lower bound is determined in the minimax sense for change point estimators of the first derivative of a regression function in the fractional white noise model. Similar minimax results presented previously in the area focus on change points in the derivatives of a regression function in the white noise model or consider estimation of the regression function in the presence of correlated errors.

  2. Partitioning of Multivariate Phenotypes using Regression Trees Reveals Complex Patterns of Adaptation to Climate across the Range of Black Cottonwood (Populus trichocarpa

    Directory of Open Access Journals (Sweden)

    Regis Wendpouire Oubida

    2015-03-01

    Full Text Available Local adaptation to climate in temperate forest trees involves the integration of multiple physiological, morphological, and phenological traits. Latitudinal clines are frequently observed for these traits, but environmental constraints also track longitude and altitude. We combined extensive phenotyping of 12 candidate adaptive traits, multivariate regression trees, quantitative genetics, and a genome-wide panel of SNP markers to better understand the interplay among geography, climate, and adaptation to abiotic factors in Populus trichocarpa. Heritabilities were low to moderate (0.13 to 0.32 and population differentiation for many traits exceeded the 99th percentile of the genome-wide distribution of FST, suggesting local adaptation. When climate variables were taken as predictors and the 12 traits as response variables in a multivariate regression tree analysis, evapotranspiration (Eref explained the most variation, with subsequent splits related to mean temperature of the warmest month, frost-free period (FFP, and mean annual precipitation (MAP. These grouping matched relatively well the splits using geographic variables as predictors: the northernmost groups (short FFP and low Eref had the lowest growth, and lowest cold injury index; the southern British Columbia group (low Eref and intermediate temperatures had average growth and cold injury index; the group from the coast of California and Oregon (high Eref and FFP had the highest growth performance and the highest cold injury index; and the southernmost, high-altitude group (with high Eref and low FFP performed poorly, had high cold injury index, and lower water use efficiency. Taken together, these results suggest variation in both temperature and water availability across the range shape multivariate adaptive traits in poplar.

  3. Plant adaptive behaviour in hydrological models (Invited)

    Science.gov (United States)

    van der Ploeg, M. J.; Teuling, R.

    2013-12-01

    Models that will be able to cope with future precipitation and evaporation regimes need a solid base that describes the essence of the processes involved [1]. Micro-behaviour in the soil-vegetation-atmosphere system may have a large impact on patterns emerging at larger scales. A complicating factor in the micro-behaviour is the constant interaction between vegetation and geology in which water plays a key role. The resilience of the coupled vegetation-soil system critically depends on its sensitivity to environmental changes. As a result of environmental changes vegetation may wither and die, but such environmental changes may also trigger gene adaptation. Constant exposure to environmental stresses, biotic or abiotic, influences plant physiology, gene adaptations, and flexibility in gene adaptation [2-6]. Gene expression as a result of different environmental conditions may profoundly impact drought responses across the same plant species. Differences in response to an environmental stress, has consequences for the way species are currently being treated in models (single plant to global scale). In particular, model parameters that control root water uptake and plant transpiration are generally assumed to be a property of the plant functional type. Assigning plant functional types does not allow for local plant adaptation to be reflected in the model parameters, nor does it allow for correlations that might exist between root parameters and soil type. Models potentially provide a means to link root water uptake and transport to large scale processes (e.g. Rosnay and Polcher 1998, Feddes et al. 2001, Jung 2010), especially when powered with an integrated hydrological, ecological and physiological base. We explore the experimental evidence from natural vegetation to formulate possible alternative modeling concepts. [1] Seibert, J. 2000. Multi-criteria calibration of a conceptual runoff model using a genetic algorithm. Hydrology and Earth System Sciences 4(2): 215

  4. Auto-Regressive Models of Non-Stationary Time Series with Finite Length

    Institute of Scientific and Technical Information of China (English)

    FEI Wanchun; BAI Lun

    2005-01-01

    To analyze and simulate non-stationary time series with finite length, the statistical characteristics and auto-regressive (AR) models of non-stationary time series with finite length are discussed and studied. A new AR model called the time varying parameter AR model is proposed for solution of non-stationary time series with finite length. The auto-covariances of time series simulated by means of several AR models are analyzed. The result shows that the new AR model can be used to simulate and generate a new time series with the auto-covariance same as the original time series. The size curves of cocoon filaments regarded as non-stationary time series with finite length are experimentally simulated. The simulation results are significantly better than those obtained so far, and illustrate the availability of the time varying parameter AR model. The results are useful for analyzing and simulating non-stationary time series with finite length.

  5. Assessing the response of area burned to changing climate in western boreal North America using a Multivariate Adaptive Regression Splines (MARS) approach

    Science.gov (United States)

    Balshi, M. S.; McGuire, A.D.; Duffy, P.; Flannigan, M.; Walsh, J.; Melillo, J.

    2009-01-01

    Fire is a common disturbance in the North American boreal forest that influences ecosystem structure and function. The temporal and spatial dynamics of fire are likely to be altered as climate continues to change. In this study, we ask the question: how will area burned in boreal North America by wildfire respond to future changes in climate? To evaluate this question, we developed temporally and spatially explicit relationships between air temperature and fuel moisture codes derived from the Canadian Fire Weather Index System to estimate annual area burned at 2.5?? (latitude ?? longitude) resolution using a Multivariate Adaptive Regression Spline (MARS) approach across Alaska and Canada. Burned area was substantially more predictable in the western portion of boreal North America than in eastern Canada. Burned area was also not very predictable in areas of substantial topographic relief and in areas along the transition between boreal forest and tundra. At the scale of Alaska and western Canada, the empirical fire models explain on the order of 82% of the variation in annual area burned for the period 1960-2002. July temperature was the most frequently occurring predictor across all models, but the fuel moisture codes for the months June through August (as a group) entered the models as the most important predictors of annual area burned. To predict changes in the temporal and spatial dynamics of fire under future climate, the empirical fire models used output from the Canadian Climate Center CGCM2 global climate model to predict annual area burned through the year 2100 across Alaska and western Canada. Relative to 1991-2000, the results suggest that average area burned per decade will double by 2041-2050 and will increase on the order of 3.5-5.5 times by the last decade of the 21st century. To improve the ability to better predict wildfire across Alaska and Canada, future research should focus on incorporating additional effects of long-term and successional

  6. Infinite mixture-of-experts model for sparse survival regression with application to breast cancer

    Directory of Open Access Journals (Sweden)

    Dahl Edgar

    2010-10-01

    Full Text Available Abstract Background We present an infinite mixture-of-experts model to find an unknown number of sub-groups within a given patient cohort based on survival analysis. The effect of patient features on survival is modeled using the Cox’s proportionality hazards model which yields a non-standard regression component. The model is able to find key explanatory factors (chosen from main effects and higher-order interactions for each sub-group by enforcing sparsity on the regression coefficients via the Bayesian Group-Lasso. Results Simulated examples justify the need of such an elaborate framework for identifying sub-groups along with their key characteristics versus other simpler models. When applied to a breast-cancer dataset consisting of survival times and protein expression levels of patients, it results in identifying two distinct sub-groups with different survival patterns (low-risk and high-risk along with the respective sets of compound markers. Conclusions The unified framework presented here, combining elements of cluster and feature detection for survival analysis, is clearly a powerful tool for analyzing survival patterns within a patient group. The model also demonstrates the feasibility of analyzing complex interactions which can contribute to definition of novel prognostic compound markers.

  7. Advantages of geographically weighted regression for modeling benthic substrate in two Greater Yellowstone Ecosystem streams

    Science.gov (United States)

    Sheehan, Kenneth R.; Strager, Michael P.; Welsh, Stuart

    2013-01-01

    Stream habitat assessments are commonplace in fish management, and often involve nonspatial analysis methods for quantifying or predicting habitat, such as ordinary least squares regression (OLS). Spatial relationships, however, often exist among stream habitat variables. For example, water depth, water velocity, and benthic substrate sizes within streams are often spatially correlated and may exhibit spatial nonstationarity or inconsistency in geographic space. Thus, analysis methods should address spatial relationships within habitat datasets. In this study, OLS and a recently developed method, geographically weighted regression (GWR), were used to model benthic substrate from water depth and water velocity data at two stream sites within the Greater Yellowstone Ecosystem. For data collection, each site was represented by a grid of 0.1 m2 cells, where actual values of water depth, water velocity, and benthic substrate class were measured for each cell. Accuracies of regressed substrate class data by OLS and GWR methods were calculated by comparing maps, parameter estimates, and determination coefficient r 2. For analysis of data from both sites, Akaike’s Information Criterion corrected for sample size indicated the best approximating model for the data resulted from GWR and not from OLS. Adjusted r 2 values also supported GWR as a better approach than OLS for prediction of substrate. This study supports GWR (a spatial analysis approach) over nonspatial OLS methods for prediction of habitat for stream habitat assessments.

  8. Notes on power of normality tests of error terms in regression models

    Energy Technology Data Exchange (ETDEWEB)

    Střelec, Luboš [Department of Statistics and Operation Analysis, Faculty of Business and Economics, Mendel University in Brno, Zemědělská 1, Brno, 61300 (Czech Republic)

    2015-03-10

    Normality is one of the basic assumptions in applying statistical procedures. For example in linear regression most of the inferential procedures are based on the assumption of normality, i.e. the disturbance vector is assumed to be normally distributed. Failure to assess non-normality of the error terms may lead to incorrect results of usual statistical inference techniques such as t-test or F-test. Thus, error terms should be normally distributed in order to allow us to make exact inferences. As a consequence, normally distributed stochastic errors are necessary in order to make a not misleading inferences which explains a necessity and importance of robust tests of normality. Therefore, the aim of this contribution is to discuss normality testing of error terms in regression models. In this contribution, we introduce the general RT class of robust tests for normality, and present and discuss the trade-off between power and robustness of selected classical and robust normality tests of error terms in regression models.

  9. Inferring cellular regulatory networks with Bayesian model averaging for linear regression (BMALR).

    Science.gov (United States)

    Huang, Xun; Zi, Zhike

    2014-08-01

    Bayesian network and linear regression methods have been widely applied to reconstruct cellular regulatory networks. In this work, we propose a Bayesian model averaging for linear regression (BMALR) method to infer molecular interactions in biological systems. This method uses a new closed form solution to compute the posterior probabilities of the edges from regulators to the target gene within a hybrid framework of Bayesian model averaging and linear regression methods. We have assessed the performance of BMALR by benchmarking on both in silico DREAM datasets and real experimental datasets. The results show that BMALR achieves both high prediction accuracy and high computational efficiency across different benchmarks. A pre-processing of the datasets with the log transformation can further improve the performance of BMALR, leading to a new top overall performance. In addition, BMALR can achieve robust high performance in community predictions when it is combined with other competing methods. The proposed method BMALR is competitive compared to the existing network inference methods. Therefore, BMALR will be useful to infer regulatory interactions in biological networks. A free open source software tool for the BMALR algorithm is available at https://sites.google.com/site/bmalr4netinfer/.

  10. EFFICIENT ESTIMATION OF FUNCTIONAL-COEFFICIENT REGRESSION MODELS WITH DIFFERENT SMOOTHING VARIABLES

    Institute of Scientific and Technical Information of China (English)

    Zhang Riquan; Li Guoying

    2008-01-01

    In this article, a procedure for estimating the coefficient functions on the functional-coefficient regression models with different smoothing variables in different co-efficient functions is defined. First step, by the local linear technique and the averaged method, the initial estimates of the coefficient functions are given. Second step, based on the initial estimates, the efficient estimates of the coefficient functions are proposed by a one-step back-fitting procedure. The efficient estimators share the same asymptotic normalities as the local linear estimators for the functional-coefficient models with a single smoothing variable in different functions. Two simulated examples show that the procedure is effective.

  11. A self-organizing power system stabilizer using Fuzzy Auto-Regressive Moving Average (FARMA) model

    Energy Technology Data Exchange (ETDEWEB)

    Park, Y.M.; Moon, U.C. [Seoul National Univ. (Korea, Republic of). Electrical Engineering Dept.; Lee, K.Y. [Pennsylvania State Univ., University Park, PA (United States). Electrical Engineering Dept.

    1996-06-01

    This paper presents a self-organizing power system stabilizer (SOPSS) which use the Fuzzy Auto-Regressive Moving Average (FARMA) model. The control rules and the membership functions of the proposed logic controller are generated automatically without using any plant model. The generated rules are stored in the fuzzy rule space and updated on-line by a self-organizing procedure. To show the effectiveness of the proposed controller, comparison with a conventional controller for one-machine infinite-bus system is presented.

  12. Generalized Empirical Likelihood Inference in Semiparametric Regression Model for Longitudinal Data

    Institute of Scientific and Technical Information of China (English)

    Gao Rong LI; Ping TIAN; Liu Gen XUE

    2008-01-01

    In this paper, we consider the semiparametric regression model for longitudinal data. Due to the correlation within groups, a generalized empirical log-likelihood ratio statistic for the unknown parameters in the model is suggested by introducing the working covariance matrix. It is proved that the proposed statistic is asymptotically standard chi-squared under some suitable conditions, and hence it can be used to construct the confidence regions of the parameters. A simulation study is conducted to compare the proposed method with the generalized least squares method in terms of coverage accuracy and average lengths of the confidence intervals.

  13. Asymptotic properties for the semiparametric regression model with randomly censored data

    Institute of Scientific and Technical Information of China (English)

    王启华; 郑忠国

    1997-01-01

    Suppose that the patients’ survival times,Y,are random variables following the semiparametric regression model Y=Xβ+g(T)+ε,where (X,T) is a radom vector taking values in R×[0,1],β is an unknown parameter,g(·) is an unknown smooth regression function and εis the random error with zero mean and variance σ2.It is assumed that (X,T) is independent of ε.The estimators βn and gm(·) ofβ and g(·) are defined,respectively,when the observations are randomly censored on the right and the censoring distribution is unknown.Moreover,it isshown that βm is asymptotically normal and gm(·) is weak consistence with rate Op(n-1/3).

  14. CODEVECTOR MODELING USING LOCAL POLYNOMIAL REGRESSION FOR VECTOR QUANTIZATION BASED IMAGE COMPRESSION

    Directory of Open Access Journals (Sweden)

    P. Arockia Jansi Rani

    2010-08-01

    Full Text Available Image compression is very important in reducing the costs of data storage and transmission in relatively slow channels. In this paper, a still image compression scheme driven by Self-Organizing Map with polynomial regression modeling and entropy coding, employed within the wavelet framework is presented. The image compressibility and interpretability are improved by incorporating noise reduction into the compression scheme. The implementation begins with the classical wavelet decomposition, quantization followed by Huffman encoder. The codebook for the quantization process is designed using an unsupervised learning algorithm and further modified using polynomial regression to control the amount of noise reduction. Simulation results show that the proposed method reduces bit rate significantly and provides better perceptual quality than earlier methods.

  15. A hybrid model of kernel density estimation and quantile regression for GEFCom2014 probabilistic load forecasting

    CERN Document Server

    Haben, Stephen

    2016-01-01

    We present a model for generating probabilistic forecasts by combining kernel density estimation (KDE) and quantile regression techniques, as part of the probabilistic load forecasting track of the Global Energy Forecasting Competition 2014. The KDE method is initially implemented with a time-decay parameter. We later improve this method by conditioning on the temperature or the period of the week variables to provide more accurate forecasts. Secondly, we develop a simple but effective quantile regression forecast. The novel aspects of our methodology are two-fold. First, we introduce symmetry into the time-decay parameter of the kernel density estimation based forecast. Secondly we combine three probabilistic forecasts with different weights for different periods of the month.

  16. strucchange: An R Package for Testing for Structural Change in Linear Regression Models

    Directory of Open Access Journals (Sweden)

    Achim Zeileis

    2002-01-01

    Full Text Available This paper reviews tests for structural change in linear regression models from the generalized fluctuation test framework as well as from the F test (Chow test framework. It introduces a unified approach for implementing these tests and presents how these ideas have been realized in an R package called strucchange. Enhancing the standard significance test approach the package contains methods to fit, plot and test empirical fluctuation processes (like CUSUM, MOSUM and estimates-based processes and to compute, plot and test sequences of F statistics with the supF , aveF and expF test. Thus, it makes powerful tools available to display information about structural changes in regression relationships and to assess their significance. Furthermore, it is described how incoming data can be monitored.

  17. Introduction to statistical modelling 2: categorical variables and interactions in linear regression.

    Science.gov (United States)

    Lunt, Mark

    2015-07-01

    In the first article in this series we explored the use of linear regression to predict an outcome variable from a number of predictive factors. It assumed that the predictive factors were measured on an interval scale. However, this article shows how categorical variables can also be included in a linear regression model, enabling predictions to be made separately for different groups and allowing for testing the hypothesis that the outcome differs between groups. The use of interaction terms to measure whether the effect of a particular predictor variable differs between groups is also explained. An alternative approach to testing the difference between groups of the effect of a given predictor, which consists of measuring the effect in each group separately and seeing whether the statistical significance differs between the groups, is shown to be misleading.

  18. elrm: Software Implementing Exact-Like Inference for Logistic Regression Models

    Directory of Open Access Journals (Sweden)

    David Zamar

    2007-09-01

    Full Text Available Exact inference is based on the conditional distribution of the sufficient statistics for the parameters of interest given the observed values for the remaining sufficient statistics. Exact inference for logistic regression can be problematic when data sets are large and the support of the conditional distribution cannot be represented in memory. Additionally, these methods are not widely implemented except in commercial software packages such as LogXact and SAS. Therefore, we have developed elrm, software for R implementing (approximate exact inference for binomial regression models from large data sets. We provide a description of the underlying statistical methods and illustrate the use of elrm with examples. We also evaluate elrm by comparing results with those obtained using other methods.

  19. The Effects of Agricultural Informatization on Agricultural Economic Growth: An Empirical Analysis Based on Regression Model

    Institute of Scientific and Technical Information of China (English)

    Lingling; TAN

    2013-01-01

    This article selects some major factors influencing the agricultural economic growth are selected,such as labor,capital input,farmland area,fertilizer input and information input.And it selects some factors to explain information input,such as the number of website ownership,types of books,magazines and newspapers published,the number of telephone ownership per 100 households,the number of home computers ownership per 100 households,farmers’ spending on transportation and communication,culture,education,entertainment and services, and the total number of agricultural science and technology service personnel.Using regression model,this article conducts regression analysis of the cross-section data on 31 provinces,autonomous regions and municipalities in 2010.The results show that the building of information infrastructure,the use of means of information,the popularization and promotion of knowledge of agricultural science and technology,play an important role in promoting agricultural economic growth.

  20. Application of multi regressive linear model and neural network for wear prediction of grinding mill liners

    Directory of Open Access Journals (Sweden)

    Farzaneh Ahmadzadeh

    2013-06-01

    Full Text Available The liner of an ore grinding mill is a critical component in the grinding process, necessary for both high metal recovery and shell protection. From an economic point of view, it is important to keep mill liners in operation as long as possible, minimising the downtime for maintenance or repair. Therefore, predicting their wear is crucial. This paper tests different methods of predicting wear in the context of remaining height and remaining life of the liners. The key concern is to make decisions on replacement and maintenance without stopping the mill for extra inspection as this leads to financial savings. The paper applies linear multiple regression and artificial neural networks (ANN techniques to determine the most suitable methodology for predicting wear. The advantages of the ANN model over the traditional approach of multiple regression analysis include its high accuracy.

  1. GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran.

    Science.gov (United States)

    Naghibi, Seyed Amir; Pourghasemi, Hamid Reza; Dixon, Barnali

    2016-01-01

    Groundwater is considered one of the most valuable fresh water resources. The main objective of this study was to produce groundwater spring potential maps in the Koohrang Watershed, Chaharmahal-e-Bakhtiari Province, Iran, using three machine learning models: boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF). Thirteen hydrological-geological-physiographical (HGP) factors that influence locations of springs were considered in this research. These factors include slope degree, slope aspect, altitude, topographic wetness index (TWI), slope length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, lithology, land use, drainage density, and fault density. Subsequently, groundwater spring potential was modeled and mapped using CART, RF, and BRT algorithms. The predicted results from the three models were validated using the receiver operating characteristics curve (ROC). From 864 springs identified, 605 (≈70 %) locations were used for the spring potential mapping, while the remaining 259 (≈30 %) springs were used for the model validation. The area under the curve (AUC) for the BRT model was calculated as 0.8103 and for CART and RF the AUC were 0.7870 and 0.7119, respectively. Therefore, it was concluded that the BRT model produced the best prediction results while predicting locations of springs followed by CART and RF models, respectively. Geospatially integrated BRT, CART, and RF methods proved to be useful in generating the spring potential map (SPM) with reasonable accuracy.

  2. Adaptation dynamics of the quasispecies model

    Science.gov (United States)

    Jain, Kavita

    2009-02-01

    We study the adaptation dynamics of an initially maladapted population evolving via the elementary processes of mutation and selection. The evolution occurs on rugged fitness landscapes which are defined on the multi-dimensional genotypic space and have many local peaks separated by low fitness valleys. We mainly focus on the Eigen's model that describes the deterministic dynamics of an infinite number of self-replicating molecules. In the stationary state, for small mutation rates such a population forms a {\\it quasispecies} which consists of the fittest genotype and its closely related mutants. The quasispecies dynamics on rugged fitness landscape follow a punctuated (or step-like) pattern in which a population jumps from a low fitness peak to a higher one, stays there for a considerable time before shifting the peak again and eventually reaches the global maximum of the fitness landscape. We calculate exactly several properties of this dynamical process within a simplified version of the quasispecies model.

  3. Adaptation dynamics of the quasispecies model

    Indian Academy of Sciences (India)

    Kavita Jain

    2008-08-01

    We study the adaptation dynamics of an initially maladapted population evolving via the elementary processes of mutation and selection. The evolution occurs on rugged fitness landscapes which are defined on the multi-dimensional genotypic space and have many local peaks separated by low fitness valleys. We mainly focus on the Eigen’s model that describes the deterministic dynamics of an infinite number of self-replicating molecules. In the stationary state, for small mutation rates such a population forms a quasispecies which consists of the fittest genotype and its closely related mutants. The quasispecies dynamics on rugged fitness landscape follow a punctuated (or step-like) pattern in which a population jumps from a low fitness peak to a higher one, stays there for a considerable time before shifting the peak again and eventually reaches the global maximum of the fitness landscape. We calculate exactly several properties of this dynamical process within a simplified version of the quasispecies model.

  4. European upper mantle tomography: adaptively parameterized models

    Science.gov (United States)

    Schäfer, J.; Boschi, L.

    2009-04-01

    We have devised a new algorithm for upper-mantle surface-wave tomography based on adaptive parameterization: i.e. the size of each parameterization pixel depends on the local density of seismic data coverage. The advantage in using this kind of parameterization is that a high resolution can be achieved in regions with dense data coverage while a lower (and cheaper) resolution is kept in regions with low coverage. This way, parameterization is everywhere optimal, both in terms of its computational cost, and of model resolution. This is especially important for data sets with inhomogenous data coverage, as it is usually the case for global seismic databases. The data set we use has an especially good coverage around Switzerland and over central Europe. We focus on periods from 35s to 150s. The final goal of the project is to determine a new model of seismic velocities for the upper mantle underlying Europe and the Mediterranean Basin, of resolution higher than what is currently found in the literature. Our inversions involve regularization via norm and roughness minimization, and this in turn requires that discrete norm and roughness operators associated with our adaptive grid be precisely defined. The discretization of the roughness damping operator in the case of adaptive parameterizations is not as trivial as it is for the uniform ones; important complications arise from the significant lateral variations in the size of pixels. We chose to first define the roughness operator in a spherical harmonic framework, and subsequently translate it to discrete pixels via a linear transformation. Since the smallest pixels we allow in our parameterization have a size of 0.625 °, the spherical-harmonic roughness operator has to be defined up to harmonic degree 899, corresponding to 810.000 harmonic coefficients. This results in considerable computational costs: we conduct the harmonic-pixel transformations on a small Beowulf cluster. We validate our implementation of adaptive

  5. Bias and uncertainty in regression-calibrated models of groundwater flow in heterogeneous media

    Science.gov (United States)

    Cooley, R.L.; Christensen, S.

    2006-01-01

    Groundwater models need to account for detailed but generally unknown spatial variability (heterogeneity) of the hydrogeologic model inputs. To address this problem we replace the large, m-dimensional stochastic vector ?? that reflects both small and large scales of heterogeneity in the inputs by a lumped or smoothed m-dimensional approximation ????*, where ?? is an interpolation matrix and ??* is a stochastic vector of parameters. Vector ??* has small enough dimension to allow its estimation with the available data. The consequence of the replacement is that model function f(????*) written in terms of the approximate inputs is in error with respect to the same model function written in terms of ??, ??,f(??), which is assumed to be nearly exact. The difference f(??) - f(????*), termed model error, is spatially correlated, generates prediction biases, and causes standard confidence and prediction intervals to be too small. Model error is accounted for in the weighted nonlinear regression methodology developed to estimate ??* and assess model uncertainties by incorporating the second-moment matrix of the model errors into the weight matrix. Techniques developed by statisticians to analyze classical nonlinear regression methods are extended to analyze the revised method. The analysis develops analytical expressions for bias terms reflecting the interaction of model nonlinearity and model error, for correction factors needed to adjust the sizes of confidence and prediction intervals for this interaction, and for correction factors needed to adjust the sizes of confidence and prediction intervals for possible use of a diagonal weight matrix in place of the correct one. If terms expressing the degree of intrinsic nonlinearity for f(??) and f(????*) are small, then most of the biases are small and the correction factors are reduced in magnitude. Biases, correction factors, and confidence and prediction intervals were obtained for a test problem for which model error is

  6. An ensemble Kalman filter for statistical estimation of physics constrained nonlinear regression models

    Energy Technology Data Exchange (ETDEWEB)

    Harlim, John, E-mail: jharlim@psu.edu [Department of Mathematics and Department of Meteorology, the Pennsylvania State University, University Park, PA 16802, Unites States (United States); Mahdi, Adam, E-mail: amahdi@ncsu.edu [Department of Mathematics, North Carolina State University, Raleigh, NC 27695 (United States); Majda, Andrew J., E-mail: jonjon@cims.nyu.edu [Department of Mathematics and Center for Atmosphere and Ocean Science, Courant Institute of Mathematical Sciences, New York University, New York, NY 10012 (United States)

    2014-01-15

    A central issue in contemporary science is the development of nonlinear data driven statistical–dynamical models for time series of noisy partial observations from nature or a complex model. It has been established recently that ad-hoc quadratic multi-level regression models can have finite-time blow-up of statistical solutions and/or pathological behavior of their invariant measure. Recently, a new class of physics constrained nonlinear regression models were developed to ameliorate this pathological behavior. Here a new finite ensemble Kalman filtering algorithm is developed for estimating the state, the linear and nonlinear model coefficients, the model and the observation noise covariances from available partial noisy observations of the state. Several stringent tests and applications of the method are developed here. In the most complex application, the perfect model has 57 degrees of freedom involving a zonal (east–west) jet, two topographic Rossby waves, and 54 nonlinearly interacting Rossby waves; the perfect model has significant non-Gaussian statistics in the zonal jet with blocked and unblocked regimes and a non-Gaussian skewed distribution due to interaction with the other 56 modes. We only observe the zonal jet contaminated by noise and apply the ensemble filter algorithm for estimation. Numerically, we find that a three dimensional nonlinear stochastic model with one level of memory mimics the statistical effect of the other 56 modes on the zonal jet in an accurate fashion, including the skew non-Gaussian distribution and autocorrelation decay. On the other hand, a similar stochastic model with zero memory levels fails to capture the crucial non-Gaussian behavior of the zonal jet from the perfect 57-mode model.

  7. Regional Integrated Meteorological Forecasting and Warning Model for Geological Hazards Based on Logistic Regression

    Institute of Scientific and Technical Information of China (English)

    XU Jing; YANG Chi; ZHANG Guoping

    2007-01-01

    Information model is adopted to integrate factors of various geosciences to estimate the susceptibility of geological hazards. Further combining the dynamic rainfall observations, Logistic regression is used for modeling the probabilities of geological hazard occurrences, upon which hierarchical warnings for rainfall-induced geological hazards are produced. The forecasting and warning model takes numerical precipitation forecasts on grid points as its dynamic input, forecasts the probabilities of geological hazard occurrences on the same grid, and translates the results into likelihoods in the form of a 5-level hierarchy. Validation of the model with observational data for the year 2004 shows that 80% of the geological hazards of the year have been identified as "likely enough to release warning messages". The model can satisfy the requirements of an operational warning system, thus is an effective way to improve the meteorological warnings for geological hazards.

  8. Sensitivity Analysis to Select the Most Influential Risk Factors in a Logistic Regression Model

    Directory of Open Access Journals (Sweden)

    Jassim N. Hussain

    2008-01-01

    Full Text Available The traditional variable selection methods for survival data depend on iteration procedures, and control of this process assumes tuning parameters that are problematic and time consuming, especially if the models are complex and have a large number of risk factors. In this paper, we propose a new method based on the global sensitivity analysis (GSA to select the most influential risk factors. This contributes to simplification of the logistic regression model by excluding the irrelevant risk factors, thus eliminating the need to fit and evaluate a large number of models. Data from medical trials are suggested as a way to test the efficiency and capability of this method and as a way to simplify the model. This leads to construction of an appropriate model. The proposed method ranks the risk factors according to their importance.

  9. Application of Logistic Regression Tree Model in Determining Habitat Distribution of Astragalus verus

    Directory of Open Access Journals (Sweden)

    M. Saki

    2013-03-01

    Full Text Available The relationship between plant species and environmental factors has always been a central issue in plant ecology. With rising power of statistical techniques, geo-statistics and geographic information systems (GIS, the development of predictive habitat distribution models of organisms has rapidly increased in ecology. This study aimed to evaluate the ability of Logistic Regression Tree model to create potential habitat map of Astragalus verus. This species produces Tragacanth and has economic value. A stratified- random sampling was applied to 100 sites (50 presence- 50 absence of given species, and produced environmental and edaphic factors maps by using Kriging and Inverse Distance Weighting methods in the ArcGIS software for the whole study area. Relationships between species occurrence and environmental factors were determined by Logistic Regression Tree model and extended to the whole study area. The results indicated species occurrence has strong correlation with environmental factors such as mean daily temperature and clay, EC and organic carbon content of the soil. Species occurrence showed direct relationship with mean daily temperature and clay and organic carbon, and inverse relationship with EC. Model accuracy was evaluated both by Cohen’s kappa statistics (κ and by area under Receiver Operating Characteristics curve based on independent test data set. Their values (kappa=0.9, Auc of ROC=0.96 indicated the high power of LRT to create potential habitat map on local scales. This model, therefore, can be applied to recognize potential sites for rangeland reclamation projects.

  10. Use of Poisson spatiotemporal regression models for the Brazilian Amazon Forest: malaria count data

    Directory of Open Access Journals (Sweden)

    Jorge Alberto Achcar

    2011-12-01

    Full Text Available INTRODUCTION: Malaria is a serious problem in the Brazilian Amazon region, and the detection of possible risk factors could be of great interest for public health authorities. The objective of this article was to investigate the association between environmental variables and the yearly registers of malaria in the Amazon region using Bayesian spatiotemporal methods. METHODS: We used Poisson spatiotemporal regression models to analyze the Brazilian Amazon forest malaria count for the period from 1999 to 2008. In this study, we included some covariates that could be important in the yearly prediction of malaria, such as deforestation rate. We obtained the inferences using a Bayesian approach and Markov Chain Monte Carlo (MCMC methods to simulate samples for the joint posterior distribution of interest. The discrimination of different models was also discussed. RESULTS: The model proposed here suggests that deforestation rate, the number of inhabitants per km², and the human development index (HDI are important in the prediction of malaria cases. CONCLUSIONS: It is possible to conclude that human development, population growth, deforestation, and their associated ecological alterations are conducive to increasing malaria risk. We conclude that the use of Poisson regression models that capture the spatial and temporal effects under the Bayesian paradigm is a good strategy for modeling malaria counts.

  11. Estimating the Impact of Urbanization on Air Quality in China Using Spatial Regression Models

    Directory of Open Access Journals (Sweden)

    Chuanglin Fang

    2015-11-01

    Full Text Available Urban air pollution is one of the most visible environmental problems to have accompanied China’s rapid urbanization. Based on emission inventory data from 2014, gathered from 289 cities, we used Global and Local Moran’s I to measure the spatial autorrelation of Air Quality Index (AQI values at the city level, and employed Ordinary Least Squares (OLS, Spatial Lag Model (SAR, and Geographically Weighted Regression (GWR to quantitatively estimate the comprehensive impact and spatial variations of China’s urbanization process on air quality. The results show that a significant spatial dependence and heterogeneity existed in AQI values. Regression models revealed urbanization has played an important negative role in determining air quality in Chinese cities. The population, urbanization rate, automobile density, and the proportion of secondary industry were all found to have had a significant influence over air quality. Per capita Gross Domestic Product (GDP and the scale of urban land use, however, failed the significance test at 10% level. The GWR model performed better than global models and the results of GWR modeling show that the relationship between urbanization and air quality was not constant in space. Further, the local parameter estimates suggest significant spatial variation in the impacts of various urbanization factors on air quality.

  12. Scale Effects of the Relationships between Urban Heat Islands and Impact Factors Based on a Geographically-Weighted Regression Model

    Directory of Open Access Journals (Sweden)

    Xiaobo Luo

    2016-09-01

    Full Text Available Urban heat island (UHI effect, the side effect of rapid urbanization, has become an obstacle to the further healthy development of the city. Understanding its relationships with impact factors is important to provide useful information for climate adaptation urban planning strategies. For this purpose, the geographically-weighted regression (GWR approach is used to explore the scale effects in a mountainous city, namely the change laws and characteristics of the relationships between land surface temperature and impact factors at different spatial resolutions (30–960 m. The impact factors include the Soil-adjusted Vegetation Index (SAVI, the Index-based Built-up Index (IBI, and the Soil Brightness Index (NDSI, which indicate the coverage of the vegetation, built-up, and bare land, respectively. For reference, the ordinary least squares (OLS model, a global regression technique, is also employed by using the same dependent variable and explanatory variables as in the GWR model. Results from the experiment exemplified by Chongqing showed that the GWR approach had a better prediction accuracy and a better ability to describe spatial non-stationarity than the OLS approach judged by the analysis of the local coefficient of determination (R2, Corrected Akaike Information Criterion (AICc, and F-test at small spatial resolution (< 240 m; however, when the spatial scale was increased to 480 m, this advantage has become relatively weak. This indicates that the GWR model becomes increasingly global, revealing the relationships with more generalized geographical patterns, and then spatial non-stationarity in the relationship will tend to be neglected with the increase of spatial resolution.

  13. Creating a non-linear total sediment load formula using polynomial best subset regression model

    Science.gov (United States)

    Okcu, Davut; Pektas, Ali Osman; Uyumaz, Ali

    2016-08-01

    The aim of this study is to derive a new total sediment load formula which is more accurate and which has less application constraints than the well-known formulae of the literature. 5 most known stream power concept sediment formulae which are approved by ASCE are used for benchmarking on a wide range of datasets that includes both field and flume (lab) observations. The dimensionless parameters of these widely used formulae are used as inputs in a new regression approach. The new approach is called Polynomial Best subset regression (PBSR) analysis. The aim of the PBRS analysis is fitting and testing all possible combinations of the input variables and selecting the best subset. Whole the input variables with their second and third powers are included in the regression to test the possible relation between the explanatory variables and the dependent variable. While selecting the best subset a multistep approach is used that depends on significance values and also the multicollinearity degrees of inputs. The new formula is compared to others in a holdout dataset and detailed performance investigations are conducted for field and lab datasets within this holdout data. Different goodness of fit statistics are used as they represent different perspectives of the model accuracy. After the detailed comparisons are carried out we figured out the most accurate equation that is also applicable on both flume and river data. Especially, on field dataset the prediction performance of the proposed formula outperformed the benchmark formulations.

  14. Using the jackknife for estimation in log link Bernoulli regression models.

    Science.gov (United States)

    Lipsitz, Stuart R; Fitzmaurice, Garrett M; Arriaga, Alex; Sinha, Debajyoti; Gawande, Atul A

    2015-02-10

    Bernoulli (or binomial) regression using a generalized linear model with a log link function, where the exponentiated regression parameters have interpretation as relative risks, is often more appropriate than logistic regression for prospective studies with common outcomes. In particular, many researchers regard relative risks to be more intuitively interpretable than odds ratios. However, for the log link, when the outcome is very prevalent, the likelihood may not have a unique maximum. To circumvent this problem, a 'COPY method' has been proposed, which is equivalent to creating for each subject an additional observation with the same covariates except the response variable has the outcome values interchanged (1's changed to 0's and 0's changed to 1's). The original response is given weight close to 1, while the new observation is given a positive weight close to 0; this approach always leads to convergence of the maximum likelihood algorithm, except for problems with convergence due to multicollinearity among covariates. Even though this method produces a unique maximum, when the outcome is very prevalent, and/or the sample size is relatively small, the COPY method can yield biased estimates. Here, we propose using the jackknife as a bias-reduction approach for the COPY method. The proposed method is motivated by a study of patients undergoing colorectal cancer surgery.

  15. Analysis of pulsed eddy current data using regression models for steam generator tube support structure inspection

    Science.gov (United States)

    Buck, J. A.; Underhill, P. R.; Morelli, J.; Krause, T. W.

    2016-02-01

    Nuclear steam generators (SGs) are a critical component for ensuring safe and efficient operation of a reactor. Life management strategies are implemented in which SG tubes are regularly inspected by conventional eddy current testing (ECT) and ultrasonic testing (UT) technologies to size flaws, and safe operating life of SGs is predicted based on growth models. ECT, the more commonly used technique, due to the rapidity with which full SG tube wall inspection can be performed, is challenged when inspecting ferromagnetic support structure materials in the presence of magnetite sludge and multiple overlapping degradation modes. In this work, an emerging inspection method, pulsed eddy current (PEC), is being investigated to address some of these particular inspection conditions. Time-domain signals were collected by an 8 coil array PEC probe in which ferromagnetic drilled support hole diameter, depth of rectangular tube frets and 2D tube off-centering were varied. Data sets were analyzed with a modified principal components analysis (MPCA) to extract dominant signal features. Multiple linear regression models were applied to MPCA scores to size hole diameter as well as size rectangular outer diameter tube frets. Models were improved through exploratory factor analysis, which was applied to MPCA scores to refine selection for regression models inputs by removing nonessential information.

  16. Statistical Downscaling Output GCM Modeling with Continuum Regression and Pre-Processing PCA Approach

    Directory of Open Access Journals (Sweden)

    Sutikno Sutikno

    2010-08-01

    Full Text Available One of the climate models used to predict the climatic conditions is Global Circulation Models (GCM. GCM is a computer-based model that consists of different equations. It uses numerical and deterministic equation which follows the physics rules. GCM is a main tool to predict climate and weather, also it uses as primary information source to review the climate change effect. Statistical Downscaling (SD technique is used to bridge the large-scale GCM with a small scale (the study area. GCM data is spatial and temporal data most likely to occur where the spatial correlation between different data on the grid in a single domain. Multicollinearity problems require the need for pre-processing of variable data X. Continuum Regression (CR and pre-processing with Principal Component Analysis (PCA methods is an alternative to SD modelling. CR is one method which was developed by Stone and Brooks (1990. This method is a generalization from Ordinary Least Square (OLS, Principal Component Regression (PCR and Partial Least Square method (PLS methods, used to overcome multicollinearity problems. Data processing for the station in Ambon, Pontianak, Losarang, Indramayu and Yuntinyuat show that the RMSEP values and R2 predict in the domain 8x8 and 12x12 by uses CR method produces results better than by PCR and PLS.

  17. Modeling the Philippines' real gross domestic product: A normal estimation equation for multiple linear regression

    Science.gov (United States)

    Urrutia, Jackie D.; Tampis, Razzcelle L.; Mercado, Joseph; Baygan, Aaron Vito M.; Baccay, Edcon B.

    2016-02-01

    The objective of this research is to formulate a mathematical model for the Philippines' Real Gross Domestic Product (Real GDP). The following factors are considered: Consumers' Spending (x1), Government's Spending (x2), Capital Formation (x3) and Imports (x4) as the Independent Variables that can actually influence in the Real GDP in the Philippines (y). The researchers used a Normal Estimation Equation using Matrices to create the model for Real GDP and used α = 0.01.The researchers analyzed quarterly data from 1990 to 2013. The data were acquired from the National Statistical Coordination Board (NSCB) resulting to a total of 96 observations for each variable. The data have undergone a logarithmic transformation particularly the Dependent Variable (y) to satisfy all the assumptions of the Multiple Linear Regression Analysis. The mathematical model for Real GDP was formulated using Matrices through MATLAB. Based on the results, only three of the Independent Variables are significant to the Dependent Variable namely: Consumers' Spending (x1), Capital Formation (x3) and Imports (x4), hence, can actually predict Real GDP (y). The regression analysis displays that 98.7% (coefficient of determination) of the Independent Variables can actually predict the Dependent Variable. With 97.6% of the result in Paired T-Test, the Predicted Values obtained from the model showed no significant difference from the Actual Values of Real GDP. This research will be essential in appraising the forthcoming changes to aid the Government in implementing policies for the development of the economy.

  18. EVALUATION OF THE PERFORMANCE OF WAVELETS FOR MULTIVARIATE REGRESSION MODELS USING INFRARED SPECTROSCOPY DATA

    Directory of Open Access Journals (Sweden)

    Marcia Werlang

    2008-08-01

    Full Text Available Discrete wavelet transform (DWT Daubecheis was used to compress the dimension of spectral infrared data for determination to the hydroxyl value (OHV of soybean polyols samples. Spectral data were recorded between 650 and 4000 cm-1 with a 4 cm-1 resolution by Fourier transform infrared spectroscopy (FTIR coupled with attenuated total reflection (ATR accessory. Through the models of regression using partial least squares (PLS and interval partial least squares (iPLS methods, the performance of each was compared with the original and/or between them. The spectra data set compressed the 1/4 of its original dimension they had presented the best one resulted with a lesser RMSEP that the model with the not compress signal and a similar correlation. With this result a model of lesser dimension was gotten however with the same capacity, thus DWT, getting a robust method for the reduction of the dimension of the spectra data sets, when if to intend to construct regression multivariate models.

  19. Robustness of Land-Use Regression Models Developed from Mobile Air Pollutant Measurements.

    Science.gov (United States)

    Hatzopoulou, Marianne; Valois, Marie-France; Levy, Ilan; Mihele, Cristian; Lu, Gang; Bagg, Scott; Minet, Laura; Brook, Jeffrey Robert

    2017-02-27

    Land-Use Regression (LUR) models are useful for resolving fine scale spatial variations in average air pollutant concentrations across urban areas. With the rise of mobile air pollution campaigns, characterized by short-term monitoring and large spatial extents, it is important to investigate the effects of sampling protocols on the resulting LUR. In this study a mobile lab was used to repeatedly visit a large number of locations (~1800), defined by road segments, to derive average concentrations across the city of Montreal, Canada. We hypothesize that the robustness of the LUR from these data depends upon how many independent, random times each location is visited (Nvis) and the number of locations (Nloc) used in model development and that these parameters can be optimized. By performing multiple LURs on random sets of locations, we assessed the robustness of the LUR through consistency in adjusted R2 (i.e., coefficient of variation, CV) and in regression coefficients among different models. As Nloc increased, R2adj became less variable; for Nloc=100 vs. Nloc=300 the CV in R2adj for ultrafine particles decreased from 0.088 to 0.029 and from 0.115 to 0.076 for NO2. The CV in the R2adj also decreased as Nvis increased from 6 to 16; from 0.090 to 0.014 for UFP. As Nloc and Nvis increase, the variability in the coefficient sizes across the different model realizations were also seen to decrease.

  20. A note on prognostic accuracy evaluation of regression models applied to longitudinal autocorrelated binary data

    Directory of Open Access Journals (Sweden)

    Giulia Barbati

    2014-11-01

    Full Text Available Background: Focus of this work was on evaluating the prognostic accuracy of two approaches for modelling binary longitudinal outcomes, a Generalized Estimating Equation (GEE and a likelihood based method, Marginalized Transition Model (MTM, in which a transition model is combined with a marginal generalized linear model describing the average response as a function of measured predictors.Methods: A retrospective study on cardiovascular patients and a prospective study on sciatic pain were used to evaluate discrimination by computing the Area Under the Receiver-Operating-Characteristics curve, (AUC, the Integrated Discrimination Improvement (IDI and the Net Reclassification Improvement (NRI at different time occasions. Calibration was also evaluated. A simulation study was run in order to compare model’s performance in a context of a perfect knowledge of the data generating mechanism. Results: Similar regression coefficients estimates and comparable calibration were obtained; an higher discrimination level for MTM was observed. No significant differences in calibration and MSE (Mean Square Error emerged in the simulation study, that instead confirmed the MTM higher discrimination level. Conclusions: The choice of the regression approach should depend on the scientific question being addressed, i.e. if the overall population-average and calibration or the subject-specific patterns and discrimination are the objectives of interest, and some recently proposed discrimination indices are useful in evaluating predictive accuracy also in a context of longitudinal studies.

  1. System identification modelling of ship manoeuvring motion based onε- support vector regression

    Institute of Scientific and Technical Information of China (English)

    王雪刚; 邹早建; 侯先瑞; 徐锋

    2015-01-01

    Based on theε-support vector regression, three modelling methods for the ship manoeuvring motion, i.e., the white-box modelling, the grey-box modelling and the black-box modelling, are investigated. Theoo10/10,oo20/20 zigzag tests and the o35 turning circle manoeuvre are simulated. Part of the simulation data for theoo20/20 zigzag test are used to train the support vectors, and the trained support vector machine is used to predict the wholeoo20/20 zigzag test. Comparison between the simula- ted and predictedoo20/20 zigzag test shows a good predictive ability of the three modelling methods. Then all mathematical models obtained by the modelling methods are used to predict theoo10/10 zigzag test ando35 turning circle manoeuvre, and the predicted results are compared with those of simulation tests to demonstrate the good generalization performance of the mathematical models. Finally, the modelling methods are analyzed and compared with each other in terms of the application conditions, the prediction accuracy and the computation speed. An appropriate modelling method can be chosen according to the intended use of the mathematical models and the available data for the system identification.

  2. Multiple Linear Regression Model Based on Neural Network and Its Application in the MBR Simulation

    Directory of Open Access Journals (Sweden)

    Chunqing Li

    2012-01-01

    Full Text Available The computer simulation of the membrane bioreactor MBR has become the research focus of the MBR simulation. In order to compensate for the defects, for example, long test period, high cost, invisible equipment seal, and so forth, on the basis of conducting in-depth study of the mathematical model of the MBR, combining with neural network theory, this paper proposed a three-dimensional simulation system for MBR wastewater treatment, with fast speed, high efficiency, and good visualization. The system is researched and developed with the hybrid programming of VC++ programming language and OpenGL, with a multifactor linear regression model of affecting MBR membrane fluxes based on neural network, applying modeling method of integer instead of float and quad tree recursion. The experiments show that the three-dimensional simulation system, using the above models and methods, has the inspiration and reference for the future research and application of the MBR simulation technology.

  3. Photovoltaic Array Condition Monitoring Based on Online Regression of Performance Model

    DEFF Research Database (Denmark)

    Spataru, Sergiu; Sera, Dezso; Kerekes, Tamas

    2013-01-01

    automatic supervision and condition monitoring of the PV system components, especially for small PV installations, where no specialized personnel is present at the site. This work proposes a PV array condition monitoring system based on a PV array performance model. The system is parameterized online, using...... regression modeling, from PV array production, plane-of-array irradiance, and module temperature measurements, acquired during an initial learning phase of the system. After the model has been parameterized automatically, the condition monitoring system enters the normal operation phase, where...... the performance model is used to predict the power output of the PV array. Utilizing the predicted and measured PV array output power values, the condition monitoring system is able to detect power losses above 5%, occurring in the PV array....

  4. Modeling and Simulation of Road Traffic Noise Using Artificial Neural Network and Regression.

    Science.gov (United States)

    Honarmand, M; Mousavi, S M

    2014-04-01

    Modeling and simulation of noise pollution has been done in a large city, where the population is over 2 millions. Two models of artificial neural network and regression were developed to predict in-city road traffic noise pollution with using the data of noise measurements and vehicle counts at three points of the city for a period of 12 hours. The MATLAB and DATAFIT softwares were used for simulation. The predicted results of noise level were compared with the measured noise levels in three stations. The values of normalized bias, sum of squared errors, mean of squared errors, root mean of squared errors, and squared correlation coefficient calculated for each model show the results of two models are suitable, and the predictions of artificial neural network are closer to the experimental data.

  5. Joint Bayesian variable and graph selection for regression models with network-structured predictors.

    Science.gov (United States)

    Peterson, Christine B; Stingo, Francesco C; Vannucci, Marina

    2016-03-30

    In this work, we develop a Bayesian approach to perform selection of predictors that are linked within a network. We achieve this by combining a sparse regression model relating the predictors to a response variable with a graphical model describing conditional dependencies among the predictors. The proposed method is well-suited for genomic applications because it allows the identification of pathways of functionally related genes or proteins that impact an outcome of interest. In contrast to previous approaches for network-guided variable selection, we infer the network among predictors using a Gaussian graphical model and do not assume that network information is available a priori. We demonstrate that our method outperforms existing methods in identifying network-structured predictors in simulation settings and illustrate our proposed model with an application to inference of proteins relevant to glioblastoma survival.

  6. On Calculating the Hougaard Measure of Skewness in a Nonlinear Regression Model with Two Parameters

    Directory of Open Access Journals (Sweden)

    S. A. EL-Shehawy

    2009-01-01

    Full Text Available Problem statement: This study presented an alternative computational algorithm for determining the values of the Hougaard measure of skewness as a nonlinearity measure in a Nonlinear Regression model (NLR-model with two parameters. Approach: These values indicated a degree of a nonlinear behavior in the estimator of the parameter in a NLR-model. Results: We applied the suggested algorithm on an example of a NLR-model in which there is a conditionally linear parameter. The algorithm is mainly based on many earlier studies in measures of nonlinearity. The algorithm was suited for implementation using computer algebra systems such as MAPLE, MATLAB and MATHEMATICA. Conclusion/Recommendations: The results with the corresponding output the same considering example will be compared with the results in some earlier studies.

  7. Focused information criterion and model averaging based on weighted composite quantile regression

    KAUST Repository

    Xu, Ganggang

    2013-08-13

    We study the focused information criterion and frequentist model averaging and their application to post-model-selection inference for weighted composite quantile regression (WCQR) in the context of the additive partial linear models. With the non-parametric functions approximated by polynomial splines, we show that, under certain conditions, the asymptotic distribution of the frequentist model averaging WCQR-estimator of a focused parameter is a non-linear mixture of normal distributions. This asymptotic distribution is used to construct confidence intervals that achieve the nominal coverage probability. With properly chosen weights, the focused information criterion based WCQR estimators are not only robust to outliers and non-normal residuals but also can achieve efficiency close to the maximum likelihood estimator, without assuming the true error distribution. Simulation studies and a real data analysis are used to illustrate the effectiveness of the proposed procedure. © 2013 Board of the Foundation of the Scandinavian Journal of Statistics..

  8. Adaptable Multivariate Calibration Models for Spectral Applications

    Energy Technology Data Exchange (ETDEWEB)

    THOMAS,EDWARD V.

    1999-12-20

    Multivariate calibration techniques have been used in a wide variety of spectroscopic situations. In many of these situations spectral variation can be partitioned into meaningful classes. For example, suppose that multiple spectra are obtained from each of a number of different objects wherein the level of the analyte of interest varies within each object over time. In such situations the total spectral variation observed across all measurements has two distinct general sources of variation: intra-object and inter-object. One might want to develop a global multivariate calibration model that predicts the analyte of interest accurately both within and across objects, including new objects not involved in developing the calibration model. However, this goal might be hard to realize if the inter-object spectral variation is complex and difficult to model. If the intra-object spectral variation is consistent across objects, an effective alternative approach might be to develop a generic intra-object model that can be adapted to each object separately. This paper contains recommendations for experimental protocols and data analysis in such situations. The approach is illustrated with an example involving the noninvasive measurement of glucose using near-infrared reflectance spectroscopy. Extensions to calibration maintenance and calibration transfer are discussed.

  9. Modeling animal-vehicle collisions using diagonal inflated bivariate Poisson regression.

    Science.gov (United States)

    Lao, Yunteng; Wu, Yao-Jan; Corey, Jonathan; Wang, Yinhai

    2011-01-01

    Two types of animal-vehicle collision (AVC) data are commonly adopted for AVC-related risk analysis research: reported AVC data and carcass removal data. One issue with these two data sets is that they were found to have significant discrepancies by previous studies. In order to model these two types of data together and provide a better understanding of highway AVCs, this study adopts a diagonal inflated bivariate Poisson regression method, an inflated version of bivariate Poisson regression model, to fit the reported AVC and carcass removal data sets collected in Washington State during 2002-2006. The diagonal inflated bivariate Poisson model not only can model paired data with correlation, but also handle under- or over-dispersed data sets as well. Compared with three other types of models, double Poisson, bivariate Poisson, and zero-inflated double Poisson, the diagonal inflated bivariate Poisson model demonstrates its capability of fitting two data sets with remarkable overlapping portions resulting from the same stochastic process. Therefore, the diagonal inflated bivariate Poisson model provides researchers a new approach to investigating AVCs from a different perspective involving the three distribution parameters (λ(1), λ(2) and λ(3)). The modeling results show the impacts of traffic elements, geometric design and geographic characteristics on the occurrences of both reported AVC and carcass removal data. It is found that the increase of some associated factors, such as speed limit, annual average daily traffic, and shoulder width, will increase the numbers of reported AVCs and carcass removals. Conversely, the presence of some geometric factors, such as rolling and mountainous terrain, will decrease the number of reported AVCs.

  10. An Optimal Control Modification to Model-Reference Adaptive Control for Fast Adaptation

    Science.gov (United States)

    Nguyen, Nhan T.; Krishnakumar, Kalmanje; Boskovic, Jovan

    2008-01-01

    This paper presents a method that can achieve fast adaptation for a class of model-reference adaptive control. It is well-known that standard model-reference adaptive control exhibits high-gain control behaviors when a large adaptive gain is used to achieve fast adaptation in order to reduce tracking error rapidly. High gain control creates high-frequency oscillations that can excite unmodeled dynamics and can lead to instability. The fast adaptation approach is based on the minimization of the squares of the tracking error, which is formulated as an optimal control problem. The necessary condition of optimality is used to derive an adaptive law using the gradient method. This adaptive law is shown to result in uniform boundedness of the tracking error by means of the Lyapunov s direct method. Furthermore, this adaptive law allows a large adaptive gain to be used without causing undesired high-gain control effects. The method is shown to be more robust than standard model-reference adaptive control. Simulations demonstrate the effectiveness of the proposed method.

  11. APPLICATION OF RANDOM REGRESSION MODELS FOR GROWTH TRAITS OF NELLORE CATTLE IN BRAZIL

    Directory of Open Access Journals (Sweden)

    Wéverton José Lima fONSECA

    2016-11-01

    Full Text Available Thepurpose of thisreview isto show the increase in number of researches on covariance components and genetic evaluation using random regression models (RRM for growth traits of Nellore cattle. Random regression models (RRM, also known as infinite-dimension models have been used to estimate variance components and genetic parameters for weight of beef cattle. In addition, those models are a standard alternative for genetic analyses of longitudinal data, however, the availibility of computational resources for performing genetic evaluations widely is an obstacle. Traits related to animal growth are adopted as selection criteria in beef cattle breeding programs, because the remuneration of cattle breeders is made based on the weight of carcasses. In recent years, RRM have been adopted as standard procedure in relation to the analysis of longitudinal data in animal breeding. Objetivou-se com esta revisão de literatura compilar informações sob a avaliação genética utilizando modelos de regressão aleatória (MRA para características de crescimento em bovinos da raça Nelore. Os modelos de regressão aleatória (MRA, denominados modelos de dimensão infinita, estão sendo utilizados para estimar os componentes de variância e parâmetros genéticos de pesos de bovinos de corte. Os MRA têm se tornado uma alternativa padrão para análises genéticas de dados longitudinais, onde um dos entraves destes modelos está relacionado à disponibilidade de memória e tempo computacional para a realização de avaliações genéticas em larga escala. Características relacionadas ao crescimento animal são adotadas em programas de melhoramento genético de bovinos de corte como critérios de seleção, pelo fato da remuneração, dos produtores, ser feita com base no peso das carcaças. Nos últimos anos, os MRA têm sido adotados como procedimento padrão para análise de dados longitudinais em melhoramento genético animal.

  12. A logistic regression model of Coronary Artery Disease among Male Patients in Punjab

    Directory of Open Access Journals (Sweden)

    Sohail Chand

    2005-07-01

    Full Text Available This is a cross-sectional retrospective study of 308 male patients, who were presented first time for coronary angiography at the Punjab Institute of Cardiology. The mean age was 50.97 + 9.9 among male patients. As the response variable coronary artery disease (CAD was a binary variable, logistic regression model was fitted to predict the Coronary Artery Disease with the help of significant risk factors. Age, Chest pain, Diabetes Mellitus, Smoking and Lipids are resulted as significant risk factors associated with CAD among male population.

  13. Semi-parametric estimation of random effects in a logistic regression model using conditional inference

    DEFF Research Database (Denmark)

    Petersen, Jørgen Holm

    2016-01-01

    . For each term in the composite likelihood, a conditional likelihood is used that eliminates the influence of the random effects, which results in a composite conditional likelihood consisting of only one-dimensional integrals that may be solved numerically. Good properties of the resulting estimator......This paper describes a new approach to the estimation in a logistic regression model with two crossed random effects where special interest is in estimating the variance of one of the effects while not making distributional assumptions about the other effect. A composite likelihood is studied...

  14. Estimation of Panel Data Regression Models with Two-Sided Censoring or Truncation

    DEFF Research Database (Denmark)

    Alan, Sule; Honore, Bo E.; Hu, Luojia;

    2014-01-01

    This paper constructs estimators for panel data regression models with individual speci…fic heterogeneity and two–sided censoring and truncation. Following Powell (1986) the estimation strategy is based on moment conditions constructed from re–censored or re–truncated residuals. While these moment...... conditions do not identify the parameter of interest, they can be used to motivate objective functions that do. We apply one of the estimators to study the e¤ect of a Danish tax reform on household portfolio choice. The idea behind the estimators can also be used in a cross sectional setting....

  15. Heteroscedastic nonlinear regression models based on scale mixtures of skew-normal distributions.

    Science.gov (United States)

    Lachos, Victor H; Bandyopadhyay, Dipankar; Garay, Aldo M

    2011-08-01

    An extension of some standard likelihood based procedures to heteroscedastic nonlinear regression models under scale mixtures of skew-normal (SMSN) distributions is developed. We derive a simple EM-type algorithm for iteratively computing maximum likelihood (ML) estimates and the observed information matrix is derived analytically. Simulation studies demonstrate the robustness of this flexible class against outlying and influential observations, as well as nice asymptotic properties of the proposed EM-type ML estimates. Finally, the methodology is illustrated using an ultrasonic calibration data.

  16. Analysis of Individual Credit Scoring Based on Logistic Regression with the Adaptive Lasso%个人信用评分的Adaptive Lasso-Logistic回归分析

    Institute of Scientific and Technical Information of China (English)

    张婷婷; 景英川

    2016-01-01

    以个人信用风险为研究对象,分析影响个人信用评分的因素.利用某商业银行个人信用数据,并采用Adaptive Lasso-Logistic回归模型对影响顾客的个人信用风险的因素进行分析,并与传统Logistic回归模型以及Lasso-Logistic回归模型进行比较.以对顾客“好”与“坏”的二分类结果的正确比例为主要衡量标准,实证发现以Adaptive Lasso-Logistic回归方法建立的个人信用评分模型,在变量选择和解释上,以及预测的准确性上,均优于传统的Logistic和Lasso-Logistic方法.

  17. Logistic regression model for diagnosis of transition zone prostate cancer on multi-parametric MRI

    Energy Technology Data Exchange (ETDEWEB)

    Dikaios, Nikolaos; Halligan, Steve; Taylor, Stuart; Atkinson, David; Punwani, Shonit [University College London, Centre for Medical Imaging, London (United Kingdom); University College London Hospital, Departments of Radiology, London (United Kingdom); Alkalbani, Jokha; Sidhu, Harbir Singh; Fujiwara, Taiki [University College London, Centre for Medical Imaging, London (United Kingdom); Abd-Alazeez, Mohamed; Ahmed, Hashim; Emberton, Mark [University College London, Research Department of Urology, London (United Kingdom); Kirkham, Alex; Allen, Clare [University College London Hospital, Departments of Radiology, London (United Kingdom); Freeman, Alex [University College London Hospital, Department of Histopathology, London (United Kingdom)

    2014-09-17

    We aimed to develop logistic regression (LR) models for classifying prostate cancer within the transition zone on multi-parametric magnetic resonance imaging (mp-MRI). One hundred and fifty-five patients (training cohort, 70 patients; temporal validation cohort, 85 patients) underwent mp-MRI and transperineal-template-prostate-mapping (TPM) biopsy. Positive cores were classified by cancer definitions: (1) any-cancer; (2) definition-1 [≥Gleason 4 + 3 or ≥ 6 mm cancer core length (CCL)] [high risk significant]; and (3) definition-2 (≥Gleason 3 + 4 or ≥ 4 mm CCL) cancer [intermediate-high risk significant]. For each, logistic-regression mp-MRI models were derived from the training cohort and validated internally and with the temporal cohort. Sensitivity/specificity and the area under the receiver operating characteristic (ROC-AUC) curve were calculated. LR model performance was compared to radiologists' performance. Twenty-eight of 70 patients from the training cohort, and 25/85 patients from the temporal validation cohort had significant cancer on TPM. The ROC-AUC of the LR model for classification of cancer was 0.73/0.67 at internal/temporal validation. The radiologist A/B ROC-AUC was 0.65/0.74 (temporal cohort). For patients scored by radiologists as Prostate Imaging Reporting and Data System (Pi-RADS) score 3, sensitivity/specificity of radiologist A 'best guess' and LR model was 0.14/0.54 and 0.71/0.61, respectively; and radiologist B 'best guess' and LR model was 0.40/0.34 and 0.50/0.76, respectively. LR models can improve classification of Pi-RADS score 3 lesions similar to experienced radiologists. (orig.)

  18. Non-linear regression model for spatial variation in precipitation chemistry for South India

    Science.gov (United States)

    Siva Soumya, B.; Sekhar, M.; Riotte, J.; Braun, Jean-Jacques

    Chemical composition of rainwater changes from sea to inland under the influence of several major factors - topographic location of area, its distance from sea, annual rainfall. A model is developed here to quantify the variation in precipitation chemistry under the influence of inland distance and rainfall amount. Various sites in India categorized as 'urban', 'suburban' and 'rural' have been considered for model development. pH, HCO 3, NO 3 and Mg do not change much from coast to inland while, SO 4 and Ca change is subjected to local emissions. Cl and Na originate solely from sea salinity and are the chemistry parameters in the model. Non-linear multiple regressions performed for the various categories revealed that both rainfall amount and precipitation chemistry obeyed a power law reduction with distance from sea. Cl and Na decrease rapidly for the first 100 km distance from sea, then decrease marginally for the next 100 km, and later stabilize. Regression parameters estimated for different cases were found to be consistent ( R2 ˜ 0.8). Variation in one of the parameters accounted for urbanization. Model was validated using data points from the southern peninsular region of the country. Estimates are found to be within 99.9% confidence interval. Finally, this relationship between the three parameters - rainfall amount, coastline distance, and concentration (in terms of Cl and Na) was validated with experiments conducted in a small experimental watershed in the south-west India. Chemistry estimated using the model was in good correlation with observed values with a relative error of ˜5%. Monthly variation in the chemistry is predicted from a downscaling model and then compared with the observed data. Hence, the model developed for rain chemistry is useful in estimating the concentrations at different spatio-temporal scales and is especially applicable for south-west region of India.

  19. Detection of melamine in milk powders using near-infrared hyperspectral imaging combined with regression coefficient of partial least square regression model.

    Science.gov (United States)

    Lim, Jongguk; Kim, Giyoung; Mo, Changyeun; Kim, Moon S; Chao, Kuanglin; Qin, Jianwei; Fu, Xiaping; Baek, Insuck; Cho, Byoung-Kwan

    2016-05-01

    Illegal use of nitrogen-rich melamine (C3H6N6) to boost perceived protein content of food products such as milk, infant formula, frozen yogurt, pet food, biscuits, and coffee drinks has caused serious food safety problems. Conventional methods to detect melamine in foods, such as Enzyme-linked immunosorbent assay (ELISA), High-performance liquid chromatography (HPLC), and Gas chromatography-mass spectrometry (GC-MS), are sensitive but they are time-consuming, expensive, and labor-intensive. In this research, near-infrared (NIR) hyperspectral imaging technique combined with regression coefficient of partial least squares regression (PLSR) model was used to detect melamine particles in milk powders easily and quickly. NIR hyperspectral reflectance imaging data in the spectral range of 990-1700nm were acquired from melamine-milk powder mixture samples prepared at various concentrations ranging from 0.02% to 1%. PLSR models were developed to correlate the spectral data (independent variables) with melamine concentration (dependent variables) in melamine-milk powder mixture samples. PLSR models applying various pretreatment methods were used to reconstruct the two-dimensional PLS images. PLS images were converted to the binary images to detect the suspected melamine pixels in milk powder. As the melamine concentration was increased, the numbers of suspected melamine pixels of binary images were also increased. These results suggested that NIR hyperspectral imaging technique and the PLSR model can be regarded as an effective tool to detect melamine particles in milk powders.

  20. Surface Roughness Prediction Model using Zirconia Toughened Alumina (ZTA) Turning Inserts: Taguchi Method and Regression Analysis

    Science.gov (United States)

    Mandal, Nilrudra; Doloi, Biswanath; Mondal, Biswanath

    2016-01-01

    In the present study, an attempt has been made to apply the Taguchi parameter design method and regression analysis for optimizing the cutting conditions on surface finish while machining AISI 4340 steel with the help of the newly developed yttria based Zirconia Toughened Alumina (ZTA) inserts. These inserts are prepared through wet chemical co-precipitation route followed by powder metallurgy process. Experiments have been carried out based on an orthogonal array L9 with three parameters (cutting speed, depth of cut and feed rate) at three levels (low, medium and high). Based on the mean response and signal to noise ratio (SNR), the best optimal cutting condition has been arrived at A3B1C1 i.e. cutting speed is 420 m/min, depth of cut is 0.5 mm and feed rate is 0.12 m/min considering the condition smaller is the better approach. Analysis of Variance (ANOVA) is applied to find out the significance and percentage contribution of each parameter. The mathematical model of surface roughness has been developed using regression analysis as a function of the above mentioned independent variables. The predicted values from the developed model and experimental values are found to be very close to each other justifying the significance of the model. A confirmation run has been carried out with 95 % confidence level to verify the optimized result and the values obtained are within the prescribed limit.

  1. Exergy Analysis of a Subcritical Reheat Steam Power Plant with Regression Modeling and Optimization

    Directory of Open Access Journals (Sweden)

    MUHIB ALI RAJPER

    2016-07-01

    Full Text Available In this paper, exergy analysis of a 210 MW SPP (Steam Power Plant is performed. Firstly, the plant is modeled and validated, followed by a parametric study to show the effects of various operating parameters on the performance parameters. The net power output, energy efficiency, and exergy efficiency are taken as the performance parameters, while the condenser pressure, main steam pressure, bled steam pressures, main steam temperature, and reheat steam temperature isnominated as the operating parameters. Moreover, multiple polynomial regression models are developed to correlate each performance parameter with the operating parameters. The performance is then optimizedby using Direct-searchmethod. According to the results, the net power output, energy efficiency, and exergy efficiency are calculated as 186.5 MW, 31.37 and 30.41%, respectively under normal operating conditions as a base case. The condenser is a major contributor towards the energy loss, followed by the boiler, whereas the highest irreversibilities occur in the boiler and turbine. According to the parametric study, variation in the operating parameters greatly influences the performance parameters. The regression models have appeared to be a good estimator of the performance parameters. The optimum net power output, energy efficiency and exergy efficiency are obtained as 227.6 MW, 37.4 and 36.4, respectively, which have been calculated along with optimal values of selected operating parameters.

  2. Shigella mediated depletion of macrophages in a murine breast cancer model is associated with tumor regression.

    Directory of Open Access Journals (Sweden)

    Katharina Galmbacher

    Full Text Available A tumor promoting role of macrophages has been described for a transgenic murine breast cancer model. In this model tumor-associated macrophages (TAMs represent a major component of the leukocytic infiltrate and are associated with tumor progression. Shigella flexneri is a bacterial pathogen known to specificly induce apotosis in macrophages. To evaluate whether Shigella-induced removal of macrophages may be sufficient for achieving tumor regression we have developed an attenuated strain of S. flexneri (M90TDeltaaroA and infected tumor bearing mice. Two mouse models were employed, xenotransplantation of a murine breast cancer cell line and spontanous breast cancer development in MMTV-HER2 transgenic mice. Quantitative analysis of bacterial tumor targeting demonstrated that attenuated, invasive Shigella flexneri primarily infected TAMs after systemic administration. A single i.v. injection of invasive M90TDeltaaroA resulted in caspase-1 dependent apoptosis of TAMs followed by a 74% reduction in tumors of transgenic MMTV-HER-2 mice 7 days post infection. TAM depletion was sustained and associated with complete tumor regression.These data support TAMs as useful targets for antitumor therapy and highlight attenuated bacterial pathogens as potential tools.

  3. Automated Decisional Model for Optimum Economic Order Quantity Determination Using Price Regressive Rates

    Science.gov (United States)

    Roşu, M. M.; Tarbă, C. I.; Neagu, C.

    2016-11-01

    The current models for inventory management are complementary, but together they offer a large pallet of elements for solving complex problems of companies when wanting to establish the optimum economic order quantity for unfinished products, row of materials, goods etc. The main objective of this paper is to elaborate an automated decisional model for the calculus of the economic order quantity taking into account the price regressive rates for the total order quantity. This model has two main objectives: first, to determine the periodicity when to be done the order n or the quantity order q; second, to determine the levels of stock: lighting control, security stock etc. In this way we can provide the answer to two fundamental questions: How much must be ordered? When to Order? In the current practice, the business relationships with its suppliers are based on regressive rates for price. This means that suppliers may grant discounts, from a certain level of quantities ordered. Thus, the unit price of the products is a variable which depends on the order size. So, the most important element for choosing the optimum for the economic order quantity is the total cost for ordering and this cost depends on the following elements: the medium price per units, the stock cost, the ordering cost etc.

  4. Binary Classifier Calibration using an Ensemble of Near Isotonic Regression Models

    Science.gov (United States)

    Naeini, Mahdi Pakdaman; Cooper, Gregory F.

    2017-01-01

    Learning accurate probabilistic models from data is crucial in many practical tasks in data mining. In this paper we present a new non-parametric calibration method called ensemble of near isotonic regression (ENIR). The method can be considered as an extension of BBQ [20], a recently proposed calibration method, as well as the commonly used calibration method based on isotonic regression (IsoRegC) [27]. ENIR is designed to address the key limitation of IsoRegC which is the monotonicity assumption of the predictions. Similar to BBQ, the method post-processes the output of a binary classifier to obtain calibrated probabilities. Thus it can be used with many existing classification models to generate accurate probabilistic predictions. We demonstrate the performance of ENIR on synthetic and real datasets for commonly applied binary classification models. Experimental results show that the method outperforms several common binary classifier calibration methods. In particular on the real data, ENIR commonly performs statistically significantly better than the other methods, and never worse. It is able to improve the calibration power of classifiers, while retaining their discrimination power. The method is also computationally tractable for large scale datasets, as it is O(N log N) time, where N is the number of samples.

  5. Additive Hazard Regression Models: An Application to the Natural History of Human Papillomavirus

    Directory of Open Access Journals (Sweden)

    Xianhong Xie

    2013-01-01

    Full Text Available There are several statistical methods for time-to-event analysis, among which is the Cox proportional hazards model that is most commonly used. However, when the absolute change in risk, instead of the risk ratio, is of primary interest or when the proportional hazard assumption for the Cox proportional hazards model is violated, an additive hazard regression model may be more appropriate. In this paper, we give an overview of this approach and then apply a semiparametric as well as a nonparametric additive model to a data set from a study of the natural history of human papillomavirus (HPV in HIV-positive and HIV-negative women. The results from the semiparametric model indicated on average an additional 14 oncogenic HPV infections per 100 woman-years related to CD4 count < 200 relative to HIV-negative women, and those from the nonparametric additive model showed an additional 40 oncogenic HPV infections per 100 women over 5 years of followup, while the estimated hazard ratio in the Cox model was 3.82. Although the Cox model can provide a better understanding of the exposure disease association, the additive model is often more useful for public health planning and intervention.

  6. A Vector Auto Regression Model Applied to Real Estate Development Investment: A Statistic Analysis

    Directory of Open Access Journals (Sweden)

    Fengyun Liu

    2016-10-01

    Full Text Available This study analyzes the economic system dynamics of investment in real estate from mainly four participants in China. Local governments limit the supply of commercial and residential land to raise fiscal revenue, and expand debts by land mortgage to develop industrial zones and parks. Led by local government, banks and real estate development enterprises forge a coalition on real estate investment and facilitate real estate price appreciation. The above theoretical model is empirically evidenced with VAR (Vector Auto Regression methodology. A panel VAR model shows that land leasing and real estate price appreciation positively affect local government general fiscal revenue. Additional VAR models find that bank credit in addition to private and foreign funds respectively have strong positive dynamic effects on housing prices. Housing prices also have a strong positive impact on speculation from private funds and hot money.

  7. A binary logistic regression model for discriminating real protein-protein interface

    Institute of Scientific and Technical Information of China (English)

    2003-01-01

    The selection and study of descriptive variables of protein-protein complex interface is a major question that many biologists come across when the research of protein-protein recognition is concerned. Several variables have been proposed to understand the structural or energetic features of complex interfaces. Here a systematic study of some of these "traditional" variables, as well as a few new ones, is introduced. With the values of these variables extracted from 42 PDB samples with real or false complex interfaces, a binary logistic regression analysis is performed, which results in an effective empirical model for the evaluation of binding probabilities of protein-protein interfaces. The model is validated with 12 samples, and satisfactory results are obtained for both the training and validation sets. Meanwhile, three potential dimeric interfaces of staphylokinase have been investigated and one with the best suitability to our model is proposed.

  8. Model selection for marginal regression analysis of longitudinal data with missing observations and covariate measurement error.

    Science.gov (United States)

    Shen, Chung-Wei; Chen, Yi-Hau

    2015-10-01

    Missing observations and covariate measurement error commonly arise in longitudinal data. However, existing methods for model selection in marginal regression analysis of longitudinal data fail to address the potential bias resulting from these issues. To tackle this problem, we propose a new model selection criterion, the Generalized Longitudinal Information Criterion, which is based on an approximately unbiased estimator for the expected quadratic error of a considered marginal model accounting for both data missingness and covariate measurement error. The simulation results reveal that the proposed method performs quite well in the presence of missing data and covariate measurement error. On the contrary, the naive procedures without taking care of such complexity in data may perform quite poorly. The proposed method is applied to data from the Taiwan Longitudinal Study on Aging to assess the relationship of depression with health and social status in the elderly, accommodating measurement error in the covariate as well as missing observations.

  9. Linear Regression Model of the Ash Mass Fraction and Electrical Conductivity for Slovenian Honey

    Directory of Open Access Journals (Sweden)

    Mojca Jamnik

    2008-01-01

    Full Text Available Mass fraction of ash is a quality criterion for determining the botanical origin of honey. At present, this parameter is generally being replaced by the measurement of electrical conductivity (κ. The value κ depends on the ash and acid content of honey; the higher their content, the higher the resulting conductivity. A linear regression model for the relationship between ash and electrical conductivity has been established for Slovenian honey by analysing 290 samples of Slovenian honey (including acacia, lime, chestnut, spruce, fir, multifloral and mixed forest honeydew honey. The obtained model differs from the one proposed by the International Honey Commission (IHC in the slope, but not in the section part of the relation formula. Therefore, the Slovenian model is recommended when calculating the ash mass fraction from the results of electrical conductivity in samples of Slovenian honey.

  10. General model selection estimation of a periodic regression with a Gaussian noise

    CERN Document Server

    Konev, Victor; 10.1007/s10463-008-0193-1

    2010-01-01

    This paper considers the problem of estimating a periodic function in a continuous time regression model with an additive stationary gaussian noise having unknown correlation function. A general model selection procedure on the basis of arbitrary projective estimates, which does not need the knowledge of the noise correlation function, is proposed. A non-asymptotic upper bound for quadratic risk (oracle inequality) has been derived under mild conditions on the noise. For the Ornstein-Uhlenbeck noise the risk upper bound is shown to be uniform in the nuisance parameter. In the case of gaussian white noise the constructed procedure has some advantages as compared with the procedure based on the least squares estimates (LSE). The asymptotic minimaxity of the estimates has been proved. The proposed model selection scheme is extended also to the estimation problem based on the discrete data applicably to the situation when high frequency sampling can not be provided.

  11. A theoretical model to describe progressions and regressions for exercise rehabilitation.

    Science.gov (United States)

    Blanchard, Sam; Glasgow, Phil

    2014-08-01

    This article aims to describe a new theoretical model to simplify and aid visualisation of the clinical reasoning process involved in progressing a single exercise. Exercise prescription is a core skill for physiotherapists but is an area that is lacking in theoretical models to assist clinicians when designing exercise programs to aid rehabilitation from injury. Historical models of periodization and motor learning theories lack any visual aids to assist clinicians. The concept of the proposed model is that new stimuli can be added or exchanged with other stimuli, either intrinsic or extrinsic to the participant, in order to gradually progress an exercise whilst remaining safe and effective. The proposed model maintains the core skills of physiotherapists by assisting clinical reasoning skills, exercise prescription and goal setting. It is not limited to any one pathology or rehabilitation setting and can adapted by any level of skilled clinician.

  12. Modeling group size and scalar stress by logistic regression from an archaeological perspective.

    Directory of Open Access Journals (Sweden)

    Gianmarco Alberti

    Full Text Available Johnson's scalar stress theory, describing the mechanics of (and the remedies to the increase in in-group conflictuality that parallels the increase in groups' size, provides scholars with a useful theoretical framework for the understanding of different aspects of the material culture of past communities (i.e., social organization, communal food consumption, ceramic style, architecture and settlement layout. Due to its relevance in archaeology and anthropology, the article aims at proposing a predictive model of critical level of scalar stress on the basis of community size. Drawing upon Johnson's theory and on Dunbar's findings on the cognitive constrains to human group size, a model is built by means of Logistic Regression on the basis of the data on colony fissioning among the Hutterites of North America. On the grounds of the theoretical framework sketched in the first part of the article, the absence or presence of colony fissioning is considered expression of not critical vs. critical level of scalar stress for the sake of the model building. The model, which is also tested against a sample of archaeological and ethnographic cases: a confirms the existence of a significant relationship between critical scalar stress and group size, setting the issue on firmer statistical grounds; b allows calculating the intercept and slope of the logistic regression model, which can be used in any time to estimate the probability that a community experienced a critical level of scalar stress; c allows locating a critical scalar stress threshold at community size 127 (95% CI: 122-132, while the maximum probability of critical scale stress is predicted at size 158 (95% CI: 147-170. The model ultimately provides grounds to assess, for the sake of any further archaeological/anthropological interpretation, the probability that a group reached a hot spot of size development critical for its internal cohesion.

  13. ESTIMATORS OF LINEAR REGRESSION MODEL WITH AUTOCORRELATED ERRROR TERMS AND PREDICTION USING CORRELATED UNIFORM REGRESSORS

    Directory of Open Access Journals (Sweden)

    KAYODE AYINDE

    2012-11-01

    Full Text Available Performances of estimators of linear regression model with autocorrelated error term have been attributed to the nature and specification of the explanatory variables. The violation of assumption of the independence of the explanatory variables is not uncommon especially in business, economic and social sciences, leading to the development of many estimators. Moreover, prediction is one of the main essences of regression analysis. This work, therefore, attempts to examine the parameter estimates of the Ordinary Least Square estimator (OLS, Cochrane-Orcutt estimator (COR, Maximum Likelihood estimator (ML and the estimators based on Principal Component analysis (PC in prediction of linear regression model with autocorrelated error terms under the violations of assumption of independent regressors (multicollinearity using Monte-Carlo experiment approach. With uniform variables as regressors, it further identifies the best estimator that can be used for prediction purpose by averaging the adjusted co-efficient of determination of each estimator over the number of trials. Results reveal that the performances of COR and ML estimators at each level of multicollinearity over the levels of autocorrelation are convex – like while that of the OLS and PC estimators are concave; and that asthe level of multicollinearity increases, the estimators perform much better at all the levels of autocorrelation. Except when the sample size is small (n=10, the performances of the COR and ML estimators are generally best and asymptotically the same. When the sample size is small, the COR estimator is still best except when the autocorrelation level is low. At these instances, the PC estimator is either best or competes with the best estimator. Moreover, at low level of autocorrelation in all the sample sizes, the OLS estimator competes with the best estimator in all the levels of multicollinearity.

  14. SPECIFICS OF THE APPLICATIONS OF MULTIPLE REGRESSION MODEL IN THE ANALYSES OF THE EFFECTS OF GLOBAL FINANCIAL CRISES

    Directory of Open Access Journals (Sweden)

    Željko V. Račić

    2010-12-01

    Full Text Available This paper aims to present the specifics of the application of multiple linear regression model. The economic (financial crisis is analyzed in terms of gross domestic product which is in a function of the foreign trade balance (on one hand and the credit cards, i.e. indebtedness of the population on this basis (on the other hand, in the USA (from 1999. to 2008. We used the extended application model which shows how the analyst should run the whole development process of regression model. This process began with simple statistical features and the application of regression procedures, and ended with residual analysis, intended for the study of compatibility of data and model settings. This paper also analyzes the values of some standard statistics used in the selection of appropriate regression model. Testing of the model is carried out with the use of the Statistics PASW 17 program.

  15. A land use regression model for estimating the NO2 concentration in Shanghai, China.

    Science.gov (United States)

    Meng, Xia; Chen, Li; Cai, Jing; Zou, Bin; Wu, Chang-Fu; Fu, Qingyan; Zhang, Yan; Liu, Yang; Kan, Haidong

    2015-02-01

    Limited by data accessibility, few exposure assessment studies of air pollutants have been conducted in China. There is an urgent need to develop models for assessing the intra-urban concentration of key air pollutants in Chinese cities. In this study, a land use regression (LUR) model was established to estimate NO2 during 2008-2011 in Shanghai. Four predictor variables were left in the final LUR model: the length of major road within the 2-km buffer around monitoring sites, the number of industrial sources (excluding power plants) within a 10-km buffer, the agricultural land area within a 5-km buffer, and the population counts. The model R(2) and the leave-one-out-cross-validation (LOOCV) R(2) of the NO2 LUR models were 0.82 and 0.75, respectively. The prediction surface of the NO2 concentration based on the LUR model was of high spatial resolution. The 1-year predicted concentration based on the ratio and the difference methods fitted well with the measured NO2 concentration. The LUR model of NO2 outperformed the kriging and inverse distance weighed (IDW) interpolation methods in Shanghai. Our findings suggest that the LUR model may provide a cost-effective method of air pollution exposure assessment in a developing country.

  16. Good Corporate Governance and Predicting Financial Distress Using Logistic and Probit Regression Model

    Directory of Open Access Journals (Sweden)

    Juniarti Juniarti

    2013-01-01

    Full Text Available The study aims to prove whether good corporate governance (GCG is able to predict the probability of companies experiencing financial difficulties. Financial ratios that traditionally used for predicting bankruptcy remains used in this study. Besides, this study also compares logit and probit regression models, which are widely used in research related accounting bankruptcy prediction. Both models will be compared to determine which model is more superior. The sample in this study is the infrastructure, transportation, utilities & trade, services and hotels companies experiencing financial distress in the period 2008-2011. The results show that GCG and other three variables control i.e DTA, CR and company category do not prove significantly to predict the probability of companies experiencing financial difficulties. NPM, the only variable that proved significantly distinguishing healthy firms and distress. In general, logit and probit models do not result in different conclusions. Both of the models confirm the goodness of fit of models and the results of hypothesis testing. In terms of classification accuracy, logit model proves more accurate predictions than the probit models.

  17. Multiple linear regression model for predicting biomass digestibility from structural features.

    Science.gov (United States)

    Zhu, Li; O'Dwyer, Jonathan P; Chang, Vincent S; Granda, Cesar B; Holtzapple, Mark T

    2010-07-01

    A total of 147 model lignocellulose samples with a broad spectrum of structural features (lignin contents, acetyl contents, and crystallinity indices) were hydrolyzed with a wide range of cellulase loadings during 1-, 6-, and 72-h hydrolysis periods. Carbohydrate conversions at 1, 6, and 72 h were linearly proportional to the logarithm of cellulase loadings from approximately 10% to 90% conversion, indicating that the simplified HCH-1 model is valid for predicting lignocellulose digestibility. The HCH-1 model is a modified Michaelis-Menton model that accounts for the fraction of insoluble substrate available to bind with enzyme. The slopes and intercepts of a simplified HCH-1 model were correlated with structural features using multiple linear regression (MLR) models. The agreement between the measured and predicted 1-, 6-, and 72-h slopes and intercepts of glucan, xylan, and total sugar hydrolyses indicate that lignin content, acetyl content, and cellulose crystallinity are key factors that determine biomass digestibility. The 1-, 6-, and 72-h glucan, xylan, and total sugar conversions predicted from structural features using MLR models and the simplified HCH-1 model fit satisfactorily with the measured data (R(2) approximately 1.0). The parameter selection suggests that lignin content and cellulose crystallinity more strongly affect on digestibility than acetyl content. Cellulose crystallinity has greater influence during short hydrolysis periods whereas lignin content has more influence during longer hydrolysis periods. Cellulose crystallinity shows more influence on glucan hydrolysis whereas lignin content affects xylan hydrolysis to a greater extent.

  18. Reducing the bias of estimates of genotype by environment interactions in random regression sire models.

    Science.gov (United States)

    Lillehammer, Marie; Odegård, Jørgen; Meuwissen, Theo H E

    2009-03-19

    The combination of a sire model and a random regression term describing genotype by environment interactions may lead to biased estimates of genetic variance components because of heterogeneous residual variance. In order to test different models, simulated data with genotype by environment interactions, and dairy cattle data assumed to contain such interactions, were analyzed. Two animal models were compared to four sire models. Models differed in their ability to handle heterogeneous variance from different sources. Including an individual effect with a (co)variance matrix restricted to three times the sire (co)variance matrix permitted the modeling of the additive genetic variance not covered by the sire effect. This made the ability of sire models to handle heterogeneous genetic variance approximately equivalent to that of animal models. When residual variance was heterogeneous, a different approach to account for the heterogeneity of variance was needed, for example when using dairy cattle data in order to prevent overestimation of genetic heterogeneity of variance. Including environmental classes can be used to account for heterogeneous residual variance.

  19. An adaptive turbo-shaft engine modeling method based on PS and MRR-LSSVR algorithms

    Institute of Scientific and Technical Information of China (English)

    Wang Jiankang; Zhang Haibo; Yan Changkai; Duan Shujing; Huang Xianghua

    2013-01-01

    In order to establish an adaptive turbo-shaft engine model with high accuracy,a new modeling method based on parameter selection (PS) algorithm and multi-input multi-output recursive reduced least square support vector regression (MRR-LSSVR) machine is proposed.Firstly,the PS algorithm is designed to choose the most reasonable inputs of the adaptive module.During this process,a wrapper criterion based on least square support vector regression (LSSVR) machine is adopted,which can not only reduce computational complexity but also enhance generalization performance.Secondly,with the input variables determined by the PS algorithm,a mapping model of engine parameter estimation is trained off-line using MRR-LSSVR,which has a satisfying accuracy within 5‰.Finally,based on a numerical simulation platform of an integrated helicopter/turbo-shaft engine system,an adaptive turbo-shaft engine model is developed and tested in a certain flight envelope.Under the condition of single or multiple engine components being degraded,many simulation experiments are carried out,and the simulation results show the effectiveness and validity of the proposed adaptive modeling method.

  20. Causal relationship model between variables using linear regression to improve professional commitment of lecturer

    Science.gov (United States)

    Setyaningsih, S.

    2017-01-01

    The main element to build a leading university requires lecturer commitment in a professional manner. Commitment is measured through willpower, loyalty, pride, loyalty, and integrity as a professional lecturer. A total of 135 from 337 university lecturers were sampled to collect data. Data were analyzed using validity and reliability test and multiple linear regression. Many studies have found a link on the commitment of lecturers, but the basic cause of the causal relationship is generally neglected. These results indicate that the professional commitment of lecturers affected by variables empowerment, academic culture, and trust. The relationship model between variables is composed of three substructures. The first substructure consists of endogenous variables professional commitment and exogenous three variables, namely the academic culture, empowerment and trust, as well as residue variable ɛ y . The second substructure consists of one endogenous variable that is trust and two exogenous variables, namely empowerment and academic culture and the residue variable ɛ 3. The third substructure consists of one endogenous variable, namely the academic culture and exogenous variables, namely empowerment as well as residue variable ɛ 2. Multiple linear regression was used in the path model for each substructure. The results showed that the hypothesis has been proved and these findings provide empirical evidence that increasing the variables will have an impact on increasing the professional commitment of the lecturers.