Support Vector Regression Model for Direct Methanol Fuel Cell
Tang, J. L.; Cai, C. Z.; Xiao, T. T.; Huang, S. J.
2012-07-01
The purpose of this paper is to establish a direct methanol fuel cell (DMFC) prediction model by using the support vector regression (SVR) approach combined with particle swarm optimization (PSO) algorithm for its parameter selection. Two variables, cell temperature and cell current density were employed as input variables, cell voltage value of DMFC acted as output variable. Using leave-one-out cross-validation (LOOCV) test on 21 samples, the maximum absolute percentage error (APE) yields 5.66%, the mean absolute percentage error (MAPE) is only 0.93% and the correlation coefficient (R2) as high as 0.995. Compared with the result of artificial neural network (ANN) approach, it is shown that the modeling ability of SVR surpasses that of ANN. These suggest that SVR prediction model can be a good predictor to estimate the cell voltage for DMFC system.
Directory of Open Access Journals (Sweden)
Hong-Juan Li
2013-04-01
Full Text Available Electric load forecasting is an important issue for a power utility, associated with the management of daily operations such as energy transfer scheduling, unit commitment, and load dispatch. Inspired by strong non-linear learning capability of support vector regression (SVR, this paper presents a SVR model hybridized with the empirical mode decomposition (EMD method and auto regression (AR for electric load forecasting. The electric load data of the New South Wales (Australia market are employed for comparing the forecasting performances of different forecasting models. The results confirm the validity of the idea that the proposed model can simultaneously provide forecasting with good accuracy and interpretability.
Estimating transmitted waves of floating breakwater using support vector regression model
Digital Repository Service at National Institute of Oceanography (India)
Mandal, S.; Hegde, A.V.; Kumar, V.; Patil, S.G.
to diameter of pipes (S/D). The radial basis functions performed well than the polynomial function in the regressive support vector machine as the kernel function for the given set of data. The support vector regression model gives the correlation coefficients...
On Weighted Support Vector Regression
DEFF Research Database (Denmark)
Han, Xixuan; Clemmensen, Line Katrine Harder
2014-01-01
We propose a new type of weighted support vector regression (SVR), motivated by modeling local dependencies in time and space in prediction of house prices. The classic weights of the weighted SVR are added to the slack variables in the objective function (OF‐weights). This procedure directly...
Linking Simple Economic Theory Models and the Cointegrated Vector AutoRegressive Model
DEFF Research Database (Denmark)
Møller, Niels Framroze
, it is demonstrated how other controversial hypotheses such as Rational Expectations can be formulated directly as restrictions on the CVAR-parameters. A simple example of a "Neoclassical synthetic" AS-AD model is also formulated. Finally, the partial- general equilibrium distinction is related to the CVAR as well......This paper attempts to clarify the connection between simple economic theory models and the approach of the Cointegrated Vector-Auto-Regressive model (CVAR). By considering (stylized) examples of simple static equilibrium models, it is illustrated in detail, how the theoretical model and its....... Further fundamental extensions and advances to more sophisticated theory models, such as those related to dynamics and expectations (in the structural relations) are left for future papers...
Zhou, Lim Yi; Shan, Fam Pei; Shimizu, Kunio; Imoto, Tomoaki; Lateh, Habibah; Peng, Koay Swee
2017-08-01
A comparative study of logistic regression, support vector machine (SVM) and least square support vector machine (LSSVM) models has been done to predict the slope failure (landslide) along East-West Highway (Gerik-Jeli). The effects of two monsoon seasons (southwest and northeast) that occur in Malaysia are considered in this study. Two related factors of occurrence of slope failure are included in this study: rainfall and underground water. For each method, two predictive models are constructed, namely SOUTHWEST and NORTHEAST models. Based on the results obtained from logistic regression models, two factors (rainfall and underground water level) contribute to the occurrence of slope failure. The accuracies of the three statistical models for two monsoon seasons are verified by using Relative Operating Characteristics curves. The validation results showed that all models produced prediction of high accuracy. For the results of SVM and LSSVM, the models using RBF kernel showed better prediction compared to the models using linear kernel. The comparative results showed that, for SOUTHWEST models, three statistical models have relatively similar performance. For NORTHEAST models, logistic regression has the best predictive efficiency whereas the SVM model has the second best predictive efficiency.
Directory of Open Access Journals (Sweden)
Changhao Fan
2017-01-01
Full Text Available In modeling, only information from the deviation between the output of the support vector regression (SVR model and the training sample is considered, whereas the other prior information of the training sample, such as probability distribution information, is ignored. Probabilistic distribution information describes the overall distribution of sample data in a training sample that contains different degrees of noise and potential outliers, as well as helping develop a high-accuracy model. To mine and use the probability distribution information of a training sample, a new support vector regression model that incorporates probability distribution information weight SVR (PDISVR is proposed. In the PDISVR model, the probability distribution of each sample is considered as the weight and is then introduced into the error coefficient and slack variables of SVR. Thus, the deviation and probability distribution information of the training sample are both used in the PDISVR model to eliminate the influence of noise and outliers in the training sample and to improve predictive performance. Furthermore, examples with different degrees of noise were employed to demonstrate the performance of PDISVR, which was then compared with those of three SVR-based methods. The results showed that PDISVR performs better than the three other methods.
Support vector regression model for predicting the sorption capacity of lead (II
Directory of Open Access Journals (Sweden)
Nusrat Parveen
2016-09-01
Full Text Available Biosorption is supposed to be an economical process for the treatment of wastewater containing heavy metals like lead (II. In this research paper, the support vector regression (SVR has been used to predict the sorption capacity of lead (II ions with the independent input parameters being: initial lead ion concentration, pH, temperature and contact time. Tree fern, an agricultural by-product, has been employed as a low cost biosorbent. Comparison between multiple linear regression (MLR and SVR-based models has been made using statistical parameters. It has been found that the SVR model is more accurate and generalized for prediction of the sorption capacity of lead (II ions.
Gas detonation cell width prediction model based on support vector regression
Directory of Open Access Journals (Sweden)
Jiyang Yu
2017-10-01
Full Text Available Detonation cell width is an important parameter in hydrogen explosion assessments. The experimental data on gas detonation are statistically analyzed to establish a universal method to numerically predict detonation cell widths. It is commonly understood that detonation cell width, λ, is highly correlated with the characteristic reaction zone width, δ. Classical parametric regression methods were widely applied in earlier research to build an explicit semiempirical correlation for the ratio of λ/δ. The obtained correlations formulate the dependency of the ratio λ/δ on a dimensionless effective chemical activation energy and a dimensionless temperature of the gas mixture. In this paper, support vector regression (SVR, which is based on nonparametric machine learning, is applied to achieve functions with better fitness to experimental data and more accurate predictions. Furthermore, a third parameter, dimensionless pressure, is considered as an additional independent variable. It is found that three-parameter SVR can significantly improve the performance of the fitting function. Meanwhile, SVR also provides better adaptability and the model functions can be easily renewed when experimental database is updated or new regression parameters are considered.
Active set support vector regression.
Musicant, David R; Feinberg, Alexander
2004-03-01
This paper presents active set support vector regression (ASVR), a new active set strategy to solve a straightforward reformulation of the standard support vector regression problem. This new algorithm is based on the successful ASVM algorithm for classification problems, and consists of solving a finite number of linear equations with a typically large dimensionality equal to the number of points to be approximated. However, by making use of the Sherman-Morrison-Woodbury formula, a much smaller matrix of the order of the original input space is inverted at each step. The algorithm requires no specialized quadratic or linear programming code, but merely a linear equation solver which is publicly available. ASVR is extremely fast, produces comparable generalization error to other popular algorithms, and is available on the web for download.
Modeling of Soil Aggregate Stability using Support Vector Machines and Multiple Linear Regression
Directory of Open Access Journals (Sweden)
Ali Asghar Besalatpour
2016-02-01
Full Text Available Introduction: Soil aggregate stability is a key factor in soil resistivity to mechanical stresses, including the impacts of rainfall and surface runoff, and thus to water erosion (Canasveras et al., 2010. Various indicators have been proposed to characterize and quantify soil aggregate stability, for example percentage of water-stable aggregates (WSA, mean weight diameter (MWD, geometric mean diameter (GMD of aggregates, and water-dispersible clay (WDC content (Calero et al., 2008. Unfortunately, the experimental methods available to determine these indicators are laborious, time-consuming and difficult to standardize (Canasveras et al., 2010. Therefore, it would be advantageous if aggregate stability could be predicted indirectly from more easily available data (Besalatpour et al., 2014. The main objective of this study is to investigate the potential use of support vector machines (SVMs method for estimating soil aggregate stability (as quantified by GMD as compared to multiple linear regression approach. Materials and Methods: The study area was part of the Bazoft watershed (31° 37′ to 32° 39′ N and 49° 34′ to 50° 32′ E, which is located in the Northern part of the Karun river basin in central Iran. A total of 160 soil samples were collected from the top 5 cm of soil surface. Some easily available characteristics including topographic, vegetation, and soil properties were used as inputs. Soil organic matter (SOM content was determined by the Walkley-Black method (Nelson & Sommers, 1986. Particle size distribution in the soil samples (clay, silt, sand, fine sand, and very fine sand were measured using the procedure described by Gee & Bauder (1986 and calcium carbonate equivalent (CCE content was determined by the back-titration method (Nelson, 1982. The modified Kemper & Rosenau (1986 method was used to determine wet-aggregate stability (GMD. The topographic attributes of elevation, slope, and aspect were characterized using a 20-m
Hu, Qinghua; Zhang, Shiguang; Xie, Zongxia; Mi, Jusheng; Wan, Jie
2014-09-01
Support vector regression (SVR) techniques are aimed at discovering a linear or nonlinear structure hidden in sample data. Most existing regression techniques take the assumption that the error distribution is Gaussian. However, it was observed that the noise in some real-world applications, such as wind power forecasting and direction of the arrival estimation problem, does not satisfy Gaussian distribution, but a beta distribution, Laplacian distribution, or other models. In these cases the current regression techniques are not optimal. According to the Bayesian approach, we derive a general loss function and develop a technique of the uniform model of ν-support vector regression for the general noise model (N-SVR). The Augmented Lagrange Multiplier method is introduced to solve N-SVR. Numerical experiments on artificial data sets, UCI data and short-term wind speed prediction are conducted. The results show the effectiveness of the proposed technique. Copyright © 2014 Elsevier Ltd. All rights reserved.
Mehdizadeh, Saeid; Behmanesh, Javad; Khalili, Keivan
2017-07-01
Soil temperature (T s) and its thermal regime are the most important factors in plant growth, biological activities, and water movement in soil. Due to scarcity of the T s data, estimation of soil temperature is an important issue in different fields of sciences. The main objective of the present study is to investigate the accuracy of multivariate adaptive regression splines (MARS) and support vector machine (SVM) methods for estimating the T s. For this aim, the monthly mean data of the T s (at depths of 5, 10, 50, and 100 cm) and meteorological parameters of 30 synoptic stations in Iran were utilized. To develop the MARS and SVM models, various combinations of minimum, maximum, and mean air temperatures (T min, T max, T); actual and maximum possible sunshine duration; sunshine duration ratio (n, N, n/N); actual, net, and extraterrestrial solar radiation data (R s, R n, R a); precipitation (P); relative humidity (RH); wind speed at 2 m height (u 2); and water vapor pressure (Vp) were used as input variables. Three error statistics including root-mean-square-error (RMSE), mean absolute error (MAE), and determination coefficient (R 2) were used to check the performance of MARS and SVM models. The results indicated that the MARS was superior to the SVM at different depths. In the test and validation phases, the most accurate estimations for the MARS were obtained at the depth of 10 cm for T max, T min, T inputs (RMSE = 0.71 °C, MAE = 0.54 °C, and R 2 = 0.995) and for RH, V p, P, and u 2 inputs (RMSE = 0.80 °C, MAE = 0.61 °C, and R 2 = 0.996), respectively.
Wang, Xiaolei
2014-12-12
Background: A quantitative understanding of interactions between transcription factors (TFs) and their DNA binding sites is key to the rational design of gene regulatory networks. Recent advances in high-throughput technologies have enabled high-resolution measurements of protein-DNA binding affinity. Importantly, such experiments revealed the complex nature of TF-DNA interactions, whereby the effects of nucleotide changes on the binding affinity were observed to be context dependent. A systematic method to give high-quality estimates of such complex affinity landscapes is, thus, essential to the control of gene expression and the advance of synthetic biology. Results: Here, we propose a two-round prediction method that is based on support vector regression (SVR) with weighted degree (WD) kernels. In the first round, a WD kernel with shifts and mismatches is used with SVR to detect the importance of subsequences with different lengths at different positions. The subsequences identified as important in the first round are then fed into a second WD kernel to fit the experimentally measured affinities. To our knowledge, this is the first attempt to increase the accuracy of the affinity prediction by applying two rounds of string kernels and by identifying a small number of crucial k-mers. The proposed method was tested by predicting the binding affinity landscape of Gcn4p in Saccharomyces cerevisiae using datasets from HiTS-FLIP. Our method explicitly identified important subsequences and showed significant performance improvements when compared with other state-of-the-art methods. Based on the identified important subsequences, we discovered two surprisingly stable 10-mers and one sensitive 10-mer which were not reported before. Further test on four other TFs in S. cerevisiae demonstrated the generality of our method. Conclusion: We proposed in this paper a two-round method to quantitatively model the DNA binding affinity landscape. Since the ability to modify
An Enhanced MEMS Error Modeling Approach Based on Nu-Support Vector Regression
Directory of Open Access Journals (Sweden)
Deepak Bhatt
2012-07-01
Full Text Available Micro Electro Mechanical System (MEMS-based inertial sensors have made possible the development of a civilian land vehicle navigation system by offering a low-cost solution. However, the accurate modeling of the MEMS sensor errors is one of the most challenging tasks in the design of low-cost navigation systems. These sensors exhibit significant errors like biases, drift, noises; which are negligible for higher grade units. Different conventional techniques utilizing the Gauss Markov model and neural network method have been previously utilized to model the errors. However, Gauss Markov model works unsatisfactorily in the case of MEMS units due to the presence of high inherent sensor errors. On the other hand, modeling the random drift utilizing Neural Network (NN is time consuming, thereby affecting its real-time implementation. We overcome these existing drawbacks by developing an enhanced Support Vector Machine (SVM based error model. Unlike NN, SVMs do not suffer from local minimisation or over-fitting problems and delivers a reliable global solution. Experimental results proved that the proposed SVM approach reduced the noise standard deviation by 10–35% for gyroscopes and 61–76% for accelerometers. Further, positional error drifts under static conditions improved by 41% and 80% in comparison to NN and GM approaches.
An enhanced MEMS error modeling approach based on Nu-Support Vector Regression.
Bhatt, Deepak; Aggarwal, Priyanka; Bhattacharya, Prabir; Devabhaktuni, Vijay
2012-01-01
Micro Electro Mechanical System (MEMS)-based inertial sensors have made possible the development of a civilian land vehicle navigation system by offering a low-cost solution. However, the accurate modeling of the MEMS sensor errors is one of the most challenging tasks in the design of low-cost navigation systems. These sensors exhibit significant errors like biases, drift, noises; which are negligible for higher grade units. Different conventional techniques utilizing the Gauss Markov model and neural network method have been previously utilized to model the errors. However, Gauss Markov model works unsatisfactorily in the case of MEMS units due to the presence of high inherent sensor errors. On the other hand, modeling the random drift utilizing Neural Network (NN) is time consuming, thereby affecting its real-time implementation. We overcome these existing drawbacks by developing an enhanced Support Vector Machine (SVM) based error model. Unlike NN, SVMs do not suffer from local minimisation or over-fitting problems and delivers a reliable global solution. Experimental results proved that the proposed SVM approach reduced the noise standard deviation by 10-35% for gyroscopes and 61-76% for accelerometers. Further, positional error drifts under static conditions improved by 41% and 80% in comparison to NN and GM approaches.
Directory of Open Access Journals (Sweden)
Rachid Darnag
2017-02-01
Full Text Available Support vector machines (SVM represent one of the most promising Machine Learning (ML tools that can be applied to develop a predictive quantitative structure–activity relationship (QSAR models using molecular descriptors. Multiple linear regression (MLR and artificial neural networks (ANNs were also utilized to construct quantitative linear and non linear models to compare with the results obtained by SVM. The prediction results are in good agreement with the experimental value of HIV activity; also, the results reveal the superiority of the SVM over MLR and ANN model. The contribution of each descriptor to the structure–activity relationships was evaluated.
Directory of Open Access Journals (Sweden)
Ibrahim A. Naguib
2011-12-01
Full Text Available Partial least squares regression (PLSR, spectral residual augmented classical least squares (SRACLS and support vector regression (SVR are three different chemometric models. These models are subjected to a comparative study that highlights their inherent characteristics via applying them to analysis of bisacodyl in the presence of its reported degradation products monoacetyl bisacodyl (I and desacetyl bisacodyl (II, in raw material. For proper analysis, a 3 factor 3 level experimental design was established resulting in a training set of 9 mixtures containing different ratios of the interfering species. A linear test set consisting of 6 mixtures was used to validate the prediction ability of the suggested models. To test the generalisation ability of the models, some extra mixtures were prepared that are outside the concentration space of the training set. To test the ability of models to handle nonlinearity in spectral response, another set of nonlinear samples was prepared. The paper highlights model transfer to other labs under other conditions as well. This paper aims to manifest the advantages of SRACLS and SVR over PLSR model, where SRACLS can tackle future changes without the need for tedious recalibration, while SVR is a more robust and general model, with high ability to model nonlinearity in spectral response, though like PLSR is needing recalibration. The results presented indicate the ability of the three models to analyse bisacodyl in the presence of its degradation products in raw material with high accuracy and precision; where SVR gives the best results at all tested conditions compared to other models.
Directory of Open Access Journals (Sweden)
Peek Andrew S
2007-06-01
Full Text Available Abstract Background RNA interference (RNAi is a naturally occurring phenomenon that results in the suppression of a target RNA sequence utilizing a variety of possible methods and pathways. To dissect the factors that result in effective siRNA sequences a regression kernel Support Vector Machine (SVM approach was used to quantitatively model RNA interference activities. Results Eight overall feature mapping methods were compared in their abilities to build SVM regression models that predict published siRNA activities. The primary factors in predictive SVM models are position specific nucleotide compositions. The secondary factors are position independent sequence motifs (N-grams and guide strand to passenger strand sequence thermodynamics. Finally, the factors that are least contributory but are still predictive of efficacy are measures of intramolecular guide strand secondary structure and target strand secondary structure. Of these, the site of the 5' most base of the guide strand is the most informative. Conclusion The capacity of specific feature mapping methods and their ability to build predictive models of RNAi activity suggests a relative biological importance of these features. Some feature mapping methods are more informative in building predictive models and overall t-test filtering provides a method to remove some noisy features or make comparisons among datasets. Together, these features can yield predictive SVM regression models with increased predictive accuracy between predicted and observed activities both within datasets by cross validation, and between independently collected RNAi activity datasets. Feature filtering to remove features should be approached carefully in that it is possible to reduce feature set size without substantially reducing predictive models, but the features retained in the candidate models become increasingly distinct. Software to perform feature prediction and SVM training and testing on nucleic acid
Directory of Open Access Journals (Sweden)
Jakub Langhammer
2016-11-01
Full Text Available This paper analyzes the potential of a nu-support vector regression (nu-SVR model for the reconstruction of missing data of hydrological time series from a sensor network. Sensor networks are currently experiencing rapid growth of applications in experimental research and monitoring and provide an opportunity to study the dynamics of hydrological processes in previously ungauged or remote areas. Due to physical vulnerability or limited maintenance, networks are prone to data outages, which can devaluate the unique data sources. This paper analyzes the potential of a nu-SVR model to simulate water levels in a network of sensors in four nested experimental catchments in a mid-latitude montane environment. The model was applied to a range of typical runoff situations, including a single event storm, multi-peak flood event, snowmelt, rain on snow and a low flow period. The simulations based on daily values proved the high efficiency of the nu-SVR modeling approach to simulate the hydrological processes in a network of monitoring stations. The model proved its ability to reliably reconstruct and simulate typical runoff situations, including complex events, such as rain on snow or flooding from recurrent regional rain. The worst model performance was observed at low flow periods and for single peak flows, especially in the high-altitude catchments.
Shen, Wanxiang; Xiao, Tao; Chen, Shangying; Liu, Feng; Chen, Yu Zong; Jiang, Yuyang
2017-11-01
The enzymatic hydrolysis of chemicals, which is important for in vitro drug metabolism assays, is an important indicator of drug stability profiles during drug discovery and development. Herein, we employed a stepwise feature elimination (SFE) method with nonlinear support vector machine regression (SVR) models to predict the in vitro half-lives in human plasma/blood of various esters. The SVR model was developed using public databases and literature-reported data on the half-lives of esters in human plasma/blood. In particular, the SFE method was developed to prevent over fitting and under fitting in the nonlinear model, and it provided a novel and efficient method of realizing feature combinations and selections to enhance the prediction accuracy. Our final developed model with 24 features effectively predicted an external validation set using the time-split method and presented reasonably good R2 values (0.6) and also predicted two completely independent validation datasets with R2 values of 0.62 and 0.54; thus, this model performed much better than other prediction models. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
A novel strategy for forensic age prediction by DNA methylation and support vector regression model
Xu, Cheng; Qu, Hongzhu; Wang, Guangyu; Xie, Bingbing; Shi, Yi; Yang, Yaran; Zhao, Zhao; Hu, Lan; Fang, Xiangdong; Yan, Jiangwei; Feng, Lei
2015-01-01
High deviations resulting from prediction model, gender and population difference have limited age estimation application of DNA methylation markers. Here we identified 2,957 novel age-associated DNA methylation sites (P 0.5) in blood of eight pairs of Chinese Han female monozygotic twins. Among them, nine novel sites (false discovery rate forensic practice as well as in tracking the aging process in other related applications. PMID:26635134
Tang, J. L.; Cai, C. Z.; Xiao, T. T.; Huang, S. J.
2012-07-01
The electrical conductivity of solid oxide fuel cell (SOFC) cathode is one of the most important indices affecting the efficiency of SOFC. In order to improve the performance of fuel cell system, it is advantageous to have accurate model with which one can predict the electrical conductivity. In this paper, a model utilizing support vector regression (SVR) approach combined with particle swarm optimization (PSO) algorithm for its parameter optimization was established to modeling and predicting the electrical conductivity of Ba0.5Sr0.5Co0.8Fe0.2 O3-δ-xSm0.5Sr0.5CoO3-δ (BSCF-xSSC) composite cathode under two influence factors, including operating temperature (T) and SSC content (x) in BSCF-xSSC composite cathode. The leave-one-out cross validation (LOOCV) test result by SVR strongly supports that the generalization ability of SVR model is high enough. The absolute percentage error (APE) of 27 samples does not exceed 0.05%. The mean absolute percentage error (MAPE) of all 30 samples is only 0.09% and the correlation coefficient (R2) as high as 0.999. This investigation suggests that the hybrid PSO-SVR approach may be not only a promising and practical methodology to simulate the properties of fuel cell system, but also a powerful tool to be used for optimal designing or controlling the operating process of a SOFC system.
Chao, Cheng-Min; Yu, Ya-Wen; Cheng, Bor-Wen; Kuo, Yao-Lung
2014-10-01
The aim of the paper is to use data mining technology to establish a classification of breast cancer survival patterns, and offers a treatment decision-making reference for the survival ability of women diagnosed with breast cancer in Taiwan. We studied patients with breast cancer in a specific hospital in Central Taiwan to obtain 1,340 data sets. We employed a support vector machine, logistic regression, and a C5.0 decision tree to construct a classification model of breast cancer patients' survival rates, and used a 10-fold cross-validation approach to identify the model. The results show that the establishment of classification tools for the classification of the models yielded an average accuracy rate of more than 90% for both; the SVM provided the best method for constructing the three categories of the classification system for the survival mode. The results of the experiment show that the three methods used to create the classification system, established a high accuracy rate, predicted a more accurate survival ability of women diagnosed with breast cancer, and could be used as a reference when creating a medical decision-making frame.
Deep Support Vector Machines for Regression Problems
Wiering, Marco; Schutten, Marten; Millea, Adrian; Meijster, Arnold; Schomaker, Lambertus
2013-01-01
In this paper we describe a novel extension of the support vector machine, called the deep support vector machine (DSVM). The original SVM has a single layer with kernel functions and is therefore a shallow model. The DSVM can use an arbitrary number of layers, in which lower-level layers contain
Abu Awad, Yara; Koutrakis, Petros; Coull, Brent A; Schwartz, Joel
2017-11-01
Fine ambient particulate matter has been widely associated with multiple health effects. Mitigation hinges on understanding which sources are contributing to its toxicity. Black Carbon (BC), an indicator of particles generated from traffic sources, has been associated with a number of health effects however due to its high spatial variability, its concentration is difficult to estimate. We previously fit a model estimating BC concentrations in the greater Boston area; however this model was built using limited monitoring data and could not capture the complex spatio-temporal patterns of ambient BC. In order to improve our predictive ability, we obtained more data for a total of 24,301 measurements from 368 monitors over a 12 year period in Massachusetts, Rhode Island and New Hampshire. We also used Nu-Support Vector Regression (nu-SVR) - a machine learning technique which incorporates nonlinear terms and higher order interactions, with appropriate regularization of parameter estimates. We then used a generalized additive model to refit the residuals from the nu-SVR and added the residual predictions to our earlier estimates. Both spatial and temporal predictors were included in the model which allowed us to capture the change in spatial patterns of BC over time. The 10 fold cross validated (CV) R2 of the model was good in both cold (10-fold CV R2 = 0.87) and warm seasons (CV R2 = 0.79). We have successfully built a model that can be used to estimate short and long-term exposures to BC and will be useful for studies looking at various health outcomes in MA, RI and Southern NH. Copyright © 2017 Elsevier Inc. All rights reserved.
Unsupervised parsing of gaze data with a beta-process vector auto-regressive hidden Markov model.
Houpt, Joseph W; Frame, Mary E; Blaha, Leslie M
2017-10-26
The first stage of analyzing eye-tracking data is commonly to code the data into sequences of fixations and saccades. This process is usually automated using simple, predetermined rules for classifying ranges of the time series into events, such as "if the dispersion of gaze samples is lower than a particular threshold, then code as a fixation; otherwise code as a saccade." More recent approaches incorporate additional eye-movement categories in automated parsing algorithms by using time-varying, data-driven thresholds. We describe an alternative approach using the beta-process vector auto-regressive hidden Markov model (BP-AR-HMM). The BP-AR-HMM offers two main advantages over existing frameworks. First, it provides a statistical model for eye-movement classification rather than a single estimate. Second, the BP-AR-HMM uses a latent process to model the number and nature of the types of eye movements and hence is not constrained to predetermined categories. We applied the BP-AR-HMM both to high-sampling rate gaze data from Andersson et al. (Behavior Research Methods 49(2), 1-22 2016) and to low-sampling rate data from the DIEM project (Mital et al., Cognitive Computation 3(1), 5-24 2011). Driven by the data properties, the BP-AR-HMM identified over five categories of movements, some which clearly mapped on to fixations and saccades, and others potentially captured post-saccadic oscillations, smooth pursuit, and various recording errors. The BP-AR-HMM serves as an effective algorithm for data-driven event parsing alone or as an initial step in exploring the characteristics of gaze data sets.
Qin, Li-Tang; Liu, Shu-Shen; Liu, Hai-Ling; Zhang, Yong-Hong
2010-01-01
Accurate description of hormetic dose-response curves (DRC) is a key step for the determination of the efficacy and hazards of the pollutants with the hormetic phenomenon. This study tries to use support vector regression (SVR) and least squares support vector regression (LS-SVR) to address the problem of curve fitting existing in hormesis. The SVR and LS-SVR, which are entirely different from the non-linear fitting methods used to describe hormetic effects based on large sample, are at present only optimum methods based on small sample often encountered in the experimental toxicology. The tuning parameters (C and p1 for SVR, gam and sig2 for LS-SVR) determining SVR and LS-SVR models were obtained by both the internal and external validation of the models. The internal validation was performed by using leave-one-out (LOO) cross-validation and the external validation was performed by splitting the whole data set (12 data points) into the same size (six data points) of training set and test set. The results show that SVR and LS-SVR can accurately describe not only for the hermetic J-shaped DRC of seven water-soluble organic solvents consisting of acetonitrile, methanol, ethanol, acetone, ether, tetrahydrofuran, and isopropanol, but also for the classical sigmoid DRC of six pesticides including simetryn, prometon, bromacil, velpar, diquat-dibromide monohydrate, and dichlorvos. Copyright 2009 Elsevier Ltd. All rights reserved.
Seshan, Hari; Goyal, Manish K; Falk, Michael W; Wuertz, Stefan
2014-04-15
The relationship between microbial community structure and function has been examined in detail in natural and engineered environments, but little work has been done on using microbial community information to predict function. We processed microbial community and operational data from controlled experiments with bench-scale bioreactor systems to predict reactor process performance. Four membrane-operated sequencing batch reactors treating synthetic wastewater were operated in two experiments to test the effects of (i) the toxic compound 3-chloroaniline (3-CA) and (ii) bioaugmentation targeting 3-CA degradation, on the sludge microbial community in the reactors. In the first experiment, two reactors were treated with 3-CA and two reactors were operated as controls without 3-CA input. In the second experiment, all four reactors were additionally bioaugmented with a Pseudomonas putida strain carrying a plasmid with a portion of the pathway for 3-CA degradation. Molecular data were generated from terminal restriction fragment length polymorphism (T-RFLP) analysis targeting the 16S rRNA and amoA genes from the sludge community. The electropherograms resulting from these T-RFs were used to calculate diversity indices - community richness, dynamics and evenness - for the domain Bacteria as well as for ammonia-oxidizing bacteria in each reactor over time. These diversity indices were then used to train and test a support vector regression (SVR) model to predict reactor performance based on input microbial community indices and operational data. Considering the diversity indices over time and across replicate reactors as discrete values, it was found that, although bioaugmentation with a bacterial strain harboring a subset of genes involved in the degradation of 3-CA did not bring about 3-CA degradation, it significantly affected the community as measured through all three diversity indices in both the general bacterial community and the ammonia-oxidizer community (
Theory of net analyte signal vectors in inverse regression
DEFF Research Database (Denmark)
Bro, R.; Andersen, Charlotte Møller
2003-01-01
The. net analyte signal and the net analyte signal vector are useful measures in building and optimizing multivariate calibration models. In this paper a theory for their use in inverse regression is developed. The theory of net analyte signal was originally derived from classical least squares...
Hamidi, Omid; Tapak, Leili; Abbasi, Hamed; Maryanaji, Zohreh
2017-10-01
We have conducted a case study to investigate the performance of support vector machine, multivariate adaptive regression splines, and random forest time series methods in snowfall modeling. These models were applied to a data set of monthly snowfall collected during six cold months at Hamadan Airport sample station located in the Zagros Mountain Range in Iran. We considered monthly data of snowfall from 1981 to 2008 during the period from October/November to April/May as the training set and the data from 2009 to 2015 as the testing set. The root mean square errors (RMSE), mean absolute errors (MAE), determination coefficient (R 2), coefficient of efficiency (E%), and intra-class correlation coefficient (ICC) statistics were used as evaluation criteria. Our results indicated that the random forest time series model outperformed the support vector machine and multivariate adaptive regression splines models in predicting monthly snowfall in terms of several criteria. The RMSE, MAE, R 2, E, and ICC for the testing set were 7.84, 5.52, 0.92, 0.89, and 0.93, respectively. The overall results indicated that the random forest time series model could be successfully used to estimate monthly snowfall values. Moreover, the support vector machine model showed substantial performance as well, suggesting it may also be applied to forecast snowfall in this area.
Cardiovascular Response Identification Based on Nonlinear Support Vector Regression
Wang, Lu; Su, Steven W.; Chan, Gregory S. H.; Celler, Branko G.; Cheng, Teddy M.; Savkin, Andrey V.
This study experimentally investigates the relationships between central cardiovascular variables and oxygen uptake based on nonlinear analysis and modeling. Ten healthy subjects were studied using cycle-ergometry exercise tests with constant workloads ranging from 25 Watt to 125 Watt. Breath by breath gas exchange, heart rate, cardiac output, stroke volume and blood pressure were measured at each stage. The modeling results proved that the nonlinear modeling method (Support Vector Regression) outperforms traditional regression method (reducing Estimation Error between 59% and 80%, reducing Testing Error between 53% and 72%) and is the ideal approach in the modeling of physiological data, especially with small training data set.
Flexible survival regression modelling
DEFF Research Database (Denmark)
Cortese, Giuliana; Scheike, Thomas H; Martinussen, Torben
2009-01-01
Regression analysis of survival data, and more generally event history data, is typically based on Cox's regression model. We here review some recent methodology, focusing on the limitations of Cox's regression model. The key limitation is that the model is not well suited to represent time-varyi...
Fault Isolation for Nonlinear Systems Using Flexible Support Vector Regression
Directory of Open Access Journals (Sweden)
Yufang Liu
2014-01-01
Full Text Available While support vector regression is widely used as both a function approximating tool and a residual generator for nonlinear system fault isolation, a drawback for this method is the freedom in selecting model parameters. Moreover, for samples with discordant distributing complexities, the selection of reasonable parameters is even impossible. To alleviate this problem we introduce the method of flexible support vector regression (F-SVR, which is especially suited for modelling complicated sample distributions, as it is free from parameters selection. Reasonable parameters for F-SVR are automatically generated given a sample distribution. Lastly, we apply this method in the analysis of the fault isolation of high frequency power supplies, where satisfactory results have been obtained.
Zhang, B; Liang, X L; Gao, H Y; Ye, L S; Wang, Y G
2016-05-13
We evaluated the application of three machine learning algorithms, including logistic regression, support vector machine and back-propagation neural network, for diagnosing congenital heart disease and colorectal cancer. By inspecting related serum tumor marker levels in colorectal cancer patients and healthy subjects, early diagnosis models for colorectal cancer were built using three machine learning algorithms to assess their corresponding diagnostic values. Except for serum alpha-fetoprotein, the levels of 11 other serum markers of patients in the colorectal cancer group were higher than those in the benign colorectal cancer group (P model and back-propagation, a neural network diagnosis model was built with diagnostic accuracies of 82 and 75%, sensitivities of 85 and 80%, and specificities of 80 and 70%, respectively. Colorectal cancer diagnosis models based on the three machine learning algorithms showed high diagnostic value and can help obtain evidence for the early diagnosis of colorectal cancer.
Knowledge-Based Green's Kernel for Support Vector Regression
Directory of Open Access Journals (Sweden)
Tahir Farooq
2010-01-01
Full Text Available This paper presents a novel prior knowledge-based Green's kernel for support vector regression (SVR. After reviewing the correspondence between support vector kernels used in support vector machines (SVMs and regularization operators used in regularization networks and the use of Green's function of their corresponding regularization operators to construct support vector kernels, a mathematical framework is presented to obtain the domain knowledge about magnitude of the Fourier transform of the function to be predicted and design a prior knowledge-based Green's kernel that exhibits optimal regularization properties by using the concept of matched filters. The matched filter behavior of the proposed kernel function makes it suitable for signals corrupted with noise that includes many real world systems. We conduct several experiments mostly using benchmark datasets to compare the performance of our proposed technique with the results already published in literature for other existing support vector kernel over a variety of settings including different noise levels, noise models, loss functions, and SVM variations. Experimental results indicate that knowledge-based Green's kernel could be seen as a good choice among the other candidate kernel functions.
Directory of Open Access Journals (Sweden)
Hailun Wang
2017-01-01
Full Text Available Support vector regression algorithm is widely used in fault diagnosis of rolling bearing. A new model parameter selection method for support vector regression based on adaptive fusion of the mixed kernel function is proposed in this paper. We choose the mixed kernel function as the kernel function of support vector regression. The mixed kernel function of the fusion coefficients, kernel function parameters, and regression parameters are combined together as the parameters of the state vector. Thus, the model selection problem is transformed into a nonlinear system state estimation problem. We use a 5th-degree cubature Kalman filter to estimate the parameters. In this way, we realize the adaptive selection of mixed kernel function weighted coefficients and the kernel parameters, the regression parameters. Compared with a single kernel function, unscented Kalman filter (UKF support vector regression algorithms, and genetic algorithms, the decision regression function obtained by the proposed method has better generalization ability and higher prediction accuracy.
Directory of Open Access Journals (Sweden)
Zhan-bo Chen
2014-01-01
Full Text Available In order to improve the performance prediction accuracy of hydraulic excavator, the regression least squares support vector machine is applied. First, the mathematical model of the regression least squares support vector machine is studied, and then the algorithm of the regression least squares support vector machine is designed. Finally, the performance prediction simulation of hydraulic excavator based on regression least squares support vector machine is carried out, and simulation results show that this method can predict the performance changing rules of hydraulic excavator correctly.
Naguib, Ibrahim A.; Darwish, Hany W.
2012-02-01
A comparison between support vector regression (SVR) and Artificial Neural Networks (ANNs) multivariate regression methods is established showing the underlying algorithm for each and making a comparison between them to indicate the inherent advantages and limitations. In this paper we compare SVR to ANN with and without variable selection procedure (genetic algorithm (GA)). To project the comparison in a sensible way, the methods are used for the stability indicating quantitative analysis of mixtures of mebeverine hydrochloride and sulpiride in binary mixtures as a case study in presence of their reported impurities and degradation products (summing up to 6 components) in raw materials and pharmaceutical dosage form via handling the UV spectral data. For proper analysis, a 6 factor 5 level experimental design was established resulting in a training set of 25 mixtures containing different ratios of the interfering species. An independent test set consisting of 5 mixtures was used to validate the prediction ability of the suggested models. The proposed methods (linear SVR (without GA) and linear GA-ANN) were successfully applied to the analysis of pharmaceutical tablets containing mebeverine hydrochloride and sulpiride mixtures. The results manifest the problem of nonlinearity and how models like the SVR and ANN can handle it. The methods indicate the ability of the mentioned multivariate calibration models to deconvolute the highly overlapped UV spectra of the 6 components' mixtures, yet using cheap and easy to handle instruments like the UV spectrophotometer.
Naguib, Ibrahim A; Darwish, Hany W
2012-02-01
A comparison between support vector regression (SVR) and Artificial Neural Networks (ANNs) multivariate regression methods is established showing the underlying algorithm for each and making a comparison between them to indicate the inherent advantages and limitations. In this paper we compare SVR to ANN with and without variable selection procedure (genetic algorithm (GA)). To project the comparison in a sensible way, the methods are used for the stability indicating quantitative analysis of mixtures of mebeverine hydrochloride and sulpiride in binary mixtures as a case study in presence of their reported impurities and degradation products (summing up to 6 components) in raw materials and pharmaceutical dosage form via handling the UV spectral data. For proper analysis, a 6 factor 5 level experimental design was established resulting in a training set of 25 mixtures containing different ratios of the interfering species. An independent test set consisting of 5 mixtures was used to validate the prediction ability of the suggested models. The proposed methods (linear SVR (without GA) and linear GA-ANN) were successfully applied to the analysis of pharmaceutical tablets containing mebeverine hydrochloride and sulpiride mixtures. The results manifest the problem of nonlinearity and how models like the SVR and ANN can handle it. The methods indicate the ability of the mentioned multivariate calibration models to deconvolute the highly overlapped UV spectra of the 6 components' mixtures, yet using cheap and easy to handle instruments like the UV spectrophotometer. Copyright © 2011 Elsevier B.V. All rights reserved.
Electricity Load Forecasting Using Support Vector Regression with Memetic Algorithms
Hu, Zhongyi; Xiong, Tao
2013-01-01
Electricity load forecasting is an important issue that is widely explored and examined in power systems operation literature and commercial transactions in electricity markets literature as well. Among the existing forecasting models, support vector regression (SVR) has gained much attention. Considering the performance of SVR highly depends on its parameters; this study proposed a firefly algorithm (FA) based memetic algorithm (FA-MA) to appropriately determine the parameters of SVR forecasting model. In the proposed FA-MA algorithm, the FA algorithm is applied to explore the solution space, and the pattern search is used to conduct individual learning and thus enhance the exploitation of FA. Experimental results confirm that the proposed FA-MA based SVR model can not only yield more accurate forecasting results than the other four evolutionary algorithms based SVR models and three well-known forecasting models but also outperform the hybrid algorithms in the related existing literature. PMID:24459425
Electricity Load Forecasting Using Support Vector Regression with Memetic Algorithms
Directory of Open Access Journals (Sweden)
Zhongyi Hu
2013-01-01
Full Text Available Electricity load forecasting is an important issue that is widely explored and examined in power systems operation literature and commercial transactions in electricity markets literature as well. Among the existing forecasting models, support vector regression (SVR has gained much attention. Considering the performance of SVR highly depends on its parameters; this study proposed a firefly algorithm (FA based memetic algorithm (FA-MA to appropriately determine the parameters of SVR forecasting model. In the proposed FA-MA algorithm, the FA algorithm is applied to explore the solution space, and the pattern search is used to conduct individual learning and thus enhance the exploitation of FA. Experimental results confirm that the proposed FA-MA based SVR model can not only yield more accurate forecasting results than the other four evolutionary algorithms based SVR models and three well-known forecasting models but also outperform the hybrid algorithms in the related existing literature.
Ghaedi, M; Dashtian, K; Ghaedi, A M; Dehghanian, N
2016-05-11
The aim of this work is the study of the predictive ability of a hybrid model of support vector regression with genetic algorithm optimization (GA-SVR) for the adsorption of malachite green (MG) onto multi-walled carbon nanotubes (MWCNTs). Various factors were investigated by central composite design and optimum conditions was set as: pH 8, 0.018 g MWCNTs, 8 mg L(-1) dye mixed with 50 mL solution thoroughly for 10 min. The Langmuir, Freundlich, Temkin and D-R isothermal models are applied to fitting the experimental data, and the data was well explained by the Langmuir model with a maximum adsorption capacity of 62.11-80.64 mg g(-1) in a short time at 25 °C. Kinetic studies at various adsorbent dosages and the initial MG concentration show that maximum MG removal was achieved within 10 min of the start of every experiment under most conditions. The adsorption obeys the pseudo-second-order rate equation in addition to the intraparticle diffusion model. The optimal parameters (C of 0.2509, σ(2) of 0.1288 and ε of 0.2018) for the SVR model were obtained based on the GA. For the testing data set, MSE values of 0.0034 and the coefficient of determination (R(2)) values of 0.9195 were achieved.
African Journals Online (AJOL)
zlukovi
modelled as a quadratic regression, nested within parity. The previous lactation length was ... This proportion was mainly covered by linear and quadratic coefficients. Results suggest that RRM could .... The multiple trait models in scalar notation are presented by equations (1, 2), while equation. (3) represents the random ...
Owolabi, Taoreed O.; Akande, Kabiru O.; Olatunji, Sunday O.; Aldhafferi, Nahier; Alqahtani, Abdullah
2017-11-01
Titanium dioxide (TiO2) semiconductor is characterized with a wide band gap and attracts a significant attention for several applications that include solar cell carrier transportation and photo-catalysis. The tunable band gap of this semiconductor coupled with low cost, chemical stability and non-toxicity make it indispensable for these applications. Structural distortion always accompany TiO2 band gap tuning through doping and this present work utilizes the resulting structural lattice distortion to estimate band gap of doped TiO2 using support vector regression (SVR) coupled with novel gravitational search algorithm (GSA) for hyper-parameters optimization. In order to fully capture the non-linear relationship between lattice distortion and band gap, two SVR models were homogeneously hybridized and were subsequently optimized using GSA. GSA-HSVR (hybridized SVR) performs better than GSA-SVR model with performance improvement of 57.2% on the basis of root means square error reduction of the testing dataset. Effect of Co doping and Nitrogen-Iodine co-doping on band gap of TiO2 semiconductor was modeled and simulated. The obtained band gap estimates show excellent agreement with the values reported from the experiment. By implementing the models, band gap of doped TiO2 can be estimated with high level of precision and absorption ability of the semiconductor can be extended to visible region of the spectrum for improved properties and efficiency.
Ghaedi, M; Rahimi, Mahmoud Reza; Ghaedi, A M; Tyagi, Inderjeet; Agarwal, Shilpi; Gupta, Vinod Kumar
2016-01-01
Two novel and eco friendly adsorbents namely tin oxide nanoparticles loaded on activated carbon (SnO2-NP-AC) and activated carbon prepared from wood tree Pistacia atlantica (AC-PAW) were used for the rapid removal and fast adsorption of methyl orange (MO) from the aqueous phase. The dependency of MO removal with various adsorption influential parameters was well modeled and optimized using multiple linear regressions (MLR) and least squares support vector regression (LSSVR). The optimal parameters for the LSSVR model were found based on γ value of 0.76 and σ(2) of 0.15. For testing the data set, the mean square error (MSE) values of 0.0010 and the coefficient of determination (R(2)) values of 0.976 were obtained for LSSVR model, and the MSE value of 0.0037 and the R(2) value of 0.897 were obtained for the MLR model. The adsorption equilibrium and kinetic data was found to be well fitted and in good agreement with Langmuir isotherm model and second-order equation and intra-particle diffusion models respectively. The small amount of the proposed SnO2-NP-AC and AC-PAW (0.015 g and 0.08 g) is applicable for successful rapid removal of methyl orange (>95%). The maximum adsorption capacity for SnO2-NP-AC and AC-PAW was 250 mg g(-1) and 125 mg g(-1) respectively. Copyright © 2015 Elsevier Inc. All rights reserved.
Directory of Open Access Journals (Sweden)
Ibrahim A. Naguib
2017-12-01
Full Text Available In the presented study, orthogonal projection to latent structures (OPLS is introduced asÂ a data preprocessing method that handles nonlinear data prior to modelling with two well established nonlinear multivariate models; namely support vector regression (SVR and artificial neural networks (ANN. The proposed preprocessing proved to significantly improve prediction abilities through removal of uncorrelated data.The study was established based on a case study nonlinear spectrofluorimetric data of agomelatine (AGM and its hydrolysis degradation products (Deg I and Deg II, where a 3 factor 4 level experimental design was used to provide a training set of 16 mixtures with different proportions of studied components. An independent test set which consisted of 9 mixtures was established to confirm the prediction ability of the introduced models. Excitation wavelength was 227Â nm, and working range for emission spectra was 320â440Â nm.The couplings of OPLS-SVR and OPLS-ANN provided better accuracy for prediction of independent nonlinear test set. The root mean square error of prediction RMSEP for the test set mixtures was used asÂ a major comparison parameter, where RMSEP results for OPLS-SVR and OPLS-ANN are 2.19 and 1.50 respectively. Keywords: Agomelatine, SVR, ANN, OPLS, Spectrofluorimetry, Nonlinear
DEFF Research Database (Denmark)
Becciolini, Diego; Franzosi, Diogo Buarque; Foadi, Roshan
2015-01-01
We analyze the Large Hadron Collider (LHC) phenomenology of heavy vector resonances with a $SU(2)_L\\times SU(2)_R$ spectral global symmetry. This symmetry partially protects the electroweak S-parameter from large contributions of the vector resonances. The resulting custodial vector model spectrum...
A Support Vector Regression Approach for Investigating Multianticipative Driving Behavior
Directory of Open Access Journals (Sweden)
Bin Lu
2015-01-01
Full Text Available This paper presents a Support Vector Regression (SVR approach that can be applied to predict the multianticipative driving behavior using vehicle trajectory data. Building upon the SVR approach, a multianticipative car-following model is developed and enhanced in learning speed and predication accuracy. The model training and validation are conducted by using the field trajectory data extracted from the Next Generation Simulation (NGSIM project. During the model training and validation tests, the estimation results show that the SVR model performs as well as IDM model with respect to the model prediction accuracy. In addition, this paper performs a relative importance analysis to quantify the multianticipation in terms of the different stimuli to which drivers react in platoon car following. The analysis results confirm that drivers respond to the behavior of not only the immediate leading vehicle in front but also the second, third, and even fourth leading vehicles. Specifically, in congested traffic conditions, drivers are observed to be more sensitive to the relative speed than to the gap. These findings provide insight into multianticipative driving behavior and illustrate the necessity of taking into account multianticipative car-following model in microscopic traffic simulation.
Multivariate Lesion-Symptom Mapping Using Support Vector Regression
Zhang, Yongsheng; Kimberg, Daniel Y.; Coslett, H. Branch; Schwartz, Myrna F.; Wang, Ze
2014-01-01
Lesion analysis is a classic approach to study brain functions. Because brain function is a result of coherent activations of a collection of functionally related voxels, lesion-symptom relations are generally contributed by multiple voxels simultaneously. Although voxel-based lesion symptom mapping (VLSM) has made substantial contributions to the understanding of brain-behavior relationships, a better understanding of the brain-behavior relationship contributed by multiple brain regions needs a multivariate lesion symptom mapping (MLSM). The purpose of this paper was to develop an MLSM using a machine learning-based multivariate regression algorithm: support vector regression (SVR). In the proposed SVR-LSM, the symptom relation to the entire lesion map as opposed to each isolated voxel is modeled using a non-linear function, so the inter-voxel correlations are intrinsically considered, resulting in a potentially more sensitive way to examine lesion-symptom relationships. To explore the relative merits of VLSM and SVR-LSM we used both approaches in the analysis of a synthetic dataset. SVR-LSM showed much higher sensitivity and specificity for detecting the synthetic lesion-behavior relations than VLSM. When applied to lesion data and language measures from patients with brain damages, SVR-LSM reproduced the essential pattern of previous findings identified by VLSM and showed higher sensitivity than VLSM for identifying the lesion-behavior relations. Our data also showed the possibility of using lesion data to predict continuous behavior scores. PMID:25044213
Mixed kernel function support vector regression for global sensitivity analysis
Cheng, Kai; Lu, Zhenzhou; Wei, Yuhao; Shi, Yan; Zhou, Yicheng
2017-11-01
Global sensitivity analysis (GSA) plays an important role in exploring the respective effects of input variables on an assigned output response. Amongst the wide sensitivity analyses in literature, the Sobol indices have attracted much attention since they can provide accurate information for most models. In this paper, a mixed kernel function (MKF) based support vector regression (SVR) model is employed to evaluate the Sobol indices at low computational cost. By the proposed derivation, the estimation of the Sobol indices can be obtained by post-processing the coefficients of the SVR meta-model. The MKF is constituted by the orthogonal polynomials kernel function and Gaussian radial basis kernel function, thus the MKF possesses both the global characteristic advantage of the polynomials kernel function and the local characteristic advantage of the Gaussian radial basis kernel function. The proposed approach is suitable for high-dimensional and non-linear problems. Performance of the proposed approach is validated by various analytical functions and compared with the popular polynomial chaos expansion (PCE). Results demonstrate that the proposed approach is an efficient method for global sensitivity analysis.
Application of Hybrid Quantum Tabu Search with Support Vector Regression (SVR for Load Forecasting
Directory of Open Access Journals (Sweden)
Cheng-Wen Lee
2016-10-01
Full Text Available Hybridizing chaotic evolutionary algorithms with support vector regression (SVR to improve forecasting accuracy is a hot topic in electricity load forecasting. Trapping at local optima and premature convergence are critical shortcomings of the tabu search (TS algorithm. This paper investigates potential improvements of the TS algorithm by applying quantum computing mechanics to enhance the search information sharing mechanism (tabu memory to improve the forecasting accuracy. This article presents an SVR-based load forecasting model that integrates quantum behaviors and the TS algorithm with the support vector regression model (namely SVRQTS to obtain a more satisfactory forecasting accuracy. Numerical examples demonstrate that the proposed model outperforms the alternatives.
Hilbe, Joseph M
2009-01-01
This book really does cover everything you ever wanted to know about logistic regression … with updates available on the author's website. Hilbe, a former national athletics champion, philosopher, and expert in astronomy, is a master at explaining statistical concepts and methods. Readers familiar with his other expository work will know what to expect-great clarity.The book provides considerable detail about all facets of logistic regression. No step of an argument is omitted so that the book will meet the needs of the reader who likes to see everything spelt out, while a person familiar with some of the topics has the option to skip "obvious" sections. The material has been thoroughly road-tested through classroom and web-based teaching. … The focus is on helping the reader to learn and understand logistic regression. The audience is not just students meeting the topic for the first time, but also experienced users. I believe the book really does meet the author's goal … .-Annette J. Dobson, Biometric...
A Simpler Approach to Coefficient Regularized Support Vector Machines Regression
Directory of Open Access Journals (Sweden)
Hongzhi Tong
2014-01-01
Full Text Available We consider a kind of support vector machines regression (SVMR algorithms associated with lq (1≤q<∞ coefficient-based regularization and data-dependent hypothesis space. Compared with former literature, we provide here a simpler convergence analysis for those algorithms. The novelty of our analysis lies in the estimation of the hypothesis error, which is implemented by setting a stepping stone between the coefficient regularized SVMR and the classical SVMR. An explicit learning rate is then derived under very mild conditions.
Clifford support vector machines for classification, regression, and recurrence.
Bayro-Corrochano, Eduardo Jose; Arana-Daniel, Nancy
2010-11-01
This paper introduces the Clifford support vector machines (CSVM) as a generalization of the real and complex-valued support vector machines using the Clifford geometric algebra. In this framework, we handle the design of kernels involving the Clifford or geometric product. In this approach, one redefines the optimization variables as multivectors. This allows us to have a multivector as output. Therefore, we can represent multiple classes according to the dimension of the geometric algebra in which we work. We show that one can apply CSVM for classification and regression and also to build a recurrent CSVM. The CSVM is an attractive approach for the multiple input multiple output processing of high-dimensional geometric entities. We carried out comparisons between CSVM and the current approaches to solve multiclass classification and regression. We also study the performance of the recurrent CSVM with experiments involving time series. The authors believe that this paper can be of great use for researchers and practitioners interested in multiclass hypercomplex computing, particularly for applications in complex and quaternion signal and image processing, satellite control, neurocomputation, pattern recognition, computer vision, augmented virtual reality, robotics, and humanoids.
Panel Smooth Transition Regression Models
DEFF Research Database (Denmark)
González, Andrés; Terasvirta, Timo; Dijk, Dick van
We introduce the panel smooth transition regression model. This new model is intended for characterizing heterogeneous panels, allowing the regression coefficients to vary both across individuals and over time. Specifically, heterogeneity is allowed for by assuming that these coefficients are bou...
Comparison of ν-support vector regression and logistic equation for ...
African Journals Online (AJOL)
Due to the complexity and high non-linearity of bioprocess, most simple mathematical models fail to describe the exact behavior of biochemistry systems. As a novel type of learning method, support vector regression (SVR) owns the powerful capability to characterize problems via small sample, nonlinearity, high dimension ...
Goo, Yeong-Jia James; Shen, Zone-De
2014-01-01
As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%. PMID:25302338
Directory of Open Access Journals (Sweden)
Suduan Chen
2014-01-01
Full Text Available As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.
Forecasting with Dynamic Regression Models
Pankratz, Alan
2012-01-01
One of the most widely used tools in statistical forecasting, single equation regression models is examined here. A companion to the author's earlier work, Forecasting with Univariate Box-Jenkins Models: Concepts and Cases, the present text pulls together recent time series ideas and gives special attention to possible intertemporal patterns, distributed lag responses of output to input series and the auto correlation patterns of regression disturbance. It also includes six case studies.
Support vector regression for real-time flood stage forecasting
Yu, Pao-Shan; Chen, Shien-Tsung; Chang, I.-Fan
2006-09-01
SummaryFlood forecasting is an important non-structural approach for flood mitigation. The flood stage is chosen as the variable to be forecasted because it is practically useful in flood forecasting. The support vector machine, a novel artificial intelligence-based method developed from statistical learning theory, is adopted herein to establish a real-time stage forecasting model. The lags associated with the input variables are determined by applying the hydrological concept of the time of response, and a two-step grid search method is applied to find the optimal parameters, and thus overcome the difficulties in constructing the learning machine. Two structures of models used to perform multiple-hour-ahead stage forecasts are developed. Validation results from flood events in Lan-Yang River, Taiwan, revealed that the proposed models can effectively predict the flood stage forecasts one-to-six-hours ahead. Moreover, a sensitivity analysis was conducted on the lags associated with the input variables.
Ridge Regression for Interactive Models.
Tate, Richard L.
1988-01-01
An exploratory study of the value of ridge regression for interactive models is reported. Assuming that the linear terms in a simple interactive model are centered to eliminate non-essential multicollinearity, a variety of common models, representing both ordinal and disordinal interactions, are shown to have "orientations" that are…
Modified Regression Correlation Coefficient for Poisson Regression Model
Kaengthong, Nattacha; Domthong, Uthumporn
2017-09-01
This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).
Multimodality in GARCH regression models
Ooms, M.; Doornik, J.A.
2008-01-01
It is shown empirically that mixed autoregressive moving average regression models with generalized autoregressive conditional heteroskedasticity (Reg-ARMA-GARCH models) can have multimodality in the likelihood that is caused by a dummy variable in the conditional mean. Maximum likelihood estimates
Support Vector Regression and Genetic Algorithm for HVAC Optimal Operation
Directory of Open Access Journals (Sweden)
Ching-Wei Chen
2016-01-01
Full Text Available This study covers records of various parameters affecting the power consumption of air-conditioning systems. Using the Support Vector Machine (SVM, the chiller power consumption model, secondary chilled water pump power consumption model, air handling unit fan power consumption model, and air handling unit load model were established. In addition, it was found that R2 of the models all reached 0.998, and the training time was far shorter than that of the neural network. Through genetic programming, a combination of operating parameters with the least power consumption of air conditioning operation was searched. Moreover, the air handling unit load in line with the air conditioning cooling load was predicted. The experimental results show that for the combination of operating parameters with the least power consumption in line with the cooling load obtained through genetic algorithm search, the power consumption of the air conditioning systems under said combination of operating parameters was reduced by 22% compared to the fixed operating parameters, thus indicating significant energy efficiency.
Inferential Models for Linear Regression
Directory of Open Access Journals (Sweden)
Zuoyi Zhang
2011-09-01
Full Text Available Linear regression is arguably one of the most widely used statistical methods in applications. However, important problems, especially variable selection, remain a challenge for classical modes of inference. This paper develops a recently proposed framework of inferential models (IMs in the linear regression context. In general, an IM is able to produce meaningful probabilistic summaries of the statistical evidence for and against assertions about the unknown parameter of interest and, moreover, these summaries are shown to be properly calibrated in a frequentist sense. Here we demonstrate, using simple examples, that the IM framework is promising for linear regression analysis --- including model checking, variable selection, and prediction --- and for uncertain inference in general.
Directory of Open Access Journals (Sweden)
Cheng-Wen Lee
2017-11-01
Full Text Available Accurate electricity forecasting is still the critical issue in many energy management fields. The applications of hybrid novel algorithms with support vector regression (SVR models to overcome the premature convergence problem and improve forecasting accuracy levels also deserve to be widely explored. This paper applies chaotic function and quantum computing concepts to address the embedded drawbacks including crossover and mutation operations of genetic algorithms. Then, this paper proposes a novel electricity load forecasting model by hybridizing chaotic function and quantum computing with GA in an SVR model (named SVRCQGA to achieve more satisfactory forecasting accuracy levels. Experimental examples demonstrate that the proposed SVRCQGA model is superior to other competitive models.
MANCOVA for one way classification with homogeneity of regression coefficient vectors
Mokesh Rayalu, G.; Ravisankar, J.; Mythili, G. Y.
2017-11-01
The MANOVA and MANCOVA are the extensions of the univariate ANOVA and ANCOVA techniques to multidimensional or vector valued observations. The assumption of a Gaussian distribution has been replaced with the Multivariate Gaussian distribution for the vectors data and residual term variables in the statistical models of these techniques. The objective of MANCOVA is to determine if there are statistically reliable mean differences that can be demonstrated between groups later modifying the newly created variable. When randomization assignment of samples or subjects to groups is not possible, multivariate analysis of covariance (MANCOVA) provides statistical matching of groups by adjusting dependent variables as if all subjects scored the same on the covariates. In this research article, an extension has been made to the MANCOVA technique with more number of covariates and homogeneity of regression coefficient vectors is also tested.
Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…
Directory of Open Access Journals (Sweden)
Shokri Saeid
2015-01-01
Full Text Available An accurate prediction of sulfur content is very important for the proper operation and product quality control in hydrodesulfurization (HDS process. For this purpose, a reliable data- driven soft sensors utilizing Support Vector Regression (SVR was developed and the effects of integrating Vector Quantization (VQ with Principle Component Analysis (PCA were studied on the assessment of this soft sensor. First, in pre-processing step the PCA and VQ techniques were used to reduce dimensions of the original input datasets. Then, the compressed datasets were used as input variables for the SVR model. Experimental data from the HDS setup were employed to validate the proposed integrated model. The integration of VQ/PCA techniques with SVR model was able to increase the prediction accuracy of SVR. The obtained results show that integrated technique (VQ-SVR was better than (PCA-SVR in prediction accuracy. Also, VQ decreased the sum of the training and test time of SVR model in comparison with PCA. For further evaluation, the performance of VQ-SVR model was also compared to that of SVR. The obtained results indicated that VQ-SVR model delivered the best satisfactory predicting performance (AARE= 0.0668 and R2= 0.995 in comparison with investigated models.
An Adaptive Support Vector Regression Machine for the State Prognosis of Mechanical Systems
Directory of Open Access Journals (Sweden)
Qing Zhang
2015-01-01
Full Text Available Due to the unsteady state evolution of mechanical systems, the time series of state indicators exhibits volatile behavior and staged characteristics. To model hidden trends and predict deterioration failure utilizing volatile state indicators, an adaptive support vector regression (ASVR machine is proposed. In ASVR, the width of an error-insensitive tube, which is a constant in the traditional support vector regression, is set as a variable determined by the transient distribution boundary of local regions in the training time series. Thus, the localized regions are obtained using a sliding time window, and their boundaries are defined by a robust measure known as the truncated range. Utilizing an adaptive error-insensitive tube, a stabilized tolerance level for noise is achieved, whether the time series occurs in low-volatility regions or in high-volatility regions. The proposed method is evaluated by vibrational data measured on descaling pumps. The results show that ASVR is capable of capturing the local trends of the volatile time series of state indicators and is superior to the standard support vector regression for state prediction.
A Novel Empirical Mode Decomposition With Support Vector Regression for Wind Speed Forecasting.
Ren, Ye; Suganthan, Ponnuthurai Nagaratnam; Srikanth, Narasimalu
2016-08-01
Wind energy is a clean and an abundant renewable energy source. Accurate wind speed forecasting is essential for power dispatch planning, unit commitment decision, maintenance scheduling, and regulation. However, wind is intermittent and wind speed is difficult to predict. This brief proposes a novel wind speed forecasting method by integrating empirical mode decomposition (EMD) and support vector regression (SVR) methods. The EMD is used to decompose the wind speed time series into several intrinsic mode functions (IMFs) and a residue. Subsequently, a vector combining one historical data from each IMF and the residue is generated to train the SVR. The proposed EMD-SVR model is evaluated with a wind speed data set. The proposed EMD-SVR model outperforms several recently reported methods with respect to accuracy or computational complexity.
Directory of Open Access Journals (Sweden)
N. Sujay Raghavendra
2015-12-01
Full Text Available This research demonstrates the state-of-the-art capability of Wavelet packet analysis in improving the forecasting efficiency of Support vector regression (SVR through the development of a novel hybrid Wavelet packet–Support vector regression (WP–SVR model for forecasting monthly groundwater level fluctuations observed in three shallow unconfined coastal aquifers. The Sequential Minimal Optimization Algorithm-based SVR model is also employed for comparative study with WP–SVR model. The input variables used for modeling were monthly time series of total rainfall, average temperature, mean tide level, and past groundwater level observations recorded during the period 1996–2006 at three observation wells located near Mangalore, India. The Radial Basis function is employed as a kernel function during SVR modeling. Model parameters are calibrated using the first seven years of data, and the remaining three years data are used for model validation using various input combinations. The performance of both the SVR and WP–SVR models is assessed using different statistical indices. From the comparative result analysis of the developed models, it can be seen that WP–SVR model outperforms the classic SVR model in predicting groundwater levels at all the three well locations (e.g. NRMSE(WP–SVR = 7.14, NRMSE(SVR = 12.27; NSE(WP–SVR = 0.91, NSE(SVR = 0.8 during the test phase with respect to well location at Surathkal. Therefore, using the WP–SVR model is highly acceptable for modeling and forecasting of groundwater level fluctuations.
Maximum likelihood optimal and robust Support Vector Regression with lncosh loss function.
Karal, Omer
2017-10-01
In this paper, a novel and continuously differentiable convex loss function based on natural logarithm of hyperbolic cosine function, namely lncosh loss, is introduced to obtain Support Vector Regression (SVR) models which are optimal in the maximum likelihood sense for the hyper-secant error distributions. Most of the current regression models assume that the distribution of error is Gaussian, which corresponds to the squared loss function and has helpful analytical properties such as easy computation and analysis. However, in many real world applications, most observations are subject to unknown noise distributions, so the Gaussian distribution may not be a useful choice. The developed SVR model with the parameterized lncosh loss provides a possibility of learning a loss function leading to a regression model which is maximum likelihood optimal for a specific input-output data. The SVR models obtained with different parameter choices of lncosh loss with ε-insensitiveness feature, possess most of the desirable characteristics of well-known loss functions such as Vapnik's loss, the Squared loss, and Huber's loss function as special cases. In other words, it is observed in the extensive simulations that the mentioned lncosh loss function is entirely controlled by a single adjustable λ parameter and as a result, it allows switching between different losses depending on the choice of λ. The effectiveness and feasibility of lncosh loss function are validated through a number of synthetic and real world benchmark data sets for various types of additive noise distributions. Copyright © 2017 Elsevier Ltd. All rights reserved.
SNPs selection using support vector regression and genetic algorithms in GWAS.
de Oliveira, Fabrízzio Condé; Borges, Carlos Cristiano Hasenclever; Almeida, Fernanda Nascimento; e Silva, Fabyano Fonseca; da Silva Verneque, Rui; da Silva, Marcos Vinicius G B; Arbex, Wagner
2014-01-01
This paper proposes a new methodology to simultaneously select the most relevant SNPs markers for the characterization of any measurable phenotype described by a continuous variable using Support Vector Regression with Pearson Universal kernel as fitness function of a binary genetic algorithm. The proposed methodology is multi-attribute towards considering several markers simultaneously to explain the phenotype and is based jointly on statistical tools, machine learning and computational intelligence. The suggested method has shown potential in the simulated database 1, with additive effects only, and real database. In this simulated database, with a total of 1,000 markers, and 7 with major effect on the phenotype and the other 993 SNPs representing the noise, the method identified 21 markers. Of this total, 5 are relevant SNPs between the 7 but 16 are false positives. In real database, initially with 50,752 SNPs, we have reduced to 3,073 markers, increasing the accuracy of the model. In the simulated database 2, with additive effects and interactions (epistasis), the proposed method matched to the methodology most commonly used in GWAS. The method suggested in this paper demonstrates the effectiveness in explaining the real phenotype (PTA for milk), because with the application of the wrapper based on genetic algorithm and Support Vector Regression with Pearson Universal, many redundant markers were eliminated, increasing the prediction and accuracy of the model on the real database without quality control filters. The PUK demonstrated that it can replicate the performance of linear and RBF kernels.
Support Vector Regression Method for Wind Speed Prediction Incorporating Probability Prior Knowledge
Directory of Open Access Journals (Sweden)
Jiqiang Chen
2014-01-01
Full Text Available Prior knowledge, such as wind speed probability distribution based on historical data and the wind speed fluctuation between the maximal value and the minimal value in a certain period of time, provides much more information about the wind speed, so it is necessary to incorporate it into the wind speed prediction. First, a method of estimating wind speed probability distribution based on historical data is proposed based on Bernoulli’s law of large numbers. Second, in order to describe the wind speed fluctuation between the maximal value and the minimal value in a certain period of time, the probability distribution estimated by the proposed method is incorporated into the training data and the testing data. Third, a support vector regression model for wind speed prediction is proposed based on standard support vector regression. At last, experiments predicting the wind speed in a certain wind farm show that the proposed method is feasible and effective and the model’s running time and prediction errors can meet the needs of wind speed prediction.
Optimization of Filter by using Support Vector Regression Machine with Cuckoo Search Algorithm
Directory of Open Access Journals (Sweden)
M. İlarslan
2014-09-01
Full Text Available Herein, a new methodology using a 3D Electromagnetic (EM simulator-based Support Vector Regression Machine (SVRM models of base elements is presented for band-pass filter (BPF design. SVRM models of elements, which are as fast as analytical equations and as accurate as a 3D EM simulator, are employed in a simple and efficient Cuckoo Search Algorithm (CSA to optimize an ultra-wideband (UWB microstrip BPF. CSA performance is verified by comparing it with other Meta-Heuristics such as Genetic Algorithm (GA and Particle Swarm Optimization (PSO. As an example of the proposed design methodology, an UWB BPF that operates between the frequencies of 3.1 GHz and 10.6 GHz is designed, fabricated and measured. The simulation and measurement results indicate in conclusion the superior performance of this optimization methodology in terms of improved filter response characteristics like return loss, insertion loss, harmonic suppression and group delay.
Study on Parameter Optimization for Support Vector Regression in Solving the Inverse ECG Problem
Directory of Open Access Journals (Sweden)
Mingfeng Jiang
2013-01-01
Full Text Available The typical inverse ECG problem is to noninvasively reconstruct the transmembrane potentials (TMPs from body surface potentials (BSPs. In the study, the inverse ECG problem can be treated as a regression problem with multi-inputs (body surface potentials and multi-outputs (transmembrane potentials, which can be solved by the support vector regression (SVR method. In order to obtain an effective SVR model with optimal regression accuracy and generalization performance, the hyperparameters of SVR must be set carefully. Three different optimization methods, that is, genetic algorithm (GA, differential evolution (DE algorithm, and particle swarm optimization (PSO, are proposed to determine optimal hyperparameters of the SVR model. In this paper, we attempt to investigate which one is the most effective way in reconstructing the cardiac TMPs from BSPs, and a full comparison of their performances is also provided. The experimental results show that these three optimization methods are well performed in finding the proper parameters of SVR and can yield good generalization performance in solving the inverse ECG problem. Moreover, compared with DE and GA, PSO algorithm is more efficient in parameters optimization and performs better in solving the inverse ECG problem, leading to a more accurate reconstruction of the TMPs.
Study on parameter optimization for support vector regression in solving the inverse ECG problem.
Jiang, Mingfeng; Jiang, Shanshan; Zhu, Lingyan; Wang, Yaming; Huang, Wenqing; Zhang, Heng
2013-01-01
The typical inverse ECG problem is to noninvasively reconstruct the transmembrane potentials (TMPs) from body surface potentials (BSPs). In the study, the inverse ECG problem can be treated as a regression problem with multi-inputs (body surface potentials) and multi-outputs (transmembrane potentials), which can be solved by the support vector regression (SVR) method. In order to obtain an effective SVR model with optimal regression accuracy and generalization performance, the hyperparameters of SVR must be set carefully. Three different optimization methods, that is, genetic algorithm (GA), differential evolution (DE) algorithm, and particle swarm optimization (PSO), are proposed to determine optimal hyperparameters of the SVR model. In this paper, we attempt to investigate which one is the most effective way in reconstructing the cardiac TMPs from BSPs, and a full comparison of their performances is also provided. The experimental results show that these three optimization methods are well performed in finding the proper parameters of SVR and can yield good generalization performance in solving the inverse ECG problem. Moreover, compared with DE and GA, PSO algorithm is more efficient in parameters optimization and performs better in solving the inverse ECG problem, leading to a more accurate reconstruction of the TMPs.
Empirical Vector Autoregressive Modeling
M. Ooms (Marius)
1993-01-01
textabstractChapter 2 introduces the baseline version of the VAR model, with its basic statistical assumptions that we examine in the sequel. We first check whether the variables in the VAR can be transformed to meet these assumptions. We analyze the univariate characteristics of the series.
Al-Ghraibah, Amani
Solar flares release stored magnetic energy in the form of radiation and can have significant detrimental effects on earth including damage to technological infrastructure. Recent work has considered methods to predict future flare activity on the basis of quantitative measures of the solar magnetic field. Accurate advanced warning of solar flare occurrence is an area of increasing concern and much research is ongoing in this area. Our previous work 111] utilized standard pattern recognition and classification techniques to determine (classify) whether a region is expected to flare within a predictive time window, using a Relevance Vector Machine (RVM) classification method. We extracted 38 features which describing the complexity of the photospheric magnetic field, the result classification metrics will provide the baseline against which we compare our new work. We find a true positive rate (TPR) of 0.8, true negative rate (TNR) of 0.7, and true skill score (TSS) of 0.49. This dissertation proposes three basic topics; the first topic is an extension to our previous work [111, where we consider a feature selection method to determine an appropriate feature subset with cross validation classification based on a histogram analysis of selected features. Classification using the top five features resulting from this analysis yield better classification accuracies across a large unbalanced dataset. In particular, the feature subsets provide better discrimination of the many regions that flare where we find a TPR of 0.85, a TNR of 0.65 sightly lower than our previous work, and a TSS of 0.5 which has an improvement comparing with our previous work. In the second topic, we study the prediction of solar flare size and time-to-flare using support vector regression (SVR). When we consider flaring regions only, we find an average error in estimating flare size of approximately half a GOES class. When we additionally consider non-flaring regions, we find an increased average
Large N Expansion. Vector Models
Nissimov, Emil; Pacheva, Svetlana
2006-01-01
Preliminary version of a contribution to the "Quantum Field Theory. Non-Perturbative QFT" topical area of "Modern Encyclopedia of Mathematical Physics" (SELECTA), eds. Aref'eva I, and Sternheimer D, Springer (2007). Consists of two parts - "main article" (Large N Expansion. Vector Models) and a "brief article" (BPHZL Renormalization).
Gaussian Process Regression Model in Spatial Logistic Regression
Sofro, A.; Oktaviarina, A.
2018-01-01
Spatial analysis has developed very quickly in the last decade. One of the favorite approaches is based on the neighbourhood of the region. Unfortunately, there are some limitations such as difficulty in prediction. Therefore, we offer Gaussian process regression (GPR) to accommodate the issue. In this paper, we will focus on spatial modeling with GPR for binomial data with logit link function. The performance of the model will be investigated. We will discuss the inference of how to estimate the parameters and hyper-parameters and to predict as well. Furthermore, simulation studies will be explained in the last section.
A Spreadsheet Model for Teaching Regression Analysis.
Wood, William C.; O'Hare, Sharon L.
1992-01-01
Presents a spreadsheet model that is useful in introducing students to regression analysis and the computation of regression coefficients. Includes spreadsheet layouts and formulas so that the spreadsheet can be implemented. (Author)
Directory of Open Access Journals (Sweden)
Zhang Sheng Bo
2016-01-01
Full Text Available A novel quality prediction method with mobile time window is proposed for small-batch producing process based on weighted least squares support vector regression (LS-SVR. The design steps and learning algorithm are also addressed. In the method, weighted LS-SVR is taken as the intelligent kernel, with which the small-batch learning is solved well and the nearer sample is set a larger weight, while the farther is set the smaller weight in the history data. A typical machining process of cutting bearing outer race is carried out and the real measured data are used to contrast experiment. The experimental results demonstrate that the prediction accuracy of the weighted LSSVR based model is only 20%-30% that of the standard LS-SVR based one in the same condition. It provides a better candidate for quality prediction of small-batch producing process.
Single Image Super-Resolution by Non-Linear Sparse Representation and Support Vector Regression
Directory of Open Access Journals (Sweden)
Yungang Zhang
2017-02-01
Full Text Available Sparse representations are widely used tools in image super-resolution (SR tasks. In the sparsity-based SR methods, linear sparse representations are often used for image description. However, the non-linear data distributions in images might not be well represented by linear sparse models. Moreover, many sparsity-based SR methods require the image patch self-similarity assumption; however, the assumption may not always hold. In this paper, we propose a novel method for single image super-resolution (SISR. Unlike most prior sparsity-based SR methods, the proposed method uses non-linear sparse representation to enhance the description of the non-linear information in images, and the proposed framework does not need to assume the self-similarity of image patches. Based on the minimum reconstruction errors, support vector regression (SVR is applied for predicting the SR image. The proposed method was evaluated on various benchmark images, and promising results were obtained.
Supplier Short Term Load Forecasting Using Support Vector Regression and Exogenous Input
Matijaš, Marin; Vukićcević, Milan; Krajcar, Slavko
2011-09-01
In power systems, task of load forecasting is important for keeping equilibrium between production and consumption. With liberalization of electricity markets, task of load forecasting changed because each market participant has to forecast their own load. Consumption of end-consumers is stochastic in nature. Due to competition, suppliers are not in a position to transfer their costs to end-consumers; therefore it is essential to keep forecasting error as low as possible. Numerous papers are investigating load forecasting from the perspective of the grid or production planning. We research forecasting models from the perspective of a supplier. In this paper, we investigate different combinations of exogenous input on the simulated supplier loads and show that using points of delivery as a feature for Support Vector Regression leads to lower forecasting error, while adding customer number in different datasets does the opposite.
Directory of Open Access Journals (Sweden)
S.K. Lahiri
2009-09-01
Full Text Available Soft sensors have been widely used in the industrial process control to improve the quality of the product and assure safety in the production. The core of a soft sensor is to construct a soft sensing model. This paper introduces support vector regression (SVR, a new powerful machine learning methodbased on a statistical learning theory (SLT into soft sensor modeling and proposes a new soft sensing modeling method based on SVR. This paper presents an artificial intelligence based hybrid soft sensormodeling and optimization strategies, namely support vector regression – genetic algorithm (SVR-GA for modeling and optimization of mono ethylene glycol (MEG quality variable in a commercial glycol plant. In the SVR-GA approach, a support vector regression model is constructed for correlating the process data comprising values of operating and performance variables. Next, model inputs describing the process operating variables are optimized using genetic algorithm with a view to maximize the process performance. The SVR-GA is a new strategy for soft sensor modeling and optimization. The major advantage of the strategies is that modeling and optimization can be conducted exclusively from the historic process data wherein the detailed knowledge of process phenomenology (reaction mechanism, kinetics etc. is not required. Using SVR-GA strategy, a number of sets of optimized operating conditions were found. The optimized solutions, when verified in an actual plant, resulted in a significant improvement in the quality.
Directory of Open Access Journals (Sweden)
Shahrbanoo Goli
2016-01-01
Full Text Available The Support Vector Regression (SVR model has been broadly used for response prediction. However, few researchers have used SVR for survival analysis. In this study, a new SVR model is proposed and SVR with different kernels and the traditional Cox model are trained. The models are compared based on different performance measures. We also select the best subset of features using three feature selection methods: combination of SVR and statistical tests, univariate feature selection based on concordance index, and recursive feature elimination. The evaluations are performed using available medical datasets and also a Breast Cancer (BC dataset consisting of 573 patients who visited the Oncology Clinic of Hamadan province in Iran. Results show that, for the BC dataset, survival time can be predicted more accurately by linear SVR than nonlinear SVR. Based on the three feature selection methods, metastasis status, progesterone receptor status, and human epidermal growth factor receptor 2 status are the best features associated to survival. Also, according to the obtained results, performance of linear and nonlinear kernels is comparable. The proposed SVR model performs similar to or slightly better than other models. Also, SVR performs similar to or better than Cox when all features are included in model.
A Vector Approach to Regression Analysis and Its Implications to Heavy-Duty Diesel Emissions
Energy Technology Data Exchange (ETDEWEB)
McAdams, H.T.
2001-02-14
An alternative approach is presented for the regression of response data on predictor variables that are not logically or physically separable. The methodology is demonstrated by its application to a data set of heavy-duty diesel emissions. Because of the covariance of fuel properties, it is found advantageous to redefine the predictor variables as vectors, in which the original fuel properties are components, rather than as scalars each involving only a single fuel property. The fuel property vectors are defined in such a way that they are mathematically independent and statistically uncorrelated. Because the available data set does not allow definitive separation of vehicle and fuel effects, and because test fuels used in several of the studies may be unrealistically contrived to break the association of fuel variables, the data set is not considered adequate for development of a full-fledged emission model. Nevertheless, the data clearly show that only a few basic patterns of fuel-property variation affect emissions and that the number of these patterns is considerably less than the number of variables initially thought to be involved. These basic patterns, referred to as ''eigenfuels,'' may reflect blending practice in accordance with their relative weighting in specific circumstances. The methodology is believed to be widely applicable in a variety of contexts. It promises an end to the threat of collinearity and the frustration of attempting, often unrealistically, to separate variables that are inseparable.
Kropotov, D. A.
2011-08-01
Problems of classification and regression estimation in which objects are represented by multidimensional arrays of features are considered. Many practical statements can be reduced to such problems, for example, the popular approach to the description of images as a set of patches and a set of descriptors in each patch or the description of an object in the form of a set of distances from it to certain support objects selected based on a set of features. For solving problems concerning the objects thus described, a generalization of the relevance vector model is proposed. In this generalization, specific regularization coefficients are defined for each dimension of the multidimensional array of the object description; the resultant regularization coefficient for a given element in the multidimensional array is determined as a combination of the regularization coefficients for all the dimensions. The models with the sum and product used for such combinations are examined. Algorithms based on the variational approach are proposed for learning in these models. These algorithms enable one to find the so-called "sparse" solutions, that is, exclude from the consideration the irrelevant dimensions in the multidimensional array of the object description. Compared with the classical relevance vector model, the proposed approach makes it possible to reduce the number of adjustable parameters because a sum of all the dimensions is considered instead of their product. As a result, the method becomes more robust under overfitting in the case of small samples. This property and the sparseness of the resulting solutions in the proposed models are demonstrated experimentally, in particular, in the case of the known face identification database called Labeled Faces in the Wild.
Weng, Shizhuang; Yuan, Baohong; Zhu, Zede; Huang, Linsheng; Zhang, Dongyan; Zheng, Ling
2016-03-01
As a novel and ultrasensitive detection technology that had advantages of fingerprint effect, high speed and low cost, surface-enhanced Raman scattering (SERS) was used to develop the regression models for the fast quantitative detection of thiram by support vector machine regression (SVR) in the paper. Meanwhile, three parameter optimization methods, which were grid search (GS), genetic algorithm (GA) and particle swarm optimization (PSO), were employed to optimize the internal parameters of SVR. Furthermore, the influence of the spectral number, spectral wavenumber range and principal component analysis (PCA) on the quantitative detection was also discussed. Firstly, the experiments demonstrate the proposed method can realize the fast and quantitative detection of thiram, and the best result is obtained by GS-SVR with the spectra of the range of characteristic peak which are processed by PCA. And the effect of GS, GA, PSO on the parameter optimization is similar, but the analysis time has a great difference in which GS is the fastest. Considering the analysis accuracy and time simultaneously, the spectral number of samples over each concentration should be set to 50. Then, developing the quantitative model with the spectra of range of characteristic peak can reduce analysis time on the promise of ensuring the detection accuracy. Additionally, PCA can further reduce the detection error through reserving the main information of the spectra data and eliminating the noise.
Hybrid ARIMA and Support Vector Regression in Short‑term Electricity Price Forecasting
Directory of Open Access Journals (Sweden)
Jindřich Pokora
2017-01-01
Full Text Available The literature suggests that, in short‑term electricity‑price forecasting, a combination of ARIMA and support vector regression (SVR yields performance improvement over separate use of each method. The objective of the research is to investigate the circumstances under which these hybrid models are superior for day‑ahead hourly price forecasting. Analysis of the Nord Pool market with 16 interconnected areas and 6 investigated monthly periods allows not only for a considerable level of generalizability but also for assessment of the effect of transmission congestion since this causes differences in prices between the Nord Pool areas. The paper finds that SVR, SVRARIMA and ARIMASVR provide similar performance, at the same time, hybrid methods outperform single models in terms of RMSE in 98 % of investigated time series. Furthermore, it seems that higher flexibility of hybrid models improves modeling of price spikes at a slight cost of imprecision during steady periods. Lastly, superiority of hybrid models is pronounced under transmission congestions, measured as first and second moments of the electricity price.
Zhou, Pei-pei; Shan, Jin-feng; Jiang, Jian-lan
2015-12-01
To optimize the optimal microwave-assisted extraction method of curcuminoids from Curcuma longa. On the base of single factor experiment, the ethanol concentration, the ratio of liquid to solid and the microwave time were selected for further optimization. Support Vector Regression (SVR) and Central Composite Design-Response Surface Methodology (CCD) algorithm were utilized to design and establish models respectively, while Particle Swarm Optimization (PSO) was introduced to optimize the parameters of SVR models and to search optimal points of models. The evaluation indicator, the sum of curcumin, demethoxycurcumin and bisdemethoxycurcumin by HPLC, were used. The optimal parameters of microwave-assisted extraction were as follows: ethanol concentration of 69%, ratio of liquid to solid of 21 : 1, microwave time of 55 s. On those conditions, the sum of three curcuminoids was 28.97 mg/g (per gram of rhizomes powder). Both the CCD model and the SVR model were credible, for they have predicted the similar process condition and the deviation of yield were less than 1.2%.
Estimation of Electrically-Evoked Knee Torque from Mechanomyography Using Support Vector Regression.
Ibitoye, Morufu Olusola; Hamzaid, Nur Azah; Abdul Wahab, Ahmad Khairi; Hasnan, Nazirah; Olatunji, Sunday Olusanya; Davis, Glen M
2016-07-19
The difficulty of real-time muscle force or joint torque estimation during neuromuscular electrical stimulation (NMES) in physical therapy and exercise science has motivated recent research interest in torque estimation from other muscle characteristics. This study investigated the accuracy of a computational intelligence technique for estimating NMES-evoked knee extension torque based on the Mechanomyographic signals (MMG) of contracting muscles that were recorded from eight healthy males. Simulation of the knee torque was modelled via Support Vector Regression (SVR) due to its good generalization ability in related fields. Inputs to the proposed model were MMG amplitude characteristics, the level of electrical stimulation or contraction intensity, and knee angle. Gaussian kernel function, as well as its optimal parameters were identified with the best performance measure and were applied as the SVR kernel function to build an effective knee torque estimation model. To train and test the model, the data were partitioned into training (70%) and testing (30%) subsets, respectively. The SVR estimation accuracy, based on the coefficient of determination (R²) between the actual and the estimated torque values was up to 94% and 89% during the training and testing cases, with root mean square errors (RMSE) of 9.48 and 12.95, respectively. The knee torque estimations obtained using SVR modelling agreed well with the experimental data from an isokinetic dynamometer. These findings support the realization of a closed-loop NMES system for functional tasks using MMG as the feedback signal source and an SVR algorithm for joint torque estimation.
Jeffrey T. Walton
2008-01-01
Three machine learning subpixel estimation methods (Cubist, Random Forests, and support vector regression) were applied to estimate urban cover. Urban forest canopy cover and impervious surface cover were estimated from Landsat-7 ETM+ imagery using a higher resolution cover map resampled to 30 m as training and reference data. Three different band combinations (...
Support vector regression methodology for estimating global solar radiation in Algeria
Guermoui, Mawloud; Rabehi, Abdelaziz; Gairaa, Kacem; Benkaciali, Said
2018-01-01
Accurate estimation of Daily Global Solar Radiation (DGSR) has been a major goal for solar energy applications. In this paper we show the possibility of developing a simple model based on the Support Vector Regression (SVM-R), which could be used to estimate DGSR on the horizontal surface in Algeria based only on sunshine ratio as input. The SVM model has been developed and tested using a data set recorded over three years (2005-2007). The data was collected at the Applied Research Unit for Renewable Energies (URAER) in Ghardaïa city. The data collected between 2005-2006 are used to train the model while the 2007 data are used to test the performance of the selected model. The measured and the estimated values of DGSR were compared during the testing phase statistically using the Root Mean Square Error (RMSE), Relative Square Error (rRMSE), and correlation coefficient (r2), which amount to 1.59(MJ/m2), 8.46 and 97,4%, respectively. The obtained results show that the SVM-R is highly qualified for DGSR estimation using only sunshine ratio.
Directory of Open Access Journals (Sweden)
Jianzhou Wang
2015-01-01
Full Text Available This paper develops an effectively intelligent model to forecast short-term wind speed series. A hybrid forecasting technique is proposed based on recurrence plot (RP and optimized support vector regression (SVR. Wind caused by the interaction of meteorological systems makes itself extremely unsteady and difficult to forecast. To understand the wind system, the wind speed series is analyzed using RP. Then, the SVR model is employed to forecast wind speed, in which the input variables are selected by RP, and two crucial parameters, including the penalties factor and gamma of the kernel function RBF, are optimized by various optimization algorithms. Those optimized algorithms are genetic algorithm (GA, particle swarm optimization algorithm (PSO, and cuckoo optimization algorithm (COA. Finally, the optimized SVR models, including COA-SVR, PSO-SVR, and GA-SVR, are evaluated based on some criteria and a hypothesis test. The experimental results show that (1 analysis of RP reveals that wind speed has short-term predictability on a short-term time scale, (2 the performance of the COA-SVR model is superior to that of the PSO-SVR and GA-SVR methods, especially for the jumping samplings, and (3 the COA-SVR method is statistically robust in multi-step-ahead prediction and can be applied to practical wind farm applications.
Variable importance in latent variable regression models
Kvalheim, O.M.; Arneberg, R.; Bleie, O.; Rajalahti, T.; Smilde, A.K.; Westerhuis, J.A.
2014-01-01
The quality and practical usefulness of a regression model are a function of both interpretability and prediction performance. This work presents some new graphical tools for improved interpretation of latent variable regression models that can also assist in improved algorithms for variable
STREAMFLOW AND WATER QUALITY REGRESSION MODELING ...
African Journals Online (AJOL)
STREAMFLOW AND WATER QUALITY REGRESSION MODELING OF IMO RIVER SYSTEM: A CASE STUDY. ... Journal of Modeling, Design and Management of Engineering Systems ... Possible sources of contamination of Imo-river system within Nekede and Obigbo hydrological stations watershed were traced.
Predictive based monitoring of nuclear plant component degradation using support vector regression
Energy Technology Data Exchange (ETDEWEB)
Agarwal, Vivek [Idaho National Lab. (INL), Idaho Falls, ID (United States). Dept. of Human Factors, Controls, Statistics; Alamaniotis, Miltiadis [Purdue Univ., West Lafayette, IN (United States). School of Nuclear Engineering; Tsoukalas, Lefteri H. [Purdue Univ., West Lafayette, IN (United States). School of Nuclear Engineering
2015-02-01
Nuclear power plants (NPPs) are large installations comprised of many active and passive assets. Degradation monitoring of all these assets is expensive (labor cost) and highly demanding task. In this paper a framework based on Support Vector Regression (SVR) for online surveillance of critical parameter degradation of NPP components is proposed. In this case, on time replacement or maintenance of components will prevent potential plant malfunctions, and reduce the overall operational cost. In the current work, we apply SVR equipped with a Gaussian kernel function to monitor components. Monitoring includes the one-step-ahead prediction of the component’s respective operational quantity using the SVR model, while the SVR model is trained using a set of previous recorded degradation histories of similar components. Predictive capability of the model is evaluated upon arrival of a sensor measurement, which is compared to the component failure threshold. A maintenance decision is based on a fuzzy inference system that utilizes three parameters: (i) prediction evaluation in the previous steps, (ii) predicted value of the current step, (iii) and difference of current predicted value with components failure thresholds. The proposed framework will be tested on turbine blade degradation data.
Directory of Open Access Journals (Sweden)
Mustakim Mustakim
2016-02-01
Full Text Available The largest region that produces oil palm in Indonesia has an important role in improving the welfare of society and economy. Oil palm has increased significantly in Riau Province in every period, to determine the production development for the next few years with the functions and benefits of oil palm carried prediction production results that were seen from time series data last 8 years (2005-2013. In its prediction implementation, it was done by comparing the performance of Support Vector Regression (SVR method and Artificial Neural Network (ANN. From the experiment, SVR produced the best model compared with ANN. It is indicated by the correlation coefficient of 95% and 6% for MSE in the kernel Radial Basis Function (RBF, whereas ANN produced only 74% for R2 and 9% for MSE on the 8th experiment with hiden neuron 20 and learning rate 0,1. SVR model generates predictions for next 3 years which increased between 3% - 6% from actual data and RBF model predictions.
Brown, Joshua D; Summers, Michael F; Johnson, Bruce A
2015-09-01
The Biological Magnetic Resonance Data Bank (BMRB) contains NMR chemical shift depositions for over 200 RNAs and RNA-containing complexes. We have analyzed the (1)H NMR and (13)C chemical shifts reported for non-exchangeable protons of 187 of these RNAs. Software was developed that downloads BMRB datasets and corresponding PDB structure files, and then generates residue-specific attributes based on the calculated secondary structure. Attributes represent properties present in each sequential stretch of five adjacent residues and include variables such as nucleotide type, base-pair presence and type, and tetraloop types. Attributes and (1)H and (13)C NMR chemical shifts of the central nucleotide are then used as input to train a predictive model using support vector regression. These models can then be used to predict shifts for new sequences. The new software tools, available as stand-alone scripts or integrated into the NMR visualization and analysis program NMRViewJ, should facilitate NMR assignment and/or validation of RNA (1)H and (13)C chemical shifts. In addition, our findings enabled the re-calibration a ring-current shift model using published NMR chemical shifts and high-resolution X-ray structural data as guides.
Regression Models for Market-Shares
DEFF Research Database (Denmark)
Birch, Kristina; Olsen, Jørgen Kai; Tjur, Tue
2005-01-01
On the background of a data set of weekly sales and prices for three brands of coffee, this paper discusses various regression models and their relation to the multiplicative competitive-interaction model (the MCI model, see Cooper 1988, 1993) for market-shares. Emphasis is put on the interpretat......On the background of a data set of weekly sales and prices for three brands of coffee, this paper discusses various regression models and their relation to the multiplicative competitive-interaction model (the MCI model, see Cooper 1988, 1993) for market-shares. Emphasis is put...... on the interpretation of the parameters in relation to models for the total sales based on discrete choice models.Key words and phrases. MCI model, discrete choice model, market-shares, price elasitcity, regression model....
Bias-corrected quantile regression estimation of censored regression models
Cizek, Pavel; Sadikoglu, Serhan
2018-01-01
In this paper, an extension of the indirect inference methodology to semiparametric estimation is explored in the context of censored regression. Motivated by weak small-sample performance of the censored regression quantile estimator proposed by Powell (J Econom 32:143–155, 1986a), two- and
Directory of Open Access Journals (Sweden)
Kabiru O. Akande
2016-01-01
Full Text Available Hybrid computational intelligence is defined as a combination of multiple intelligent algorithms such that the resulting model has superior performance to the individual algorithms. Therefore, the importance of fusing two or more intelligent algorithms to achieve better performance cannot be overemphasized. In this work, a novel homogenous hybridization scheme is proposed for the improvement of the generalization and predictive ability of support vector machines regression (SVR. The proposed and developed hybrid SVR (HSVR works by considering the initial SVR prediction as a feature extraction process and then employs the SVR output, which is the extracted feature, as its sole descriptor. The developed hybrid model is applied to the prediction of reservoir permeability and the predicted permeability is compared to core permeability which is regarded as standard in petroleum industry. The results show that the proposed hybrid scheme (HSVR performed better than the existing SVR in both generalization and prediction ability. The outcome of this research will assist petroleum engineers to effectively predict permeability of carbonate reservoirs with higher degree of accuracy and will invariably lead to better reservoir. Furthermore, the encouraging performance of this hybrid will serve as impetus for further exploring homogenous hybrid system.
Dai, Wensheng; Wu, Jui-Yu; Lu, Chi-Jie
2014-01-01
Sales forecasting is one of the most important issues in managing information technology (IT) chain store sales since an IT chain store has many branches. Integrating feature extraction method and prediction tool, such as support vector regression (SVR), is a useful method for constructing an effective sales forecasting scheme. Independent component analysis (ICA) is a novel feature extraction technique and has been widely applied to deal with various forecasting problems. But, up to now, only the basic ICA method (i.e., temporal ICA model) was applied to sale forecasting problem. In this paper, we utilize three different ICA methods including spatial ICA (sICA), temporal ICA (tICA), and spatiotemporal ICA (stICA) to extract features from the sales data and compare their performance in sales forecasting of IT chain store. Experimental results from a real sales data show that the sales forecasting scheme by integrating stICA and SVR outperforms the comparison models in terms of forecasting error. The stICA is a promising tool for extracting effective features from branch sales data and the extracted features can improve the prediction performance of SVR for sales forecasting.
Directory of Open Access Journals (Sweden)
Wensheng Dai
2014-01-01
Full Text Available Sales forecasting is one of the most important issues in managing information technology (IT chain store sales since an IT chain store has many branches. Integrating feature extraction method and prediction tool, such as support vector regression (SVR, is a useful method for constructing an effective sales forecasting scheme. Independent component analysis (ICA is a novel feature extraction technique and has been widely applied to deal with various forecasting problems. But, up to now, only the basic ICA method (i.e., temporal ICA model was applied to sale forecasting problem. In this paper, we utilize three different ICA methods including spatial ICA (sICA, temporal ICA (tICA, and spatiotemporal ICA (stICA to extract features from the sales data and compare their performance in sales forecasting of IT chain store. Experimental results from a real sales data show that the sales forecasting scheme by integrating stICA and SVR outperforms the comparison models in terms of forecasting error. The stICA is a promising tool for extracting effective features from branch sales data and the extracted features can improve the prediction performance of SVR for sales forecasting.
Dai, Wensheng
2014-01-01
Sales forecasting is one of the most important issues in managing information technology (IT) chain store sales since an IT chain store has many branches. Integrating feature extraction method and prediction tool, such as support vector regression (SVR), is a useful method for constructing an effective sales forecasting scheme. Independent component analysis (ICA) is a novel feature extraction technique and has been widely applied to deal with various forecasting problems. But, up to now, only the basic ICA method (i.e., temporal ICA model) was applied to sale forecasting problem. In this paper, we utilize three different ICA methods including spatial ICA (sICA), temporal ICA (tICA), and spatiotemporal ICA (stICA) to extract features from the sales data and compare their performance in sales forecasting of IT chain store. Experimental results from a real sales data show that the sales forecasting scheme by integrating stICA and SVR outperforms the comparison models in terms of forecasting error. The stICA is a promising tool for extracting effective features from branch sales data and the extracted features can improve the prediction performance of SVR for sales forecasting. PMID:25165740
Directory of Open Access Journals (Sweden)
Zhongwei Li
Full Text Available Welan gum is a kind of novel microbial polysaccharide, which is widely produced during the process of microbial growth and metabolism in different external conditions. Welan gum can be used as the thickener, suspending agent, emulsifier, stabilizer, lubricant, film-forming agent and adhesive usage in agriculture. In recent years, finding optimal experimental conditions to maximize the production is paid growing attentions. In this work, a hybrid computational method is proposed to optimize experimental conditions for producing Welan gum with data collected from experiments records. Support Vector Regression (SVR is used to model the relationship between Welan gum production and experimental conditions, and then adaptive Genetic Algorithm (AGA, for short is applied to search optimized experimental conditions. As results, a mathematic model of predicting production of Welan gum from experimental conditions is obtained, which achieves accuracy rate 88.36%. As well, a class of optimized experimental conditions is predicted for producing Welan gum 31.65g/L. Comparing the best result in chemical experiment 30.63g/L, the predicted production improves it by 3.3%. The results provide potential optimal experimental conditions to improve the production of Welan gum.
Li, Zhongwei; Yuan, Xiang; Cui, Xuerong; Liu, Xin; Wang, Leiquan; Zhang, Weishan; Lu, Qinghua; Zhu, Hu
2017-01-01
Welan gum is a kind of novel microbial polysaccharide, which is widely produced during the process of microbial growth and metabolism in different external conditions. Welan gum can be used as the thickener, suspending agent, emulsifier, stabilizer, lubricant, film-forming agent and adhesive usage in agriculture. In recent years, finding optimal experimental conditions to maximize the production is paid growing attentions. In this work, a hybrid computational method is proposed to optimize experimental conditions for producing Welan gum with data collected from experiments records. Support Vector Regression (SVR) is used to model the relationship between Welan gum production and experimental conditions, and then adaptive Genetic Algorithm (AGA, for short) is applied to search optimized experimental conditions. As results, a mathematic model of predicting production of Welan gum from experimental conditions is obtained, which achieves accuracy rate 88.36%. As well, a class of optimized experimental conditions is predicted for producing Welan gum 31.65g/L. Comparing the best result in chemical experiment 30.63g/L, the predicted production improves it by 3.3%. The results provide potential optimal experimental conditions to improve the production of Welan gum.
Model checking for ROC regression analysis.
Cai, Tianxi; Zheng, Yingye
2007-03-01
The receiver operating characteristic (ROC) curve is a prominent tool for characterizing the accuracy of a continuous diagnostic test. To account for factors that might influence the test accuracy, various ROC regression methods have been proposed. However, as in any regression analysis, when the assumed models do not fit the data well, these methods may render invalid and misleading results. To date, practical model-checking techniques suitable for validating existing ROC regression models are not yet available. In this article, we develop cumulative residual-based procedures to graphically and numerically assess the goodness of fit for some commonly used ROC regression models, and show how specific components of these models can be examined within this framework. We derive asymptotic null distributions for the residual processes and discuss resampling procedures to approximate these distributions in practice. We illustrate our methods with a dataset from the cystic fibrosis registry.
Estimation of Electrically-Evoked Knee Torque from Mechanomyography Using Support Vector Regression
Directory of Open Access Journals (Sweden)
Morufu Olusola Ibitoye
2016-07-01
Full Text Available The difficulty of real-time muscle force or joint torque estimation during neuromuscular electrical stimulation (NMES in physical therapy and exercise science has motivated recent research interest in torque estimation from other muscle characteristics. This study investigated the accuracy of a computational intelligence technique for estimating NMES-evoked knee extension torque based on the Mechanomyographic signals (MMG of contracting muscles that were recorded from eight healthy males. Simulation of the knee torque was modelled via Support Vector Regression (SVR due to its good generalization ability in related fields. Inputs to the proposed model were MMG amplitude characteristics, the level of electrical stimulation or contraction intensity, and knee angle. Gaussian kernel function, as well as its optimal parameters were identified with the best performance measure and were applied as the SVR kernel function to build an effective knee torque estimation model. To train and test the model, the data were partitioned into training (70% and testing (30% subsets, respectively. The SVR estimation accuracy, based on the coefficient of determination (R2 between the actual and the estimated torque values was up to 94% and 89% during the training and testing cases, with root mean square errors (RMSE of 9.48 and 12.95, respectively. The knee torque estimations obtained using SVR modelling agreed well with the experimental data from an isokinetic dynamometer. These findings support the realization of a closed-loop NMES system for functional tasks using MMG as the feedback signal source and an SVR algorithm for joint torque estimation.
Directory of Open Access Journals (Sweden)
Mattia Callegari
2015-05-01
Full Text Available In this contribution we analyze the performance of a monthly river discharge forecasting model with a Support Vector Regression (SVR technique in a European alpine area. We considered as predictors the discharges of the antecedent months, snow-covered area (SCA, and meteorological and climatic variables for 14 catchments in South Tyrol (Northern Italy, as well as the long-term average discharge of the month of prediction, also regarded as a benchmark. Forecasts at a six-month lead time tend to perform no better than the benchmark, with an average 33% relative root mean square error (RMSE% on test samples. However, at one month lead time, RMSE% was 22%, a non-negligible improvement over the benchmark; moreover, the SVR model reduces the frequency of higher errors associated with anomalous months. Predictions with a lead time of three months show an intermediate performance between those at one and six months lead time. Among the considered predictors, SCA alone reduces RMSE% to 6% and 5% compared to using monthly discharges only, for a lead time equal to one and three months, respectively, whereas meteorological parameters bring only minor improvements. The model also outperformed a simpler linear autoregressive model, and yielded the lowest volume error in forecasting with one month lead time, while at longer lead times the differences compared to the benchmarks are negligible. Our results suggest that although an SVR model may deliver better forecasts than its simpler linear alternatives, long lead-time hydrological forecasting in Alpine catchments remains a challenge. Catchment state variables may play a bigger role than catchment input variables; hence a focus on characterizing seasonal catchment storage—Rather than seasonal weather forecasting—Could be key for improving our predictive capacity.
Implicit Social Trust Dan Support Vector Regression Untuk Sistem Rekomendasi Berita
Directory of Open Access Journals (Sweden)
Melita Widya Ningrum
2018-01-01
Full Text Available Situs berita merupakan salah satu situs yang sering diakses masyarakat karena kemampuannya dalam menyajikan informasi terkini dari berbagai topik seperti olahraga, bisnis, politik, teknologi, kesehatan dan hiburan. Masyarakat dapat mencari dan melihat berita yang sedang populer dari seluruh dunia. Di sisi lain, melimpahnya artikel berita yang tersedia dapat menyulitkan pengguna dalam menemukan artikel berita yang sesuai dengan ketertarikannya. Pemilihan artikel berita yang ditampilkan ke halaman utama pengguna menjadi penting karena dapat meningkatkan minat pengguna untuk membaca artikel berita dari situs tersebut. Selain itu, pemilihan artikel berita yang sesuai dapat meminimalisir terjadinya banjir informasi yang tidak relevan. Dalam pemilihan artikel berita dibutuhkan sistem rekomendasi yang memiliki pengetahuan mengenai ketertarikan atau relevansi pengguna akan topik berita tertentu. Pada penelitian ini, peneliti membuat sistem rekomendasi artikel berita pada New York Times berbasis implicit social trust. Social trust dihasilkan dari interaksi antara pengguna dengan teman-temannya dan bobot kepercayaan teman pengguna pada media sosial Twitter. Data yang diambil merupakan data pengguna Twitter, teman dan jumlah interaksi antar pengguna berupa retweet. Sistem ini memanfaatkan algoritma Support Vector Regression untuk memberikan estimasi penilaian pengguna terhadap suatu topik tertentu. Hasil pengolahan data dengan Support Vector Regression menunjukkan tingkat akurasi dengan MAPE sebesar 0,8243075902233644%. Keywords : Twitter, Rekomendasi Berita, Social Trust, Support Vector Regression
Wu, Peilin; Zhang, Qunying; Fei, Chunjiao; Fang, Guangyou
2017-04-01
Aeromagnetic gradients are typically measured by optically pumped magnetometers mounted on an aircraft. Any aircraft, particularly helicopters, produces significant levels of magnetic interference. Therefore, aeromagnetic compensation is essential, and least square (LS) is the conventional method used for reducing interference levels. However, the LSs approach to solving the aeromagnetic interference model has a few difficulties, one of which is in handling multicollinearity. Therefore, we propose an aeromagnetic gradient compensation method, specifically targeted for helicopter use but applicable on any airborne platform, which is based on the ɛ-support vector regression algorithm. The structural risk minimization criterion intrinsic to the method avoids multicollinearity altogether. Local aeromagnetic anomalies can be retained, and platform-generated fields are suppressed simultaneously by constructing an appropriate loss function and kernel function. The method was tested using an unmanned helicopter and obtained improvement ratios of 12.7 and 3.5 in the vertical and horizontal gradient data, respectively. Both of these values are probably better than those that would have been obtained from the conventional method applied to the same data, had it been possible to do so in a suitable comparative context. The validity of the proposed method is demonstrated by the experimental result.
Directory of Open Access Journals (Sweden)
Quoc-Huy Phan
2013-01-01
Full Text Available Multipath mitigation is a long-standing problem in global positioning system (GPS research and is essential for improving the accuracy and precision of positioning solutions. In this work, we consider multipath error estimation as a regression problem and propose a unified framework for both code and carrier-phase multipath mitigation for ground fixed GPS stations. We use the kernel support vector machine to predict multipath errors, since it is known to potentially offer better-performance traditional models, such as neural networks. The predicted multipath error is then used to correct GPS measurements. We empirically show that the proposed method can reduce the code multipath error standard deviation up to 79% on average, which significantly outperforms other approaches in the literature. A comparative analysis of reduction of double-differential carrier-phase multipath error reveals that a 57% reduction is also achieved. Furthermore, by simulation, we also show that this method is robust to coexisting signals of phenomena (e.g., seismic signals we wish to preserve.
Nonparametric and semiparametric dynamic additive regression models
DEFF Research Database (Denmark)
Scheike, Thomas Harder; Martinussen, Torben
Dynamic additive regression models provide a flexible class of models for analysis of longitudinal data. The approach suggested in this work is suited for measurements obtained at random time points and aims at estimating time-varying effects. Both fully nonparametric and semiparametric models can...
Energy Technology Data Exchange (ETDEWEB)
Riaz, Nadeem; Wiersma, Rodney; Mao Weihua; Xing Lei [Department of Radiation Oncology, Stanford University, 875 Blake Wilbur Drive, Stanford, CA 94305-5847 (United States); Shanker, Piyush; Gudmundsson, Olafur; Widrow, Bernard [Department of Electrical Engineering, Stanford University, Stanford, CA 94305 (United States)], E-mail: nriaz@stanford.edu
2009-10-07
Intra-fraction tumor tracking methods can improve radiation delivery during radiotherapy sessions. Image acquisition for tumor tracking and subsequent adjustment of the treatment beam with gating or beam tracking introduces time latency and necessitates predicting the future position of the tumor. This study evaluates the use of multi-dimensional linear adaptive filters and support vector regression to predict the motion of lung tumors tracked at 30 Hz. We expand on the prior work of other groups who have looked at adaptive filters by using a general framework of a multiple-input single-output (MISO) adaptive system that uses multiple correlated signals to predict the motion of a tumor. We compare the performance of these two novel methods to conventional methods like linear regression and single-input, single-output adaptive filters. At 400 ms latency the average root-mean-square-errors (RMSEs) for the 14 treatment sessions studied using no prediction, linear regression, single-output adaptive filter, MISO and support vector regression are 2.58, 1.60, 1.58, 1.71 and 1.26 mm, respectively. At 1 s, the RMSEs are 4.40, 2.61, 3.34, 2.66 and 1.93 mm, respectively. We find that support vector regression most accurately predicts the future tumor position of the methods studied and can provide a RMSE of less than 2 mm at 1 s latency. Also, a multi-dimensional adaptive filter framework provides improved performance over single-dimension adaptive filters. Work is underway to combine these two frameworks to improve performance.
Santos, Frédéric; Guyomarc'h, Pierre; Bruzek, Jaroslav
2014-12-01
Accuracy of identification tools in forensic anthropology primarily rely upon the variations inherent in the data upon which they are built. Sex determination methods based on craniometrics are widely used and known to be specific to several factors (e.g. sample distribution, population, age, secular trends, measurement technique, etc.). The goal of this study is to discuss the potential variations linked to the statistical treatment of the data. Traditional craniometrics of four samples extracted from documented osteological collections (from Portugal, France, the U.S.A., and Thailand) were used to test three different classification methods: linear discriminant analysis (LDA), logistic regression (LR), and support vector machines (SVM). The Portuguese sample was set as a training model on which the other samples were applied in order to assess the validity and reliability of the different models. The tests were performed using different parameters: some included the selection of the best predictors; some included a strict decision threshold (sex assessed only if the related posterior probability was high, including the notion of indeterminate result); and some used an unbalanced sex-ratio. Results indicated that LR tends to perform slightly better than the other techniques and offers a better selection of predictors. Also, the use of a decision threshold (i.e. p>0.95) is essential to ensure an acceptable reliability of sex determination methods based on craniometrics. Although the Portuguese, French, and American samples share a similar sexual dimorphism, application of Western models on the Thai sample (that displayed a lower degree of dimorphism) was unsuccessful. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Applied Regression Modeling A Business Approach
Pardoe, Iain
2012-01-01
An applied and concise treatment of statistical regression techniques for business students and professionals who have little or no background in calculusRegression analysis is an invaluable statistical methodology in business settings and is vital to model the relationship between a response variable and one or more predictor variables, as well as the prediction of a response value given values of the predictors. In view of the inherent uncertainty of business processes, such as the volatility of consumer spending and the presence of market uncertainty, business professionals use regression a
Directory of Open Access Journals (Sweden)
Abustan Abustan
2009-06-01
Full Text Available Vector Auto Regression (VAR is an analysis or statistic method which can be used to predict time series variable and to analyst dynamic impact of disturbance factor in the variable system. In addition, VAR analysis is very useful to assess the interrelationship between economic variables. This research through the following test phases: unit root test, test of hypothesis, Granger causality test, and form a vector autoregresion model (VAR. The data used in this research is the GDP data and budget data of South Sulawesi in the period 1985-2004. The research aims to analyze the interrelationship between public expenditure and economic growth in South Sulawesi. The result showed statistically significant in economic growth (PDRB influence public expenditure (APBD, however, not vice versa. Otherwise, for the need of APBD prediction, the used of lag 4 was the optimum model based on the causal relationship to PDRB.
Regression models for predicting anthropometric measurements of ...
African Journals Online (AJOL)
... System (ANFIS) was employed to select the two most influential of the five input measurements. This search was separately conducted for each of the output measurements. Regression models were developed from the collected anthropometric data. Also, the predictive performance of these models was examined using ...
A Logistic Regression Model for Personnel Selection.
Raju, Nambury S.; And Others
1991-01-01
A two-parameter logistic regression model for personnel selection is proposed. The model was tested with a database of 84,808 military enlistees. The probability of job success was related directly to trait levels, addressing such topics as selection, validity generalization, employee classification, selection bias, and utility-based fair…
Semantic Vector Space Model: Implementation and Evaluation.
Liu, Geoffrey Z.
1997-01-01
Presents the Semantic Vector Space Model, a text representation and searching technique based on the combination of Vector Space Model with heuristic syntax parsing and distributed representation of semantic case structures. In this model, both documents and queries are represented as semantic matrices, and retrieval is achieved by computing…
Multiple regression modeling of nonlinear data sets
Kravtsov, S.; Kondrashov, D.; Ghil, M.
2003-04-01
Application of multiple polynomial regression modeling to observational and model generated data sets is discussed. Here the form of classical multiple linear regression is generalized to a model that is still linear in its parameters, but includes general multivariate polynomials of predictor variables as the basis functions. The system's low-frequency evolution is assumed to be the result of deterministic, possibly nonlinear, dynamics excited by a temporally white, but geographically coherent and normally distributed white noise. In determining the appropriate structure of the latter, the multi-level generalization of multiple polynomial regression, where the residual stochastic forcing at a given level is subsequently modeled as a function of variables at this, and all preceding levels, has turned out to be useful. The number of levels is determined so that lag-0 covariance of the residual forcing converges to a constant matrix, while its lag-1 covariance vanishes. The method has been applied to the output from a three-layer quasi-geostrophic model, to the analysis of the Northern Hemisphere wintertime geopotential height anomalies, and to global sea-surface temperature (SST) data. In the former two cases, the nonlinear multi-regime structure of probability density function (PDF) constructed in the phase subspace of a few leading empirical orthogonal functions (EOFs), as well as the detailed spectrum of the data's temporal evolution, have been well reproduced by the regression simulations. We have given a simple dynamical interpretation of these results in terms of synoptic-eddy feedback on the system's low-frequency variability. In modeling of SST data, a simple way to include the seasonal cycle into the regression model has been developed. The regression simulation in this case produces ENSO events with maximum amplitude in December/January, while the positive events generally tend to have a larger amplitude than the negative events -- a feature that cannot be
Mixed-effects regression models in linguistics
Heylen, Kris; Geeraerts, Dirk
2018-01-01
When data consist of grouped observations or clusters, and there is a risk that measurements within the same group are not independent, group-specific random effects can be added to a regression model in order to account for such within-group associations. Regression models that contain such group-specific random effects are called mixed-effects regression models, or simply mixed models. Mixed models are a versatile tool that can handle both balanced and unbalanced datasets and that can also be applied when several layers of grouping are present in the data; these layers can either be nested or crossed. In linguistics, as in many other fields, the use of mixed models has gained ground rapidly over the last decade. This methodological evolution enables us to build more sophisticated and arguably more realistic models, but, due to its technical complexity, also introduces new challenges. This volume brings together a number of promising new evolutions in the use of mixed models in linguistics, but also addres...
DOA Finding with Support Vector Regression Based Forward-Backward Linear Prediction.
Pan, Jingjing; Wang, Yide; Le Bastard, Cédric; Wang, Tianzhen
2017-05-27
Direction-of-arrival (DOA) estimation has drawn considerable attention in array signal processing, particularly with coherent signals and a limited number of snapshots. Forward-backward linear prediction (FBLP) is able to directly deal with coherent signals. Support vector regression (SVR) is robust with small samples. This paper proposes the combination of the advantages of FBLP and SVR in the estimation of DOAs of coherent incoming signals with low snapshots. The performance of the proposed method is validated with numerical simulations in coherent scenarios, in terms of different angle separations, numbers of snapshots, and signal-to-noise ratios (SNRs). Simulation results show the effectiveness of the proposed method.
DOA Finding with Support Vector Regression Based Forward–Backward Linear Prediction
Directory of Open Access Journals (Sweden)
Jingjing Pan
2017-05-01
Full Text Available Direction-of-arrival (DOA estimation has drawn considerable attention in array signal processing, particularly with coherent signals and a limited number of snapshots. Forward–backward linear prediction (FBLP is able to directly deal with coherent signals. Support vector regression (SVR is robust with small samples. This paper proposes the combination of the advantages of FBLP and SVR in the estimation of DOAs of coherent incoming signals with low snapshots. The performance of the proposed method is validated with numerical simulations in coherent scenarios, in terms of different angle separations, numbers of snapshots, and signal-to-noise ratios (SNRs. Simulation results show the effectiveness of the proposed method.
A Vector AutoRegressive (VAR) Approach to the Credit Channel for ...
African Journals Online (AJOL)
This paper is an attempt to determine the presence and empirical significance of monetary policy and the bank lending view of the credit channel for Mauritius, which is particularly relevant at these times. A vector autoregressive (VAR) model of order three is used to examine the monetary transmission mechanism using ...
Linear Regression Models for Estimating True Subsurface ...
Indian Academy of Sciences (India)
47
For the fact that subsurface resistivity is nonlinear, the datasets were first. 14 transformed into logarithmic scale to satisfy the basic regression assumptions. Three. 15 models, one each for the three array types, are thus developed based on simple linear. 16 relationships between the dependent and independent variables.
A Skew-Normal Mixture Regression Model
Liu, Min; Lin, Tsung-I
2014-01-01
A challenge associated with traditional mixture regression models (MRMs), which rest on the assumption of normally distributed errors, is determining the number of unobserved groups. Specifically, even slight deviations from normality can lead to the detection of spurious classes. The current work aims to (a) examine how sensitive the commonly…
OPTIMAL DESIGNS FOR SPLINE WAVELET REGRESSION MODELS.
Maronge, Jacob M; Zhai, Yi; Wiens, Douglas P; Fang, Zhide
2017-05-01
In this article we investigate the optimal design problem for some wavelet regression models. Wavelets are very flexible in modeling complex relations, and optimal designs are appealing as a means of increasing the experimental precision. In contrast to the designs for the Haar wavelet regression model (Herzberg and Traves 1994; Oyet and Wiens 2000), the I-optimal designs we construct are different from the D-optimal designs. We also obtain c-optimal designs. Optimal (D- and I-) quadratic spline wavelet designs are constructed, both analytically and numerically. A case study shows that a significant saving of resources may be realized by employing an optimal design. We also construct model robust designs, to address response misspecification arising from fitting an incomplete set of wavelets.
Linear and support vector regressions based on geometrical correlation of data
Directory of Open Access Journals (Sweden)
Kaijun Wang
2007-10-01
Full Text Available Linear regression (LR and support vector regression (SVR are widely used in data analysis. Geometrical correlation learning (GcLearn was proposed recently to improve the predictive ability of LR and SVR through mining and using correlations between data of a variable (inner correlation. This paper theoretically analyzes prediction performance of the GcLearn method and proves that GcLearn LR and SVR will have better prediction performance than traditional LR and SVR for prediction tasks when good inner correlations are obtained and predictions by traditional LR and SVR are far away from their neighbor training data under inner correlation. This gives the applicable condition of GcLearn method.
Multiple Imputations for Linear Regression Models
Brownstone, David
1991-01-01
Rubin (1987) has proposed multiple imputations as a general method for estimation in the presence of missing data. Rubinâ€™s results only strictly apply to Bayesian models, but Schenker and Welsh (1988) directly prove the consistency Â multiple imputations inference~ when there are missing values of the dependent variable in linear regression models. This paper extends and modifies Schenker and Welshâ€™s theorems to give conditions where multiple imputations yield consistent inferences for bo...
Influence diagnostics in meta-regression model.
Shi, Lei; Zuo, ShanShan; Yu, Dalei; Zhou, Xiaohua
2017-09-01
This paper studies the influence diagnostics in meta-regression model including case deletion diagnostic and local influence analysis. We derive the subset deletion formulae for the estimation of regression coefficient and heterogeneity variance and obtain the corresponding influence measures. The DerSimonian and Laird estimation and maximum likelihood estimation methods in meta-regression are considered, respectively, to derive the results. Internal and external residual and leverage measure are defined. The local influence analysis based on case-weights perturbation scheme, responses perturbation scheme, covariate perturbation scheme, and within-variance perturbation scheme are explored. We introduce a method by simultaneous perturbing responses, covariate, and within-variance to obtain the local influence measure, which has an advantage of capable to compare the influence magnitude of influential studies from different perturbations. An example is used to illustrate the proposed methodology. Copyright © 2017 John Wiley & Sons, Ltd.
Directory of Open Access Journals (Sweden)
Young Do Koo
2017-06-01
Full Text Available Residual stress is a critical element in determining the integrity of parts and the lifetime of welded structures. It is necessary to estimate the residual stress of a welding zone because residual stress is a major reason for the generation of primary water stress corrosion cracking in nuclear power plants. That is, it is necessary to estimate the distribution of the residual stress in welding of dissimilar metals under manifold welding conditions. In this study, a cascaded support vector regression (CSVR model was presented to estimate the residual stress of a welding zone. The CSVR model was serially and consecutively structured in terms of SVR modules. Using numerical data obtained from finite element analysis by a subtractive clustering method, learning data that explained the characteristic behavior of the residual stress of a welding zone were selected to optimize the proposed model. The results suggest that the CSVR model yielded a better estimation performance when compared with a classic SVR model.
Directory of Open Access Journals (Sweden)
Jatin Alreja
2015-06-01
Full Text Available This paper uses Multivariate Adaptive Regression Spline (MARS and Least Squares Support Vector Machines (LSSVMs to predict hysteretic energy demand in steel moment resisting frames. These models are used to establish a relation between the hysteretic energy demand and several effective parameters such as earthquake intensity, number of stories, soil type, period, strength index, and the energy imparted to the structure. A total of 27 datasets (input–output pairs are used, 23 of which are used to train the model and 4 are used to test the models. The data-sets used in this study are derived from experimental results. The performance and validity of the model are further tested on different steel moment resisting structures. The developed models have been compared with Genetic-based simulated annealing method (GSA and accurate results portray the strong potential of MARS and LSSVM as reliable tools to predict the hysteretic energy demand.
Directory of Open Access Journals (Sweden)
Hongjian Wang
2014-01-01
Full Text Available We present a support vector regression-based adaptive divided difference filter (SVRADDF algorithm for improving the low state estimation accuracy of nonlinear systems, which are typically affected by large initial estimation errors and imprecise prior knowledge of process and measurement noises. The derivative-free SVRADDF algorithm is significantly simpler to compute than other methods and is implemented using only functional evaluations. The SVRADDF algorithm involves the use of the theoretical and actual covariance of the innovation sequence. Support vector regression (SVR is employed to generate the adaptive factor to tune the noise covariance at each sampling instant when the measurement update step executes, which improves the algorithm’s robustness. The performance of the proposed algorithm is evaluated by estimating states for (i an underwater nonmaneuvering target bearing-only tracking system and (ii maneuvering target bearing-only tracking in an air-traffic control system. The simulation results show that the proposed SVRADDF algorithm exhibits better performance when compared with a traditional DDF algorithm.
Yang, Chien-Chun; Nagarajan, Mahesh B.; Huber, Markus B.; Carballido-Gamio, Julio; Bauer, Jan S.; Baum, Thomas; Eckstein, Felix; Lochmüller, Eva-Maria; Link, Thomas M.; Wismüller, Axel
2014-03-01
Regional trabecular bone quality estimation for purposes of femoral bone strength prediction is important for improving the clinical assessment of osteoporotic fracture risk. In this study, we explore the ability of 3D Minkowski Functionals derived from multi-detector computed tomography (MDCT) images of proximal femur specimens in predicting their corresponding biomechanical strength. MDCT scans were acquired for 50 proximal femur specimens harvested from human cadavers. An automated volume of interest (VOI)-fitting algorithm was used to define a consistent volume in the femoral head of each specimen. In these VOIs, the trabecular bone micro-architecture was characterized by statistical moments of its BMD distribution and by topological features derived from Minkowski Functionals. A linear multiregression analysis and a support vector regression (SVR) algorithm with a linear kernel were used to predict the failure load (FL) from the feature sets; the predicted FL was compared to the true FL determined through biomechanical testing. The prediction performance was measured by the root mean square error (RMSE) for each feature set. The best prediction result was obtained from the Minkowski Functional surface used in combination with SVR, which had the lowest prediction error (RMSE = 0.939 ± 0.345) and which was significantly lower than mean BMD (RMSE = 1.075 ± 0.279, pfemur specimens with Minkowski Functionals extracted from on MDCT images used in conjunction with support vector regression.
Eisavi, Vahid; Homayouni, Saeid
2016-10-01
Information on land use and land cover changes is considered as a foremost requirement for monitoring environmental change. Developing change detection methodology in the remote sensing community is an active research topic. However, to the best of our knowledge, no research has been conducted so far on the application of random forest regression (RFR) and support vector regression (SVR) for natural hazard change detection from high-resolution optical remote sensing observations. Hence, the objective of this study is to examine the use of RFR and SVR to discriminate between changed and unchanged areas after a tsunami. For this study, RFR and SVR were applied to two different pilot coastlines in Indonesia and Japan. Two different remotely sensed data sets acquired by Quickbird and Ikonos sensors were used for efficient evaluation of the proposed methodology. The results demonstrated better performance of SVM compared to random forest (RF) with an overall accuracy higher by 3% to 4% and kappa coefficient by 0.05 to 0.07. Using McNemar's test, statistically significant differences (Z≥1.96), at the 5% significance level, between the confusion matrices of the RF classifier and the support vector classifier were observed in both study areas. The high accuracy of change detection obtained in this study confirms that these methods have the potential to be used for detecting changes due to natural hazards.
Geographically weighted regression model on poverty indicator
Slamet, I.; Nugroho, N. F. T. A.; Muslich
2017-12-01
In this research, we applied geographically weighted regression (GWR) for analyzing the poverty in Central Java. We consider Gaussian Kernel as weighted function. The GWR uses the diagonal matrix resulted from calculating kernel Gaussian function as a weighted function in the regression model. The kernel weights is used to handle spatial effects on the data so that a model can be obtained for each location. The purpose of this paper is to model of poverty percentage data in Central Java province using GWR with Gaussian kernel weighted function and to determine the influencing factors in each regency/city in Central Java province. Based on the research, we obtained geographically weighted regression model with Gaussian kernel weighted function on poverty percentage data in Central Java province. We found that percentage of population working as farmers, population growth rate, percentage of households with regular sanitation, and BPJS beneficiaries are the variables that affect the percentage of poverty in Central Java province. In this research, we found the determination coefficient R2 are 68.64%. There are two categories of district which are influenced by different of significance factors.
Regularized multivariate regression models with skew-t error distributions
Chen, Lianfu
2014-06-01
We consider regularization of the parameters in multivariate linear regression models with the errors having a multivariate skew-t distribution. An iterative penalized likelihood procedure is proposed for constructing sparse estimators of both the regression coefficient and inverse scale matrices simultaneously. The sparsity is introduced through penalizing the negative log-likelihood by adding L1-penalties on the entries of the two matrices. Taking advantage of the hierarchical representation of skew-t distributions, and using the expectation conditional maximization (ECM) algorithm, we reduce the problem to penalized normal likelihood and develop a procedure to minimize the ensuing objective function. Using a simulation study the performance of the method is assessed, and the methodology is illustrated using a real data set with a 24-dimensional response vector. © 2014 Elsevier B.V.
Ordinal regression by a generalized force-based model.
Fernandez-Navarro, Francisco; Riccardi, Annalisa; Carloni, Sante
2015-04-01
This paper introduces a new instance-based algorithm for multiclass classification problems where the classes have a natural order. The proposed algorithm extends the state-of-the-art gravitational models by generalizing the scaling behavior of the class-pattern interaction force. Like the other gravitational models, the proposed algorithm classifies new patterns by comparing the magnitude of the force that each class exerts on a given pattern. To address ordinal problems, the algorithm assumes that, given a pattern, the forces associated to each class follow a unimodal distribution. For this reason, a weight matrix that allows to modify the metric in the attributes space and a vector of parameters that allows to modify the force law for each class have been introduced in the model definition. Furthermore, a probabilistic formulation of the error function allows the estimation of the model parameters using global and local optimization procedures toward minimization of the errors and penalization of the non unimodal outputs. One of the strengths of the model is its competitive grade of interpretability which is a requisite in most of real applications. The proposed algorithm is compared to other well-known ordinal regression algorithms on discretized regression datasets and real ordinal regression datasets. Experimental results demonstrate that the proposed algorithm can achieve competitive generalization performance and it is validated using nonparametric statistical tests.
Adaptive regression for modeling nonlinear relationships
Knafl, George J
2016-01-01
This book presents methods for investigating whether relationships are linear or nonlinear and for adaptively fitting appropriate models when they are nonlinear. Data analysts will learn how to incorporate nonlinearity in one or more predictor variables into regression models for different types of outcome variables. Such nonlinear dependence is often not considered in applied research, yet nonlinear relationships are common and so need to be addressed. A standard linear analysis can produce misleading conclusions, while a nonlinear analysis can provide novel insights into data, not otherwise possible. A variety of examples of the benefits of modeling nonlinear relationships are presented throughout the book. Methods are covered using what are called fractional polynomials based on real-valued power transformations of primary predictor variables combined with model selection based on likelihood cross-validation. The book covers how to formulate and conduct such adaptive fractional polynomial modeling in the s...
General regression and representation model for classification.
Directory of Open Access Journals (Sweden)
Jianjun Qian
Full Text Available Recently, the regularized coding-based classification methods (e.g. SRC and CRC show a great potential for pattern classification. However, most existing coding methods assume that the representation residuals are uncorrelated. In real-world applications, this assumption does not hold. In this paper, we take account of the correlations of the representation residuals and develop a general regression and representation model (GRR for classification. GRR not only has advantages of CRC, but also takes full use of the prior information (e.g. the correlations between representation residuals and representation coefficients and the specific information (weight matrix of image pixels to enhance the classification performance. GRR uses the generalized Tikhonov regularization and K Nearest Neighbors to learn the prior information from the training data. Meanwhile, the specific information is obtained by using an iterative algorithm to update the feature (or image pixel weights of the test sample. With the proposed model as a platform, we design two classifiers: basic general regression and representation classifier (B-GRR and robust general regression and representation classifier (R-GRR. The experimental results demonstrate the performance advantages of proposed methods over state-of-the-art algorithms.
Bayesian Inference of a Multivariate Regression Model
Directory of Open Access Journals (Sweden)
Marick S. Sinay
2014-01-01
Full Text Available We explore Bayesian inference of a multivariate linear regression model with use of a flexible prior for the covariance structure. The commonly adopted Bayesian setup involves the conjugate prior, multivariate normal distribution for the regression coefficients and inverse Wishart specification for the covariance matrix. Here we depart from this approach and propose a novel Bayesian estimator for the covariance. A multivariate normal prior for the unique elements of the matrix logarithm of the covariance matrix is considered. Such structure allows for a richer class of prior distributions for the covariance, with respect to strength of beliefs in prior location hyperparameters, as well as the added ability, to model potential correlation amongst the covariance structure. The posterior moments of all relevant parameters of interest are calculated based upon numerical results via a Markov chain Monte Carlo procedure. The Metropolis-Hastings-within-Gibbs algorithm is invoked to account for the construction of a proposal density that closely matches the shape of the target posterior distribution. As an application of the proposed technique, we investigate a multiple regression based upon the 1980 High School and Beyond Survey.
Applying support vector regression analysis on grip force level-related corticomuscular coherence
DEFF Research Database (Denmark)
Rong, Yao; Han, Xixuan; Hao, Dongmei
2014-01-01
Voluntary motor performance is the result of cortical commands driving muscle actions. Corticomuscular coherence can be used to examine the functional coupling or communication between human brain and muscles. To investigate the effects of grip force level on corticomuscular coherence in an acces......Voluntary motor performance is the result of cortical commands driving muscle actions. Corticomuscular coherence can be used to examine the functional coupling or communication between human brain and muscles. To investigate the effects of grip force level on corticomuscular coherence...... in an accessory muscle, this study proposed an expanded support vector regression (ESVR) algorithm to quantify the coherence between electroencephalogram (EEG) from sensorimotor cortex and surface electromyogram (EMG) from brachioradialis in upper limb. A measure called coherence proportion was introduced...... is more sensitive to grip force level than coherence area. The significantly higher corticomuscular coherence occurred in the alpha (pcontrol the activity...
Regression Models For Saffron Yields in Iran
S. H, Sanaeinejad; S. N, Hosseini
Saffron is an important crop in social and economical aspects in Khorassan Province (Northeast of Iran). In this research wetried to evaluate trends of saffron yield in recent years and to study the relationship between saffron yield and the climate change. A regression analysis was used to predict saffron yield based on 20 years of yield data in Birjand, Ghaen and Ferdows cities.Climatologically data for the same periods was provided by database of Khorassan Climatology Center. Climatologically data includedtemperature, rainfall, relative humidity and sunshine hours for ModelI, and temperature and rainfall for Model II. The results showed the coefficients of determination for Birjand, Ferdows and Ghaen for Model I were 0.69, 0.50 and 0.81 respectively. Also coefficients of determination for the same cities for model II were 0.53, 0.50 and 0.72 respectively. Multiple regression analysisindicated that among weather variables, temperature was the key parameter for variation ofsaffron yield. It was concluded that increasing temperature at spring was the main cause of declined saffron yield during recent years across the province. Finally, yield trend was predicted for the last 5 years using time series analysis.
Directory of Open Access Journals (Sweden)
Jun Shuai
2017-01-01
Full Text Available Numerous studies on fault diagnosis have been conducted in recent years because the timely and correct detection of machine fault effectively minimizes the damage resulting in the unexpected breakdown of machineries. The mathematical morphological analysis has been performed to denoise raw signal. However, the improper choice of the length of the structure element (SE will substantially influence the effectiveness of fault feature extraction. Moreover, the classification of fault type is a significant step in intelligent fault diagnosis, and many techniques have already been developed, such as support vector machine (SVM. This study proposes an intelligent fault diagnosis strategy that combines the extraction of morphological feature and support vector regression (SVR classifier. The vibration signal is first processed using various scales of morphological analysis, where the length of SE is determined adaptively. Thereafter, nine statistical features are extracted from the processed signal. Lastly, an SVR classifier is used to identify the health condition of the machinery. The effectiveness of the proposed scheme is validated using the data set from a bearing test rig. Results show the high accuracy of the proposed method despite the influence of noise.
Balfer, Jenny; Bajorath, Jürgen
2015-01-01
Support vector machines are a popular machine learning method for many classification tasks in biology and chemistry. In addition, the support vector regression (SVR) variant is widely used for numerical property predictions. In chemoinformatics and pharmaceutical research, SVR has become the probably most popular approach for modeling of non-linear structure-activity relationships (SARs) and predicting compound potency values. Herein, we have systematically generated and analyzed SVR prediction models for a variety of compound data sets with different SAR characteristics. Although these SVR models were accurate on the basis of global prediction statistics and not prone to overfitting, they were found to consistently mispredict highly potent compounds. Hence, in regions of local SAR discontinuity, SVR prediction models displayed clear limitations. Compared to observed activity landscapes of compound data sets, landscapes generated on the basis of SVR potency predictions were partly flattened and activity cliff information was lost. Taken together, these findings have implications for practical SVR applications. In particular, prospective SVR-based potency predictions should be considered with caution because artificially low predictions are very likely for highly potent candidate compounds, the most important prediction targets.
Directory of Open Access Journals (Sweden)
Jenny Balfer
Full Text Available Support vector machines are a popular machine learning method for many classification tasks in biology and chemistry. In addition, the support vector regression (SVR variant is widely used for numerical property predictions. In chemoinformatics and pharmaceutical research, SVR has become the probably most popular approach for modeling of non-linear structure-activity relationships (SARs and predicting compound potency values. Herein, we have systematically generated and analyzed SVR prediction models for a variety of compound data sets with different SAR characteristics. Although these SVR models were accurate on the basis of global prediction statistics and not prone to overfitting, they were found to consistently mispredict highly potent compounds. Hence, in regions of local SAR discontinuity, SVR prediction models displayed clear limitations. Compared to observed activity landscapes of compound data sets, landscapes generated on the basis of SVR potency predictions were partly flattened and activity cliff information was lost. Taken together, these findings have implications for practical SVR applications. In particular, prospective SVR-based potency predictions should be considered with caution because artificially low predictions are very likely for highly potent candidate compounds, the most important prediction targets.
Multitask Quantile Regression under the Transnormal Model.
Fan, Jianqing; Xue, Lingzhou; Zou, Hui
2016-01-01
We consider estimating multi-task quantile regression under the transnormal model, with focus on high-dimensional setting. We derive a surprisingly simple closed-form solution through rank-based covariance regularization. In particular, we propose the rank-based ℓ1 penalization with positive definite constraints for estimating sparse covariance matrices, and the rank-based banded Cholesky decomposition regularization for estimating banded precision matrices. By taking advantage of alternating direction method of multipliers, nearest correlation matrix projection is introduced that inherits sampling properties of the unprojected one. Our work combines strengths of quantile regression and rank-based covariance regularization to simultaneously deal with nonlinearity and nonnormality for high-dimensional regression. Furthermore, the proposed method strikes a good balance between robustness and efficiency, achieves the "oracle"-like convergence rate, and provides the provable prediction interval under the high-dimensional setting. The finite-sample performance of the proposed method is also examined. The performance of our proposed rank-based method is demonstrated in a real application to analyze the protein mass spectroscopy data.
2014-01-01
Sales forecasting plays an important role in operating a business since it can be used to determine the required inventory level to meet consumer demand and avoid the problem of under/overstocking. Improving the accuracy of sales forecasting has become an important issue of operating a business. This study proposes a hybrid sales forecasting scheme by combining independent component analysis (ICA) with K-means clustering and support vector regression (SVR). The proposed scheme first uses the ICA to extract hidden information from the observed sales data. The extracted features are then applied to K-means algorithm for clustering the sales data into several disjoined clusters. Finally, the SVR forecasting models are applied to each group to generate final forecasting results. Experimental results from information technology (IT) product agent sales data reveal that the proposed sales forecasting scheme outperforms the three comparison models and hence provides an efficient alternative for sales forecasting. PMID:25045738
Directory of Open Access Journals (Sweden)
Chi-Jie Lu
2014-01-01
Full Text Available Sales forecasting plays an important role in operating a business since it can be used to determine the required inventory level to meet consumer demand and avoid the problem of under/overstocking. Improving the accuracy of sales forecasting has become an important issue of operating a business. This study proposes a hybrid sales forecasting scheme by combining independent component analysis (ICA with K-means clustering and support vector regression (SVR. The proposed scheme first uses the ICA to extract hidden information from the observed sales data. The extracted features are then applied to K-means algorithm for clustering the sales data into several disjoined clusters. Finally, the SVR forecasting models are applied to each group to generate final forecasting results. Experimental results from information technology (IT product agent sales data reveal that the proposed sales forecasting scheme outperforms the three comparison models and hence provides an efficient alternative for sales forecasting.
Energy Technology Data Exchange (ETDEWEB)
Feng Wu; Hao Zhou; Tao Ren; Ligang Zheng; Kefa Cen [Zhejiang University, Hangzhou (China). State Key Laboratory of Clean Energy Utilization
2009-10-15
Support vector regression (SVR) was employed to establish mathematical models for the NOx emissions and carbon burnout of a 300 MW coal-fired utility boiler. Combined with the SVR models, the cellular genetic algorithm for multi-objective optimization (MOCell) was used for multi-objective optimization of the boiler combustion. Meanwhile, the comparison between MOCell and the improved non-dominated sorting genetic algorithm (NSGA-II) shows that MOCell has superior performance to NSGA-II regarding the problem. The field experiments were carried out to verify the accuracy of the results obtained by MOCell, the results were in good agreement with the measurement data. The proposed approach provides an effective tool for multi-objective optimization of coal combustion performance, whose feasibility and validity are experimental validated. A time period of less than 4 s was required for a run of optimization under a PC system, which is suitable for the online application. 19 refs., 8 figs., 2 tabs.
Inferring gene regression networks with model trees
Directory of Open Access Journals (Sweden)
Aguilar-Ruiz Jesus S
2010-10-01
Full Text Available Abstract Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear
Cox regression model with doubly truncated data.
Rennert, Lior; Xie, Sharon X
2017-10-26
Truncation is a well-known phenomenon that may be present in observational studies of time-to-event data. While many methods exist to adjust for either left or right truncation, there are very few methods that adjust for simultaneous left and right truncation, also known as double truncation. We propose a Cox regression model to adjust for this double truncation using a weighted estimating equation approach, where the weights are estimated from the data both parametrically and nonparametrically, and are inversely proportional to the probability that a subject is observed. The resulting weighted estimators of the hazard ratio are consistent. The parametric weighted estimator is asymptotically normal and a consistent estimator of the asymptotic variance is provided. For the nonparametric weighted estimator, we apply the bootstrap technique to estimate the variance and confidence intervals. We demonstrate through extensive simulations that the proposed estimators greatly reduce the bias compared to the unweighted Cox regression estimator which ignores truncation. We illustrate our approach in an analysis of autopsy-confirmed Alzheimer's disease patients to assess the effect of education on survival. © 2017, The International Biometric Society.
Directory of Open Access Journals (Sweden)
A. Faridi
2013-11-01
Full Text Available Support vector regression (SVR is used in this study to develop models to estimate apparent metabolizable energy (AME, AME corrected for nitrogen (AMEn, true metabolizable energy (TME, and TME corrected for nitrogen (TMEn contents of corn fed to ducks based on its chemical composition. Performance of the SVR models was assessed by comparing their results with those of artificial neural network (ANN and multiple linear regression (MLR models. The input variables to estimate metabolizable energy content (MJ kg-1 of corn were crude protein, ether extract, crude fibre, and ash (g kg-1. Goodness of fit of the models was examined using R2, mean square error, and bias. Based on these indices, the predictive performance of the SVR, ANN, and MLR models was acceptable. Comparison of models indicated that performance of SVR (in terms of R2 on the full data set (0.937 for AME, 0.954 for AMEn, 0.860 for TME, and 0.937 for TMEn was better than that of ANN (0.907 for AME, 0.922 for AMEn, 0.744 for TME, and 0.920 for TMEn and MLR (0.887 for AME, 0.903 for AMEn, 0.704 for TME, and 0.902 for TMEn. Similar findings were observed with the calibration and testing data sets. These results suggest SVR models are a promising tool for modelling the relationship between chemical composition and metabolizable energy of feedstuffs for poultry. Although from the present results the application of SVR models seems encouraging, the use of such models in other areas of animal nutrition needs to be evaluated.
Entrepreneurial intention modeling using hierarchical multiple regression
Directory of Open Access Journals (Sweden)
Marina Jeger
2014-12-01
Full Text Available The goal of this study is to identify the contribution of effectuation dimensions to the predictive power of the entrepreneurial intention model over and above that which can be accounted for by other predictors selected and confirmed in previous studies. As is often the case in social and behavioral studies, some variables are likely to be highly correlated with each other. Therefore, the relative amount of variance in the criterion variable explained by each of the predictors depends on several factors such as the order of variable entry and sample specifics. The results show the modest predictive power of two dimensions of effectuation prior to the introduction of the theory of planned behavior elements. The article highlights the main advantages of applying hierarchical regression in social sciences as well as in the specific context of entrepreneurial intention formation, and addresses some of the potential pitfalls that this type of analysis entails.
An Additive-Multiplicative Cox-Aalen Regression Model
DEFF Research Database (Denmark)
Scheike, Thomas H.; Zhang, Mei-Jie
2002-01-01
Aalen model; additive risk model; counting processes; Cox regression; survival analysis; time-varying effects......Aalen model; additive risk model; counting processes; Cox regression; survival analysis; time-varying effects...
Chen, Hai-Feng
2009-08-01
Oil/water partition coefficient (log P) is one of the key points for lead compound to be drug. In silico log P models based solely on chemical structures have become an important part of modern drug discovery. Here, we report support vector machines, radial basis function neural networks, and multiple linear regression methods to investigate the correlation between partition coefficient and physico-chemical descriptors for a large data set of compounds. The correlation coefficient r(2) between experimental and predicted log P for training and test sets by support vector machines, radial basis function neural networks, and multiple linear regression is 0.92, 0.90, and 0.88, respectively. The results show that non-linear support vector machines derives statistical models that have better prediction ability than those of radial basis function neural networks and multiple linear regression methods. This indicates that support vector machines can be used as an alternative modeling tool for quantitative structure-property/activity relationships studies.
Boosted Regression Tree Models to Explain Watershed ...
Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on the Index of Biotic Integrity (IBI), were also analyzed. Seasonal BRT models at two spatial scales (watershed and riparian buffered area [RBA]) for nitrite-nitrate (NO2-NO3), total Kjeldahl nitrogen, and total phosphorus (TP) and annual models for the IBI score were developed. Two primary factors — location within the watershed (i.e., geographic position, stream order, and distance to a downstream confluence) and percentage of urban land cover (both scales) — emerged as important predictor variables. Latitude and longitude interacted with other factors to explain the variability in summer NO2-NO3 concentrations and IBI scores. BRT results also suggested that location might be associated with indicators of sources (e.g., land cover), runoff potential (e.g., soil and topographic factors), and processes not easily represented by spatial data indicators. Runoff indicators (e.g., Hydrological Soil Group D and Topographic Wetness Indices) explained a substantial portion of the variability in nutrient concentrations as did point sources for TP in the summer months. The results from our BRT approach can help prioritize areas for nutrient management in mixed-use and heavily impacted watershed
A vector model for error propagation
Energy Technology Data Exchange (ETDEWEB)
Smith, D.L.; Geraldo, L.P.
1989-03-01
A simple vector model for error propagation, which is entirely equivalent to the conventional statistical approach, is discussed. It offers considerable insight into the nature of error propagation while, at the same time, readily demonstrating the significance of uncertainty correlations. This model is well suited to the analysis of error for sets of neutron-induced reaction cross sections. 7 refs., 1 fig.
Directory of Open Access Journals (Sweden)
Naradasu Kumar Ravi
2013-01-01
Full Text Available Diesel engine designers are constantly on the look-out for performance enhancement through efficient control of operating parameters. In this paper, the concept of an intelligent engine control system is proposed that seeks to ensure optimized performance under varying operating conditions. The concept is based on arriving at the optimum engine operating parameters to ensure the desired output in terms of efficiency. In addition, a Support Vector Machines based prediction model has been developed to predict the engine performance under varying operating conditions. Experiments were carried out at varying loads, compression ratios and amounts of exhaust gas recirculation using a variable compression ratio diesel engine for data acquisition. It was observed that the SVM model was able to predict the engine performance accurately.
Directory of Open Access Journals (Sweden)
Xian-Xia Zhang
2013-01-01
Full Text Available This paper presents a reference function based 3D FLC design methodology using support vector regression (SVR learning. The concept of reference function is introduced to 3D FLC for the generation of 3D membership functions (MF, which enhance the capability of the 3D FLC to cope with more kinds of MFs. The nonlinear mathematical expression of the reference function based 3D FLC is derived, and spatial fuzzy basis functions are defined. Via relating spatial fuzzy basis functions of a 3D FLC to kernel functions of an SVR, an equivalence relationship between a 3D FLC and an SVR is established. Therefore, a 3D FLC can be constructed using the learned results of an SVR. Furthermore, the universal approximation capability of the proposed 3D fuzzy system is proven in terms of the finite covering theorem. Finally, the proposed method is applied to a catalytic packed-bed reactor and simulation results have verified its effectiveness.
Directory of Open Access Journals (Sweden)
Jaehyun Yoo
2015-05-01
Full Text Available Machine learning has been successfully used for target localization in wireless sensor networks (WSNs due to its accurate and robust estimation against highly nonlinear and noisy sensor measurement. For efficient and adaptive learning, this paper introduces online semi-supervised support vector regression (OSS-SVR. The first advantage of the proposed algorithm is that, based on semi-supervised learning framework, it can reduce the requirement on the amount of the labeled training data, maintaining accurate estimation. Second, with an extension to online learning, the proposed OSS-SVR automatically tracks changes of the system to be learned, such as varied noise characteristics. We compare the proposed algorithm with semi-supervised manifold learning, an online Gaussian process and online semi-supervised colocalization. The algorithms are evaluated for estimating the unknown location of a mobile robot in a WSN. The experimental results show that the proposed algorithm is more accurate under the smaller amount of labeled training data and is robust to varying noise. Moreover, the suggested algorithm performs fast computation, maintaining the best localization performance in comparison with the other methods.
Estimation of the laser cutting operating cost by support vector regression methodology
Jović, Srđan; Radović, Aleksandar; Šarkoćević, Živče; Petković, Dalibor; Alizamir, Meysam
2016-09-01
Laser cutting is a popular manufacturing process utilized to cut various types of materials economically. The operating cost is affected by laser power, cutting speed, assist gas pressure, nozzle diameter and focus point position as well as the workpiece material. In this article, the process factors investigated were: laser power, cutting speed, air pressure and focal point position. The aim of this work is to relate the operating cost to the process parameters mentioned above. CO2 laser cutting of stainless steel of medical grade AISI316L has been investigated. The main goal was to analyze the operating cost through the laser power, cutting speed, air pressure, focal point position and material thickness. Since the laser operating cost is a complex, non-linear task, soft computing optimization algorithms can be used. Intelligent soft computing scheme support vector regression (SVR) was implemented. The performance of the proposed estimator was confirmed with the simulation results. The SVR results are then compared with artificial neural network and genetic programing. According to the results, a greater improvement in estimation accuracy can be achieved through the SVR compared to other soft computing methodologies. The new optimization methods benefit from the soft computing capabilities of global optimization and multiobjective optimization rather than choosing a starting point by trial and error and combining multiple criteria into a single criterion.
Spatial Support Vector Regression to Detect Silent Errors in the Exascale Era
Energy Technology Data Exchange (ETDEWEB)
Subasi, Omer; Di, Sheng; Bautista-Gomez, Leonardo; Balaprakash, Prasanna; Unsal, Osman; Labarta, Jesus; Cristal, Adrian; Cappello, Franck
2016-01-01
As the exascale era approaches, the increasing capacity of high-performance computing (HPC) systems with targeted power and energy budget goals introduces significant challenges in reliability. Silent data corruptions (SDCs) or silent errors are one of the major sources that corrupt the executionresults of HPC applications without being detected. In this work, we explore a low-memory-overhead SDC detector, by leveraging epsilon-insensitive support vector machine regression, to detect SDCs that occur in HPC applications that can be characterized by an impact error bound. The key contributions are three fold. (1) Our design takes spatialfeatures (i.e., neighbouring data values for each data point in a snapshot) into training data, such that little memory overhead (less than 1%) is introduced. (2) We provide an in-depth study on the detection ability and performance with different parameters, and we optimize the detection range carefully. (3) Experiments with eight real-world HPC applications show thatour detector can achieve the detection sensitivity (i.e., recall) up to 99% yet suffer a less than 1% of false positive rate for most cases. Our detector incurs low performance overhead, 5% on average, for all benchmarks studied in the paper. Compared with other state-of-the-art techniques, our detector exhibits the best tradeoff considering the detection ability and overheads.
Twin support vector machines models, extensions and applications
Jayadeva; Chandra, Suresh
2017-01-01
This book provides a systematic and focused study of the various aspects of twin support vector machines (TWSVM) and related developments for classification and regression. In addition to presenting most of the basic models of TWSVM and twin support vector regression (TWSVR) available in the literature, it also discusses the important and challenging applications of this new machine learning methodology. A chapter on “Additional Topics” has been included to discuss kernel optimization and support tensor machine topics, which are comparatively new but have great potential in applications. It is primarily written for graduate students and researchers in the area of machine learning and related topics in computer science, mathematics, electrical engineering, management science and finance.
Directory of Open Access Journals (Sweden)
Ping Jiang
2015-01-01
Full Text Available Wind speed/power has received increasing attention around the earth due to its renewable nature as well as environmental friendliness. With the global installed wind power capacity rapidly increasing, wind industry is growing into a large-scale business. Reliable short-term wind speed forecasts play a practical and crucial role in wind energy conversion systems, such as the dynamic control of wind turbines and power system scheduling. In this paper, an intelligent hybrid model for short-term wind speed prediction is examined; the model is based on cross correlation (CC analysis and a support vector regression (SVR model that is coupled with brainstorm optimization (BSO and cuckoo search (CS algorithms, which are successfully utilized for parameter determination. The proposed hybrid models were used to forecast short-term wind speeds collected from four wind turbines located on a wind farm in China. The forecasting results demonstrate that the intelligent hybrid models outperform single models for short-term wind speed forecasting, which mainly results from the superiority of BSO and CS for parameter optimization.
DEFF Research Database (Denmark)
Graversen, C; Frokjaer, J B; Brock, Christina
2012-01-01
patients were discriminated from the HV by a support vector machine (SVM) applied in regression mode. For the optimal DWT, the discriminative features were extracted and the SVM regression value representing the overall alteration of the EP was correlated to the clinical scores. A classification...... approach to study central mechanisms in diabetes mellitus, and may provide a future application for a clinical tool to optimize treatment in individual patients....
Zhou, Yang; Fu, Xiaping; Ying, Yibin; Fang, Zhenhuan
2015-06-23
A fiber-optic probe system was developed to estimate the optical properties of turbid media based on spatially resolved diffuse reflectance. Because of the limitations in numerical calculation of radiative transfer equation (RTE), diffusion approximation (DA) and Monte Carlo simulations (MC), support vector regression (SVR) was introduced to model the relationship between diffuse reflectance values and optical properties. The SVR models of four collection fibers were trained by phantoms in calibration set with a wide range of optical properties which represented products of different applications, then the optical properties of phantoms in prediction set were predicted after an optimal searching on SVR models. The results indicated that the SVR model was capable of describing the relationship with little deviation in forward validation. The correlation coefficient (R) of reduced scattering coefficient μ'(s) and absorption coefficient μ(a) in the prediction set were 0.9907 and 0.9980, respectively. The root mean square errors of prediction (RMSEP) of μ'(s) and μ(a) in inverse validation were 0.411 cm(-1) and 0.338 cm(-1), respectively. The results indicated that the integrated fiber-optic probe system combined with SVR model were suitable for fast and accurate estimation of optical properties of turbid media based on spatially resolved diffuse reflectance. Copyright © 2015 Elsevier B.V. All rights reserved.
Directory of Open Access Journals (Sweden)
Fereshteh Shiri
2010-08-01
Full Text Available In the present work, support vector machines (SVMs and multiple linear regression (MLR techniques were used for quantitative structure–property relationship (QSPR studies of retention time (tR in standardized liquid chromatography–UV–mass spectrometry of 67 mycotoxins (aflatoxins, trichothecenes, roquefortines and ochratoxins based on molecular descriptors calculated from the optimized 3D structures. By applying missing value, zero and multicollinearity tests with a cutoff value of 0.95, and genetic algorithm method of variable selection, the most relevant descriptors were selected to build QSPR models. MLRand SVMs methods were employed to build QSPR models. The robustness of the QSPR models was characterized by the statistical validation and applicability domain (AD. The prediction results from the MLR and SVM models are in good agreement with the experimental values. The correlation and predictability measure by r2 and q2 are 0.931 and 0.932, repectively, for SVM and 0.923 and 0.915, respectively, for MLR. The applicability domain of the model was investigated using William’s plot. The effects of different descriptors on the retention times are described.
Modeling maximum daily temperature using a varying coefficient regression model
Han Li; Xinwei Deng; Dong-Yum Kim; Eric P. Smith
2014-01-01
Relationships between stream water and air temperatures are often modeled using linear or nonlinear regression methods. Despite a strong relationship between water and air temperatures and a variety of models that are effective for data summarized on a weekly basis, such models did not yield consistently good predictions for summaries such as daily maximum temperature...
Carbon Nanotube Growth Rate Regression using Support Vector Machines and Artificial Neural Networks
2014-03-27
chiral vector is made up of the unit vectors a1 a2 and the angle θ determines the tube type of either zig zag , chiral or armchair. Recreated from [4...observed. Reprinted from [45] with permission from the Nature Publishing Group. . . . . . . . . . . . . . . . . . . 22 2.14 Armchair, zig zag and...of the unit vectors a1 a2 and the angle θ determines the tube type of either zig zag , chiral or armchair. Recreated from [4, 39]. the angle between
Vector difference calculus for physical lattice models
Schwalm, W.; Moritz, B.; Giona, M.; Schwalm, M.
1999-01-01
A vector difference calculus is developed for physical models defined on a general triangulating graph G, which may be a regular or an extremely irregular lattice, using discrete field quantities roughly analogous to differential forms. The role of the space Λp of p-forms at a point is taken on by the linear space generated at a graph vertex by the geometrical p-simplices which contain it. The vector operations divergence, gradient, and curl are developed using the boundary ∂ and coboundary d. Dot, cross, and scalar products are defined in such a way that discrete analogs of the vector integral theorems, including theorems of Gauss-Ostrogradski, Stokes, and Green, as well as most standard vector identities hold exactly, not as approximations to a continuum limit. Physical conservation laws for the models become theorems satisfied by the discrete fields themselves. Three discrete lattice models are constructed as examples, namely a discrete version of the Maxwell equations, the Navier-Stokes equation for incompressible flow, and the Navier linearized model for a homogeneous, isotropic elastic medium. Weight factors needed for obtaining quantitative agreement with continuum calculations are derived for the special case of a regular triangular lattice. Green functions are developed using a generalized Helmholtz decomposition of the fields.
"A regression error specification test (RESET) for generalized linear models".
Sunil Sapra
2005-01-01
Generalized linear models (GLMs) are generalizations of linear regression models, which allow fitting regression models to response data that follow a general exponential family. GLMs are used widely in social sciences for fitting regression models to count data, qualitative response data and duration data. While a variety of specification tests have been developed for the linear regression model and are routinely applied for testing for misspecification of functional form, omitted variables,...
Model performance analysis and model validation in logistic regression
Directory of Open Access Journals (Sweden)
Rosa Arboretti Giancristofaro
2007-10-01
Full Text Available In this paper a new model validation procedure for a logistic regression model is presented. At first, we illustrate a brief review of different techniques of model validation. Next, we define a number of properties required for a model to be considered "good", and a number of quantitative performance measures. Lastly, we describe a methodology for the assessment of the performance of a given model by using an example taken from a management study.
Castelletti, Davide; Demir, Begüm; Bruzzone, Lorenzo
2014-10-01
This paper presents a novel semisupervised learning (SSL) technique defined in the context of ɛ-insensitive support vector regression (SVR) to estimate biophysical parameters from remotely sensed images. The proposed SSL method aims to mitigate the problems of small-sized biased training sets without collecting any additional samples with reference measures. This is achieved on the basis of two consecutive steps. The first step is devoted to inject additional priors information in the learning phase of the SVR in order to adapt the importance of each training sample according to distribution of the unlabeled samples. To this end, a weight is initially associated to each training sample based on a novel strategy that defines higher weights for the samples located in the high density regions of the feature space while giving reduced weights to those that fall into the low density regions of the feature space. Then, in order to exploit different weights for training samples in the learning phase of the SVR, we introduce a weighted SVR (WSVR) algorithm. The second step is devoted to jointly exploit labeled and informative unlabeled samples for further improving the definition of the WSVR learning function. To this end, the most informative unlabeled samples that have an expected accurate target values are initially selected according to a novel strategy that relies on the distribution of the unlabeled samples in the feature space and on the WSVR function estimated at the first step. Then, we introduce a restructured WSVR algorithm that jointly uses labeled and unlabeled samples in the learning phase of the WSVR algorithm and tunes their importance by different values of regularization parameters. Experimental results obtained for the estimation of single-tree stem volume show the effectiveness of the proposed SSL method.
Chen, Jing; Qiu, Xiaojie; Yin, Cunyi; Jiang, Hao
2018-02-01
An efficient method to design the broadband gain-flattened Raman fiber amplifier with multiple pumps is proposed based on least squares support vector regression (LS-SVR). A multi-input multi-output LS-SVR model is introduced to replace the complicated solving process of the nonlinear coupled Raman amplification equation. The proposed approach contains two stages: offline training stage and online optimization stage. During the offline stage, the LS-SVR model is trained. Owing to the good generalization capability of LS-SVR, the net gain spectrum can be directly and accurately obtained when inputting any combination of the pump wavelength and power to the well-trained model. During the online stage, we incorporate the LS-SVR model into the particle swarm optimization algorithm to find the optimal pump configuration. The design results demonstrate that the proposed method greatly shortens the computation time and enhances the efficiency of the pump parameter optimization for Raman fiber amplifier design.
Directory of Open Access Journals (Sweden)
Mehmet Das
2018-01-01
Full Text Available In this study, an air heated solar collector (AHSC dryer was designed to determine the drying characteristics of the pear. Flat pear slices of 10 mm thickness were used in the experiments. The pears were dried both in the AHSC dryer and under the sun. Panel glass temperature, panel floor temperature, panel inlet temperature, panel outlet temperature, drying cabinet inlet temperature, drying cabinet outlet temperature, drying cabinet temperature, drying cabinet moisture, solar radiation, pear internal temperature, air velocity and mass loss of pear were measured at 30 min intervals. Experiments were carried out during the periods of June 2017 in Elazig, Turkey. The experiments started at 8:00 a.m. and continued till 18:00. The experiments were continued until the weight changes in the pear slices stopped. Wet basis moisture content (MCw, dry basis moisture content (MCd, adjustable moisture ratio (MR, drying rate (DR, and convective heat transfer coefficient (hc were calculated with both in the AHSC dryer and the open sun drying experiment data. It was found that the values of hc in both drying systems with a range 12.4 and 20.8 W/m2 °C. Three different kernel models were used in the support vector machine (SVM regression to construct the predictive model of the calculated hc values for both systems. The mean absolute error (MAE, root mean squared error (RMSE, relative absolute error (RAE and root relative absolute error (RRAE analysis were performed to indicate the predictive model’s accuracy. As a result, the rate of drying of the pear was examined for both systems and it was observed that the pear had dried earlier in the AHSC drying system. A predictive model was obtained using the SVM regression for the calculated hc values for the pear in the AHSC drying system. The normalized polynomial kernel was determined as the best kernel model in SVM for estimating the hc values.
Directory of Open Access Journals (Sweden)
N. Zahir
2015-12-01
Full Text Available Lake Urmia is one of the most important ecosystems of the country which is on the verge of elimination. Many factors contribute to this crisis among them is the precipitation, paly important roll. Precipitation has many forms one of them is in the form of snow. The snow on Sahand Mountain is one of the main and important sources of the Lake Urmia’s water. Snow Depth (SD is vital parameters for estimating water balance for future year. In this regards, this study is focused on SD parameter using Special Sensor Microwave/Imager (SSM/I instruments on board the Defence Meteorological Satellite Program (DMSP F16. The usual statistical methods for retrieving SD include linear and non-linear ones. These methods used least square procedure to estimate SD model. Recently, kernel base methods widely used for modelling statistical problem. From these methods, the support vector regression (SVR is achieved the high performance for modelling the statistical problem. Examination of the obtained data shows the existence of outlier in them. For omitting these outliers, wavelet denoising method is applied. After the omission of the outliers it is needed to select the optimum bands and parameters for SVR. To overcome these issues, feature selection methods have shown a direct effect on improving the regression performance. We used genetic algorithm (GA for selecting suitable features of the SSMI bands in order to estimate SD model. The results for the training and testing data in Sahand mountain is [R²_TEST=0.9049 and RMSE= 6.9654] that show the high SVR performance.
Data analysis using regression and multilevel/hierarchical models
National Research Council Canada - National Science Library
Gelman, Andrew; Hill, Jennifer
2007-01-01
"Data Analysis Using Regression and Multilevel/Hierarchical Models is a comprehensive manual for the applied researcher who wants to perform data analysis using linear and nonlinear regression and multilevel models...
Vector quarks in the Higgs triplet model
Bahrami, Sahar; Frank, Mariana
2014-08-01
We analyze the effects of introducing vector fermions in the Higgs triplet model. In this scenario, the model contains, in addition to the Standard Model particle content, one triplet Higgs representation and a variety of vectorlike fermion states, including singlet, doublet, and triplet states. We investigate the electroweak precision variables and impose restrictions on model parameters. We show that, for some representations, introducing vector quarks significantly alters the constraints on the mass of the doubly charged Higgs boson, bringing it in closer agreement with present experimental constraints. We also study the effects of introducing the vectorlike fermions on neutral Higgs phenomenology, in particular on the loop-dominated decays H→γγ and H→Zγ, and the restrictions they impose on the parameter space.
Energy Technology Data Exchange (ETDEWEB)
Jiang, Huaiguang [National Renewable Energy Laboratory (NREL), Golden, CO (United States)
2017-08-25
This work proposes an approach for distribution system load forecasting, which aims to provide highly accurate short-term load forecasting with high resolution utilizing a support vector regression (SVR) based forecaster and a two-step hybrid parameters optimization method. Specifically, because the load profiles in distribution systems contain abrupt deviations, a data normalization is designed as the pretreatment for the collected historical load data. Then an SVR model is trained by the load data to forecast the future load. For better performance of SVR, a two-step hybrid optimization algorithm is proposed to determine the best parameters. In the first step of the hybrid optimization algorithm, a designed grid traverse algorithm (GTA) is used to narrow the parameters searching area from a global to local space. In the second step, based on the result of the GTA, particle swarm optimization (PSO) is used to determine the best parameters in the local parameter space. After the best parameters are determined, the SVR model is used to forecast the short-term load deviation in the distribution system.
RRegrs: an R package for computer-aided model selection with multiple regression models.
Tsiliki, Georgia; Munteanu, Cristian R; Seoane, Jose A; Fernandez-Lozano, Carlos; Sarimveis, Haralambos; Willighagen, Egon L
2015-01-01
Predictive regression models can be created with many different modelling approaches. Choices need to be made for data set splitting, cross-validation methods, specific regression parameters and best model criteria, as they all affect the accuracy and efficiency of the produced predictive models, and therefore, raising model reproducibility and comparison issues. Cheminformatics and bioinformatics are extensively using predictive modelling and exhibit a need for standardization of these methodologies in order to assist model selection and speed up the process of predictive model development. A tool accessible to all users, irrespectively of their statistical knowledge, would be valuable if it tests several simple and complex regression models and validation schemes, produce unified reports, and offer the option to be integrated into more extensive studies. Additionally, such methodology should be implemented as a free programming package, in order to be continuously adapted and redistributed by others. We propose an integrated framework for creating multiple regression models, called RRegrs. The tool offers the option of ten simple and complex regression methods combined with repeated 10-fold and leave-one-out cross-validation. Methods include Multiple Linear regression, Generalized Linear Model with Stepwise Feature Selection, Partial Least Squares regression, Lasso regression, and Support Vector Machines Recursive Feature Elimination. The new framework is an automated fully validated procedure which produces standardized reports to quickly oversee the impact of choices in modelling algorithms and assess the model and cross-validation results. The methodology was implemented as an open source R package, available at https://www.github.com/enanomapper/RRegrs, by reusing and extending on the caret package. The universality of the new methodology is demonstrated using five standard data sets from different scientific fields. Its efficiency in cheminformatics and QSAR
Model reduction methods for vector autoregressive processes
Brüggemann, Ralf
2004-01-01
1. 1 Objective of the Study Vector autoregressive (VAR) models have become one of the dominant research tools in the analysis of macroeconomic time series during the last two decades. The great success of this modeling class started with Sims' (1980) critique of the traditional simultaneous equation models (SEM). Sims criticized the use of 'too many incredible restrictions' based on 'supposed a priori knowledge' in large scale macroeconometric models which were popular at that time. Therefore, he advo cated largely unrestricted reduced form multivariate time series models, unrestricted VAR models in particular. Ever since his influential paper these models have been employed extensively to characterize the underlying dynamics in systems of time series. In particular, tools to summarize the dynamic interaction between the system variables, such as impulse response analysis or forecast error variance decompo sitions, have been developed over the years. The econometrics of VAR models and related quantities i...
Comparison of ν-support vector regression and logistic equation for ...
African Journals Online (AJOL)
Jane
2011-07-04
Jul 4, 2011 ... Prediction of key state variables using support vector machines in bioprocess. Chem. Eng. Technol. 29: 313-319. Lin, W.Z., Xiao, X., and Chou, K.C., 2009. GPCR-GIA: a web-server for identifying G-protein coupled receptors and their families with grey incidence analysis. Protein Eng Des Sel 22, 699-705.
Water demand prediction using artificial neural networks and support vector regression
CSIR Research Space (South Africa)
Msiza, IS
2008-11-01
Full Text Available comparison are Artificial Neural Networks (ANNs) and Support Vector Machines (SVMs). In this study it was observed that ANNs perform significantly better than SVMs. This performance is measured against the generalization ability of the two techniques in water...
STREAMFLOW AND WATER QUALITY REGRESSION MODELING ...
African Journals Online (AJOL)
The upper reaches of Imo-river system between Nekede and Obigbo hydrological stations (a stretch of 24km) have been studied for the purpose of water quality and streamflow modeling. Model's applications on water supply to Nekede and Obigbo communities were equally explored with the development of mass curves.
Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model
Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami
2017-06-01
A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.
The cointegrated vector autoregressive model with general deterministic terms
DEFF Research Database (Denmark)
Johansen, Søren; Nielsen, Morten Ørregaard
In the cointegrated vector autoregression (CVAR) literature, deterministic terms have until now been analyzed on a case-by-case, or as-needed basis. We give a comprehensive unified treatment of deterministic terms in the additive model X(t)= Z(t) + Y(t), where Z(t) belongs to a large class...... of deterministic regressors and Y(t) is a zero-mean CVAR. We suggest an extended model that can be estimated by reduced rank regression and give a condition for when the additive and extended models are asymptotically equivalent, as well as an algorithm for deriving the additive model parameters from the extended...... model parameters. We derive asymptotic properties of the maximum likelihood estimators and discuss tests for rank and tests on the deterministic terms. In particular, we give conditions under which the estimators are asymptotically (mixed) Gaussian, such that associated tests are khi squared distributed....
Moderation analysis using a two-level regression model.
Yuan, Ke-Hai; Cheng, Ying; Maxwell, Scott
2014-10-01
Moderation analysis is widely used in social and behavioral research. The most commonly used model for moderation analysis is moderated multiple regression (MMR) in which the explanatory variables of the regression model include product terms, and the model is typically estimated by least squares (LS). This paper argues for a two-level regression model in which the regression coefficients of a criterion variable on predictors are further regressed on moderator variables. An algorithm for estimating the parameters of the two-level model by normal-distribution-based maximum likelihood (NML) is developed. Formulas for the standard errors (SEs) of the parameter estimates are provided and studied. Results indicate that, when heteroscedasticity exists, NML with the two-level model gives more efficient and more accurate parameter estimates than the LS analysis of the MMR model. When error variances are homoscedastic, NML with the two-level model leads to essentially the same results as LS with the MMR model. Most importantly, the two-level regression model permits estimating the percentage of variance of each regression coefficient that is due to moderator variables. When applied to data from General Social Surveys 1991, NML with the two-level model identified a significant moderation effect of race on the regression of job prestige on years of education while LS with the MMR model did not. An R package is also developed and documented to facilitate the application of the two-level model.
Use of a mixture statistical model in studying malaria vectors density.
Directory of Open Access Journals (Sweden)
Olayidé Boussari
Full Text Available Vector control is a major step in the process of malaria control and elimination. This requires vector counts and appropriate statistical analyses of these counts. However, vector counts are often overdispersed. A non-parametric mixture of Poisson model (NPMP is proposed to allow for overdispersion and better describe vector distribution. Mosquito collections using the Human Landing Catches as well as collection of environmental and climatic data were carried out from January to December 2009 in 28 villages in Southern Benin. A NPMP regression model with "village" as random effect is used to test statistical correlations between malaria vectors density and environmental and climatic factors. Furthermore, the villages were ranked using the latent classes derived from the NPMP model. Based on this classification of the villages, the impacts of four vector control strategies implemented in the villages were compared. Vector counts were highly variable and overdispersed with important proportion of zeros (75%. The NPMP model had a good aptitude to predict the observed values and showed that: i proximity to freshwater body, market gardening, and high levels of rain were associated with high vector density; ii water conveyance, cattle breeding, vegetation index were associated with low vector density. The 28 villages could then be ranked according to the mean vector number as estimated by the random part of the model after adjustment on all covariates. The NPMP model made it possible to describe the distribution of the vector across the study area. The villages were ranked according to the mean vector density after taking into account the most important covariates. This study demonstrates the necessity and possibility of adapting methods of vector counting and sampling to each setting.
Solgi, Abazar; Pourhaghi, Amir; Bahmani, Ramin; Zarei, Heidar
2017-07-01
An accurate estimation of flow using different models is an issue for water resource researchers. In this study, support vector regression (SVR) and gene expression programming (GEP) models in daily and monthly scale were used in order to simulate Gamasiyab River flow in Nahavand, Iran. The results showed that although the performance of models in daily scale was acceptable and the result of SVR model was a little better, their performance in the daily scale was really better than the monthly scale. Therefore, wavelet transform was used and the main signal of every input was decomposed. Then, by using principal component analysis method, important sub-signals were recognized and used as inputs for the SVR and GEP models to produce wavelet-support vector regression (WSVR) and wavelet-gene expression programming. The results showed that the performance of WSVR was better than the SVR in such a way that the combination of SVR with wavelet could improve the determination coefficient of the model up to 3% and 18% for daily and monthly scales, respectively. Totally, it can be said that the combination of wavelet with SVR is a suitable tool for the prediction of Gamasiyab River flow in both daily and monthly scales.
Bayesian extreme quantile regression for hidden Markov models
Koutsourelis, Antonios
2012-01-01
This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel University The main contribution of this thesis is the introduction of Bayesian quantile regression for hidden Markov models, especially when we have to deal with extreme quantile regression analysis, as there is a limited research to inference conditional quantiles for hidden Markov models, under a Bayesian approach. The first objective is to compare Bayesian extreme quantile regression and th...
Melo-Cardenas, J; Urquiza, M; Kipps, T J; Castro, J E
2012-05-01
Ad-ISF35 is an adenovirus (Ad) vector that encodes a mouse-human chimeric CD154. Ad-ISF35 induces activation of chronic lymphocytic leukemia (CLL) cells converting them into CLL cells capable of promoting immune recognition and anti-leukemia T-cell activation. Clinical trials in humans treated with Ad-ISF35-transduced leukemia cells or intranodal injection of Ad-ISF35 have shown objective clinical responses. To better understand the biology of Ad-ISF35 and to contribute to its clinical development, we preformed studies to evaluate biodistribution, persistence and toxicity of repeat dose intratumoral administration of Ad-ISF35 in a mouse model. Ad-ISF35 intratumoral administration induced tumor regression in more than 80% of mice bearing A20 tumors. There were no abnormalities in the serum chemistry. Mice receiving Ad-ISF35 presented severe extramedullary hematopoiesis and follicular hyperplasia in the spleen and extramedullary hematopoiesis with lymphoid hyperplasia in lymph nodes. After Ad-ISF35 injection, the vector was found primarily in the injected tumors with a biodistribution pattern that showed a rapid clearance with no evidence of Ad-ISF35 accumulation or persistence in the injected tumor or peripheral organs. Furthermore, pre-existing antibodies against Ad-5 did not abrogate Ad-ISF35 anti-tumor activity. In conclusion, intratumoral administration of Ad-ISF35 induced tumor regression in A20 tumor bearing mice without toxicities and with no evidence of vector accumulation or persistence.
Ogutu, Joseph O; Schulz-Streeck, Torben; Piepho, Hans-Peter
2012-05-21
Genomic selection (GS) is emerging as an efficient and cost-effective method for estimating breeding values using molecular markers distributed over the entire genome. In essence, it involves estimating the simultaneous effects of all genes or chromosomal segments and combining the estimates to predict the total genomic breeding value (GEBV). Accurate prediction of GEBVs is a central and recurring challenge in plant and animal breeding. The existence of a bewildering array of approaches for predicting breeding values using markers underscores the importance of identifying approaches able to efficiently and accurately predict breeding values. Here, we comparatively evaluate the predictive performance of six regularized linear regression methods-- ridge regression, ridge regression BLUP, lasso, adaptive lasso, elastic net and adaptive elastic net-- for predicting GEBV using dense SNP markers. We predicted GEBVs for a quantitative trait using a dataset on 3000 progenies of 20 sires and 200 dams and an accompanying genome consisting of five chromosomes with 9990 biallelic SNP-marker loci simulated for the QTL-MAS 2011 workshop. We applied all the six methods that use penalty-based (regularization) shrinkage to handle datasets with far more predictors than observations. The lasso, elastic net and their adaptive extensions further possess the desirable property that they simultaneously select relevant predictive markers and optimally estimate their effects. The regression models were trained with a subset of 2000 phenotyped and genotyped individuals and used to predict GEBVs for the remaining 1000 progenies without phenotypes. Predictive accuracy was assessed using the root mean squared error, the Pearson correlation between predicted GEBVs and (1) the true genomic value (TGV), (2) the true breeding value (TBV) and (3) the simulated phenotypic values based on fivefold cross-validation (CV). The elastic net, lasso, adaptive lasso and the adaptive elastic net all had
ATLS Hypovolemic Shock Classification by Prediction of Blood Loss in Rats Using Regression Models.
Choi, Soo Beom; Choi, Joon Yul; Park, Jee Soo; Kim, Deok Won
2016-07-01
In our previous study, our input data set consisted of 78 rats, the blood loss in percent as a dependent variable, and 11 independent variables (heart rate, systolic blood pressure, diastolic blood pressure, mean arterial pressure, pulse pressure, respiration rate, temperature, perfusion index, lactate concentration, shock index, and new index (lactate concentration/perfusion)). The machine learning methods for multicategory classification were applied to a rat model in acute hemorrhage to predict the four Advanced Trauma Life Support (ATLS) hypovolemic shock classes for triage in our previous study. However, multicategory classification is much more difficult and complicated than binary classification. We introduce a simple approach for classifying ATLS hypovolaemic shock class by predicting blood loss in percent using support vector regression and multivariate linear regression (MLR). We also compared the performance of the classification models using absolute and relative vital signs. The accuracies of support vector regression and MLR models with relative values by predicting blood loss in percent were 88.5% and 84.6%, respectively. These were better than the best accuracy of 80.8% of the direct multicategory classification using the support vector machine one-versus-one model in our previous study for the same validation data set. Moreover, the simple MLR models with both absolute and relative values could provide possibility of the future clinical decision support system for ATLS classification. The perfusion index and new index were more appropriate with relative changes than absolute values.
Impact of multicollinearity on small sample hydrologic regression models
Kroll, Charles N.; Song, Peter
2013-06-01
Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.
Methods of Detecting Outliers in A Regression Analysis Model ...
African Journals Online (AJOL)
PROF. O. E. OSUAGWU
2013-06-01
Jun 1, 2013 ... Abstract. This study detects outliers in a univariate and bivariate data by using both Rosner's and. Grubb's test in a regression analysis model. The study shows how an observation that causes the least square point estimate of a Regression model to be substantially different from what it would be if the ...
Methods of Detecting Outliers in A Regression Analysis Model. | Ogu ...
African Journals Online (AJOL)
This study detects outliers in a univariate and bivariate data by using both Rosner's and Grubb's test in a regression analysis model. The study shows how an observation that causes the least square point estimate of a Regression model to be substantially different from what it would be if the observation were removed from ...
A test for the parameters of multiple linear regression models ...
African Journals Online (AJOL)
A test for the parameters of multiple linear regression models is developed for conducting tests simultaneously on all the parameters of multiple linear regression models. The test is robust relative to the assumptions of homogeneity of variances and absence of serial correlation of the classical F-test. Under certain null and ...
Mixed Frequency Data Sampling Regression Models: The R Package midasr
Directory of Open Access Journals (Sweden)
Eric Ghysels
2016-08-01
Full Text Available When modeling economic relationships it is increasingly common to encounter data sampled at different frequencies. We introduce the R package midasr which enables estimating regression models with variables sampled at different frequencies within a MIDAS regression framework put forward in work by Ghysels, Santa-Clara, and Valkanov (2002. In this article we define a general autoregressive MIDAS regression model with multiple variables of different frequencies and show how it can be specified using the familiar R formula interface and estimated using various optimization methods chosen by the researcher. We discuss how to check the validity of the estimated model both in terms of numerical convergence and statistical adequacy of a chosen regression specification, how to perform model selection based on a information criterion, how to assess forecasting accuracy of the MIDAS regression model and how to obtain a forecast aggregation of different MIDAS regression models. We illustrate the capabilities of the package with a simulated MIDAS regression model and give two empirical examples of application of MIDAS regression.
Balabin, Roman M; Lomakina, Ekaterina I
2011-04-21
In this study, we make a general comparison of the accuracy and robustness of five multivariate calibration models: partial least squares (PLS) regression or projection to latent structures, polynomial partial least squares (Poly-PLS) regression, artificial neural networks (ANNs), and two novel techniques based on support vector machines (SVMs) for multivariate data analysis: support vector regression (SVR) and least-squares support vector machines (LS-SVMs). The comparison is based on fourteen (14) different datasets: seven sets of gasoline data (density, benzene content, and fractional composition/boiling points), two sets of ethanol gasoline fuel data (density and ethanol content), one set of diesel fuel data (total sulfur content), three sets of petroleum (crude oil) macromolecules data (weight percentages of asphaltenes, resins, and paraffins), and one set of petroleum resins data (resins content). Vibrational (near-infrared, NIR) spectroscopic data are used to predict the properties and quality coefficients of gasoline, biofuel/biodiesel, diesel fuel, and other samples of interest. The four systems presented here range greatly in composition, properties, strength of intermolecular interactions (e.g., van der Waals forces, H-bonds), colloid structure, and phase behavior. Due to the high diversity of chemical systems studied, general conclusions about SVM regression methods can be made. We try to answer the following question: to what extent can SVM-based techniques replace ANN-based approaches in real-world (industrial/scientific) applications? The results show that both SVR and LS-SVM methods are comparable to ANNs in accuracy. Due to the much higher robustness of the former, the SVM-based approaches are recommended for practical (industrial) application. This has been shown to be especially true for complicated, highly nonlinear objects.
Application of support vector regression (SVR) for stream flow prediction on the Amazon basin
CSIR Research Space (South Africa)
Du Toit, Melise
2016-10-01
Full Text Available regression technique is used in this study to analyse historical stream flow occurrences and predict stream flow values for the Amazon basin. Up to twelve month predictions are made and the coefficient of determination and root-mean-square error are used...
Precessing Asteroids ftrom Radius Vector Models?
Drummond, Jack D.
2014-11-01
Examining a sample of asteroids (the first 99) for which radius vector models have been constructed from mostly lightcurves, located on a web site where such models are listed (http://astro.troja.mff.cuni.cz/projects/damit ; see Durech et al. (2010), DAMIT: a database of asteroid models, A&A, 513, A46), we fit their surfaces as triaxial ellipsoids and provide their three dimensions. In the process we also derive an Euler angular offset θ between each model's spin axis and its axis of maximum moment of inertia assuming a uniform distribution of mass. Most θ's conform to a chi-squared distribution having a maximum at 3° and a mean at 5°, and with the square root of the variance being 3°. However, seven models produce θ>20°, which we interpret as indicating possibly strong precessors, tumblers, or due to incorrect models: asteroids (68), (89), (125), (162), (167), (222), and (230). Nine others produce an excess over the distribution at 12°probability of an impact sufficient to change the angular momentum of the asteroid implied by θ during the damping time to return to rotation about the small axis is vanishingly small (less than 1 in 10000) for the 8 out of 16 asteroids with absolute dimensions. The most likely resolution, then, is that the rotational pole for the 16 asteroid models with high θ needs to be adjusted by θ degrees.
A generalized multivariate regression model for modelling ocean wave heights
Wang, X. L.; Feng, Y.; Swail, V. R.
2012-04-01
In this study, a generalized multivariate linear regression model is developed to represent the relationship between 6-hourly ocean significant wave heights (Hs) and the corresponding 6-hourly mean sea level pressure (MSLP) fields. The model is calibrated using the ERA-Interim reanalysis of Hs and MSLP fields for 1981-2000, and is validated using the ERA-Interim reanalysis for 2001-2010 and ERA40 reanalysis of Hs and MSLP for 1958-2001. The performance of the fitted model is evaluated in terms of Pierce skill score, frequency bias index, and correlation skill score. Being not normally distributed, wave heights are subjected to a data adaptive Box-Cox transformation before being used in the model fitting. Also, since 6-hourly data are being modelled, lag-1 autocorrelation must be and is accounted for. The models with and without Box-Cox transformation, and with and without accounting for autocorrelation, are inter-compared in terms of their prediction skills. The fitted MSLP-Hs relationship is then used to reconstruct historical wave height climate from the 6-hourly MSLP fields taken from the Twentieth Century Reanalysis (20CR, Compo et al. 2011), and to project possible future wave height climates using CMIP5 model simulations of MSLP fields. The reconstructed and projected wave heights, both seasonal means and maxima, are subject to a trend analysis that allows for non-linear (polynomial) trends.
Directory of Open Access Journals (Sweden)
ANDRÉS M. ÁLVAREZ MEZA
2012-01-01
Full Text Available RESUMEN: En este trabajo, se propone una metodología para la selección automática de los parámetros libres de la técnica de regresión basada en mínimos cuadrados máquinas de vectores de soporte (LS-SVM, a partir de un análisis de validación cruzada generalizada multidimensional sobre el conjunto de ecuaciones lineales de LS-SVM. La técnica desarrollada no requiere de un conocimiento a priori por parte del usuario acerca de la influencia de los parámetros libres en los resultados. Se realizan experimentos sobre dos bases de datos artificiales y dos bases de datos reales. De acuerdo a los resultados obtenidos, se concluye que el algoritmo desarrollado calcula regresiones apropiadas con errores relativos competentes.
Yan, Jun; Huang, Jian-Hua; He, Min; Lu, Hong-Bing; Yang, Rui; Kong, Bo; Xu, Qing-Song; Liang, Yi-Zeng
2013-08-01
Retention indices for frequently reported compounds of plant essential oils on three different stationary phases were investigated. Multivariate linear regression, partial least squares, and support vector machine combined with a new variable selection approach called random-frog recently proposed by our group, were employed to model quantitative structure-retention relationships. Internal and external validations were performed to ensure the stability and predictive ability. All the three methods could obtain an acceptable model, and the optimal results by support vector machine based on a small number of informative descriptors with the square of correlation coefficient for cross validation, values of 0.9726, 0.9759, and 0.9331 on the dimethylsilicone stationary phase, the dimethylsilicone phase with 5% phenyl groups, and the PEG stationary phase, respectively. The performances of two variable selection approaches, random-frog and genetic algorithm, are compared. The importance of the variables was found to be consistent when estimated from correlation coefficients in multivariate linear regression equations and selection probability in model spaces. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
DEFF Research Database (Denmark)
Boeriis, Morten; van Leeuwen, Theo
2017-01-01
This article revisits the concept of vectors, which, in Kress and van Leeuwen’s Reading Images (2006), plays a crucial role in distinguishing between ‘narrative’, action-oriented processes and ‘conceptual’, state-oriented processes. The use of this concept in image analysis has usually focused...... on the most salient vectors, and this works well, but many images contain a plethora of vectors, which makes their structure quite different from the linguistic transitivity structures with which Kress and van Leeuwen have compared ‘narrative’ images. It can also be asked whether facial expression vectors...... should be taken into account in discussing ‘reactions’, which Kress and van Leeuwen link only to eyeline vectors. Finally, the question can be raised as to whether actions are always realized by vectors. Drawing on a re-reading of Rudolf Arnheim’s account of vectors, these issues are outlined...
Directory of Open Access Journals (Sweden)
Yingguo Cheng
2011-01-01
Full Text Available This paper describes the design and implementation of a wireless electronic nose (WEN system which can online detect the combustible gases methane and hydrogen (CH4/H2 and estimate their concentrations, either singly or in mixtures. The system is composed of two wireless sensor nodes—a slave node and a master node. The former comprises a Fe2O3 gas sensing array for the combustible gas detection, a digital signal processor (DSP system for real-time sampling and processing the sensor array data and a wireless transceiver unit (WTU by which the detection results can be transmitted to the master node connected with a computer. A type of Fe2O3 gas sensor insensitive to humidity is developed for resistance to environmental influences. A threshold-based least square support vector regression (LS-SVR estimator is implemented on a DSP for classification and concentration measurements. Experimental results confirm that LS-SVR produces higher accuracy compared with artificial neural networks (ANNs and a faster convergence rate than the standard support vector regression (SVR. The designed WEN system effectively achieves gas mixture analysis in a real-time process.
Song, Kai; Wang, Qi; Liu, Qi; Zhang, Hongquan; Cheng, Yingguo
2011-01-01
This paper describes the design and implementation of a wireless electronic nose (WEN) system which can online detect the combustible gases methane and hydrogen (CH(4)/H(2)) and estimate their concentrations, either singly or in mixtures. The system is composed of two wireless sensor nodes--a slave node and a master node. The former comprises a Fe(2)O(3) gas sensing array for the combustible gas detection, a digital signal processor (DSP) system for real-time sampling and processing the sensor array data and a wireless transceiver unit (WTU) by which the detection results can be transmitted to the master node connected with a computer. A type of Fe(2)O(3) gas sensor insensitive to humidity is developed for resistance to environmental influences. A threshold-based least square support vector regression (LS-SVR)estimator is implemented on a DSP for classification and concentration measurements. Experimental results confirm that LS-SVR produces higher accuracy compared with artificial neural networks (ANNs) and a faster convergence rate than the standard support vector regression (SVR). The designed WEN system effectively achieves gas mixture analysis in a real-time process.
Correlation-regression model for physico-chemical quality of ...
African Journals Online (AJOL)
abusaad
Multiple regression models can predict EC at 5% level of significance. Nitrate, chlorides, TDS and ... Key words: Groundwater, water quality, bore well, water supply, correlation, regression. INTRODUCTION. Groundwater is the prime .... reservoir located 10 to 25 km away from the city and through more than 1850 bore wells ...
Weighted Quantile Regression for AR model with Infinite Variance Errors.
Chen, Zhao; Li, Runze; Wu, Yaohua
2012-09-01
Autoregressive (AR) models with finite variance errors have been well studied. This paper is concerned with AR models with heavy-tailed errors, which is useful in various scientific research areas. Statistical estimation for AR models with infinite variance errors is very different from those for AR models with finite variance errors. In this paper, we consider a weighted quantile regression for AR models to deal with infinite variance errors. We further propose an induced smoothing method to deal with computational challenges in weighted quantile regression. We show that the difference between weighted quantile regression estimate and its smoothed version is negligible. We further propose a test for linear hypothesis on the regression coefficients. We conduct Monte Carlo simulation study to assess the finite sample performance of the proposed procedures. We illustrate the proposed methodology by an empirical analysis of a real-life data set.
Detection of epistatic effects with logic regression and a classical linear regression model.
Malina, Magdalena; Ickstadt, Katja; Schwender, Holger; Posch, Martin; Bogdan, Małgorzata
2014-02-01
To locate multiple interacting quantitative trait loci (QTL) influencing a trait of interest within experimental populations, usually methods as the Cockerham's model are applied. Within this framework, interactions are understood as the part of the joined effect of several genes which cannot be explained as the sum of their additive effects. However, if a change in the phenotype (as disease) is caused by Boolean combinations of genotypes of several QTLs, this Cockerham's approach is often not capable to identify them properly. To detect such interactions more efficiently, we propose a logic regression framework. Even though with the logic regression approach a larger number of models has to be considered (requiring more stringent multiple testing correction) the efficient representation of higher order logic interactions in logic regression models leads to a significant increase of power to detect such interactions as compared to a Cockerham's approach. The increase in power is demonstrated analytically for a simple two-way interaction model and illustrated in more complex settings with simulation study and real data analysis.
Using the Regression Model in multivariate data analysis
Directory of Open Access Journals (Sweden)
Constantin Cristinel
2017-07-01
Full Text Available This paper is about an instrumental research regarding the using of Linear Regression Model for data analysis. The research uses a model based on real data and stress the necessity of a correct utilisation of such models in order to obtain accurate information for the decision makers. The main scope is to help practitioners and researchers in their efforts to build prediction models based on linear regressions. The conclusion reveals the necessity to use quantitative data for a correct model specification and to validate the model according to the assumptions of the least squares method.
Tutorial on Using Regression Models with Count Outcomes Using R
Directory of Open Access Journals (Sweden)
A. Alexander Beaujean
2016-02-01
Full Text Available Education researchers often study count variables, such as times a student reached a goal, discipline referrals, and absences. Most researchers that study these variables use typical regression methods (i.e., ordinary least-squares either with or without transforming the count variables. In either case, using typical regression for count data can produce parameter estimates that are biased, thus diminishing any inferences made from such data. As count-variable regression models are seldom taught in training programs, we present a tutorial to help educational researchers use such methods in their own research. We demonstrate analyzing and interpreting count data using Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial regression models. The count regression methods are introduced through an example using the number of times students skipped class. The data for this example are freely available and the R syntax used run the example analyses are included in the Appendix.
2012-01-01
Background Genomic selection (GS) is emerging as an efficient and cost-effective method for estimating breeding values using molecular markers distributed over the entire genome. In essence, it involves estimating the simultaneous effects of all genes or chromosomal segments and combining the estimates to predict the total genomic breeding value (GEBV). Accurate prediction of GEBVs is a central and recurring challenge in plant and animal breeding. The existence of a bewildering array of approaches for predicting breeding values using markers underscores the importance of identifying approaches able to efficiently and accurately predict breeding values. Here, we comparatively evaluate the predictive performance of six regularized linear regression methods-- ridge regression, ridge regression BLUP, lasso, adaptive lasso, elastic net and adaptive elastic net-- for predicting GEBV using dense SNP markers. Methods We predicted GEBVs for a quantitative trait using a dataset on 3000 progenies of 20 sires and 200 dams and an accompanying genome consisting of five chromosomes with 9990 biallelic SNP-marker loci simulated for the QTL-MAS 2011 workshop. We applied all the six methods that use penalty-based (regularization) shrinkage to handle datasets with far more predictors than observations. The lasso, elastic net and their adaptive extensions further possess the desirable property that they simultaneously select relevant predictive markers and optimally estimate their effects. The regression models were trained with a subset of 2000 phenotyped and genotyped individuals and used to predict GEBVs for the remaining 1000 progenies without phenotypes. Predictive accuracy was assessed using the root mean squared error, the Pearson correlation between predicted GEBVs and (1) the true genomic value (TGV), (2) the true breeding value (TBV) and (3) the simulated phenotypic values based on fivefold cross-validation (CV). Results The elastic net, lasso, adaptive lasso and the
Vector bilinear autoregressive time series model and its superiority ...
African Journals Online (AJOL)
In this research, a vector bilinear autoregressive time series model was proposed and used to model three revenue series (X1, X2, X3) . The “orders” of the three series were identified on the basis of the distribution of autocorrelation and partial autocorrelation functions and were used to construct the vector bilinear models.
BayesX: Analyzing Bayesian Structural Additive Regression Models
Directory of Open Access Journals (Sweden)
Andreas Brezger
2005-09-01
Full Text Available There has been much recent interest in Bayesian inference for generalized additive and related models. The increasing popularity of Bayesian methods for these and other model classes is mainly caused by the introduction of Markov chain Monte Carlo (MCMC simulation techniques which allow realistic modeling of complex problems. This paper describes the capabilities of the free software package BayesX for estimating regression models with structured additive predictor based on MCMC inference. The program extends the capabilities of existing software for semiparametric regression included in S-PLUS, SAS, R or Stata. Many model classes well known from the literature are special cases of the models supported by BayesX. Examples are generalized additive (mixed models, dynamic models, varying coefficient models, geoadditive models, geographically weighted regression and models for space-time regression. BayesX supports the most common distributions for the response variable. For univariate responses these are Gaussian, Binomial, Poisson, Gamma, negative Binomial, zero inflated Poisson and zero inflated negative binomial. For multicategorical responses, both multinomial logit and probit models for unordered categories of the response as well as cumulative threshold models for ordered categories can be estimated. Moreover, BayesX allows the estimation of complex continuous time survival and hazard rate models.
Adaptive Regression and Classification Models with Applications in Insurance
Directory of Open Access Journals (Sweden)
Jekabsons Gints
2014-07-01
Full Text Available Nowadays, in the insurance industry the use of predictive modeling by means of regression and classification techniques is becoming increasingly important and popular. The success of an insurance company largely depends on the ability to perform such tasks as credibility estimation, determination of insurance premiums, estimation of probability of claim, detecting insurance fraud, managing insurance risk. This paper discusses regression and classification modeling for such types of prediction problems using the method of Adaptive Basis Function Construction
Regression models for the quantification of Parkinsonian bradykinesia.
Kim, Ji-Won; Kwon, Yuri; Yun, Ju-Seok; Heo, Jae-Hoon; Eom, Gwang-Moon; Tack, Gye-Rae; Lim, Tae-Hong; Koh, Seong-Beom
2015-01-01
The aim of this study was to develop regression models for the quantification of parkinsonian bradykinesia. Forty patients with Parkinson's disease participated in this study. Angular velocity was measured using gyro sensor during finger tapping, forearm-rotation, and toe tapping tasks and the severity of bradykinesia was rated by two independent neurologists. Various characteristic variables were derived from the sensor signal. Stepwise multiple linear regression analysis was performed to develop models predicting the bradykinesia score with the characteristic variables as input. To evaluate the ability of the regression models to discriminate different bradykinesia scores, ANOVA and post hoc test were performed. Major determinants of the bradykinesia score differed among clinical tasks and between raters. The regression models were better than any single characteristic variable in terms of the ability to differentiate bradykinesia scores. Specifically, the regression models could differentiate all pairs of the bradykinesia scores (pmultiple regression models reflecting these differences would be beneficial for the quantification of bradykinesia because the cardinal features included in the determination of bradykinesia score differ among tasks as well as among the raters.
Wavelet regression model in forecasting crude oil price
Hamid, Mohd Helmie; Shabri, Ani
2017-05-01
This study presents the performance of wavelet multiple linear regression (WMLR) technique in daily crude oil forecasting. WMLR model was developed by integrating the discrete wavelet transform (DWT) and multiple linear regression (MLR) model. The original time series was decomposed to sub-time series with different scales by wavelet theory. Correlation analysis was conducted to assist in the selection of optimal decomposed components as inputs for the WMLR model. The daily WTI crude oil price series has been used in this study to test the prediction capability of the proposed model. The forecasting performance of WMLR model were also compared with regular multiple linear regression (MLR), Autoregressive Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) using root mean square errors (RMSE) and mean absolute errors (MAE). Based on the experimental results, it appears that the WMLR model performs better than the other forecasting technique tested in this study.
Synthesis analysis of regression models with a continuous outcome.
Zhou, Xiao-Hua; Hu, Nan; Hu, Guizhou; Root, Martin
2009-05-15
To estimate the multivariate regression model from multiple individual studies, it would be challenging to obtain results if the input from individual studies only provide univariate or incomplete multivariate regression information. Samsa et al. (J. Biomed. Biotechnol. 2005; 2:113-123) proposed a simple method to combine coefficients from univariate linear regression models into a multivariate linear regression model, a method known as synthesis analysis. However, the validity of this method relies on the normality assumption of the data, and it does not provide variance estimates. In this paper we propose a new synthesis method that improves on the existing synthesis method by eliminating the normality assumption, reducing bias, and allowing for the variance estimation of the estimated parameters. (c) 2009 John Wiley & Sons, Ltd.
Regression Model Optimization for the Analysis of Experimental Data
Ulbrich, N.
2009-01-01
A candidate math model search algorithm was developed at Ames Research Center that determines a recommended math model for the multivariate regression analysis of experimental data. The search algorithm is applicable to classical regression analysis problems as well as wind tunnel strain gage balance calibration analysis applications. The algorithm compares the predictive capability of different regression models using the standard deviation of the PRESS residuals of the responses as a search metric. This search metric is minimized during the search. Singular value decomposition is used during the search to reject math models that lead to a singular solution of the regression analysis problem. Two threshold dependent constraints are also applied. The first constraint rejects math models with insignificant terms. The second constraint rejects math models with near-linear dependencies between terms. The math term hierarchy rule may also be applied as an optional constraint during or after the candidate math model search. The final term selection of the recommended math model depends on the regressor and response values of the data set, the user s function class combination choice, the user s constraint selections, and the result of the search metric minimization. A frequently used regression analysis example from the literature is used to illustrate the application of the search algorithm to experimental data.
Real estate value prediction using multivariate regression models
Manjula, R.; Jain, Shubham; Srivastava, Sharad; Rajiv Kher, Pranav
2017-11-01
The real estate market is one of the most competitive in terms of pricing and the same tends to vary significantly based on a lot of factors, hence it becomes one of the prime fields to apply the concepts of machine learning to optimize and predict the prices with high accuracy. Therefore in this paper, we present various important features to use while predicting housing prices with good accuracy. We have described regression models, using various features to have lower Residual Sum of Squares error. While using features in a regression model some feature engineering is required for better prediction. Often a set of features (multiple regressions) or polynomial regression (applying a various set of powers in the features) is used for making better model fit. For these models are expected to be susceptible towards over fitting ridge regression is used to reduce it. This paper thus directs to the best application of regression models in addition to other techniques to optimize the result.
Alternative regression models to assess increase in childhood BMI
Directory of Open Access Journals (Sweden)
Mansmann Ulrich
2008-09-01
Full Text Available Abstract Background Body mass index (BMI data usually have skewed distributions, for which common statistical modeling approaches such as simple linear or logistic regression have limitations. Methods Different regression approaches to predict childhood BMI by goodness-of-fit measures and means of interpretation were compared including generalized linear models (GLMs, quantile regression and Generalized Additive Models for Location, Scale and Shape (GAMLSS. We analyzed data of 4967 children participating in the school entry health examination in Bavaria, Germany, from 2001 to 2002. TV watching, meal frequency, breastfeeding, smoking in pregnancy, maternal obesity, parental social class and weight gain in the first 2 years of life were considered as risk factors for obesity. Results GAMLSS showed a much better fit regarding the estimation of risk factors effects on transformed and untransformed BMI data than common GLMs with respect to the generalized Akaike information criterion. In comparison with GAMLSS, quantile regression allowed for additional interpretation of prespecified distribution quantiles, such as quantiles referring to overweight or obesity. The variables TV watching, maternal BMI and weight gain in the first 2 years were directly, and meal frequency was inversely significantly associated with body composition in any model type examined. In contrast, smoking in pregnancy was not directly, and breastfeeding and parental social class were not inversely significantly associated with body composition in GLM models, but in GAMLSS and partly in quantile regression models. Risk factor specific BMI percentile curves could be estimated from GAMLSS and quantile regression models. Conclusion GAMLSS and quantile regression seem to be more appropriate than common GLMs for risk factor modeling of BMI data.
Analysis of Sting Balance Calibration Data Using Optimized Regression Models
Ulbrich, N.; Bader, Jon B.
2010-01-01
Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.
Zhu, Dazhou; Ji, Baoping; Meng, Chaoying; Shi, Bolin; Tu, Zhenhua; Qing, Zhaoshen
2007-08-29
The nu-support vector regression (nu-SVR) was used to construct the calibration model between soluble solids content (SSC) of apples and acousto-optic tunable filter near-infrared (AOTF-NIR) spectra. The performance of nu-SVR was compared with the partial least square regression (PLSR) and the back-propagation artificial neural networks (BP-ANN). The influence of SVR parameters on the predictive ability of model was investigated. The results indicated that the parameter nu had a rather wide optimal area (between 0.35 and 1 for the apple data). Therefore, we could determine the value of nu beforehand and focus on the selection of other SVR parameters. For analyzing SSC of apple, nu-SVR was superior to PLSR and BP-ANN, especially in the case of fewer samples and treating the noise polluted spectra. Proper spectra pretreatment methods, such as scaling, mean center, standard normal variate (SNV) and the wavelength selection methods (stepwise multiple linear regression and genetic algorithm with PLS as its objective function), could improve the quality of nu-SVR model greatly.
Directory of Open Access Journals (Sweden)
Liyang Wang
2016-01-01
Full Text Available Time-varying external disturbances cause instability of humanoid robots or even tip robots over. In this work, a trapezoidal fuzzy least squares support vector regression- (TF-LSSVR- based control system is proposed to learn the external disturbances and increase the zero-moment-point (ZMP stability margin of humanoid robots. First, the humanoid states and the corresponding control torques of the joints for training the controller are collected by implementing simulation experiments. Secondly, a TF-LSSVR with a time-related trapezoidal fuzzy membership function (TFMF is proposed to train the controller using the simulated data. Thirdly, the parameters of the proposed TF-LSSVR are updated using a cubature Kalman filter (CKF. Simulation results are provided. The proposed method is shown to be effective in learning and adapting occasional external disturbances and ensuring the stability margin of the robot.
Regression models for public health surveillance data: a simulation study.
Kim, H; Kriebel, D
2009-11-01
Poisson regression is now widely used in epidemiology, but researchers do not always evaluate the potential for bias in this method when the data are overdispersed. This study used simulated data to evaluate sources of overdispersion in public health surveillance data and compare alternative statistical models for analysing such data. If count data are overdispersed, Poisson regression will not correctly estimate the variance. A model called negative binomial 2 (NB2) can correct for overdispersion, and may be preferred for analysis of count data. This paper compared the performance of Poisson and NB2 regression with simulated overdispersed injury surveillance data. Monte Carlo simulation was used to assess the utility of the NB2 regression model as an alternative to Poisson regression for data which had several different sources of overdispersion. Simulated injury surveillance datasets were created in which an important predictor variable was omitted, as well as with an incorrect offset (denominator). The simulations evaluated the ability of Poisson regression and NB2 to correctly estimate the true determinants of injury and their confidence intervals. The NB2 model was effective in reducing overdispersion, but it could not reduce bias in point estimates which resulted from omitting a covariate which was a confounder, nor could it reduce bias from using an incorrect offset. One advantage of NB2 over Poisson for overdispersed data was that the confidence interval for a covariate was considerably wider with the former, providing an indication that the Poisson model did not fit well. When overdispersion is detected in a Poisson regression model, the NB2 model should be fit as an alternative. If there is no longer overdispersion, then the NB2 results may be preferred. However, it is important to remember that NB2 cannot correct for bias from omitted covariates or from using an incorrect offset.
Oguntunde, Philip G; Lischeid, Gunnar; Dietrich, Ottfried
2017-10-14
This study examines the variations of climate variables and rice yield and quantifies the relationships among them using multiple linear regression, principal component analysis, and support vector machine (SVM) analysis in southwest Nigeria. The climate and yield data used was for a period of 36 years between 1980 and 2015. Similar to the observed decrease (P yield, pan evaporation, solar radiation, and wind speed declined significantly. Eight principal components exhibited an eigenvalue > 1 and explained 83.1% of the total variance of predictor variables. The SVM regression function using the scores of the first principal component explained about 75% of the variance in rice yield data and linear regression about 64%. SVM regression between annual solar radiation values and yield explained 67% of the variance. Only the first component of the principal component analysis (PCA) exhibited a clear long-term trend and sometimes short-term variance similar to that of rice yield. Short-term fluctuations of the scores of the PC1 are closely coupled to those of rice yield during the 1986-1993 and the 2006-2013 periods thereby revealing the inter-annual sensitivity of rice production to climate variability. Solar radiation stands out as the climate variable of highest influence on rice yield, and the influence was especially strong during monsoon and post-monsoon periods, which correspond to the vegetative, booting, flowering, and grain filling stages in the study area. The outcome is expected to provide more in-depth regional-specific climate-rice linkage for screening of better cultivars that can positively respond to future climate fluctuations as well as providing information that may help optimized planting dates for improved radiation use efficiency in the study area.
Oguntunde, Philip G.; Lischeid, Gunnar; Dietrich, Ottfried
2017-10-01
This study examines the variations of climate variables and rice yield and quantifies the relationships among them using multiple linear regression, principal component analysis, and support vector machine (SVM) analysis in southwest Nigeria. The climate and yield data used was for a period of 36 years between 1980 and 2015. Similar to the observed decrease (P 1 and explained 83.1% of the total variance of predictor variables. The SVM regression function using the scores of the first principal component explained about 75% of the variance in rice yield data and linear regression about 64%. SVM regression between annual solar radiation values and yield explained 67% of the variance. Only the first component of the principal component analysis (PCA) exhibited a clear long-term trend and sometimes short-term variance similar to that of rice yield. Short-term fluctuations of the scores of the PC1 are closely coupled to those of rice yield during the 1986-1993 and the 2006-2013 periods thereby revealing the inter-annual sensitivity of rice production to climate variability. Solar radiation stands out as the climate variable of highest influence on rice yield, and the influence was especially strong during monsoon and post-monsoon periods, which correspond to the vegetative, booting, flowering, and grain filling stages in the study area. The outcome is expected to provide more in-depth regional-specific climate-rice linkage for screening of better cultivars that can positively respond to future climate fluctuations as well as providing information that may help optimized planting dates for improved radiation use efficiency in the study area.
Wang, Xiu-feng; Zhang, Lei; Wu, Qing-hua; Min, Jian-Xin; Ma, Na; Luo, Lai-Cheng
2015-01-01
Psychological stress has become a common and important cause of premature ovarian failure (POF). Therefore, it is very important to explore the mechanisms of POF resulting from psychological stress. Sixty SD rats were randomly divided into control and model groups. Biomolecules associated with POF (β-EP, IL-1, NOS, NO, GnRH, CRH, FSH, LH, E2, P, ACTH, and CORT) were measured in the control and psychologically stressed rats. The regulation relationships of the biomolecules were explored in the...
Zongyuan Cai; Zhi Wang; Aixia Yan
2008-01-01
QSAR (Quantitative Structure Activity Relationships) models for the prediction of human intestinal absorption (HIA) were built with molecular descriptors calculated by ADRIANA.Code, Cerius2 and a combination of them. A dataset of 552 compounds covering a wide range of current drugs with experimental HIA values was investigated. A Genetic Algorithm feature selection method was applied to select proper descriptors. A Kohonen's self-organizing Neural Network (KohNN) map was used to split the who...
Robust brain ROI segmentation by deformation regression and deformable shape model.
Wu, Zhengwang; Guo, Yanrong; Park, Sang Hyun; Gao, Yaozong; Dong, Pei; Lee, Seong-Whan; Shen, Dinggang
2018-01-01
We propose a robust and efficient learning-based deformable model for segmenting regions of interest (ROIs) from structural MR brain images. Different from the conventional deformable-model-based methods that deform a shape model locally around the initialization location, we learn an image-based regressor to guide the deformable model to fit for the target ROI. Specifically, given any voxel in a new image, the image-based regressor can predict the displacement vector from this voxel towards the boundary of target ROI, which can be used to guide the deformable segmentation. By predicting the displacement vector maps for the whole image, our deformable model is able to use multiple non-boundary predictions to jointly determine and iteratively converge the initial shape model to the target ROI boundary, which is more robust to the local prediction error and initialization. In addition, by introducing the prior shape model, our segmentation avoids the isolated segmentations as often occurred in the previous multi-atlas-based methods. In order to learn an image-based regressor for displacement vector prediction, we adopt the following novel strategies in the learning procedure: (1) a joint classification and regression random forest is proposed to learn an image-based regressor together with an ROI classifier in a multi-task manner; (2) high-level context features are extracted from intermediate (estimated) displacement vector and classification maps to enforce the relationship between predicted displacement vectors at neighboring voxels. To validate our method, we compare it with the state-of-the-art multi-atlas-based methods and other learning-based methods on three public brain MR datasets. The results consistently show that our method is better in terms of both segmentation accuracy and computational efficiency. Copyright © 2017 Elsevier B.V. All rights reserved.
Joint regression analysis and AMMI model applied to oat improvement
Oliveira, A.; Oliveira, T. A.; Mejza, S.
2012-09-01
In our work we present an application of some biometrical methods useful in genotype stability evaluation, namely AMMI model, Joint Regression Analysis (JRA) and multiple comparison tests. A genotype stability analysis of oat (Avena Sativa L.) grain yield was carried out using data of the Portuguese Plant Breeding Board, sample of the 22 different genotypes during the years 2002, 2003 and 2004 in six locations. In Ferreira et al. (2006) the authors state the relevance of the regression models and of the Additive Main Effects and Multiplicative Interactions (AMMI) model, to study and to estimate phenotypic stability effects. As computational techniques we use the Zigzag algorithm to estimate the regression coefficients and the agricolae-package available in R software for AMMI model analysis.
Regression modeling of consumption or exposure variables classified by type.
Dorfman, A; Kimball, A W; Friedman, L A
1985-12-01
Consumption or exposure variables, as potential risk factors, are commonly measured and related to health effects. The measurements may be continuous or discrete, may be grouped into categories and may, in addition, be classified by type. Data analyses utilizing regression methods for the assessment of these risk factors present many problems of modeling and interpretation. Various models are proposed and evaluated, and recommendations are made. Use of the models is illustrated with Cox regression analyses of coronary heart disease mortality after 24 years of follow-up of subjects in the Framingham Study, with the focus being on alcohol consumption among these subjects.
Optimization of Regression Models of Experimental Data Using Confirmation Points
Ulbrich, N.
2010-01-01
A new search metric is discussed that may be used to better assess the predictive capability of different math term combinations during the optimization of a regression model of experimental data. The new search metric can be determined for each tested math term combination if the given experimental data set is split into two subsets. The first subset consists of data points that are only used to determine the coefficients of the regression model. The second subset consists of confirmation points that are exclusively used to test the regression model. The new search metric value is assigned after comparing two values that describe the quality of the fit of each subset. The first value is the standard deviation of the PRESS residuals of the data points. The second value is the standard deviation of the response residuals of the confirmation points. The greater of the two values is used as the new search metric value. This choice guarantees that both standard deviations are always less or equal to the value that is used during the optimization. Experimental data from the calibration of a wind tunnel strain-gage balance is used to illustrate the application of the new search metric. The new search metric ultimately generates an optimized regression model that was already tested at regression model independent confirmation points before it is ever used to predict an unknown response from a set of regressors.
Buffalos milk yield analysis using random regression models
Directory of Open Access Journals (Sweden)
A.S. Schierholt
2010-02-01
Full Text Available Data comprising 1,719 milk yield records from 357 females (predominantly Murrah breed, daughters of 110 sires, with births from 1974 to 2004, obtained from the Programa de Melhoramento Genético de Bubalinos (PROMEBUL and from records of EMBRAPA Amazônia Oriental - EAO herd, located in Belém, Pará, Brazil, were used to compare random regression models for estimating variance components and predicting breeding values of the sires. The data were analyzed by different models using the Legendre’s polynomial functions from second to fourth orders. The random regression models included the effects of herd-year, month of parity date of the control; regression coefficients for age of females (in order to describe the fixed part of the lactation curve and random regression coefficients related to the direct genetic and permanent environment effects. The comparisons among the models were based on the Akaike Infromation Criterion. The random effects regression model using third order Legendre’s polynomials with four classes of the environmental effect were the one that best described the additive genetic variation in milk yield. The heritability estimates varied from 0.08 to 0.40. The genetic correlation between milk yields in younger ages was close to the unit, but in older ages it was low.
Tree-based model checking for logistic regression.
Su, Xiaogang
2007-05-10
A tree procedure is proposed to check the adequacy of a fitted logistic regression model. The proposed method not only makes natural assessment for the logistic model, but also provides clues to amend its lack-of-fit. The resulting tree-augmented logistic model facilitates a refined model with meaningful interpretation. We demonstrate its use via simulation studies and an application to the Pima Indians diabetes data. Copyright 2006 John Wiley & Sons, Ltd.
Spontaneous symmetry breaking in the composite-vector-boson model
Energy Technology Data Exchange (ETDEWEB)
Garavaglia, T.
1986-11-15
Spontaneous symmetry breaking is discussed in the Abelian, QED-like, composite-vector-boson model. When the auxiliary vector field has a nonzero vacuum expectation value, a global symmetry, Lorentz invariance, is broken. It is shown that the regularization of the saddle-point conditions for the quantum fluctuation generating functional is consistent only with a spacelike vacuum expectation value for the auxiliary vector field.
Bias and uncertainty in regression-calibrated models of groundwater flow in heterogeneous media
Cooley, R.L.; Christensen, S.
2006-01-01
Groundwater models need to account for detailed but generally unknown spatial variability (heterogeneity) of the hydrogeologic model inputs. To address this problem we replace the large, m-dimensional stochastic vector ?? that reflects both small and large scales of heterogeneity in the inputs by a lumped or smoothed m-dimensional approximation ????*, where ?? is an interpolation matrix and ??* is a stochastic vector of parameters. Vector ??* has small enough dimension to allow its estimation with the available data. The consequence of the replacement is that model function f(????*) written in terms of the approximate inputs is in error with respect to the same model function written in terms of ??, ??,f(??), which is assumed to be nearly exact. The difference f(??) - f(????*), termed model error, is spatially correlated, generates prediction biases, and causes standard confidence and prediction intervals to be too small. Model error is accounted for in the weighted nonlinear regression methodology developed to estimate ??* and assess model uncertainties by incorporating the second-moment matrix of the model errors into the weight matrix. Techniques developed by statisticians to analyze classical nonlinear regression methods are extended to analyze the revised method. The analysis develops analytical expressions for bias terms reflecting the interaction of model nonlinearity and model error, for correction factors needed to adjust the sizes of confidence and prediction intervals for this interaction, and for correction factors needed to adjust the sizes of confidence and prediction intervals for possible use of a diagonal weight matrix in place of the correct one. If terms expressing the degree of intrinsic nonlinearity for f(??) and f(????*) are small, then most of the biases are small and the correction factors are reduced in magnitude. Biases, correction factors, and confidence and prediction intervals were obtained for a test problem for which model error is
CICAAR - Convolutive ICA with an Auto-Regressive Inverse Model
DEFF Research Database (Denmark)
Dyrholm, Mads; Hansen, Lars Kai
2004-01-01
We invoke an auto-regressive IIR inverse model for convolutive ICA and derive expressions for the likelihood and its gradient. We argue that optimization will give a stable inverse. When there are more sensors than sources the mixing model parameters are estimated in a second step by least squares...
Current and Predicted Fertility using Poisson Regression Model ...
African Journals Online (AJOL)
AJRH Managing Editor
Our study which formulates a model to predict future fertility in Nigeria was basically conceived to fill the gap. Understanding population, its determinants, growth ...... extreme cases. In conclusion, it is evident from this study that. Poisson regression model is an applicable tool for predicting number of children a woman is.
Linear regression models for quantitative assessment of left ...
African Journals Online (AJOL)
Changes in left ventricular structures and function have been reported in cardiomyopathies. No prediction models have been established in this environment. This study established regression models for prediction of left ventricular structures in normal subjects. A sample of normal subjects was drawn from a large urban ...
Uncertainties in spatially aggregated predictions from a logistic regression model
Horssen, P.W. van; Pebesma, E.J.; Schot, P.P.
2002-01-01
This paper presents a method to assess the uncertainty of an ecological spatial prediction model which is based on logistic regression models, using data from the interpolation of explanatory predictor variables. The spatial predictions are presented as approximate 95% prediction intervals. The
Geographically Weighted Logistic Regression Applied to Credit Scoring Models
Directory of Open Access Journals (Sweden)
Pedro Henrique Melo Albuquerque
Full Text Available Abstract This study used real data from a Brazilian financial institution on transactions involving Consumer Direct Credit (CDC, granted to clients residing in the Distrito Federal (DF, to construct credit scoring models via Logistic Regression and Geographically Weighted Logistic Regression (GWLR techniques. The aims were: to verify whether the factors that influence credit risk differ according to the borrower’s geographic location; to compare the set of models estimated via GWLR with the global model estimated via Logistic Regression, in terms of predictive power and financial losses for the institution; and to verify the viability of using the GWLR technique to develop credit scoring models. The metrics used to compare the models developed via the two techniques were the AICc informational criterion, the accuracy of the models, the percentage of false positives, the sum of the value of false positive debt, and the expected monetary value of portfolio default compared with the monetary value of defaults observed. The models estimated for each region in the DF were distinct in their variables and coefficients (parameters, with it being concluded that credit risk was influenced differently in each region in the study. The Logistic Regression and GWLR methodologies presented very close results, in terms of predictive power and financial losses for the institution, and the study demonstrated viability in using the GWLR technique to develop credit scoring models for the target population in the study.
Oil Price Volatility and Economic Growth in Nigeria: a Vector Auto-Regression (VAR Approach
Directory of Open Access Journals (Sweden)
Edesiri Godsday Okoro
2014-02-01
Full Text Available The study examined oil price volatility and economic growth in Nigeria linking oil price volatility, crude oil prices, oil revenue and Gross Domestic Product. Using quarterly data sourced from the Central Bank of Nigeria (CBN Statistical Bulletin and World Bank Indicators (various issues spanning 1980-2010, a non‐linear model of oil price volatility and economic growth was estimated using the VAR technique. The study revealed that oil price volatility has significantly influenced the level of economic growth in Nigeria although; the result additionally indicated a negative relationship between the oil price volatility and the level of economic growth. Furthermore, the result also showed that the Nigerian economy survived on crude oil, to such extent that the country‘s budget is tied to particular price of crude oil. This is not a good sign for a developing economy, more so that the country relies almost entirely on revenue of the oil sector as a source of foreign exchange earnings. This therefore portends some dangers for the economic survival of Nigeria. It was recommended amongst others that there should be a strong need for policy makers to focus on policy that will strengthen/stabilize the economy with specific focus on alternative sources of government revenue. Finally, there should be reduction in monetization of crude oil receipts (fiscal discipline, aggressive saving of proceeds from oil booms in future in order to withstand vicissitudes of oil price volatility in future.
Maximum Entropy Discrimination Poisson Regression for Software Reliability Modeling.
Chatzis, Sotirios P; Andreou, Andreas S
2015-11-01
Reliably predicting software defects is one of the most significant tasks in software engineering. Two of the major components of modern software reliability modeling approaches are: 1) extraction of salient features for software system representation, based on appropriately designed software metrics and 2) development of intricate regression models for count data, to allow effective software reliability data modeling and prediction. Surprisingly, research in the latter frontier of count data regression modeling has been rather limited. More specifically, a lack of simple and efficient algorithms for posterior computation has made the Bayesian approaches appear unattractive, and thus underdeveloped in the context of software reliability modeling. In this paper, we try to address these issues by introducing a novel Bayesian regression model for count data, based on the concept of max-margin data modeling, effected in the context of a fully Bayesian model treatment with simple and efficient posterior distribution updates. Our novel approach yields a more discriminative learning technique, making more effective use of our training data during model inference. In addition, it allows of better handling uncertainty in the modeled data, which can be a significant problem when the training data are limited. We derive elegant inference algorithms for our model under the mean-field paradigm and exhibit its effectiveness using the publicly available benchmark data sets.
Calibration of stormwater quality regression models: a random process?
Dembélé, A; Bertrand-Krajewski, J-L; Barillon, B
2010-01-01
Regression models are among the most frequently used models to estimate pollutants event mean concentrations (EMC) in wet weather discharges in urban catchments. Two main questions dealing with the calibration of EMC regression models are investigated: i) the sensitivity of models to the size and the content of data sets used for their calibration, ii) the change of modelling results when models are re-calibrated when data sets grow and change with time when new experimental data are collected. Based on an experimental data set of 64 rain events monitored in a densely urbanised catchment, four TSS EMC regression models (two log-linear and two linear models) with two or three explanatory variables have been derived and analysed. Model calibration with the iterative re-weighted least squares method is less sensitive and leads to more robust results than the ordinary least squares method. Three calibration options have been investigated: two options accounting for the chronological order of the observations, one option using random samples of events from the whole available data set. Results obtained with the best performing non linear model clearly indicate that the model is highly sensitive to the size and the content of the data set used for its calibration.
Vector space model for document representation in information retrieval
Directory of Open Access Journals (Sweden)
Dan MUNTEANU
2007-12-01
Full Text Available This paper presents the basics of information retrieval: the vector space model for document representation with Boolean and term weighted models, ranking methods based on the cosine factor and evaluation measures: recall, precision and combined measure.
Modeling non-Gaussian time-varying vector autoregressive process
National Aeronautics and Space Administration — We present a novel and general methodology for modeling time-varying vector autoregressive processes which are widely used in many areas such as modeling of chemical...
A bivariate cumulative probit regression model for ordered categorical data.
Kim, K
1995-06-30
This paper proposes a latent variable regression model for bivariate ordered categorical data and develops the necessary numerical procedure for parameter estimation. The proposed model is an extension of the standard bivariate probit model for dichotomous data to ordered categorical data with more than two categories for each margin. In addition, the proposed model allows for different covariates for the margins, which is characteristic of data from typical ophthalmological studies. It utilizes the stochastic ordering implicit in the data and the correlation coefficient of the bivariate normal distribution in expressing intra-subject dependency. Illustration of the proposed model uses data from the Wisconsin Epidemiologic Study of Diabetic Retinopathy for identifying risk factors for diabetic retinopathy among younger-onset diabetics. The proposed regression model also applies to other clinical or epidemiological studies that involve paired organs.
The art of regression modeling in road safety
Hauer, Ezra
2015-01-01
This unique book explains how to fashion useful regression models from commonly available data to erect models essential for evidence-based road safety management and research. Composed from techniques and best practices presented over many years of lectures and workshops, The Art of Regression Modeling in Road Safety illustrates that fruitful modeling cannot be done without substantive knowledge about the modeled phenomenon. Class-tested in courses and workshops across North America, the book is ideal for professionals, researchers, university professors, and graduate students with an interest in, or responsibilities related to, road safety. This book also: · Presents for the first time a powerful analytical tool for road safety researchers and practitioners · Includes problems and solutions in each chapter as well as data and spreadsheets for running models and PowerPoint presentation slides · Features pedagogy well-suited for graduate courses and workshops including problems, solutions, and PowerPoint p...
Sugarcane Land Classification with Satellite Imagery using Logistic Regression Model
Henry, F.; Herwindiati, D. E.; Mulyono, S.; Hendryli, J.
2017-03-01
This paper discusses the classification of sugarcane plantation area from Landsat-8 satellite imagery. The classification process uses binary logistic regression method with time series data of normalized difference vegetation index as input. The process is divided into two steps: training and classification. The purpose of training step is to identify the best parameter of the regression model using gradient descent algorithm. The best fit of the model can be utilized to classify sugarcane and non-sugarcane area. The experiment shows high accuracy and successfully maps the sugarcane plantation area which obtained best result of Cohen’s Kappa value 0.7833 (strong) with 89.167% accuracy.
A fitter use of Monte Carlo simulations in regression models
Directory of Open Access Journals (Sweden)
Alessandro Ferrarini
2011-12-01
Full Text Available In this article, I focus on the use of Monte Carlo simulations (MCS within regression models, being this application very frequent in biology, ecology and economy as well. I'm interested in enhancing a typical fault in this application of MCS, i.e. the inner correlations among independent variables are not used when generating random numbers that fit their distributions. By means of an illustrative example, I provide proof that the misuse of MCS in regression models produces misleading results. Furthermore, I also provide a solution for this topic.
Harrell , Jr , Frank E
2015-01-01
This highly anticipated second edition features new chapters and sections, 225 new references, and comprehensive R software. In keeping with the previous edition, this book is about the art and science of data analysis and predictive modeling, which entails choosing and using multiple tools. Instead of presenting isolated techniques, this text emphasizes problem solving strategies that address the many issues arising when developing multivariable models using real data and not standard textbook examples. It includes imputation methods for dealing with missing data effectively, methods for fitting nonlinear relationships and for making the estimation of transformations a formal part of the modeling process, methods for dealing with "too many variables to analyze and not enough observations," and powerful model validation techniques based on the bootstrap. The reader will gain a keen understanding of predictive accuracy, and the harm of categorizing continuous predictors or outcomes. This text realistically...
Kiala, Zolo; Odindi, John; Mutanga, Onisimo; Peerbhay, Kabir
2016-07-01
Leaf area index (LAI) is a key biophysical parameter commonly used to determine vegetation status, productivity, and health in tropical grasslands. Accurate LAI estimates are useful in supporting sustainable rangeland management by providing information related to grassland condition and associated goods and services. The performance of support vector regression (SVR) was compared to partial least square regression (PLSR) on selected optimal hyperspectral bands to detect LAI in heterogeneous grassland. Results show that PLSR performed better than SVR at the beginning and end of summer. At the peak of the growing season (mid-summer), during reflectance saturation, SVR models yielded higher accuracies (R2=0.902 and RMSE=0.371 m2 m-2) than PLSR models (R2=0.886 and RMSE=0.379 m2 m-2). For the combined dataset (all of summer), SVR models were slightly more accurate (R2=0.74 and RMSE=0.578 m2 m-2) than PLSR models (R2=0.732 and RMSE=0.58 m2 m-2). Variable importance on the projection scores show that most of the bands were located in the near-infrared and shortwave regions of the electromagnetic spectrum, thus providing a basis to investigate the potential of sensors on aerial and satellite platforms for large-scale grassland LAI prediction.
Imputation by PLS regression for generalized linear mixed models
Guyon, Emilie; Pommeret, Denys
2011-01-01
The problem of handling missing data in generalized linear mixed models with correlated covariates is considered when the missing mechanism concerns both the response variable and the covariates. An imputation algorithm combining multiple imputation and Partial Least Squares (PLS) regression is proposed. The method relies on two steps. In a first step, using a linearization technique, the generalized linear mixed model is approximated by a linear mixed model. A latent variable is introduced a...
Flexible competing risks regression modeling and goodness-of-fit
DEFF Research Database (Denmark)
Scheike, Thomas; Zhang, Mei-Jie
2008-01-01
In this paper we consider different approaches for estimation and assessment of covariate effects for the cumulative incidence curve in the competing risks model. The classic approach is to model all cause-specific hazards and then estimate the cumulative incidence curve based on these cause...... of the flexible regression models to analyze competing risks data when non-proportionality is present in the data....
Logistic Regression Model on Antenna Control Unit Autotracking Mode
2015-10-20
y is the logarithm of odds, or log-odds, also known as the logit of probability. Our model derives the logit of probabilities as the linear...partitioned over the control set predictors. This linearity of the logit vs. predictor is an assumption essential to our model . Not only can we...412TW-PA-15240 Logistic Regression Model on Antenna Control Unit Autotracking Mode DANIEL T. LAIRD AIR FORCE TEST CENTER EDWARDS AFB, CA
Detecting influential observations in nonlinear regression modeling of groundwater flow
Yager, R.M.
1998-01-01
Nonlinear regression is used to estimate optimal parameter values in models of groundwater flow to ensure that differences between predicted and observed heads and flows do not result from nonoptimal parameter values. Parameter estimates can be affected, however, by observations that disproportionately influence the regression, such as outliers that exert undue leverage on the objective function. Certain statistics developed for linear regression can be used to detect influential observations in nonlinear regression if the models are approximately linear. This paper discusses the application of Cook's D, which measures the effect of omitting a single observation on a set of estimated parameter values, and the statistical parameter DFBETAS, which quantifies the influence of an observation on each parameter. The influence statistics were used to (1) identify the influential observations in the calibration of a three-dimensional, groundwater flow model of a fractured-rock aquifer through nonlinear regression, and (2) quantify the effect of omitting influential observations on the set of estimated parameter values. Comparison of the spatial distribution of Cook's D with plots of model sensitivity shows that influential observations correspond to areas where the model heads are most sensitive to certain parameters, and where predicted groundwater flow rates are largest. Five of the six discharge observations were identified as influential, indicating that reliable measurements of groundwater flow rates are valuable data in model calibration. DFBETAS are computed and examined for an alternative model of the aquifer system to identify a parameterization error in the model design that resulted in overestimation of the effect of anisotropy on horizontal hydraulic conductivity.
Spatial stochastic regression modelling of urban land use
Arshad, S. H. M.; Jaafar, J.; Abiden, M. Z. Z.; Latif, Z. A.; Rasam, A. R. A.
2014-02-01
Urbanization is very closely linked to industrialization, commercialization or overall economic growth and development. This results in innumerable benefits of the quantity and quality of the urban environment and lifestyle but on the other hand contributes to unbounded development, urban sprawl, overcrowding and decreasing standard of living. Regulation and observation of urban development activities is crucial. The understanding of urban systems that promotes urban growth are also essential for the purpose of policy making, formulating development strategies as well as development plan preparation. This study aims to compare two different stochastic regression modeling techniques for spatial structure models of urban growth in the same specific study area. Both techniques will utilize the same datasets and their results will be analyzed. The work starts by producing an urban growth model by using stochastic regression modeling techniques namely the Ordinary Least Square (OLS) and Geographically Weighted Regression (GWR). The two techniques are compared to and it is found that, GWR seems to be a more significant stochastic regression model compared to OLS, it gives a smaller AICc (Akaike's Information Corrected Criterion) value and its output is more spatially explainable.
Direction of Effects in Multiple Linear Regression Models.
Wiedermann, Wolfgang; von Eye, Alexander
2015-01-01
Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed.
Applications of some discrete regression models for count data
Directory of Open Access Journals (Sweden)
B. M. Golam Kibria
2006-01-01
Full Text Available In this paper we have considered several regression models to fit the count data that encounter in the field of Biometrical, Environmental, Social Sciences and Transportation Engineering. We have fitted Poisson (PO, Negative Binomial (NB, Zero-Inflated Poisson (ZIP and Zero-Inflated Negative Binomial (ZINB regression models to run-off-road (ROR crash data which collected on arterial roads in south region (rural of Florida State. To compare the performance of these models, we analyzed data with moderate to high percentage of zero counts. Because the variances were almost three times greater than the means, it appeared that both NB and ZINB models performed better than PO and ZIP models for the zero inflated and over dispersed count data.
Vector meson photoproduction — model independent aspects
Kloet, W. M.; Tabakin, F.
2000-07-01
The rich spin structure of vector meson photoproduction allows for a systematic analysis of the angular and energy dependence of the spin observables in the photon-nucleon c.m. frame. Constraints for spin observables based on positivity of the spin density matrix, are discussed and should be part of any future analysis of experimental data.
Vector meson photoproduction - model independent aspects
Energy Technology Data Exchange (ETDEWEB)
Kloet, W. M.; Tabakin, F
2000-07-31
The rich spin structure of vector meson photoproduction allows for a systematic analysis of the angular and energy dependence of the spin observables in the photon-nucleon c.m. frame. Constraints for spin observables based on positivity of the spin density matrix, are discussed and should be part of any future analysis of experimental data.
Many regression algorithms, one unified model: A review.
Stulp, Freek; Sigaud, Olivier
2015-09-01
Regression is the process of learning relationships between inputs and continuous outputs from example data, which enables predictions for novel inputs. The history of regression is closely related to the history of artificial neural networks since the seminal work of Rosenblatt (1958). The aims of this paper are to provide an overview of many regression algorithms, and to demonstrate how the function representation whose parameters they regress fall into two classes: a weighted sum of basis functions, or a mixture of linear models. Furthermore, we show that the former is a special case of the latter. Our ambition is thus to provide a deep understanding of the relationship between these algorithms, that, despite being derived from very different principles, use a function representation that can be captured within one unified model. Finally, step-by-step derivations of the algorithms from first principles and visualizations of their inner workings allow this article to be used as a tutorial for those new to regression. Copyright © 2015 Elsevier Ltd. All rights reserved.
Time series regression model for infectious disease and weather.
Imai, Chisato; Armstrong, Ben; Chalabi, Zaid; Mangtani, Punam; Hashizume, Masahiro
2015-10-01
Time series regression has been developed and long used to evaluate the short-term associations of air pollution and weather with mortality or morbidity of non-infectious diseases. The application of the regression approaches from this tradition to infectious diseases, however, is less well explored and raises some new issues. We discuss and present potential solutions for five issues often arising in such analyses: changes in immune population, strong autocorrelations, a wide range of plausible lag structures and association patterns, seasonality adjustments, and large overdispersion. The potential approaches are illustrated with datasets of cholera cases and rainfall from Bangladesh and influenza and temperature in Tokyo. Though this article focuses on the application of the traditional time series regression to infectious diseases and weather factors, we also briefly introduce alternative approaches, including mathematical modeling, wavelet analysis, and autoregressive integrated moving average (ARIMA) models. Modifications proposed to standard time series regression practice include using sums of past cases as proxies for the immune population, and using the logarithm of lagged disease counts to control autocorrelation due to true contagion, both of which are motivated from "susceptible-infectious-recovered" (SIR) models. The complexity of lag structures and association patterns can often be informed by biological mechanisms and explored by using distributed lag non-linear models. For overdispersed models, alternative distribution models such as quasi-Poisson and negative binomial should be considered. Time series regression can be used to investigate dependence of infectious diseases on weather, but may need modifying to allow for features specific to this context. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Illustrating bayesian evaluation of informative hypotheses for regression models.
Kluytmans, A.; Schoot, R. van de; Mulder, J.; Hoijtink, H.
2012-01-01
In the present article we illustrate a Bayesian method of evaluating informative hypotheses for regression models. Our main aim is to make this method accessible to psychological researchers without a mathematical or Bayesian background. The use of informative hypotheses is illustrated using two
Efficient estimation of an additive quantile regression model
Cheng, Y.; de Gooijer, J.G.; Zerom, D.
2010-01-01
In this paper two kernel-based nonparametric estimators are proposed for estimating the components of an additive quantile regression model. The first estimator is a computationally convenient approach which can be viewed as a viable alternative to the method of De Gooijer and Zerom (2003). By
Efficient estimation of an additive quantile regression model
Cheng, Y.; de Gooijer, J.G.; Zerom, D.
2009-01-01
In this paper two kernel-based nonparametric estimators are proposed for estimating the components of an additive quantile regression model. The first estimator is a computationally convenient approach which can be viewed as a viable alternative to the method of De Gooijer and Zerom (2003). By
Efficient estimation of an additive quantile regression model
Cheng, Y.; de Gooijer, J.G.; Zerom, D.
2011-01-01
In this paper, two non-parametric estimators are proposed for estimating the components of an additive quantile regression model. The first estimator is a computationally convenient approach which can be viewed as a more viable alternative to existing kernel-based approaches. The second estimator
Likelihood Inference for a Fractionally Cointegrated Vector Autoregressive Model
DEFF Research Database (Denmark)
Johansen, Søren; Nielsen, Morten Ørregaard
We consider model based inference in a fractionally cointegrated (or cofractional) vector autoregressive model based on the conditional Gaussian likelihood. The model allows the process X(t) to be fractional of order d and cofractional of order d-b; that is, there exist vectors ß for which ß...... the asymptotic distribution of the likelihood ratio test for cointegration rank, which is a functional of fractional Brownian motion of type II....
Likelihood inference for a fractionally cointegrated vector autoregressive model
DEFF Research Database (Denmark)
Johansen, Søren; Nielsen, Morten Ørregaard
We consider model based inference in a fractionally cointegrated (or cofractional) vector autoregressive model based on the conditional Gaussian likelihood. The model allows the process X_{t} to be fractional of order d and cofractional of order d-b; that is, there exist vectors β for which β...... also find the asymptotic distribution of the likelihood ratio test for cointegration rank, which is a functional of fractional Brownian motion of type II....
Modeling energy expenditure in children and adolescents using quantile regression
Yang, Yunwen; Adolph, Anne L; Puyau, Maurice R.; Vohra, Firoz A.; Butte, Nancy F.; Zakeri, Issa F.
2013-01-01
Advanced mathematical models have the potential to capture the complex metabolic and physiological processes that result in energy expenditure (EE). Study objective is to apply quantile regression (QR) to predict EE and determine quantile-dependent variation in covariate effects in nonobese and obese children. First, QR models will be developed to predict minute-by-minute awake EE at different quantile levels based on heart rate (HR) and physical activity (PA) accelerometry counts, and child ...
Regression and multivariate models for predicting particulate matter concentration level.
Nazif, Amina; Mohammed, Nurul Izma; Malakahmad, Amirhossein; Abualqumboz, Motasem S
2018-01-01
The devastating health effects of particulate matter (PM 10 ) exposure by susceptible populace has made it necessary to evaluate PM 10 pollution. Meteorological parameters and seasonal variation increases PM 10 concentration levels, especially in areas that have multiple anthropogenic activities. Hence, stepwise regression (SR), multiple linear regression (MLR) and principal component regression (PCR) analyses were used to analyse daily average PM 10 concentration levels. The analyses were carried out using daily average PM 10 concentration, temperature, humidity, wind speed and wind direction data from 2006 to 2010. The data was from an industrial air quality monitoring station in Malaysia. The SR analysis established that meteorological parameters had less influence on PM 10 concentration levels having coefficient of determination (R 2 ) result from 23 to 29% based on seasoned and unseasoned analysis. While, the result of the prediction analysis showed that PCR models had a better R 2 result than MLR methods. The results for the analyses based on both seasoned and unseasoned data established that MLR models had R 2 result from 0.50 to 0.60. While, PCR models had R 2 result from 0.66 to 0.89. In addition, the validation analysis using 2016 data also recognised that the PCR model outperformed the MLR model, with the PCR model for the seasoned analysis having the best result. These analyses will aid in achieving sustainable air quality management strategies.
Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate
Directory of Open Access Journals (Sweden)
Minh Vu Trieu
2017-03-01
Full Text Available This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS, Brazilian tensile strength (BTS, rock brittleness index (BI, the distance between planes of weakness (DPW, and the alpha angle (Alpha between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP. Four (4 statistical regression models (two linear and two nonlinear are built to predict the ROP of TBM. Finally a fuzzy logic model is developed as an alternative method and compared to the four statistical regression models. Results show that the fuzzy logic model provides better estimations and can be applied to predict the TBM performance. The R-squared value (R2 of the fuzzy logic model scores the highest value of 0.714 over the second runner-up of 0.667 from the multiple variables nonlinear regression model.
On concurvity in nonlinear and nonparametric regression models
Directory of Open Access Journals (Sweden)
Sonia Amodio
2014-12-01
Full Text Available When data are affected by multicollinearity in the linear regression framework, then concurvity will be present in fitting a generalized additive model (GAM. The term concurvity describes nonlinear dependencies among the predictor variables. As collinearity results in inflated variance of the estimated regression coefficients in the linear regression model, the result of the presence of concurvity leads to instability of the estimated coefficients in GAMs. Even if the backfitting algorithm will always converge to a solution, in case of concurvity the final solution of the backfitting procedure in fitting a GAM is influenced by the starting functions. While exact concurvity is highly unlikely, approximate concurvity, the analogue of multicollinearity, is of practical concern as it can lead to upwardly biased estimates of the parameters and to underestimation of their standard errors, increasing the risk of committing type I error. We compare the existing approaches to detect concurvity, pointing out their advantages and drawbacks, using simulated and real data sets. As a result, this paper will provide a general criterion to detect concurvity in nonlinear and non parametric regression models.
Qin, Zijian; Wang, Maolin; Yan, Aixia
2017-07-01
In this study, quantitative structure-activity relationship (QSAR) models using various descriptor sets and training/test set selection methods were explored to predict the bioactivity of hepatitis C virus (HCV) NS3/4A protease inhibitors by using a multiple linear regression (MLR) and a support vector machine (SVM) method. 512 HCV NS3/4A protease inhibitors and their IC 50 values which were determined by the same FRET assay were collected from the reported literature to build a dataset. All the inhibitors were represented with selected nine global and 12 2D property-weighted autocorrelation descriptors calculated from the program CORINA Symphony. The dataset was divided into a training set and a test set by a random and a Kohonen's self-organizing map (SOM) method. The correlation coefficients (r 2 ) of training sets and test sets were 0.75 and 0.72 for the best MLR model, 0.87 and 0.85 for the best SVM model, respectively. In addition, a series of sub-dataset models were also developed. The performances of all the best sub-dataset models were better than those of the whole dataset models. We believe that the combination of the best sub- and whole dataset SVM models can be used as reliable lead designing tools for new NS3/4A protease inhibitors scaffolds in a drug discovery pipeline. Copyright © 2017 Elsevier Ltd. All rights reserved.
Yu, Xianyu; Wang, Yi; Niu, Ruiqing; Hu, Youjian
2016-05-11
In this study, a novel coupling model for landslide susceptibility mapping is presented. In practice, environmental factors may have different impacts at a local scale in study areas. To provide better predictions, a geographically weighted regression (GWR) technique is firstly used in our method to segment study areas into a series of prediction regions with appropriate sizes. Meanwhile, a support vector machine (SVM) classifier is exploited in each prediction region for landslide susceptibility mapping. To further improve the prediction performance, the particle swarm optimization (PSO) algorithm is used in the prediction regions to obtain optimal parameters for the SVM classifier. To evaluate the prediction performance of our model, several SVM-based prediction models are utilized for comparison on a study area of the Wanzhou district in the Three Gorges Reservoir. Experimental results, based on three objective quantitative measures and visual qualitative evaluation, indicate that our model can achieve better prediction accuracies and is more effective for landslide susceptibility mapping. For instance, our model can achieve an overall prediction accuracy of 91.10%, which is 7.8%-19.1% higher than the traditional SVM-based models. In addition, the obtained landslide susceptibility map by our model can demonstrate an intensive correlation between the classified very high-susceptibility zone and the previously investigated landslides.
Empirical Bayes estimation for additive hazards regression models.
Sinha, Debajyoti; McHenry, M Brent; Lipsitz, Stuart R; Ghosh, Malay
2009-09-01
We develop a novel empirical Bayesian framework for the semiparametric additive hazards regression model. The integrated likelihood, obtained by integration over the unknown prior of the nonparametric baseline cumulative hazard, can be maximized using standard statistical software. Unlike the corresponding full Bayes method, our empirical Bayes estimators of regression parameters, survival curves and their corresponding standard errors have easily computed closed-form expressions and require no elicitation of hyperparameters of the prior. The method guarantees a monotone estimator of the survival function and accommodates time-varying regression coefficients and covariates. To facilitate frequentist-type inference based on large-sample approximation, we present the asymptotic properties of the semiparametric empirical Bayes estimates. We illustrate the implementation and advantages of our methodology with a reanalysis of a survival dataset and a simulation study.
Leone, Robert Matthew
A search for vector-like quarks (VLQs) decaying to a Z boson using multi-stage machine learning was compared to a search using a standard square cuts search strategy. VLQs are predicted by several new theories beyond the Standard Model. The searches used 20.3 inverse femtobarns of proton-proton collisions at a center-of-mass energy of 8 TeV collected with the ATLAS detector in 2012 at the CERN Large Hadron Collider. CLs upper limits on production cross sections of vector-like top and bottom quarks were computed for VLQs produced singly or in pairs, Tsingle, Bsingle, Tpair, and Bpair. The two stage machine learning classification search strategy did not provide any improvement over the standard square cuts strategy, but for Tpair, Bpair, and Tsingle, a third stage of machine learning regression was able to lower the upper limits of high signal masses by as much as 50%. Additionally, new test statistics were developed for use in the Neyman construction of confidence regions in order to address deficiencies in c...
Hierarchical Neural Regression Models for Customer Churn Prediction
Directory of Open Access Journals (Sweden)
Golshan Mohammadi
2013-01-01
Full Text Available As customers are the main assets of each industry, customer churn prediction is becoming a major task for companies to remain in competition with competitors. In the literature, the better applicability and efficiency of hierarchical data mining techniques has been reported. This paper considers three hierarchical models by combining four different data mining techniques for churn prediction, which are backpropagation artificial neural networks (ANN, self-organizing maps (SOM, alpha-cut fuzzy c-means (α-FCM, and Cox proportional hazards regression model. The hierarchical models are ANN + ANN + Cox, SOM + ANN + Cox, and α-FCM + ANN + Cox. In particular, the first component of the models aims to cluster data in two churner and nonchurner groups and also filter out unrepresentative data or outliers. Then, the clustered data as the outputs are used to assign customers to churner and nonchurner groups by the second technique. Finally, the correctly classified data are used to create Cox proportional hazards model. To evaluate the performance of the hierarchical models, an Iranian mobile dataset is considered. The experimental results show that the hierarchical models outperform the single Cox regression baseline model in terms of prediction accuracy, Types I and II errors, RMSE, and MAD metrics. In addition, the α-FCM + ANN + Cox model significantly performs better than the two other hierarchical models.
Wang, Chih-Ping; Kim, Hee-Jeong; Yue, Chao; Weygand, James M.; Hsu, Tung-Shin; Chu, Xiangning
2017-04-01
To investigate whether ultralow-frequency (ULF) fluctuations from 0.5 to 8.3 mHz in the solar wind and interplanetary magnetic field (IMF) can affect the plasma sheet electron temperature (Te) near geosynchronous distances, we use a support vector regression machine technique to decouple the effects from different solar wind parameters and their ULF fluctuation power. Te in this region varies from 0.1 to 10 keV with a median of 1.3 keV. We find that when the solar wind ULF power is weak, Te increases with increasing southward IMF Bz and solar wind speed, while it varies weakly with solar wind density. As the ULF power becomes stronger during weak IMF Bz ( 0) or northward IMF, Te becomes significantly enhanced, by a factor of up to 10. We also find that mesoscale disturbances in a time scale of a few to tens of minutes as indicated by AE during substorm expansion and recovery phases are more enhanced when the ULF power is stronger. The effect of ULF powers may be explained by stronger inward radial diffusion resulting from stronger mesoscale disturbances under higher ULF powers, which can bring high-energy plasma sheet electrons further toward geosynchronous distance. This effect of ULF powers is particularly important during weak southward IMF or northward IMF when convection electric drift is weak.
A Model of Medium Depolarisation Effects on Polarisation Vector Position
Directory of Open Access Journals (Sweden)
J. Moc
1994-07-01
Full Text Available The paper presents the results of modelling the influence of real atmosphere with precipitation clouds on the radar signal polarisation status. The relationship between the signal vector depolarisation shift value and the initial wave polarisation is examined.
Directory of Open Access Journals (Sweden)
Bing-Chun Liu
Full Text Available Today, China is facing a very serious issue of Air Pollution due to its dreadful impact on the human health as well as the environment. The urban cities in China are the most affected due to their rapid industrial and economic growth. Therefore, it is of extreme importance to come up with new, better and more reliable forecasting models to accurately predict the air quality. This paper selected Beijing, Tianjin and Shijiazhuang as three cities from the Jingjinji Region for the study to come up with a new model of collaborative forecasting using Support Vector Regression (SVR for Urban Air Quality Index (AQI prediction in China. The present study is aimed to improve the forecasting results by minimizing the prediction error of present machine learning algorithms by taking into account multiple city multi-dimensional air quality information and weather conditions as input. The results show that there is a decrease in MAPE in case of multiple city multi-dimensional regression when there is a strong interaction and correlation of the air quality characteristic attributes with AQI. Also, the geographical location is found to play a significant role in Beijing, Tianjin and Shijiazhuang AQI prediction.
Liu, Bing-Chun; Binaykia, Arihant; Chang, Pei-Chann; Tiwari, Manoj Kumar; Tsao, Cheng-Chin
2017-01-01
Today, China is facing a very serious issue of Air Pollution due to its dreadful impact on the human health as well as the environment. The urban cities in China are the most affected due to their rapid industrial and economic growth. Therefore, it is of extreme importance to come up with new, better and more reliable forecasting models to accurately predict the air quality. This paper selected Beijing, Tianjin and Shijiazhuang as three cities from the Jingjinji Region for the study to come up with a new model of collaborative forecasting using Support Vector Regression (SVR) for Urban Air Quality Index (AQI) prediction in China. The present study is aimed to improve the forecasting results by minimizing the prediction error of present machine learning algorithms by taking into account multiple city multi-dimensional air quality information and weather conditions as input. The results show that there is a decrease in MAPE in case of multiple city multi-dimensional regression when there is a strong interaction and correlation of the air quality characteristic attributes with AQI. Also, the geographical location is found to play a significant role in Beijing, Tianjin and Shijiazhuang AQI prediction.
Liu, Bing-Chun; Binaykia, Arihant; Chang, Pei-Chann; Tiwari, Manoj Kumar; Tsao, Cheng-Chin
2017-01-01
Today, China is facing a very serious issue of Air Pollution due to its dreadful impact on the human health as well as the environment. The urban cities in China are the most affected due to their rapid industrial and economic growth. Therefore, it is of extreme importance to come up with new, better and more reliable forecasting models to accurately predict the air quality. This paper selected Beijing, Tianjin and Shijiazhuang as three cities from the Jingjinji Region for the study to come up with a new model of collaborative forecasting using Support Vector Regression (SVR) for Urban Air Quality Index (AQI) prediction in China. The present study is aimed to improve the forecasting results by minimizing the prediction error of present machine learning algorithms by taking into account multiple city multi-dimensional air quality information and weather conditions as input. The results show that there is a decrease in MAPE in case of multiple city multi-dimensional regression when there is a strong interaction and correlation of the air quality characteristic attributes with AQI. Also, the geographical location is found to play a significant role in Beijing, Tianjin and Shijiazhuang AQI prediction. PMID:28708836
Scattering vector mesons in D4/D8 model
Energy Technology Data Exchange (ETDEWEB)
Ballon Bayona, C.A., E-mail: ballon@cbpf.b [Centro Brasileiro de Pesquisas Fisicas, Rua Dr. Xavier Sigaud 150, Urca, 22290-180 Rio de Janeiro, RJ (Brazil); Boschi-Filho, Henrique, E-mail: boschi@if.ufrj.b [Instituto de Fisica, Universidade Federal do Rio de Janeiro, Caixa Postal 68528, 21941-972 Rio de Janeiro, RJ (Brazil); Braga, Nelson R.F., E-mail: braga@if.ufrj.b [Instituto de Fisica, Universidade Federal do Rio de Janeiro, Caixa Postal 68528, 21941-972 Rio de Janeiro, RJ (Brazil); Torres, Marcus A.C., E-mail: mtorres@if.ufrj.b [Instituto de Fisica, Universidade Federal do Rio de Janeiro, Caixa Postal 68528, 21941-972 Rio de Janeiro, RJ (Brazil)
2010-02-15
We review in this proceedings some recent results for vector meson form factors obtained using the holographic D4-D8 brane model. The D4-D8 brane model, proposed by Sakai and Sugimoto, is a holographic dual of a semi-realistic strongly coupled large N{sub c} QCD since it breaks supersymmetry and incorporates chiral symmetry breaking. We analyze the vector meson wave functions and Regge trajectories as well.
Data correction for seven activity trackers based on regression models.
Andalibi, Vafa; Honko, Harri; Christophe, Francois; Viik, Jari
2015-08-01
Using an activity tracker for measuring activity-related parameters, e.g. steps and energy expenditure (EE), can be very helpful in assisting a person's fitness improvement. Unlike the measuring of number of steps, an accurate EE estimation requires additional personal information as well as accurate velocity of movement, which is hard to achieve due to inaccuracy of sensors. In this paper, we have evaluated regression-based models to improve the precision for both steps and EE estimation. For this purpose, data of seven activity trackers and two reference devices was collected from 20 young adult volunteers wearing all devices at once in three different tests, namely 60-minute office work, 6-hour overall activity and 60-minute walking. Reference data is used to create regression models for each device and relative percentage errors of adjusted values are then statistically compared to that of original values. The effectiveness of regression models are determined based on the result of a statistical test. During a walking period, EE measurement was improved in all devices. The step measurement was also improved in five of them. The results show that improvement of EE estimation is possible only with low-cost implementation of fitting model over the collected data e.g. in the app or in corresponding service back-end.
Learning Supervised Topic Models for Classification and Regression from Crowds
DEFF Research Database (Denmark)
Rodrigues, Filipe; Lourenco, Mariana; Ribeiro, Bernardete
2017-01-01
The growing need to analyze large collections of documents has led to great developments in topic modeling. Since documents are frequently associated with other related variables, such as labels or ratings, much interest has been placed on supervised topic models. However, the nature of most...... annotation tasks, prone to ambiguity and noise, often with high volumes of documents, deem learning under a single-annotator assumption unrealistic or unpractical for most real-world applications. In this article, we propose two supervised topic models, one for classification and another for regression...
Analysis of Nonlinear Regression Models: A Cautionary Note
Peddada, Shyamal D.; Haseman, Joseph K.
2005-01-01
Regression models are routinely used in many applied sciences for describing the relationship between a response variable and an independent variable. Statistical inferences on the regression parameters are often performed using the maximum likelihood estimators (MLE). In the case of nonlinear models the standard errors of MLE are often obtained by linearizing the nonlinear function around the true parameter and by appealing to large sample theory. In this article we demonstrate, through computer simulations, that the resulting asymptotic Wald confidence intervals cannot be trusted to achieve the desired confidence levels. Sometimes they could underestimate the true nominal level and are thus liberal. Hence one needs to be cautious in using the usual linearized standard errors of MLE and the associated confidence intervals. PMID:18648618
Online Statistical Modeling (Regression Analysis) for Independent Responses
Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus
2017-06-01
Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.
General score tests for regression models incorporating 'robust' variance estimates
David Clayton; Joanna Howson
2002-01-01
Stata incorporates commands for carrying out two of the three general approaches to asymptotic significance testing in regression models, namely likelihood ratio (lrtest) and Wald tests (testparms). However, the third approach, using "score" tests, has no such general implementation. This omission is particularly serious when dealing with "clustered" data using the Huber-White approach. Here the likelihood ratio test is lost, leaving only the Wald test. This has relatively poor asymptotic pro...
Non-Parametric Identification and Estimation of Truncated Regression Models
Songnian Chen
2010-01-01
In this paper, we consider non-parametric identification and estimation of truncated regression models in both cross-sectional and panel data settings. For the cross-sectional case, Lewbel and Linton (2002) considered non-parametric identification and estimation through continuous variation under a log-concavity condition on the error distribution. We obtain non-parametric identification under weaker conditions. In particular, we obtain non-parametric identification through discrete variation...
Modeling energy expenditure in children and adolescents using quantile regression.
Yang, Yunwen; Adolph, Anne L; Puyau, Maurice R; Vohra, Firoz A; Butte, Nancy F; Zakeri, Issa F
2013-07-15
Advanced mathematical models have the potential to capture the complex metabolic and physiological processes that result in energy expenditure (EE). Study objective is to apply quantile regression (QR) to predict EE and determine quantile-dependent variation in covariate effects in nonobese and obese children. First, QR models will be developed to predict minute-by-minute awake EE at different quantile levels based on heart rate (HR) and physical activity (PA) accelerometry counts, and child characteristics of age, sex, weight, and height. Second, the QR models will be used to evaluate the covariate effects of weight, PA, and HR across the conditional EE distribution. QR and ordinary least squares (OLS) regressions are estimated in 109 children, aged 5-18 yr. QR modeling of EE outperformed OLS regression for both nonobese and obese populations. Average prediction errors for QR compared with OLS were not only smaller at the median τ = 0.5 (18.6 vs. 21.4%), but also substantially smaller at the tails of the distribution (10.2 vs. 39.2% at τ = 0.1 and 8.7 vs. 19.8% at τ = 0.9). Covariate effects of weight, PA, and HR on EE for the nonobese and obese children differed across quantiles (P PA and HR with EE were stronger for the obese than nonobese population (P EE compared with conventional OLS regression, especially at the tails of the distribution, and revealed substantially different covariate effects of weight, PA, and HR on EE in nonobese and obese children.
Huang, Mengmeng; Wei, Yan; Wang, Jun; Zhang, Yu
2016-09-01
We used the support vector regression (SVR) approach to predict and unravel reduction/promotion effect of characteristic flavonoids on the acrylamide formation under a low-moisture Maillard reaction system. Results demonstrated the reduction/promotion effects by flavonoids at addition levels of 1-10000 μmol/L. The maximal inhibition rates (51.7%, 68.8% and 26.1%) and promote rates (57.7%, 178.8% and 27.5%) caused by flavones, flavonols and isoflavones were observed at addition levels of 100 μmol/L and 10000 μmol/L, respectively. The reduction/promotion effects were closely related to the change of trolox equivalent antioxidant capacity (ΔTEAC) and well predicted by triple ΔTEAC measurements via SVR models (R: 0.633-0.900). Flavonols exhibit stronger effects on the acrylamide formation than flavones and isoflavones as well as their O-glycosides derivatives, which may be attributed to the number and position of phenolic and 3-enolic hydroxyls. The reduction/promotion effects were well predicted by using optimized quantitative structure-activity relationship (QSAR) descriptors and SVR models (R: 0.926-0.994). Compared to artificial neural network and multi-linear regression models, SVR models exhibited better fitting performance for both TEAC-dependent and QSAR descriptor-dependent predicting work. These observations demonstrated that the SVR models are competent for predicting our understanding on the future use of natural antioxidants for decreasing the acrylamide formation.
Nordic climate change: data for modeling vector borne diseases
DEFF Research Database (Denmark)
Kristensen, Birgit; Bødker, Rene
The distribution of vector species is generally restricted by a range of different climatic and geographical factors, while the development and spread of the vector-borne diseases (veterinary and zoonotic) is often primarily temperature driven. Thus temperature and its derivatives are key factors...... derivatives were calculated in order to assess the geographical and seasonal variation in the area. In order to evaluate the response of vector borne diseases to possible future climate changes and the subsequent potential spread into new areas, daily temperature predictions (mean, min and max) for three 20...... in the modelling of vector-borne diseases. This puts a high demand on the quality and accuracy of the temperature data to be used as input in such models. In order to best capture the local temporal and spatial variation in the temperature surfaces, accurate daily temperature data were used in the present project...
Preference learning with evolutionary Multivariate Adaptive Regression Spline model
DEFF Research Database (Denmark)
Abou-Zleikha, Mohamed; Shaker, Noor; Christensen, Mads Græsbøll
2015-01-01
This paper introduces a novel approach for pairwise preference learning through combining an evolutionary method with Multivariate Adaptive Regression Spline (MARS). Collecting users' feedback through pairwise preferences is recommended over other ranking approaches as this method is more appealing...... for human decision making. Learning models from pairwise preference data is however an NP-hard problem. Therefore, constructing models that can effectively learn such data is a challenging task. Models are usually constructed with accuracy being the most important factor. Another vitally important aspect...... that is usually given less attention is expressiveness, i.e. how easy it is to explain the relationship between the model input and output. Most machine learning techniques are focused either on performance or on expressiveness. This paper employ MARS models which have the advantage of being a powerful method...
Extended cox regression model: The choice of timefunction
Isik, Hatice; Tutkun, Nihal Ata; Karasoy, Durdu
2017-07-01
Cox regression model (CRM), which takes into account the effect of censored observations, is one the most applicative and usedmodels in survival analysis to evaluate the effects of covariates. Proportional hazard (PH), requires a constant hazard ratio over time, is the assumptionofCRM. Using extended CRM provides the test of including a time dependent covariate to assess the PH assumption or an alternative model in case of nonproportional hazards. In this study, the different types of real data sets are used to choose the time function and the differences between time functions are analyzed and discussed.
Lorenz, Alyson; Dhingra, Radhika; Chang, Howard H; Bisanzio, Donal; Liu, Yang; Remais, Justin V
2014-01-01
Extrapolating landscape regression models for use in assessing vector-borne disease risk and other applications requires thoughtful evaluation of fundamental model choice issues. To examine implications of such choices, an analysis was conducted to explore the extent to which disparate landscape models agree in their epidemiological and entomological risk predictions when extrapolated to new regions. Agreement between six literature-drawn landscape models was examined by comparing predicted county-level distributions of either Lyme disease or Ixodes scapularis vector using Spearman ranked correlation. AUC analyses and multinomial logistic regression were used to assess the ability of these extrapolated landscape models to predict observed national data. Three models based on measures of vegetation, habitat patch characteristics, and herbaceous landcover emerged as effective predictors of observed disease and vector distribution. An ensemble model containing these three models improved precision and predictive ability over individual models. A priori assessment of qualitative model characteristics effectively identified models that subsequently emerged as better predictors in quantitative analysis. Both a methodology for quantitative model comparison and a checklist for qualitative assessment of candidate models for extrapolation are provided; both tools aim to improve collaboration between those producing models and those interested in applying them to new areas and research questions.
Wheat flour dough Alveograph characteristics predicted by Mixolab regression models.
Codină, Georgiana Gabriela; Mironeasa, Silvia; Mironeasa, Costel; Popa, Ciprian N; Tamba-Berehoiu, Radiana
2012-02-01
In Romania, the Alveograph is the most used device to evaluate the rheological properties of wheat flour dough, but lately the Mixolab device has begun to play an important role in the breadmaking industry. These two instruments are based on different principles but there are some correlations that can be found between the parameters determined by the Mixolab and the rheological properties of wheat dough measured with the Alveograph. Statistical analysis on 80 wheat flour samples using the backward stepwise multiple regression method showed that Mixolab values using the ‘Chopin S’ protocol (40 samples) and ‘Chopin + ’ protocol (40 samples) can be used to elaborate predictive models for estimating the value of the rheological properties of wheat dough: baking strength (W), dough tenacity (P) and extensibility (L). The correlation analysis confirmed significant findings (P linear equations were obtained. Linear regression models gave multiple regression coefficients with R²(adjusted) > 0.70 for P, R²(adjusted) > 0.70 for W and R²(adjusted) > 0.38 for L, at a 95% confidence interval. Copyright © 2011 Society of Chemical Industry.
A New Approach in Regression Analysis for Modeling Adsorption Isotherms
Onjia, Antonije E.
2014-01-01
Numerous regression approaches to isotherm parameters estimation appear in the literature. The real insight into the proper modeling pattern can be achieved only by testing methods on a very big number of cases. Experimentally, it cannot be done in a reasonable time, so the Monte Carlo simulation method was applied. The objective of this paper is to introduce and compare numerical approaches that involve different levels of knowledge about the noise structure of the analytical method used for initial and equilibrium concentration determination. Six levels of homoscedastic noise and five types of heteroscedastic noise precision models were considered. Performance of the methods was statistically evaluated based on median percentage error and mean absolute relative error in parameter estimates. The present study showed a clear distinction between two cases. When equilibrium experiments are performed only once, for the homoscedastic case, the winning error function is ordinary least squares, while for the case of heteroscedastic noise the use of orthogonal distance regression or Margart's percent standard deviation is suggested. It was found that in case when experiments are repeated three times the simple method of weighted least squares performed as well as more complicated orthogonal distance regression method. PMID:24672394
Labour motivation : an axiomatic vector model
Kotliarov, Ivan
2008-01-01
En el presente artículo se da una lista de axiomas necesarios para la construcción de una teoría matemática de la motivación humana. Se propone un modelo matemático de la motivación en el trabajo. La motivación se representa como un vector resultante de la motivación parcial generada por grupos específicos de necesidades. El modelo de Vroom se incluye en el modelo propuesto como ejemplo de motivación. Se establece una correlación entre los gastos de motivación, el nivel de motivación y el niv...
Bayes Estimation of Two-Phase Linear Regression Model
Directory of Open Access Journals (Sweden)
Mayuri Pandya
2011-01-01
Full Text Available Let the regression model be Yi=β1Xi+εi, where εi are i. i. d. N (0,σ2 random errors with variance σ2>0 but later it was found that there was a change in the system at some point of time m and it is reflected in the sequence after Xm by change in slope, regression parameter β2. The problem of study is when and where this change has started occurring. This is called change point inference problem. The estimators of m, β1,β2 are derived under asymmetric loss functions, namely, Linex loss & General Entropy loss functions. The effects of correct and wrong prior information on the Bayes estimates are studied.
Modeling the number of car theft using Poisson regression
Zulkifli, Malina; Ling, Agnes Beh Yen; Kasim, Maznah Mat; Ismail, Noriszura
2016-10-01
Regression analysis is the most popular statistical methods used to express the relationship between the variables of response with the covariates. The aim of this paper is to evaluate the factors that influence the number of car theft using Poisson regression model. This paper will focus on the number of car thefts that occurred in districts in Peninsular Malaysia. There are two groups of factor that have been considered, namely district descriptive factors and socio and demographic factors. The result of the study showed that Bumiputera composition, Chinese composition, Other ethnic composition, foreign migration, number of residence with the age between 25 to 64, number of employed person and number of unemployed person are the most influence factors that affect the car theft cases. These information are very useful for the law enforcement department, insurance company and car owners in order to reduce and limiting the car theft cases in Peninsular Malaysia.
Multidimensional Vector Model of Stimulus-Response Compatibility
Yamaguchi, Motonori; Proctor, Robert W.
2012-01-01
The present study proposes and examines the multidimensional vector (MDV) model framework as a modeling schema for choice response times. MDV extends the Thurstonian model, as well as signal detection theory, to classification tasks by taking into account the influence of response properties on stimulus discrimination. It is capable of accounting…
A Bayesian Infinite Hidden Markov Vector Autoregressive Model
D. Nibbering (Didier); R. Paap (Richard); M. van der Wel (Michel)
2016-01-01
textabstractWe propose a Bayesian infinite hidden Markov model to estimate time-varying parameters in a vector autoregressive model. The Markov structure allows for heterogeneity over time while accounting for state-persistence. By modelling the transition distribution as a Dirichlet process mixture
Soyoung Park; Se-Yeong Hamm; Hang-Tak Jeon; Jinsoo Kim
2017-01-01
This study mapped and analyzed groundwater potential using two different models, logistic regression (LR) and multivariate adaptive regression splines (MARS), and compared the results. A spatial database was constructed for groundwater well data and groundwater influence factors. Groundwater well data with a high potential yield of ≥70 m3/d were extracted, and 859 locations (70%) were used for model training, whereas the other 365 locations (30%) were used for model validation. We analyzed 16...
Dynamic logistic regression and dynamic model averaging for binary classification.
McCormick, Tyler H; Raftery, Adrian E; Madigan, David; Burd, Randall S
2012-03-01
We propose an online binary classification procedure for cases when there is uncertainty about the model to use and parameters within a model change over time. We account for model uncertainty through dynamic model averaging, a dynamic extension of Bayesian model averaging in which posterior model probabilities may also change with time. We apply a state-space model to the parameters of each model and we allow the data-generating model to change over time according to a Markov chain. Calibrating a "forgetting" factor accommodates different levels of change in the data-generating mechanism. We propose an algorithm that adjusts the level of forgetting in an online fashion using the posterior predictive distribution, and so accommodates various levels of change at different times. We apply our method to data from children with appendicitis who receive either a traditional (open) appendectomy or a laparoscopic procedure. Factors associated with which children receive a particular type of procedure changed substantially over the 7 years of data collection, a feature that is not captured using standard regression modeling. Because our procedure can be implemented completely online, future data collection for similar studies would require storing sensitive patient information only temporarily, reducing the risk of a breach of confidentiality. © 2011, The International Biometric Society.
Learning Supervised Topic Models for Classification and Regression from Crowds.
Rodrigues, Filipe; Lourenco, Mariana; Ribeiro, Bernardete; Pereira, Francisco C
2017-12-01
The growing need to analyze large collections of documents has led to great developments in topic modeling. Since documents are frequently associated with other related variables, such as labels or ratings, much interest has been placed on supervised topic models. However, the nature of most annotation tasks, prone to ambiguity and noise, often with high volumes of documents, deem learning under a single-annotator assumption unrealistic or unpractical for most real-world applications. In this article, we propose two supervised topic models, one for classification and another for regression problems, which account for the heterogeneity and biases among different annotators that are encountered in practice when learning from crowds. We develop an efficient stochastic variational inference algorithm that is able to scale to very large datasets, and we empirically demonstrate the advantages of the proposed model over state-of-the-art approaches.
Unified regression model of binding equilibria in crowded environments
Lee, Byoungkoo; LeDuc, Philip R.; Schwartz, Russell
2011-01-01
Molecular crowding is a critical feature distinguishing intracellular environments from idealized solution-based environments and is essential to understanding numerous biochemical reactions, from protein folding to signal transduction. Many biochemical reactions are dramatically altered by crowding, yet it is extremely difficult to predict how crowding will quantitatively affect any particular reaction systems. We previously developed a novel stochastic off-lattice model to efficiently simulate binding reactions across wide parameter ranges in various crowded conditions. We now show that a polynomial regression model can incorporate several interrelated parameters influencing chemistry under crowded conditions. The unified model of binding equilibria accurately reproduces the results of particle simulations over a broad range of variation of six physical parameters that collectively yield a complicated, non-linear crowding effect. The work represents an important step toward the long-term goal of computationally tractable predictive models of reaction chemistry in the cellular environment. PMID:22355615
Development and Application of Nonlinear Land-Use Regression Models
Champendal, Alexandre; Kanevski, Mikhail; Huguenot, Pierre-Emmanuel
2014-05-01
The problem of air pollution modelling in urban zones is of great importance both from scientific and applied points of view. At present there are several fundamental approaches either based on science-based modelling (air pollution dispersion) or on the application of space-time geostatistical methods (e.g. family of kriging models or conditional stochastic simulations). Recently, there were important developments in so-called Land Use Regression (LUR) models. These models take into account geospatial information (e.g. traffic network, sources of pollution, average traffic, population census, land use, etc.) at different scales, for example, using buffering operations. Usually the dimension of the input space (number of independent variables) is within the range of (10-100). It was shown that LUR models have some potential to model complex and highly variable patterns of air pollution in urban zones. Most of LUR models currently used are linear models. In the present research the nonlinear LUR models are developed and applied for Geneva city. Mainly two nonlinear data-driven models were elaborated: multilayer perceptron and random forest. An important part of the research deals also with a comprehensive exploratory data analysis using statistical, geostatistical and time series tools. Unsupervised self-organizing maps were applied to better understand space-time patterns of the pollution. The real data case study deals with spatial-temporal air pollution data of Geneva (2002-2011). Nitrogen dioxide (NO2) has caught our attention. It has effects on human health and on plants; NO2 contributes to the phenomenon of acid rain. The negative effects of nitrogen dioxides on plants are the reduction of the growth, production and pesticide resistance. And finally, the effects on materials: nitrogen dioxide increases the corrosion. The data used for this study consist of a set of 106 NO2 passive sensors. 80 were used to build the models and the remaining 36 have constituted
Prediction of Mind-Wandering with Electroencephalogram and Non-linear Regression Modeling.
Kawashima, Issaku; Kumano, Hiroaki
2017-01-01
Mind-wandering (MW), task-unrelated thought, has been examined by researchers in an increasing number of articles using models to predict whether subjects are in MW, using numerous physiological variables. However, these models are not applicable in general situations. Moreover, they output only binary classification. The current study suggests that the combination of electroencephalogram (EEG) variables and non-linear regression modeling can be a good indicator of MW intensity. We recorded EEGs of 50 subjects during the performance of a Sustained Attention to Response Task, including a thought sampling probe that inquired the focus of attention. We calculated the power and coherence value and prepared 35 patterns of variable combinations and applied Support Vector machine Regression (SVR) to them. Finally, we chose four SVR models: two of them non-linear models and the others linear models; two of the four models are composed of a limited number of electrodes to satisfy model usefulness. Examination using the held-out data indicated that all models had robust predictive precision and provided significantly better estimations than a linear regression model using single electrode EEG variables. Furthermore, in limited electrode condition, non-linear SVR model showed significantly better precision than linear SVR model. The method proposed in this study helps investigations into MW in various little-examined situations. Further, by measuring MW with a high temporal resolution EEG, unclear aspects of MW, such as time series variation, are expected to be revealed. Furthermore, our suggestion that a few electrodes can also predict MW contributes to the development of neuro-feedback studies.
Genetic evaluation of European quails by random regression models
Directory of Open Access Journals (Sweden)
Flaviana Miranda Gonçalves
2012-09-01
Full Text Available The objective of this study was to compare different random regression models, defined from different classes of heterogeneity of variance combined with different Legendre polynomial orders for the estimate of (covariance of quails. The data came from 28,076 observations of 4,507 female meat quails of the LF1 lineage. Quail body weights were determined at birth and 1, 14, 21, 28, 35 and 42 days of age. Six different classes of residual variance were fitted to Legendre polynomial functions (orders ranging from 2 to 6 to determine which model had the best fit to describe the (covariance structures as a function of time. According to the evaluated criteria (AIC, BIC and LRT, the model with six classes of residual variances and of sixth-order Legendre polynomial was the best fit. The estimated additive genetic variance increased from birth to 28 days of age, and dropped slightly from 35 to 42 days. The heritability estimates decreased along the growth curve and changed from 0.51 (1 day to 0.16 (42 days. Animal genetic and permanent environmental correlation estimates between weights and age classes were always high and positive, except for birth weight. The sixth order Legendre polynomial, along with the residual variance divided into six classes was the best fit for the growth rate curve of meat quails; therefore, they should be considered for breeding evaluation processes by random regression models.
DEFF Research Database (Denmark)
Klein, John P.; Andersen, Per Kragh
2005-01-01
Bone marrow transplantation; Generalized estimating equations; Jackknife statistics; Regression models......Bone marrow transplantation; Generalized estimating equations; Jackknife statistics; Regression models...
Interpreting parameters in the logistic regression model with random effects
DEFF Research Database (Denmark)
Larsen, Klaus; Petersen, Jørgen Holm; Budtz-Jørgensen, Esben
2000-01-01
interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects......interpretation, interval odds ratio, logistic regression, median odds ratio, normally distributed random effects...
PReMiuM: An R Package for Profile Regression Mixture Models Using Dirichlet Processes
Directory of Open Access Journals (Sweden)
Silvia Liverani
2015-03-01
Full Text Available PReMiuM is a recently developed R package for Bayesian clustering using a Dirichlet process mixture model. This model is an alternative to regression models, non- parametrically linking a response vector to covariate data through cluster membership (Molitor, Papathomas, Jerrett, and Richardson 2010. The package allows binary, categorical, count and continuous response, as well as continuous and discrete covariates. Additionally, predictions may be made for the response, and missing values for the covariates are handled. Several samplers and label switching moves are implemented along with diagnostic tools to assess convergence. A number of R functions for post-processing of the output are also provided. In addition to fitting mixtures, it may additionally be of interest to determine which covariates actively drive the mixture components. This is implemented in the package as variable selection.
Bias-correction of regression models: a case study on hERG inhibition.
Hansen, Katja; Rathke, Fabian; Schroeter, Timon; Rast, Georg; Fox, Thomas; Kriegl, Jan M; Mika, Sebastian
2009-06-01
In the present work we develop a predictive QSAR model for the blockade of the hERG channel. Additionally, this specific end point is used as a test scenario to develop and evaluate several techniques for fusing predictions from multiple regression models. hERG inhibition models which are presented here are based on a combined data set of roughly 550 proprietary and 110 public domain compounds. Models are built using various statistical learning techniques and different sets of molecular descriptors. Single Support Vector Regression, Gaussian Process, or Random Forest models achieve root mean-squared errors of roughly 0.6 log units as determined from leave-group-out cross-validation. An analysis of the evaluation strategy on the performance estimates shows that standard leave-group-out cross-validation yields overly optimistic results. As an alternative, a clustered cross-validation scheme is introduced to obtain a more realistic estimate of the model performance. The evaluation of several techniques to combine multiple prediction models shows that the root mean squared error as determined from clustered cross-validation can be reduced from 0.73 +/- 0.01 to 0.57 +/- 0.01 using a local bias correction strategy.
A hybrid neural network model for noisy data regression.
Lee, Eric W M; Lim, Chee Peng; Yuen, Richard K K; Lo, S M
2004-04-01
A hybrid neural network model, based on the fusion of fuzzy adaptive resonance theory (FA ART) and the general regression neural network (GRNN), is proposed in this paper. Both FA and the GRNN are incremental learning systems and are very fast in network training. The proposed hybrid model, denoted as GRNNFA, is able to retain these advantages and, at the same time, to reduce the computational requirements in calculating and storing information of the kernels. A clustering version of the GRNN is designed with data compression by FA for noise removal. An adaptive gradient-based kernel width optimization algorithm has also been devised. Convergence of the gradient descent algorithm can be accelerated by the geometric incremental growth of the updating factor. A series of experiments with four benchmark datasets have been conducted to assess and compare effectiveness of GRNNFA with other approaches. The GRNNFA model is also employed in a novel application task for predicting the evacuation time of patrons at typical karaoke centers in Hong Kong in the event of fire. The results positively demonstrate the applicability of GRNNFA in noisy data regression problems.
A new inverse regression model applied to radiation biodosimetry
Higueras, Manuel; Puig, Pedro; Ainsbury, Elizabeth A.; Rothkamm, Kai
2015-01-01
Biological dosimetry based on chromosome aberration scoring in peripheral blood lymphocytes enables timely assessment of the ionizing radiation dose absorbed by an individual. Here, new Bayesian-type count data inverse regression methods are introduced for situations where responses are Poisson or two-parameter compound Poisson distributed. Our Poisson models are calculated in a closed form, by means of Hermite and negative binomial (NB) distributions. For compound Poisson responses, complete and simplified models are provided. The simplified models are also expressible in a closed form and involve the use of compound Hermite and compound NB distributions. Three examples of applications are given that demonstrate the usefulness of these methodologies in cytogenetic radiation biodosimetry and in radiotherapy. We provide R and SAS codes which reproduce these examples. PMID:25663804
On Regression Modeling of Human Immunodeficiency Virus Patients
Hadeel S. Al-Kutubi
2009-01-01
Problem statement: The main propose of this study was to evaluate the HIV patients for the period 1990-2008 depend on three variables age, gender and ethnicity. Approach: The data was analyzed using regression and correlation methods to get the mathematical model that explain the relationship and the effect between the age, gender and ethnicity. SSPS program V. 17.0 was used throughout this study to analyze the data and to generate the various Tables. Results: Using SPSS program to obtain reg...
A surface hydrology model for regional vector borne disease models
Tompkins, Adrian; Asare, Ernest; Bomblies, Arne; Amekudzi, Leonard
2016-04-01
Small, sun-lit temporary pools that form during the rainy season are important breeding sites for many key mosquito vectors responsible for the transmission of malaria and other diseases. The representation of this surface hydrology in mathematical disease models is challenging, due to their small-scale, dependence on the terrain and the difficulty of setting soil parameters. Here we introduce a model that represents the temporal evolution of the aggregate statistics of breeding sites in a single pond fractional coverage parameter. The model is based on a simple, geometrical assumption concerning the terrain, and accounts for the processes of surface runoff, pond overflow, infiltration and evaporation. Soil moisture, soil properties and large-scale terrain slope are accounted for using a calibration parameter that sets the equivalent catchment fraction. The model is calibrated and then evaluated using in situ pond measurements in Ghana and ultra-high (10m) resolution explicit simulations for a village in Niger. Despite the model's simplicity, it is shown to reproduce the variability and mean of the pond aggregate water coverage well for both locations and validation techniques. Example malaria simulations for Uganda will be shown using this new scheme with a generic calibration setting, evaluated using district malaria case data. Possible methods for implementing regional calibration will be briefly discussed.
Saunders, Christina T; Blume, Jeffrey D
2017-10-26
Mediation analysis explores the degree to which an exposure's effect on an outcome is diverted through a mediating variable. We describe a classical regression framework for conducting mediation analyses in which estimates of causal mediation effects and their variance are obtained from the fit of a single regression model. The vector of changes in exposure pathway coefficients, which we named the essential mediation components (EMCs), is used to estimate standard causal mediation effects. Because these effects are often simple functions of the EMCs, an analytical expression for their model-based variance follows directly. Given this formula, it is instructive to revisit the performance of routinely used variance approximations (e.g., delta method and resampling methods). Requiring the fit of only one model reduces the computation time required for complex mediation analyses and permits the use of a rich suite of regression tools that are not easily implemented on a system of three equations, as would be required in the Baron-Kenny framework. Using data from the BRAIN-ICU study, we provide examples to illustrate the advantages of this framework and compare it with the existing approaches. © The Author 2017. Published by Oxford University Press.
Directory of Open Access Journals (Sweden)
Giuliano de Oliveira Freitas
2013-10-01
Full Text Available PURPOSE: To determine linear regression models between Alpins descriptive indices and Thibos astigmatic power vectors (APV, assessing the validity and strength of such correlations. METHODS: This case series prospectively assessed 62 eyes of 31 consecutive cataract patients with preoperative corneal astigmatism between 0.75 and 2.50 diopters in both eyes. Patients were randomly assorted among two phacoemulsification groups: one assigned to receive AcrySof®Toric intraocular lens (IOL in both eyes and another assigned to have AcrySof Natural IOL associated with limbal relaxing incisions, also in both eyes. All patients were reevaluated postoperatively at 6 months, when refractive astigmatism analysis was performed using both Alpins and Thibos methods. The ratio between Thibos postoperative APV and preoperative APV (APVratio and its linear regression to Alpins percentage of success of astigmatic surgery, percentage of astigmatism corrected and percentage of astigmatism reduction at the intended axis were assessed. RESULTS: Significant negative correlation between the ratio of post- and preoperative Thibos APVratio and Alpins percentage of success (%Success was found (Spearman's ρ=-0.93; linear regression is given by the following equation: %Success = (-APVratio + 1.00x100. CONCLUSION: The linear regression we found between APVratio and %Success permits a validated mathematical inference concerning the overall success of astigmatic surgery.
Natural interpretations in Tobit regression models using marginal estimation methods.
Wang, Wei; Griswold, Michael E
2015-09-01
The Tobit model, also known as a censored regression model to account for left- and/or right-censoring in the dependent variable, has been used in many areas of applications, including dental health, medical research and economics. The reported Tobit model coefficient allows estimation and inference of an exposure effect on the latent dependent variable. However, this model does not directly provide overall exposure effects estimation on the original outcome scale. We propose a direct-marginalization approach using a reparameterized link function to model exposure and covariate effects directly on the truncated dependent variable mean. We also discuss an alternative average-predicted-value, post-estimation approach which uses model-predicted values for each person in a designated reference group under different exposure statuses to estimate covariate-adjusted overall exposure effects. Simulation studies were conducted to show the unbiasedness and robustness properties for both approaches under various scenarios. Robustness appears to diminish when covariates with substantial effects are imbalanced between exposure groups; we outline an approach for model choice based on information criterion fit statistics. The methods are applied to the Genetic Epidemiology Network of Arteriopathy (GENOA) cohort study to assess associations between obesity and cognitive function in the non-Hispanic white participants. © The Author(s) 2015.
Forecasting volatility with neural regression: a contribution to model adequacy.
Refenes, A N; Holt, W T
2001-01-01
Neural nets' usefulness for forecasting is limited by problems of overfitting and the lack of rigorous procedures for model identification, selection and adequacy testing. This paper describes a methodology for neural model misspecification testing. We introduce a generalization of the Durbin-Watson statistic for neural regression and discuss the general issues of misspecification testing using residual analysis. We derive a generalized influence matrix for neural estimators which enables us to evaluate the distribution of the statistic. We deploy Monte Carlo simulation to compare the power of the test for neural and linear regressors. While residual testing is not a sufficient condition for model adequacy, it is nevertheless a necessary condition to demonstrate that the model is a good approximation to the data generating process, particularly as neural-network estimation procedures are susceptible to partial convergence. The work is also an important step toward developing rigorous procedures for neural model identification, selection and adequacy testing which have started to appear in the literature. We demonstrate its applicability in the nontrivial problem of forecasting implied volatility innovations using high-frequency stock index options. Each step of the model building process is validated using statistical tests to verify variable significance and model adequacy with the results confirming the presence of nonlinear relationships in implied volatility innovations.
Varying-coefficient functional linear regression
Wu, Yichao; Fan, Jianqing; Müller, Hans-Georg
2010-01-01
Functional linear regression analysis aims to model regression relations which include a functional predictor. The analog of the regression parameter vector or matrix in conventional multivariate or multiple-response linear regression models is a regression parameter function in one or two arguments. If, in addition, one has scalar predictors, as is often the case in applications to longitudinal studies, the question arises how to incorporate these into a functional regression model. We study...
Gao, Wei; Li, Xiang-ru
2017-07-01
The multi-task learning takes the multiple tasks together to make analysis and calculation, so as to dig out the correlations among them, and therefore to improve the accuracy of the analyzed results. This kind of methods have been widely applied to the machine learning, pattern recognition, computer vision, and other related fields. This paper investigates the application of multi-task learning in estimating the stellar atmospheric parameters, including the surface temperature (Teff), surface gravitational acceleration (lg g), and chemical abundance ([Fe/H]). Firstly, the spectral features of the three stellar atmospheric parameters are extracted by using the multi-task sparse group Lasso algorithm, then the support vector machine is used to estimate the atmospheric physical parameters. The proposed scheme is evaluated on both the Sloan stellar spectra and the theoretical spectra computed from the Kurucz's New Opacity Distribution Function (NEWODF) model. The mean absolute errors (MAEs) on the Sloan spectra are: 0.0064 for lg (Teff /K), 0.1622 for lg (g/(cm · s-2)), and 0.1221 dex for [Fe/H]; the MAEs on the synthetic spectra are 0.0006 for lg (Teff /K), 0.0098 for lg (g/(cm · s-2)), and 0.0082 dex for [Fe/H]. Experimental results show that the proposed scheme has a rather high accuracy for the estimation of stellar atmospheric parameters.
Regression Models for Predicting Force Coefficients of Aerofoils
Directory of Open Access Journals (Sweden)
Mohammed ABDUL AKBAR
2015-09-01
Full Text Available Renewable sources of energy are attractive and advantageous in a lot of different ways. Among the renewable energy sources, wind energy is the fastest growing type. Among wind energy converters, Vertical axis wind turbines (VAWTs have received renewed interest in the past decade due to some of the advantages they possess over their horizontal axis counterparts. VAWTs have evolved into complex 3-D shapes. A key component in predicting the output of VAWTs through analytical studies is obtaining the values of lift and drag coefficients which is a function of shape of the aerofoil, ‘angle of attack’ of wind and Reynolds’s number of flow. Sandia National Laboratories have carried out extensive experiments on aerofoils for the Reynolds number in the range of those experienced by VAWTs. The volume of experimental data thus obtained is huge. The current paper discusses three Regression analysis models developed wherein lift and drag coefficients can be found out using simple formula without having to deal with the bulk of the data. Drag coefficients and Lift coefficients were being successfully estimated by regression models with R2 values as high as 0.98.
Geographical Weighted Regression Model for Poverty Analysis in Jambi Province
Directory of Open Access Journals (Sweden)
Inti Pertiwi Nashwari
2017-07-01
Full Text Available Agriculture sector has an important contribution to food security in Indonesia, but it also huge contribution to the number of poverty, especially in rural area. Studies using a global model might not be sufficient to pinpoint the factors having most impact on poverty due to spatial differences. Therefore, a Geographically Weighted Regression (GWR was used to analyze the factors influencing the poverty among food crops famers. Jambi Province is selected because have high number of poverty in rural area and the lowest farmer exchange term in Indonesia. The GWR was better than the global model, based on high value of R2, lowers AIC and MSE and Leung test. Location in upland area and road system had more influence to the poverty in the western-southern. Rainfall was significantly influence in eastern. The effect of each factor, however, was not generic, since the parameter estimate might have a positive or negative value.
Khoshravesh, Mojtaba; Sefidkouhi, Mohammad Ali Gholami; Valipour, Mohammad
2017-07-01
The proper evaluation of evapotranspiration is essential in food security investigation, farm management, pollution detection, irrigation scheduling, nutrient flows, carbon balance as well as hydrologic modeling, especially in arid environments. To achieve sustainable development and to ensure water supply, especially in arid environments, irrigation experts need tools to estimate reference evapotranspiration on a large scale. In this study, the monthly reference evapotranspiration was estimated by three different regression models including the multivariate fractional polynomial (MFP), robust regression, and Bayesian regression in Ardestan, Esfahan, and Kashan. The results were compared with Food and Agriculture Organization (FAO)-Penman-Monteith (FAO-PM) to select the best model. The results show that at a monthly scale, all models provided a closer agreement with the calculated values for FAO-PM ( R 2 > 0.95 and RMSE < 12.07 mm month-1). However, the MFP model gives better estimates than the other two models for estimating reference evapotranspiration at all stations.
Complex Environmental Data Modelling Using Adaptive General Regression Neural Networks
Kanevski, Mikhail
2015-04-01
The research deals with an adaptation and application of Adaptive General Regression Neural Networks (GRNN) to high dimensional environmental data. GRNN [1,2,3] are efficient modelling tools both for spatial and temporal data and are based on nonparametric kernel methods closely related to classical Nadaraya-Watson estimator. Adaptive GRNN, using anisotropic kernels, can be also applied for features selection tasks when working with high dimensional data [1,3]. In the present research Adaptive GRNN are used to study geospatial data predictability and relevant feature selection using both simulated and real data case studies. The original raw data were either three dimensional monthly precipitation data or monthly wind speeds embedded into 13 dimensional space constructed by geographical coordinates and geo-features calculated from digital elevation model. GRNN were applied in two different ways: 1) adaptive GRNN with the resulting list of features ordered according to their relevancy; and 2) adaptive GRNN applied to evaluate all possible models N [in case of wind fields N=(2^13 -1)=8191] and rank them according to the cross-validation error. In both cases training were carried out applying leave-one-out procedure. An important result of the study is that the set of the most relevant features depends on the month (strong seasonal effect) and year. The predictabilities of precipitation and wind field patterns, estimated using the cross-validation and testing errors of raw and shuffled data, were studied in detail. The results of both approaches were qualitatively and quantitatively compared. In conclusion, Adaptive GRNN with their ability to select features and efficient modelling of complex high dimensional data can be widely used in automatic/on-line mapping and as an integrated part of environmental decision support systems. 1. Kanevski M., Pozdnoukhov A., Timonin V. Machine Learning for Spatial Environmental Data. Theory, applications and software. EPFL Press
Notes on power of normality tests of error terms in regression models
Energy Technology Data Exchange (ETDEWEB)
Střelec, Luboš [Department of Statistics and Operation Analysis, Faculty of Business and Economics, Mendel University in Brno, Zemědělská 1, Brno, 61300 (Czech Republic)
2015-03-10
Normality is one of the basic assumptions in applying statistical procedures. For example in linear regression most of the inferential procedures are based on the assumption of normality, i.e. the disturbance vector is assumed to be normally distributed. Failure to assess non-normality of the error terms may lead to incorrect results of usual statistical inference techniques such as t-test or F-test. Thus, error terms should be normally distributed in order to allow us to make exact inferences. As a consequence, normally distributed stochastic errors are necessary in order to make a not misleading inferences which explains a necessity and importance of robust tests of normality. Therefore, the aim of this contribution is to discuss normality testing of error terms in regression models. In this contribution, we introduce the general RT class of robust tests for normality, and present and discuss the trade-off between power and robustness of selected classical and robust normality tests of error terms in regression models.
Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P
2015-01-01
This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of
A Gompertz regression model for fern spores germination
Directory of Open Access Journals (Sweden)
Gabriel y Galán, Jose María
2015-06-01
Full Text Available Germination is one of the most important biological processes for both seed and spore plants, also for fungi. At present, mathematical models of germination have been developed in fungi, bryophytes and several plant species. However, ferns are the only group whose germination has never been modelled. In this work we develop a regression model of the germination of fern spores. We have found that for Blechnum serrulatum, Blechnum yungense, Cheilanthes pilosa, Niphidium macbridei and Polypodium feuillei species the Gompertz growth model describe satisfactorily cumulative germination. An important result is that regression parameters are independent of fern species and the model is not affected by intraspecific variation. Our results show that the Gompertz curve represents a general germination model for all the non-green spore leptosporangiate ferns, including in the paper a discussion about the physiological and ecological meaning of the model.La germinación es uno de los procesos biológicos más relevantes tanto para las plantas con esporas, como para las plantas con semillas y los hongos. Hasta el momento, se han desarrollado modelos de germinación para hongos, briofitos y diversas especies de espermatófitos. Los helechos son el único grupo de plantas cuya germinación nunca ha sido modelizada. En este trabajo se desarrolla un modelo de regresión para explicar la germinación de las esporas de helechos. Observamos que para las especies Blechnum serrulatum, Blechnum yungense, Cheilanthes pilosa, Niphidium macbridei y Polypodium feuillei el modelo de crecimiento de Gompertz describe satisfactoriamente la germinación acumulativa. Un importante resultado es que los parámetros de la regresión son independientes de la especie y que el modelo no está afectado por variación intraespecífica. Por lo tanto, los resultados del trabajo muestran que la curva de Gompertz puede representar un modelo general para todos los helechos leptosporangiados
Collision prediction models using multivariate Poisson-lognormal regression.
El-Basyouny, Karim; Sayed, Tarek
2009-07-01
This paper advocates the use of multivariate Poisson-lognormal (MVPLN) regression to develop models for collision count data. The MVPLN approach presents an opportunity to incorporate the correlations across collision severity levels and their influence on safety analyses. The paper introduces a new multivariate hazardous location identification technique, which generalizes the univariate posterior probability of excess that has been commonly proposed and applied in the literature. In addition, the paper presents an alternative approach for quantifying the effect of the multivariate structure on the precision of expected collision frequency. The MVPLN approach is compared with the independent (separate) univariate Poisson-lognormal (PLN) models with respect to model inference, goodness-of-fit, identification of hot spots and precision of expected collision frequency. The MVPLN is modeled using the WinBUGS platform which facilitates computation of posterior distributions as well as providing a goodness-of-fit measure for model comparisons. The results indicate that the estimates of the extra Poisson variation parameters were considerably smaller under MVPLN leading to higher precision. The improvement in precision is due mainly to the fact that MVPLN accounts for the correlation between the latent variables representing property damage only (PDO) and injuries plus fatalities (I+F). This correlation was estimated at 0.758, which is highly significant, suggesting that higher PDO rates are associated with higher I+F rates, as the collision likelihood for both types is likely to rise due to similar deficiencies in roadway design and/or other unobserved factors. In terms of goodness-of-fit, the MVPLN model provided a superior fit than the independent univariate models. The multivariate hazardous location identification results demonstrated that some hazardous locations could be overlooked if the analysis was restricted to the univariate models.
THE REGRESSION MODEL OF IRAN LIBRARIES ORGANIZATIONAL CLIMATE.
Jahani, Mohammad Ali; Yaminfirooz, Mousa; Siamian, Hasan
2015-10-01
The purpose of this study was to drawing a regression model of organizational climate of central libraries of Iran's universities. This study is an applied research. The statistical population of this study consisted of 96 employees of the central libraries of Iran's public universities selected among the 117 universities affiliated to the Ministry of Health by Stratified Sampling method (510 people). Climate Qual localized questionnaire was used as research tools. For predicting the organizational climate pattern of the libraries is used from the multivariate linear regression and track diagram. of the 9 variables affecting organizational climate, 5 variables of innovation, teamwork, customer service, psychological safety and deep diversity play a major role in prediction of the organizational climate of Iran's libraries. The results also indicate that each of these variables with different coefficient have the power to predict organizational climate but the climate score of psychological safety (0.94) plays a very crucial role in predicting the organizational climate. Track diagram showed that five variables of teamwork, customer service, psychological safety, deep diversity and innovation directly effects on the organizational climate variable that contribution of the team work from this influence is more than any other variables. Of the indicator of the organizational climate of climateQual, the contribution of the team work from this influence is more than any other variables that reinforcement of teamwork in academic libraries can be more effective in improving the organizational climate of this type libraries.
RRegrs: an R package for computer-aided model selection with multiple regression models
Tsiliki, Georgia; Munteanu, Cristian R.; Seoane, Jose A.; Fernandez-Lozano, Carlos; Sarimveis, Haralambos; Willighagen, Egon L.
2015-01-01
Background Predictive regression models can be created with many different modelling approaches. Choices need to be made for data set splitting, cross-validation methods, specific regression parameters and best model criteria, as they all affect the accuracy and efficiency of the produced predictive models, and therefore, raising model reproducibility and comparison issues. Cheminformatics and bioinformatics are extensively using predictive modelling and exhibit a need for standardization of ...
Modeling Information Content Via Dirichlet-Multinomial Regression Analysis.
Ferrari, Alberto
2017-01-01
Shannon entropy is being increasingly used in biomedical research as an index of complexity and information content in sequences of symbols, e.g. languages, amino acid sequences, DNA methylation patterns and animal vocalizations. Yet, distributional properties of information entropy as a random variable have seldom been the object of study, leading to researchers mainly using linear models or simulation-based analytical approach to assess differences in information content, when entropy is measured repeatedly in different experimental conditions. Here a method to perform inference on entropy in such conditions is proposed. Building on results coming from studies in the field of Bayesian entropy estimation, a symmetric Dirichlet-multinomial regression model, able to deal efficiently with the issue of mean entropy estimation, is formulated. Through a simulation study the model is shown to outperform linear modeling in a vast range of scenarios and to have promising statistical properties. As a practical example, the method is applied to a data set coming from a real experiment on animal communication.
Spatial modelling of population concentration using geographically weighted regression method
Directory of Open Access Journals (Sweden)
Bajat Branislav
2011-01-01
Full Text Available This paper presents possibilities of applying the geographically weighted regression method in mapping population change index. During the last decade, this contemporary spatial modeling method has been increasingly used in geographical analyses. On the example of the researched region of Timočka Krajina (defined for the needs of elaborating the Regional Spatial Plan, the possibilities for applying this method in disaggregation of traditional models of population density, which are created using the choropleth maps at the level of statistical spatial units, are shown. The applied method is based on the use of ancillary spatial predictors which are in correlation with a targeted variable, the population change index. For this purpose, spatial databases have been used such as digital terrain model, distances from the network of I and II category state roads, as well as soil sealing databases. Spatial model has been developed in the GIS software environment using commercial GIS applications, as well as open source GIS software. Population change indexes for the period 1961-2002 have been mapped based on population census data, while the data on planned population forecast have been used for the period 2002-2027.
Characteristics and Properties of a Simple Linear Regression Model
Directory of Open Access Journals (Sweden)
Kowal Robert
2016-12-01
Full Text Available A simple linear regression model is one of the pillars of classic econometrics. Despite the passage of time, it continues to raise interest both from the theoretical side as well as from the application side. One of the many fundamental questions in the model concerns determining derivative characteristics and studying the properties existing in their scope, referring to the first of these aspects. The literature of the subject provides several classic solutions in that regard. In the paper, a completely new design is proposed, based on the direct application of variance and its properties, resulting from the non-correlation of certain estimators with the mean, within the scope of which some fundamental dependencies of the model characteristics are obtained in a much more compact manner. The apparatus allows for a simple and uniform demonstration of multiple dependencies and fundamental properties in the model, and it does it in an intuitive manner. The results were obtained in a classic, traditional area, where everything, as it might seem, has already been thoroughly studied and discovered.
An Overview on Regression Models for Discrete Longitudinal Responses
Sutradhar, Brajendra C.
2003-01-01
In the longitudinal regression setup, interest may be focused primarily on the regression parameters for the marginal expectations of the longitudinal responses, the longitudinal correlation parameters being of secondary interest. Second, interest may be focused on both the regression and the longitudinal correlation parameters. Under the first setup, there exists a "working'' correlation matrix based generalized estimating equation (GEE) approach for the estimation of the regression paramete...
Bayesian Regression of Thermodynamic Models of Redox Active Materials
Energy Technology Data Exchange (ETDEWEB)
Johnston, Katherine [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
2017-09-01
Finding a suitable functional redox material is a critical challenge to achieving scalable, economically viable technologies for storing concentrated solar energy in the form of a defected oxide. Demonstrating e ectiveness for thermal storage or solar fuel is largely accomplished by using a thermodynamic model derived from experimental data. The purpose of this project is to test the accuracy of our regression model on representative data sets. Determining the accuracy of the model includes parameter tting the model to the data, comparing the model using di erent numbers of param- eters, and analyzing the entropy and enthalpy calculated from the model. Three data sets were considered in this project: two demonstrating materials for solar fuels by wa- ter splitting and the other of a material for thermal storage. Using Bayesian Inference and Markov Chain Monte Carlo (MCMC), parameter estimation was preformed on the three data sets. Good results were achieved, except some there was some deviations on the edges of the data input ranges. The evidence values were then calculated in a variety of ways and used to compare models with di erent number of parameters. It was believed that at least one of the parameters was unnecessary and comparing evidence values demonstrated that the parameter was need on one data set and not signi cantly helpful on another. The entropy was calculated by taking the derivative in one variable and integrating over another. and its uncertainty was also calculated by evaluating the entropy over multiple MCMC samples. Afterwards, all the parts were written up as a tutorial for the Uncertainty Quanti cation Toolkit (UQTk).
Directory of Open Access Journals (Sweden)
Roberto Romaniello
2015-12-01
Full Text Available The aim of this work is to evaluate the potential of least squares support vector machine (LS-SVM regression to develop an efficient method to measure the colour of food materials in L*a*b* units by means of a computer vision systems (CVS. A laboratory CVS, based on colour digital camera (CDC, was implemented and three LS-SVM models were trained and validated, one for each output variables (L*, a*, and b* required by this problem, using the RGB signals generated by the CDC as input variables to these models. The colour target-based approach was used to camera characterization and a standard reference target of 242 colour samples was acquired using the CVS and a colorimeter. This data set was split in two sets of equal sizes, for training and validating the LS-SVM models. An effective two-stage grid search process on the parameters space was performed in MATLAB to tune the regularization parameters γ and the kernel parameters σ2 of the three LS-SVM models. A 3-8-3 multilayer feed-forward neural network (MFNN, according to the research conducted by León et al. (2006, was also trained in order to compare its performance with those of LS-SVM models. The LS-SVM models developed in this research have been shown better generalization capability then the MFNN, allowed to obtain high correlations between L*a*b* data acquired using the colorimeter and the corresponding data obtained by transformation of the RGB data acquired by the CVS. In particular, for the validation set, R2 values equal to 0.9989, 0.9987, and 0.9994 for L*, a* and b* parameters were obtained. The root mean square error values were 0.6443, 0.3226, and 0.2702 for L*, a*, and b* respectively, and the average of colour differences ΔEab was 0.8232±0.5033 units. Thus, LS-SVM regression seems to be a useful tool to measurement of food colour using a low cost CVS.
Meta-Modeling by Symbolic Regression and Pareto Simulated Annealing
Stinstra, E.; Rennen, G.; Teeuwen, G.J.A.
2006-01-01
The subject of this paper is a new approach to Symbolic Regression.Other publications on Symbolic Regression use Genetic Programming.This paper describes an alternative method based on Pareto Simulated Annealing.Our method is based on linear regression for the estimation of constants.Interval
The Mixed Effects Trend Vector Model
de Rooij, Mark; Schouteden, Martijn
2012-01-01
Maximum likelihood estimation of mixed effect baseline category logit models for multinomial longitudinal data can be prohibitive due to the integral dimension of the random effects distribution. We propose to use multidimensional unfolding methodology to reduce the dimensionality of the problem. As a by-product, readily interpretable graphical…
Linear regression model for investment analysis of an oil company
Directory of Open Access Journals (Sweden)
Edson Vinicius Pontes Bastos
2015-04-01
Full Text Available Changes in global economic environment meant that companies, particularly publicly traded, seek adaptations to global market model, which gives preference to the analysis of stock indicators profitability. In this sense, we carried out a quantitative study, based on data published by Petrobras SA, concerning the balance sheet comprising the period 2009 to 2013. Data analysis was carried out through statistical methods of covariance, correlation and linear regression. Among the findings of the paper, we emphasize that more than prove the good relations between the good historical results, the joint techniques of statistical methods serve as warnings to indicate to managers that something is not going as expected, thus helping the decision to promote a change in internal company policies, specifically in the way of investment allocation.
The R Package threg to Implement Threshold Regression Models
Directory of Open Access Journals (Sweden)
Tao Xiao
2015-08-01
This new package includes four functions: threg, and the methods hr, predict and plot for threg objects returned by threg. The threg function is the model-fitting function which is used to calculate regression coefficient estimates, asymptotic standard errors and p values. The hr method for threg objects is the hazard-ratio calculation function which provides the estimates of hazard ratios at selected time points for specified scenarios (based on given categories or value settings of covariates. The predict method for threg objects is used for prediction. And the plot method for threg objects provides plots for curves of estimated hazard functions, survival functions and probability density functions of the first-hitting-time; function curves corresponding to different scenarios can be overlaid in the same plot for comparison to give additional research insights.
Ng, Kar Yong; Awang, Norhashidah
2018-01-06
Frequent haze occurrences in Malaysia have made the management of PM 10 (particulate matter with aerodynamic less than 10 μm) pollution a critical task. This requires knowledge on factors associating with PM 10 variation and good forecast of PM 10 concentrations. Hence, this paper demonstrates the prediction of 1-day-ahead daily average PM 10 concentrations based on predictor variables including meteorological parameters and gaseous pollutants. Three different models were built. They were multiple linear regression (MLR) model with lagged predictor variables (MLR1), MLR model with lagged predictor variables and PM 10 concentrations (MLR2) and regression with time series error (RTSE) model. The findings revealed that humidity, temperature, wind speed, wind direction, carbon monoxide and ozone were the main factors explaining the PM 10 variation in Peninsular Malaysia. Comparison among the three models showed that MLR2 model was on a same level with RTSE model in terms of forecasting accuracy, while MLR1 model was the worst.
Probability output modeling for support vector machines
Zhang, Xiang; Xiao, Xiaoling; Tian, Jinwen; Liu, Jian
2007-11-01
In this paper we propose an approach to model the posterior probability output of multi-class SVMs. The sigmoid function is used to estimate the posterior probability output in binary classification. This approach modeling the posterior probability output of multi-class SVMs is achieved by directly solving the equations that are based on the combination of the probability outputs of binary classifiers using the Bayes's rule. The differences and different weights among these two-class SVM classifiers, based on the posterior probability, are considered and given for the combination of the probability outputs among these two-class SVM classifiers in this method. The comparative experiment results show that our method achieves the better classification precision and the better probability distribution of the posterior probability than the pairwise couping method and the Hastie's optimization method.
Ijima, Yusuke; Nose, Takashi; Tachibana, Makoto; Kobayashi, Takao
In this paper, we propose a rapid model adaptation technique for emotional speech recognition which enables us to extract paralinguistic information as well as linguistic information contained in speech signals. This technique is based on style estimation and style adaptation using a multiple-regression HMM (MRHMM). In the MRHMM, the mean parameters of the output probability density function are controlled by a low-dimensional parameter vector, called a style vector, which corresponds to a set of the explanatory variables of the multiple regression. The recognition process consists of two stages. In the first stage, the style vector that represents the emotional expression category and the intensity of its expressiveness for the input speech is estimated on a sentence-by-sentence basis. Next, the acoustic models are adapted using the estimated style vector, and then standard HMM-based speech recognition is performed in the second stage. We assess the performance of the proposed technique in the recognition of simulated emotional speech uttered by both professional narrators and non-professional speakers.
Testing exact rational expectations in cointegrated vector autoregressive models
DEFF Research Database (Denmark)
Johansen, Søren; Swensen, Anders Rygh
1999-01-01
This paper considers the testing of restrictions implied by rational expectations hypotheses in a cointegrated vector autoregressive model for I(1) variables. If the rational expectations involve one-step-ahead observations only and the coefficients are known, an explicit parameterization...
Likelihood inference for a fractionally cointegrated vector autoregressive model
DEFF Research Database (Denmark)
Johansen, Søren; Ørregård Nielsen, Morten
2012-01-01
We consider model based inference in a fractionally cointegrated (or cofractional) vector autoregressive model with a restricted constant term, ¿, based on the Gaussian likelihood conditional on initial values. The model nests the I(d) VAR model. We give conditions on the parameters such that the......We consider model based inference in a fractionally cointegrated (or cofractional) vector autoregressive model with a restricted constant term, ¿, based on the Gaussian likelihood conditional on initial values. The model nests the I(d) VAR model. We give conditions on the parameters...... such that the process X_{t} is fractional of order d and cofractional of order d-b; that is, there exist vectors ß for which ß'X_{t} is fractional of order d-b, and no other fractionality order is possible. We define the statistical model by 0... process in the parameters when errors are i.i.d. with suitable moment conditions and initial values are bounded. When the limit is deterministic this implies uniform convergence in probability of the conditional likelihood function. If the true value b0>1/2, we prove that the limit distribution of (ß...
Robust Medical Test Evaluation Using Flexible Bayesian Semiparametric Regression Models
Directory of Open Access Journals (Sweden)
Adam J. Branscum
2013-01-01
Full Text Available The application of Bayesian methods is increasing in modern epidemiology. Although parametric Bayesian analysis has penetrated the population health sciences, flexible nonparametric Bayesian methods have received less attention. A goal in nonparametric Bayesian analysis is to estimate unknown functions (e.g., density or distribution functions rather than scalar parameters (e.g., means or proportions. For instance, ROC curves are obtained from the distribution functions corresponding to continuous biomarker data taken from healthy and diseased populations. Standard parametric approaches to Bayesian analysis involve distributions with a small number of parameters, where the prior specification is relatively straight forward. In the nonparametric Bayesian case, the prior is placed on an infinite dimensional space of all distributions, which requires special methods. A popular approach to nonparametric Bayesian analysis that involves Polya tree prior distributions is described. We provide example code to illustrate how models that contain Polya tree priors can be fit using SAS software. The methods are used to evaluate the covariate-specific accuracy of the biomarker, soluble epidermal growth factor receptor, for discerning lung cancer cases from controls using a flexible ROC regression modeling framework. The application highlights the usefulness of flexible models over a standard parametric method for estimating ROC curves.
Single-Index Additive Vector Autoregressive Time Series Models
LI, YEHUA
2009-09-01
We study a new class of nonlinear autoregressive models for vector time series, where the current vector depends on single-indexes defined on the past lags and the effects of different lags have an additive form. A sufficient condition is provided for stationarity of such models. We also study estimation of the proposed model using P-splines, hypothesis testing, asymptotics, selection of the order of the autoregression and of the smoothing parameters and nonlinear forecasting. We perform simulation experiments to evaluate our model in various settings. We illustrate our methodology on a climate data set and show that our model provides more accurate yearly forecasts of the El Niño phenomenon, the unusual warming of water in the Pacific Ocean. © 2009 Board of the Foundation of the Scandinavian Journal of Statistics.
Vector machine techniques for modeling of seismic liquefaction data
Directory of Open Access Journals (Sweden)
Pijush Samui
2014-06-01
Full Text Available This article employs three soft computing techniques, Support Vector Machine (SVM; Least Square Support Vector Machine (LSSVM and Relevance Vector Machine (RVM, for prediction of liquefaction susceptibility of soil. SVM and LSSVM are based on the structural risk minimization (SRM principle which seeks to minimize an upper bound of the generalization error consisting of the sum of the training error and a confidence interval. RVM is a sparse Bayesian kernel machine. SVM, LSSVM and RVM have been used as classification tools. The developed SVM, LSSVM and RVM give equations for prediction of liquefaction susceptibility of soil. A comparative study has been carried out between the developed SVM, LSSVM and RVM models. The results from this article indicate that the developed SVM gives the best performance for prediction of liquefaction susceptibility of soil.
Some models for epidemics of vector-transmitted diseases
Directory of Open Access Journals (Sweden)
Fred Brauer
2016-10-01
Full Text Available Vector-transmitted diseases such as dengue fever and chikungunya have been spreading rapidly in many parts of the world. The Zika virus has been known since 1947 and invaded South America in 2013. It can be transmitted not only by (mosquito vectors but also directly through sexual contact. Zika has developed into a serious global health problem because, while most cases are asymptomatic or very light, babies born to Zika - infected mothers may develop microcephaly and other very serious birth defects.We formulate and analyze two epidemic models for vector-transmitted diseases, one appropriate for dengue and chikungunya fever outbreaks and one that includes direct transmission appropriate for Zika virus outbreaks. This is especially important because the Zika virus is the first example of a disease that can be spread both indirectly through a vector and directly (through sexual contact. In both cases, we obtain expressions for the basic reproduction number and show how to use the initial exponential growth rate to estimate the basic reproduction number. However, for the model that includes direct transmission some additional data would be needed to identify the fraction of cases transmitted directly. Data for the 2015 Zika virus outbreak in Barranquilla, Colombia has been used to fit parameters to the model developed here and to estimate the basic reproduction number.
Heterogeneous Breast Phantom Development for Microwave Imaging Using Regression Models
Directory of Open Access Journals (Sweden)
Camerin Hahn
2012-01-01
Full Text Available As new algorithms for microwave imaging emerge, it is important to have standard accurate benchmarking tests. Currently, most researchers use homogeneous phantoms for testing new algorithms. These simple structures lack the heterogeneity of the dielectric properties of human tissue and are inadequate for testing these algorithms for medical imaging. To adequately test breast microwave imaging algorithms, the phantom has to resemble different breast tissues physically and in terms of dielectric properties. We propose a systematic approach in designing phantoms that not only have dielectric properties close to breast tissues but also can be easily shaped to realistic physical models. The approach is based on regression model to match phantom's dielectric properties with the breast tissue dielectric properties found in Lazebnik et al. (2007. However, the methodology proposed here can be used to create phantoms for any tissue type as long as ex vivo, in vitro, or in vivo tissue dielectric properties are measured and available. Therefore, using this method, accurate benchmarking phantoms for testing emerging microwave imaging algorithms can be developed.
Heterogeneous Breast Phantom Development for Microwave Imaging Using Regression Models
Hahn, Camerin; Noghanian, Sima
2012-01-01
As new algorithms for microwave imaging emerge, it is important to have standard accurate benchmarking tests. Currently, most researchers use homogeneous phantoms for testing new algorithms. These simple structures lack the heterogeneity of the dielectric properties of human tissue and are inadequate for testing these algorithms for medical imaging. To adequately test breast microwave imaging algorithms, the phantom has to resemble different breast tissues physically and in terms of dielectric properties. We propose a systematic approach in designing phantoms that not only have dielectric properties close to breast tissues but also can be easily shaped to realistic physical models. The approach is based on regression model to match phantom's dielectric properties with the breast tissue dielectric properties found in Lazebnik et al. (2007). However, the methodology proposed here can be used to create phantoms for any tissue type as long as ex vivo, in vitro, or in vivo tissue dielectric properties are measured and available. Therefore, using this method, accurate benchmarking phantoms for testing emerging microwave imaging algorithms can be developed. PMID:22550473
An Ordered Regression Model to Predict Transit Passengers’ Behavioural Intentions
Energy Technology Data Exchange (ETDEWEB)
Oña, J. de; Oña, R. de; Eboli, L.; Forciniti, C.; Mazzulla, G.
2016-07-01
Passengers’ behavioural intentions after experiencing transit services can be viewed as signals that show if a customer continues to utilise a company’s service. Users’ behavioural intentions can depend on a series of aspects that are difficult to measure directly. More recently, transit passengers’ behavioural intentions have been just considered together with the concepts of service quality and customer satisfaction. Due to the characteristics of the ways for evaluating passengers’ behavioural intentions, service quality and customer satisfaction, we retain that this kind of issue could be analysed also by applying ordered regression models. This work aims to propose just an ordered probit model for analysing service quality factors that can influence passengers’ behavioural intentions towards the use of transit services. The case study is the LRT of Seville (Spain), where a survey was conducted in order to collect the opinions of the passengers about the existing transit service, and to have a measure of the aspects that can influence the intentions of the users to continue using the transit service in the future. (Author)
Wei-Bo Chen; Wen-Cheng Liu
2015-01-01
In this study, two artificial neural network models (i.e., a radial basis function neural network, RBFN, and an adaptive neurofuzzy inference system approach, ANFIS) and a multilinear regression (MLR) model were developed to simulate the DO, TP, Chl a, and SD in the Mingder Reservoir of central Taiwan. The input variables of the neural network and the MLR models were determined using linear regression. The performances were evaluated using the RBFN, ANFIS, and MLR models based on statistical ...
The Application of the Cumulative Logistic Regression Model to Automated Essay Scoring
Haberman, Shelby J.; Sinharay, Sandip
2010-01-01
Most automated essay scoring programs use a linear regression model to predict an essay score from several essay features. This article applied a cumulative logit model instead of the linear regression model to automated essay scoring. Comparison of the performances of the linear regression model and the cumulative logit model was performed on a…
Directory of Open Access Journals (Sweden)
Ying-Hsin Chang
2013-01-01
Full Text Available Human estrogen receptor (ER isoforms, ERα and ERβ, have long been an important focus in the field of biology. To better understand the structural features associated with the binding of ERα ligands to ERα and modulate their function, several QSAR models, including CoMFA, CoMSIA, SVR, and LR methods, have been employed to predict the inhibitory activity of 68 raloxifene derivatives. In the SVR and LR modeling, 11 descriptors were selected through feature ranking and sequential feature addition/deletion to generate equations to predict the inhibitory activity toward ERα. Among four descriptors that constantly appear in various generated equations, two agree with CoMFA and CoMSIA steric fields and another two can be correlated to a calculated electrostatic potential of ERα.
Vector-model-supported approach in prostate plan optimization.
Liu, Eva Sau Fan; Wu, Vincent Wing Cheung; Harris, Benjamin; Lehman, Margot; Pryor, David; Chan, Lawrence Wing Chi
2017-01-01
Lengthy time consumed in traditional manual plan optimization can limit the use of step-and-shoot intensity-modulated radiotherapy/volumetric-modulated radiotherapy (S&S IMRT/VMAT). A vector model base, retrieving similar radiotherapy cases, was developed with respect to the structural and physiologic features extracted from the Digital Imaging and Communications in Medicine (DICOM) files. Planning parameters were retrieved from the selected similar reference case and applied to the test case to bypass the gradual adjustment of planning parameters. Therefore, the planning time spent on the traditional trial-and-error manual optimization approach in the beginning of optimization could be reduced. Each S&S IMRT/VMAT prostate reference database comprised 100 previously treated cases. Prostate cases were replanned with both traditional optimization and vector-model-supported optimization based on the oncologists' clinical dose prescriptions. A total of 360 plans, which consisted of 30 cases of S&S IMRT, 30 cases of 1-arc VMAT, and 30 cases of 2-arc VMAT plans including first optimization and final optimization with/without vector-model-supported optimization, were compared using the 2-sided t-test and paired Wilcoxon signed rank test, with a significance level of 0.05 and a false discovery rate of less than 0.05. For S&S IMRT, 1-arc VMAT, and 2-arc VMAT prostate plans, there was a significant reduction in the planning time and iteration with vector-model-supported optimization by almost 50%. When the first optimization plans were compared, 2-arc VMAT prostate plans had better plan quality than 1-arc VMAT plans. The volume receiving 35 Gy in the femoral head for 2-arc VMAT plans was reduced with the vector-model-supported optimization compared with the traditional manual optimization approach. Otherwise, the quality of plans from both approaches was comparable. Vector-model-supported optimization was shown to offer much shortened planning time and iteration number
Color Image Segmentation Using Fuzzy C-Regression Model
Directory of Open Access Journals (Sweden)
Min Chen
2017-01-01
Full Text Available Image segmentation is one important process in image analysis and computer vision and is a valuable tool that can be applied in fields of image processing, health care, remote sensing, and traffic image detection. Given the lack of prior knowledge of the ground truth, unsupervised learning techniques like clustering have been largely adopted. Fuzzy clustering has been widely studied and successfully applied in image segmentation. In situations such as limited spatial resolution, poor contrast, overlapping intensities, and noise and intensity inhomogeneities, fuzzy clustering can retain much more information than the hard clustering technique. Most fuzzy clustering algorithms have originated from fuzzy c-means (FCM and have been successfully applied in image segmentation. However, the cluster prototype of the FCM method is hyperspherical or hyperellipsoidal. FCM may not provide the accurate partition in situations where data consists of arbitrary shapes. Therefore, a Fuzzy C-Regression Model (FCRM using spatial information has been proposed whose prototype is hyperplaned and can be either linear or nonlinear allowing for better cluster partitioning. Thus, this paper implements FCRM and applies the algorithm to color segmentation using Berkeley’s segmentation database. The results show that FCRM obtains more accurate results compared to other fuzzy clustering algorithms.
Villain, Jonathan; Lozano, Sylvain; Halm-Lemeille, Marie-Pierre; Durrieu, Gilles; Bureau, Ronan
2014-12-01
The potential of quantile regression (QR) and quantile support vector machine regression (QSVMR) was analyzed for the definitions of quantitative structure-activity relationship (QSAR) models associated with a diverse set of chemicals toward a particular endpoint. This study focused on a specific sensitive endpoint (acute toxicity to algae) for which even a narcosis QSAR model is not actually clear. An initial dataset including more than 401 ecotoxicological data for one species of algae (Selenastrum capricornutum) was defined. This set corresponds to a large sample of chemicals ranging from classical organic chemicals to pesticides. From this original data set, the selection of the different subsets was made in terms of the notion of toxic ratio (TR), a parameter based on the ratio between predicted and experimental values. The robustness of QR and QSVMR to outliers was clearly observed, thus demonstrating that this approach represents a major interest for QSAR associated with a diverse set of chemicals. We focused particularly on descriptors related to molecular surface properties.
The microcomputer scientific software series 2: general linear model--regression.
Harold M. Rauscher
1983-01-01
The general linear model regression (GLMR) program provides the microcomputer user with a sophisticated regression analysis capability. The output provides a regression ANOVA table, estimators of the regression model coefficients, their confidence intervals, confidence intervals around the predicted Y-values, residuals for plotting, a check for multicollinearity, a...
Model selection with multiple regression on distance matrices leads to incorrect inferences.
Directory of Open Access Journals (Sweden)
Ryan P Franckowiak
Full Text Available In landscape genetics, model selection procedures based on Information Theoretic and Bayesian principles have been used with multiple regression on distance matrices (MRM to test the relationship between multiple vectors of pairwise genetic, geographic, and environmental distance. Using Monte Carlo simulations, we examined the ability of model selection criteria based on Akaike's information criterion (AIC, its small-sample correction (AICc, and the Bayesian information criterion (BIC to reliably rank candidate models when applied with MRM while varying the sample size. The results showed a serious problem: all three criteria exhibit a systematic bias toward selecting unnecessarily complex models containing spurious random variables and erroneously suggest a high level of support for the incorrectly ranked best model. These problems effectively increased with increasing sample size. The failure of AIC, AICc, and BIC was likely driven by the inflated sample size and different sum-of-squares partitioned by MRM, and the resulting effect on delta values. Based on these findings, we strongly discourage the continued application of AIC, AICc, and BIC for model selection with MRM.
Optimal hedging with the cointegrated vector autoregressive model
DEFF Research Database (Denmark)
Gatarek, Lukasz; Johansen, Søren
We derive the optimal hedging ratios for a portfolio of assets driven by a Coin- tegrated Vector Autoregressive model (CVAR) with general cointegration rank. Our hedge is optimal in the sense of minimum variance portfolio. We consider a model that allows for the hedges to be cointegrated with the......We derive the optimal hedging ratios for a portfolio of assets driven by a Coin- tegrated Vector Autoregressive model (CVAR) with general cointegration rank. Our hedge is optimal in the sense of minimum variance portfolio. We consider a model that allows for the hedges to be cointegrated...... with the hedged asset and among themselves. We nd that the minimum variance hedge for assets driven by the CVAR, depends strongly on the portfolio holding period. The hedge is dened as a function of correlation and cointegration parameters. For short holding periods the correlation impact is predominant. For long...... horizons, the hedge ratio should overweight the cointegration parameters rather then short-run correlation information. In the innite horizon, the hedge ratios shall be equal to the cointegrating vector. The hedge ratios for any intermediate portfolio holding period should be based on the weighted average...
Evaluation of weighted regression and sample size in developing a taper model for loblolly pine
Kenneth L. Cormier; Robin M. Reich; Raymond L. Czaplewski; William A. Bechtold
1992-01-01
A stem profile model, fit using pseudo-likelihood weighted regression, was used to estimate merchantable volume of loblolly pine (Pinus taeda L.) in the southeast. The weighted regression increased model fit marginally, but did not substantially increase model performance. In all cases, the unweighted regression models performed as well as the...
A Unified Approach to Power Calculation and Sample Size Determination for Random Regression Models
Shieh, Gwowen
2007-01-01
The underlying statistical models for multiple regression analysis are typically attributed to two types of modeling: fixed and random. The procedures for calculating power and sample size under the fixed regression models are well known. However, the literature on random regression models is limited and has been confined to the case of all…
MODELING SNAKE MICROHABITAT FROM RADIOTELEMETRY STUDIES USING POLYTOMOUS LOGISTIC REGRESSION
Multivariate analysis of snake microhabitat has historically used techniques that were derived under assumptions of normality and common covariance structure (e.g., discriminant function analysis, MANOVA). In this study, polytomous logistic regression (PLR which does not require ...
Song, Chao; Kwan, Mei-Po; Zhu, Jiping
2017-04-08
An increasing number of fires are occurring with the rapid development of cities, resulting in increased risk for human beings and the environment. This study compares geographically weighted regression-based models, including geographically weighted regression (GWR) and geographically and temporally weighted regression (GTWR), which integrates spatial and temporal effects and global linear regression models (LM) for modeling fire risk at the city scale. The results show that the road density and the spatial distribution of enterprises have the strongest influences on fire risk, which implies that we should focus on areas where roads and enterprises are densely clustered. In addition, locations with a large number of enterprises have fewer fire ignition records, probably because of strict management and prevention measures. A changing number of significant variables across space indicate that heterogeneity mainly exists in the northern and eastern rural and suburban areas of Hefei city, where human-related facilities or road construction are only clustered in the city sub-centers. GTWR can capture small changes in the spatiotemporal heterogeneity of the variables while GWR and LM cannot. An approach that integrates space and time enables us to better understand the dynamic changes in fire risk. Thus governments can use the results to manage fire safety at the city scale.
A generalized additive regression model for survival times
DEFF Research Database (Denmark)
Scheike, Thomas H.
2001-01-01
Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models......Additive Aalen model; counting process; disability model; illness-death model; generalized additive models; multiple time-scales; non-parametric estimation; survival data; varying-coefficient models...
Ichii, Kazuhito; Ueyama, Masahito; Kondo, Masayuki; Saigusa, Nobuko; Kim, Joon; Alberto, Ma. Carmelita; Ardö, Jonas; Euskirchen, Eugénie S.; Kang, Minseok; Hirano, Takashi; Joiner, Joanna; Kobayashi, Hideki; Marchesini, Luca Belelli; Merbold, Lutz; Miyata, Akira; Saitoh, Taku M.; Takagi, Kentaro; Varlagin, Andrej; Bret-Harte, M. Syndonia; Kitamura, Kenzo; Kosugi, Yoshiko; Kotani, Ayumi; Kumar, Kireet; Li, Sheng-Gong; Machimura, Takashi; Matsuura, Yojiro; Mizoguchi, Yasuko; Ohta, Takeshi; Mukherjee, Sandipan; Yanagi, Yuji; Yasuda, Yukio; Zhang, Yiping; Zhao, Fenghua
2017-04-01
The lack of a standardized database of eddy covariance observations has been an obstacle for data-driven estimation of terrestrial CO2 fluxes in Asia. In this study, we developed such a standardized database using 54 sites from various databases by applying consistent postprocessing for data-driven estimation of gross primary productivity (GPP) and net ecosystem CO2 exchange (NEE). Data-driven estimation was conducted by using a machine learning algorithm: support vector regression (SVR), with remote sensing data for 2000 to 2015 period. Site-level evaluation of the estimated CO2 fluxes shows that although performance varies in different vegetation and climate classifications, GPP and NEE at 8 days are reproduced (e.g., r2 = 0.73 and 0.42 for 8 day GPP and NEE). Evaluation of spatially estimated GPP with Global Ozone Monitoring Experiment 2 sensor-based Sun-induced chlorophyll fluorescence shows that monthly GPP variations at subcontinental scale were reproduced by SVR (r2 = 1.00, 0.94, 0.91, and 0.89 for Siberia, East Asia, South Asia, and Southeast Asia, respectively). Evaluation of spatially estimated NEE with net atmosphere-land CO2 fluxes of Greenhouse Gases Observing Satellite (GOSAT) Level 4A product shows that monthly variations of these data were consistent in Siberia and East Asia; meanwhile, inconsistency was found in South Asia and Southeast Asia. Furthermore, differences in the land CO2 fluxes from SVR-NEE and GOSAT Level 4A were partially explained by accounting for the differences in the definition of land CO2 fluxes. These data-driven estimates can provide a new opportunity to assess CO2 fluxes in Asia and evaluate and constrain terrestrial ecosystem models.
Faraway, Julian J
2005-01-01
Linear models are central to the practice of statistics and form the foundation of a vast range of statistical methodologies. Julian J. Faraway''s critically acclaimed Linear Models with R examined regression and analysis of variance, demonstrated the different methods available, and showed in which situations each one applies. Following in those footsteps, Extending the Linear Model with R surveys the techniques that grow from the regression model, presenting three extensions to that framework: generalized linear models (GLMs), mixed effect models, and nonparametric regression models. The author''s treatment is thoroughly modern and covers topics that include GLM diagnostics, generalized linear mixed models, trees, and even the use of neural networks in statistics. To demonstrate the interplay of theory and practice, throughout the book the author weaves the use of the R software environment to analyze the data of real examples, providing all of the R commands necessary to reproduce the analyses. All of the ...
Bonfatti, V; Tiezzi, F; Miglior, F; Carnier, P
2017-09-01
The objective of this study was to compare the prediction accuracy of 92 infrared prediction equations obtained by different statistical approaches. The predicted traits included fatty acid composition (n = 1,040); detailed protein composition (n = 1,137); lactoferrin (n = 558); pH and coagulation properties (n = 1,296); curd yield and composition obtained by a micro-cheese making procedure (n = 1,177); and Ca, P, Mg, and K contents (n = 689). The statistical methods used to develop the prediction equations were partial least squares regression (PLSR), Bayesian ridge regression, Bayes A, Bayes B, Bayes C, and Bayesian least absolute shrinkage and selection operator. Model performances were assessed, for each trait and model, in training and validation sets over 10 replicates. In validation sets, Bayesian regression models performed significantly better than PLSR for the prediction of 33 out of 92 traits, especially fatty acids, whereas they yielded a significantly lower prediction accuracy than PLSR in the prediction of 8 traits: the percentage of C18:1n-7 trans-9 in fat; the content of unglycosylated κ-casein and its percentage in protein; the content of α-lactalbumin; the percentage of αS2-casein in protein; and the contents of Ca, P, and Mg. Even though Bayesian methods produced a significant enhancement of model accuracy in many traits compared with PLSR, most variations in the coefficient of determination in validation sets were smaller than 1 percentage point. Over traits, the highest predictive ability was obtained by Bayes C even though most of the significant differences in accuracy between Bayesian regression models were negligible. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Preclinical models to assess the immunogenicity of AAV vectors.
Ertl, Hildegund C J
2017-11-23
Although gene transfer using adeno-associated virus (AAV) vectors has made tremendous progress in recent years, challenges remain due to vector-specific adaptive immune responses. Specifically, AAV-neutralizing antibodies reduce AAV-transduction rates, while CD8+ T cells directed to AAV capsid antigens cause rejection of AAV-transduced cells. This has been addressed clinically by excluding humans with pre-existing AAV-neutralizing antibodies from gene transfer trials or by using immunosuppression or reduced doses of vectors expressing improved transgene products to blunt or circumvent destructive T cell responses. Although these approaches have met with success for treatment of some diseases, most notably hemophilia B, they may not be suitable for others. Pre-clinical models are thus needed to test alternative options to sidestep pre-existing AAV-neutralizing antibodies, to prevent their induction following gene transfer and to block the detrimental effects of CD8+ T cells directed to AAV capsid antigens. This chapter describes some of the available, although not yet perfect, models that can assess immune responses to AAV gene transfer. Copyright © 2017 Elsevier Inc. All rights reserved.
Bouwmeester, Walter; Twisk, Jos W R; Kappen, Teus H; van Klei, Wilton A; Moons, Karel G M; Vergouwe, Yvonne
2013-02-15
When study data are clustered, standard regression analysis is considered inappropriate and analytical techniques for clustered data need to be used. For prediction research in which the interest of predictor effects is on the patient level, random effect regression models are probably preferred over standard regression analysis. It is well known that the random effect parameter estimates and the standard logistic regression parameter estimates are different. Here, we compared random effect and standard logistic regression models for their ability to provide accurate predictions. Using an empirical study on 1642 surgical patients at risk of postoperative nausea and vomiting, who were treated by one of 19 anesthesiologists (clusters), we developed prognostic models either with standard or random intercept logistic regression. External validity of these models was assessed in new patients from other anesthesiologists. We supported our results with simulation studies using intra-class correlation coefficients (ICC) of 5%, 15%, or 30%. Standard performance measures and measures adapted for the clustered data structure were estimated. The model developed with random effect analysis showed better discrimination than the standard approach, if the cluster effects were used for risk prediction (standard c-index of 0.69 versus 0.66). In the external validation set, both models showed similar discrimination (standard c-index 0.68 versus 0.67). The simulation study confirmed these results. For datasets with a high ICC (≥15%), model calibration was only adequate in external subjects, if the used performance measure assumed the same data structure as the model development method: standard calibration measures showed good calibration for the standard developed model, calibration measures adapting the clustered data structure showed good calibration for the prediction model with random intercept. The models with random intercept discriminate better than the standard model only
Implications of vector boson scattering unitarity in composite Higgs models
Buarque Franzosi, Diogo; Ferrarese, Piero
2017-09-01
The strong nature of composite Higgs models manifests at high energies through the growing behavior of the scattering amplitudes of longitudinally polarized weak bosons that leads to the formation of composite resonances as well as nonresonant strong effects. In this work the unitarity of these scattering amplitudes is used as a tool to assess the profile of the composite spectrum of the theory, including nonresonant enhancements, vector resonances and the C P -even scalar excitation. These three signatures are then studied in realistic scattering processes at hadron colliders, aiming to estimate the potential to exclude dynamically motivated scenarios of composite Higgs models.
Bilinear regression model with Kronecker and linear structures for ...
African Journals Online (AJOL)
On the basis of n independent observations from a matrix normal distribution, estimating equations in a flip-flop relation are established and the consistency of estimators is studied. Keywords: Bilinear regression; Estimating equations; Flip- flop algorithm; Kronecker product structure; Linear structured covariance matrix; ...
Developing synergy regression models with space-borne ALOS ...
Indian Academy of Sciences (India)
Optical remote sensing data have been widely used to derive forestbiophysical parameters inspite of their poor sensitivity towards the forest properties. Microwave remotesensing provides a better alternative owing to its inherent ability to penetrate the forest vegetation.This study aims at developing optimal regression ...
Parametric vs. Nonparametric Regression Modelling within Clinical Decision Support
Czech Academy of Sciences Publication Activity Database
Kalina, Jan; Zvárová, Jana
2017-01-01
Roč. 5, č. 1 (2017), s. 21-27 ISSN 1805-8698 R&D Projects: GA ČR GA17-01251S Institutional support: RVO:67985807 Keywords : decision support systems * decision rules * statistical analysis * nonparametric regression Subject RIV: IN - Informatics, Computer Science OBOR OECD: Statistics and probability
231 Using Multiple Regression Analysis in Modelling the Role of ...
African Journals Online (AJOL)
User
concern especially its role in the economy of Cross River State. This paper seeks to evaluate the ... development, tourism development and local economy development using multiple regression analysis. The result shows ... potential tourist attraction which parade modern facilities such as, digital satellite, television system ...
Using support vector machine models for crash injury severity analysis.
Li, Zhibin; Liu, Pan; Wang, Wei; Xu, Chengcheng
2012-03-01
The study presented in this paper investigated the possibility of using support vector machine (SVM) models for crash injury severity analysis. Based on crash data collected at 326 freeway diverge areas, a SVM model was developed for predicting the injury severity associated with individual crashes. An ordered probit (OP) model was also developed using the same dataset. The research team compared the performance of the SVM model and the OP model. It was found that the SVM model produced better prediction performance for crash injury severity than did the OP model. The percent of correct prediction for the SVM model was found to be 48.8%, which was higher than that produced by the OP model (44.0%). Even though the SVM model may suffer from the multi-class classification problem, it still provides better prediction results for small proportion injury severities than the OP model does. The research also investigated the potential of using the SVM model for evaluating the impacts of external factors on crash injury severities. The sensitivity analysis results show that the SVM model produced comparable results regarding the impacts of variables on crash injury severity as compared to the OP model. For several variables such as the length of the exit ramp and the shoulder width of the freeway mainline, the results of the SVM model are more reasonable than those of the OP model. Copyright © 2011 Elsevier Ltd. All rights reserved.
McCann, Robert S; Messina, Joseph P; MacFarlane, David W; Bayoh, M Nabie; Vulule, John M; Gimnig, John E; Walker, Edward D
2014-06-06
Predictive models of malaria vector larval habitat locations may provide a basis for understanding the spatial determinants of malaria transmission. We used four landscape variables (topographic wetness index [TWI], soil type, land use-land cover, and distance to stream) and accumulated precipitation to model larval habitat locations in a region of western Kenya through two methods: logistic regression and random forest. Additionally, we used two separate data sets to account for variation in habitat locations across space and over time. Larval habitats were more likely to be present in locations with a lower slope to contributing area ratio (i.e. TWI), closer to streams, with agricultural land use relative to nonagricultural land use, and in friable clay/sandy clay loam soil and firm, silty clay/clay soil relative to friable clay soil. The probability of larval habitat presence increased with increasing accumulated precipitation. The random forest models were more accurate than the logistic regression models, especially when accumulated precipitation was included to account for seasonal differences in precipitation. The most accurate models for the two data sets had area under the curve (AUC) values of 0.864 and 0.871, respectively. TWI, distance to the nearest stream, and precipitation had the greatest mean decrease in Gini impurity criteria in these models. This study demonstrates the usefulness of random forest models for larval malaria vector habitat modeling. TWI and distance to the nearest stream were the two most important landscape variables in these models. Including accumulated precipitation in our models improved the accuracy of larval habitat location predictions by accounting for seasonal variation in the precipitation. Finally, the sampling strategy employed here for model parameterization could serve as a framework for creating predictive larval habitat models to assist in larval control efforts.
Directory of Open Access Journals (Sweden)
Soyoung Park
2017-07-01
Full Text Available This study mapped and analyzed groundwater potential using two different models, logistic regression (LR and multivariate adaptive regression splines (MARS, and compared the results. A spatial database was constructed for groundwater well data and groundwater influence factors. Groundwater well data with a high potential yield of ≥70 m3/d were extracted, and 859 locations (70% were used for model training, whereas the other 365 locations (30% were used for model validation. We analyzed 16 groundwater influence factors including altitude, slope degree, slope aspect, plan curvature, profile curvature, topographic wetness index, stream power index, sediment transport index, distance from drainage, drainage density, lithology, distance from fault, fault density, distance from lineament, lineament density, and land cover. Groundwater potential maps (GPMs were constructed using LR and MARS models and tested using a receiver operating characteristics curve. Based on this analysis, the area under the curve (AUC for the success rate curve of GPMs created using the MARS and LR models was 0.867 and 0.838, and the AUC for the prediction rate curve was 0.836 and 0.801, respectively. This implies that the MARS model is useful and effective for groundwater potential analysis in the study area.
Model Checking Vector Addition Systems with one zero-test
Bonet, Rémi; Leroux, Jérôme; Zeitoun, Marc
2012-01-01
We design a variation of the Karp-Miller algorithm to compute, in a forward manner, a finite representation of the cover (i.e., the downward closure of the reachability set) of a vector addition system with one zero-test. This algorithm yields decision procedures for several problems for these systems, open until now, such as place-boundedness or LTL model-checking. The proof techniques to handle the zero-test are based on two new notions of cover: the refined and the filtered cover. The refined cover is a hybrid between the reachability set and the classical cover. It inherits properties of the reachability set: equality of two refined covers is undecidable, even for usual Vector Addition Systems (with no zero-test), but the refined cover of a Vector Addition System is a recursive set. The second notion of cover, called the filtered cover, is the central tool of our algorithms. It inherits properties of the classical cover, and in particular, one can effectively compute a finite representation of this set, e...
A review of land-use regression models to assess spatial variation of outdoor air pollution
National Research Council Canada - National Science Library
Hoek, Gerard; Beelen, Rob; de Hoogh, Kees; Vienneau, Danielle; Gulliver, John; Fischer, Paul; Briggs, David
2008-01-01
.... Current approaches for assessing intra-urban air pollution contrasts include the use of exposure indicator variables, interpolation methods, dispersion models and land-use regression (LUR) models...
Semiparametric nonlinear quantile regression model for financial returns
Czech Academy of Sciences Publication Activity Database
Avdulaj, Krenar; Baruník, Jozef
2017-01-01
Roč. 21, č. 1 (2017), s. 81-97 ISSN 1081-1826 R&D Projects: GA ČR(CZ) GBP402/12/G097 Institutional support: RVO:67985556 Keywords : copula quantile regression * realized volatility * value-at-risk Subject RIV: AH - Economics Impact factor: 0.649, year: 2016 http:// library .utia.cas.cz/separaty/2017/E/avdulaj-0472346.pdf
Energy Technology Data Exchange (ETDEWEB)
Cheng, Yanjie; Tang, Youmin; Jackson, Peter [University of Northern British Columbia, Environmental Science and Engineering, Prince George, BC (Canada); Zhou, Xiaobing [University of Northern British Columbia, Environmental Science and Engineering, Prince George, BC (Canada); Centre for Australian Weather and Climate Research (CAWCR), Bureau of Meteorology, Melbourne, VIC (Australia); Chen, Dake [Lamont-Doherty Earth Observatory of Columbia University, Palisades, NY (United States); State Key Laboratory of Satellite Ocean Environment Dynamics, Hangzhou (China)
2010-10-15
In this study, singular vector analysis was performed for the period from 1856 to 2003 using the latest Zebiak-Cane model version LDEO5. The singular vector, representing the optimal growth pattern of initial perturbations/errors, was obtained by perturbing the constructed tangent linear model of the Zebiak-Cane model. Variations in the singular vector and singular value, as a function of initial time, season, ENSO states, and optimal period, were investigated. Emphasis was placed on exploring relative roles of linear and nonlinear processes in the optimal perturbation growth of ENSO, and deriving statistically robust conclusions using long-term singular vector analysis. It was found that the first singular vector is dominated by a west-east dipole spanning most of the equatorial Pacific, with one center located in the east and the other in the central Pacific. Singular vectors are less sensitive to initial conditions, i.e., independence of seasons and decades; while singular values exhibit a strong sensitivity to initial conditions. The dynamical diagnosis shows that the total linear and nonlinear heating terms play opposite roles in controlling the optimal perturbation growth, and that the linear optimal perturbation is more than twice as large as the nonlinear one. The total linear heating causes a warming effect and controls two positive perturbation growth regions: one in the central Pacific and the other in the eastern Pacific; whereas the total linearized nonlinear advection brings a cooling effect controlling the negative perturbation growth in the central Pacific. (orig.)
Story, Roger E.
1996-01-01
Discussion of the use of Latent Semantic Indexing to determine relevancy in information retrieval focuses on statistical regression and Bayesian methods. Topics include keyword searching; a multiple regression model; how the regression model can aid search methods; and limitations of this approach, including complexity, linearity, and…
Bonellie, Sandra R
2012-10-01
To illustrate the use of regression and logistic regression models to investigate changes over time in size of babies particularly in relation to social deprivation, age of the mother and smoking. Mean birthweight has been found to be increasing in many countries in recent years, but there are still a group of babies who are born with low birthweights. Population-based retrospective cohort study. Multiple linear regression and logistic regression models are used to analyse data on term 'singleton births' from Scottish hospitals between 1994-2003. Mothers who smoke are shown to give birth to lighter babies on average, a difference of approximately 0.57 Standard deviations lower (95% confidence interval. 0.55-0.58) when adjusted for sex and parity. These mothers are also more likely to have babies that are low birthweight (odds ratio 3.46, 95% confidence interval 3.30-3.63) compared with non-smokers. Low birthweight is 30% more likely where the mother lives in the most deprived areas compared with the least deprived, (odds ratio 1.30, 95% confidence interval 1.21-1.40). Smoking during pregnancy is shown to have a detrimental effect on the size of infants at birth. This effect explains some, though not all, of the observed socioeconomic birthweight. It also explains much of the observed birthweight differences by the age of the mother. Identifying mothers at greater risk of having a low birthweight baby as important implications for the care and advice this group receives. © 2012 Blackwell Publishing Ltd.
SO(10) models for flavor with vector-like fermions
Saad, Shaikh
2017-11-01
In this work, unified models based on S O(10) symmetry is presented which provides insights into the flavor observables of charged fermions and the neutrinos. Unlike the conventional S O(10) models, the Higgs boson 10H belonging to the fundamental representation is not present in this new class of models. Instead vector-like fermions in the 16 + 16 ¯ representation is introduced to induce the flavor mixing. A variety of scenarios, both non-supersymmetric and supersymmetric, are studied involving a 126 ¯H Higgs boson. For symmetry breaking purpose, 126 ¯H Higgs is accompanied by either a 45H or a 210H of Higgs boson. Our analysis shows that this framework, by utilizing either type-I or type-II seesaw mechanism, an excellent fit to the fermion masses and mixings can be obtained with a limited number of parameters. To test and distinguish these flavor models, proton decay branching ratios are also computed.
Higgs boson phenomenology in a simple model with vector resonances
Energy Technology Data Exchange (ETDEWEB)
Castillo-Felisola, Oscar; Corral, Cristobal; Gonzalez, Marcela; Moreno, Gaston; Neill, Nicolas A.; Rojas, Felipe; Zamora, Jilberto; Zerwekh, Alfonso R. [Universidad Tecnica Federico Santa Maria, Departamento de Fisica, Valparaiso (Chile); Universidad Tecnica Federico Santa Maria, Centro Cientifico-Tecnologico de Valparaiso, Valparaiso (Chile)
2013-12-15
In this paper we consider a simple scenario where the Higgs boson and two vector resonances are supposed to arise from a new strong interacting sector. We use the ATLAS measurements of the dijet spectrum to set limits on the masses of the resonances. Additionally we compute the Higgs boson decay to two photons and found, when compare to the Standard Model prediction, a small excess which is compatible with ATLAS measurements. Finally we make prediction for Higgs-strahlung processes for the LHC running at 14 TeV. (orig.)
Temporal aggregation in first order cointegrated vector autoregressive models
DEFF Research Database (Denmark)
La Cour, Lisbeth Funding; Milhøj, Anders
We study aggregation - or sample frequencies - of time series, e.g. aggregation from weekly to monthly or quarterly time series. Aggregation usually gives shorter time series but spurious phenomena, in e.g. daily observations, can on the other hand be avoided. An important issue is the effect of ...... of aggregation on the adjustment coefficient in cointegrated systems. We study only first order vector autoregressive processes for n dimensional time series Xt, and we illustrate the theory by a two dimensional and a four dimensional model for prices of various grades of gasoline...
Testing and inference in nonlinear cointegrating vector error correction models
DEFF Research Database (Denmark)
Kristensen, D.; Rahbek, A.
2013-01-01
We analyze estimators and tests for a general class of vector error correction models that allows for asymmetric and nonlinear error correction. For a given number of cointegration relationships, general hypothesis testing is considered, where testing for linearity is of particular interest. Under...... the null of linearity, parameters of nonlinear components vanish, leading to a nonstandard testing problem. We apply so-called sup-tests to resolve this issue, which requires development of new(uniform) functional central limit theory and results for convergence of stochastic integrals. We provide a full...
Beta Regression Finite Mixture Models of Polarization and Priming
Smithson, Michael; Merkle, Edgar C.; Verkuilen, Jay
2011-01-01
This paper describes the application of finite-mixture general linear models based on the beta distribution to modeling response styles, polarization, anchoring, and priming effects in probability judgments. These models, in turn, enhance our capacity for explicitly testing models and theories regarding the aforementioned phenomena. The mixture…
Lamont, A.E.; Vermunt, J.K.; Van Horn, M.L.
2016-01-01
Regression mixture models are increasingly used as an exploratory approach to identify heterogeneity in the effects of a predictor on an outcome. In this simulation study, we tested the effects of violating an implicit assumption often made in these models; that is, independent variables in the
Structure of Vector Mesons in Holographic Model with Linear Confinement
Energy Technology Data Exchange (ETDEWEB)
Anatoly Radyushkin; Hovhannes Grigoryan
2007-11-01
We investigate wave functions and form factors of vector mesons in the holographic dual model of QCD with oscillator-like infrared cutoff. We introduce wave functions conjugate to solutions of the 5D equation of motion and develop a formalism based on these wave functions, which are very similar to those of a quantum-mechanical oscillator. For the lowest bound state (rho-meson), we show that all its elastic form factors can be built from the basic form factor which, in this model, exhibits a perfect vector meson dominance, i.e., is given by the rho-pole contribution alone. We calculate the electric radius of the rho-meson and find the value _C = 0.655 fm, which is larger than in the case of the hard-wall cutoff. We calculate the coupling constant f_rho and find that the experimental value is in the middle between the values given by the oscillator and hard-wall models.
ALMA, Özlem GÜRÜNLÜ; Kurt, Serdar; Aybars UĞUR
2010-01-01
Multiple linear regression models are widely used applied statistical techniques and they are most useful devices for extracting and understanding the essential features of datasets. However, in multiple linear regression models problems arise when a serious outlier observation or multicollinearity present in the data. In regression however, the situation is somewhat more complex in the sense that some outlying points will have more influence on the regression than others. An important proble...
The research of PM2.5 concentrations model based on regression calculation model
Li, Junmin; Wang, Luping
2017-01-01
In this paper, we mainly use the urban air quality monitoring data as the study data, the relationship between PM2.5 concentrations and several major air pollutant concentrations, meteorological elements in the same period are analyzed respectively. Through the analysis, we find that there is a significant correlation characteristic between the concentrations of PM2.5 and the concentrations of PM10, SO2, NO2, O3, CO, temperature and humidity. So, we take these factors as the variables; make a multiple linear regression analysis about PM2.5 concentrations, set up the urban PM2.5 concentrations regression calculation model. Through the estimation results of the model. In comparisons with the observed results, two results are basically identical, which shows that the regression model has a good fitting effect and use value.
Vector cylindrical harmonics for low-dimensional convection models
Kelley, Douglas H; Knox, Catherine A
2016-01-01
Approximate empirical models of thermal convection can allow us to identify the essential properties of the flow in simplified form, and to produce empirical estimates using only a few parameters. Such "low-dimensional" empirical models can be constructed systematically by writing numerical or experimental measurements as superpositions of a set of appropriate basis modes, a process known as Galerkin projection. For Boussinesq convection in a cylinder, those basis modes should be defined in cylindrical coordinates, vector-valued, divergence-free, and mutually orthogonal. Here we construct two such basis sets, one using Bessel functions in the radial direction, and one using Chebyshev polynomials. We demonstrate that each set has those desired characteristics and demonstrate the advantages and drawbacks of each set. We show their use for representing sample simulation data and point out their potential for low-dimensional convection models.
A generalized exponential time series regression model for electricity prices
DEFF Research Database (Denmark)
Haldrup, Niels; Knapik, Oskar; Proietti, Tomasso
We consider the issue of modeling and forecasting daily electricity spot prices on the Nord Pool Elspot power market. We propose a method that can handle seasonal and non-seasonal persistence by modelling the price series as a generalized exponential process. As the presence of spikes can distort...... on the estimated model, the best linear predictor is constructed. Our modeling approach provides good fit within sample and outperforms competing benchmark predictors in terms of forecasting accuracy. We also find that building separate models for each hour of the day and averaging the forecasts is a better...... strategy than forecasting the daily average directly....
Forecast Model of Urban Stagnant Water Based on Logistic Regression
Directory of Open Access Journals (Sweden)
Liu Pan
2017-01-01
Full Text Available With the development of information technology, the construction of water resource system has been gradually carried out. In the background of big data, the work of water information needs to carry out the process of quantitative to qualitative change. Analyzing the correlation of data and exploring the deep value of data which are the key of water information’s research. On the basis of the research on the water big data and the traditional data warehouse architecture, we try to find out the connection of different data source. According to the temporal and spatial correlation of stagnant water and rainfall, we use spatial interpolation to integrate data of stagnant water and rainfall which are from different data source and different sensors, then use logistic regression to find out the relationship between them.
Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.
Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko
2016-03-01
In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. Copyright © 2015 Elsevier Ltd. All rights reserved.
Environmental statistical modelling of mosquito vectors at different geographical scales
Cianci, D.
2015-01-01
Vector-borne diseases are infections transmitted by the bite of infected arthropod vectors, such as mosquitoes, ticks, fleas, midges and flies. Vector-borne diseases pose an increasingly wider threat to global public health, both in terms of people affected and their geographical spread. Mosquitoes
Approximate Tests of Hypotheses in Regression Models with Grouped Data
1979-02-01
in terms of Kolmogoroff -Smirnov statistic in the next section. I 1 1 I t A 4. Simulations Two models have been considered for simulations. Model I. Yuk...Fort Meade, MD 20755 2 Commanding Officer Navy LibraryrnhOffice o Naval Research National Space Technology LaboratoryBranch Office *Attn: Navy
Misspecified poisson regression models for large-scale registry data
DEFF Research Database (Denmark)
Grøn, Randi; Gerds, Thomas A.; Andersen, Per K.
2016-01-01
working models that are then likely misspecified. To support and improve conclusions drawn from such models, we discuss methods for sensitivity analysis, for estimation of average exposure effects using aggregated data, and a semi-parametric bootstrap method to obtain robust standard errors. The methods...
Additive Intensity Regression Models in Corporate Default Analysis
DEFF Research Database (Denmark)
Lando, David; Medhat, Mamdouh; Nielsen, Mads Stenbo
2013-01-01
We consider additive intensity (Aalen) models as an alternative to the multiplicative intensity (Cox) models for analyzing the default risk of a sample of rated, nonfinancial U.S. firms. The setting allows for estimating and testing the significance of time-varying effects. We use a variety of mo...
Covariance Functions and Random Regression Models in the ...
African Journals Online (AJOL)
ARC-IRENE
many, highly correlated measures (Meyer, 1998a). Several approaches have been proposed to deal with such data, from simplest repeatability models (SRM) to complex multivariate models (MTM). The SRM considers different measurements at different stages (ages) as a realization of the same genetic trait with constant.
Directory of Open Access Journals (Sweden)
Simone Becker Lopes
2014-04-01
Full Text Available Considering the importance of spatial issues in transport planning, the main objective of this study was to analyze the results obtained from different approaches of spatial regression models. In the case of spatial autocorrelation, spatial dependence patterns should be incorporated in the models, since that dependence may affect the predictive power of these models. The results obtained with the spatial regression models were also compared with the results of a multiple linear regression model that is typically used in trips generation estimations. The findings support the hypothesis that the inclusion of spatial effects in regression models is important, since the best results were obtained with alternative models (spatial regression models or the ones with spatial variables included. This was observed in a case study carried out in the city of Porto Alegre, in the state of Rio Grande do Sul, Brazil, in the stages of specification and calibration of the models, with two distinct datasets.
Logistic Regression Modeling of Diminishing Manufacturing Sources for Integrated Circuits
National Research Council Canada - National Science Library
Gravier, Michael
1999-01-01
.... This thesis draws on available data from the electronics integrated circuit industry to attempt to assess whether statistical modeling offers a viable method for predicting the presence of DMSMS...
A note on the maximum likelihood estimator in the gamma regression model
Directory of Open Access Journals (Sweden)
Jerzy P. Rydlewski
2009-01-01
Full Text Available This paper considers a nonlinear regression model, in which the dependent variable has the gamma distribution. A model is considered in which the shape parameter of the random variable is the sum of continuous and algebraically independent functions. The paper proves that there is exactly one maximum likelihood estimator for the gamma regression model.
Genetic parameters for various random regression models to describe the weight data of pigs
Huisman, A.E.; Veerkamp, R.F.; Arendonk, van J.A.M.
2002-01-01
Various random regression models have been advocated for the fitting of covariance structures. It was suggested that a spline model would fit better to weight data than a random regression model that utilizes orthogonal polynomials. The objective of this study was to investigate which kind of random
Genetic parameters for different random regression models to describe weight data of pigs
Huisman, A.E.; Veerkamp, R.F.; Arendonk, van J.A.M.
2001-01-01
Various random regression models have been advocated for the fitting of covariance structures. It was suggested that a spline model would fit better to weight data than a random regression model that utilizes orthogonal polynomials. The objective of this study was to investigate which kind of random
Cepeda-Cuervo, Edilberto; Núñez-Antón, Vicente
2013-01-01
In this article, a proposed Bayesian extension of the generalized beta spatial regression models is applied to the analysis of the quality of education in Colombia. We briefly revise the beta distribution and describe the joint modeling approach for the mean and dispersion parameters in the spatial regression models' setting. Finally, we motivate…
A computational approach to compare regression modelling strategies in prediction research
Pajouheshnia, R.; Pestman, W.R.; Teerenstra, S.; Groenwold, R.H.
2016-01-01
BACKGROUND: It is often unclear which approach to fit, assess and adjust a model will yield the most accurate prediction model. We present an extension of an approach for comparing modelling strategies in linear regression to the setting of logistic regression and demonstrate its application in
U.S. Environmental Protection Agency — Spreadsheets are included here to support the manuscript "Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition". This...
Flexible hazard regression modeling for medical cost data.
Jain, Arvind K; Strawderman, Robert L
2002-03-01
The modeling of lifetime (i.e. cumulative) medical cost data in the presence of censored follow-up is complicated by induced informative censoring, rendering standard survival analysis tools invalid. With few exceptions, recently proposed nonparametric estimators for such data do not extend easily to handle covariate information. We propose to model the hazard function for lifetime cost endpoints using an adaptation of the HARE methodology (Kooperberg, Stone, and Truong, Journal of the American Statistical Association, 1995, 90, 78-94). Linear splines and their tensor products are used to adaptively build a model that incorporates covariates and covariate-by-cost interactions without restrictive parametric assumptions. The informative censoring problem is handled using inverse probability of censoring weighted estimating equations. The proposed method is illustrated using simulation and also with data on the cost of dialysis for patients with end-stage renal disease.
DEFF Research Database (Denmark)
Græsbøll, Kaare
The main outcome of this PhD project is a generic model for non-contagious infectious vector-borne disease spread by one vector species between up to two species of hosts distributed on farms and pasture. The model features a within-herd model of disease, combined with a triple movement kernel...... that describes spread of disease using vectors or hosts as agents of the spread. The model is run with bluetongue as the primary case study, and it is demonstrated how an epidemic outbreak of bluetongue 8 in Denmark is sensitive to the use of pasture, climate, vaccination, vector abundance, and flying parameters....... In constructing a more process oriented agent-based approach to spread modeling new parameters describing vector behavior were introduced. When these vector flying parameters have been quantified by experiments, this model can be implemented on areas naïve to the modeled disease with a high predictive power...
Application of Random-Effects Probit Regression Models.
Gibbons, Robert D.; Hedeker, Donald
1994-01-01
Develops random-effects probit model for case in which outcome of interest is series of correlated binary responses, obtained as product of longitudinal response process where individual is repeatedly classified on binary outcome variable or in multilevel or clustered problems in which individuals within groups are considered to share…
Multiple Linear Regression Model for Estimating the Price of a ...
African Journals Online (AJOL)
In the modeling, the Ordinary Least Squares (OLS) normality assumption which could introduce errors in the statistical analyses was dealt with by log transformation of the data, ensuring the data is normally distributed and there is no correlation between them. Minimisation of Sum of Squares Error method was used to ...
Graphical diagnostics to check model misspecification for the proportional odds regression model.
Liu, Ivy; Mukherjee, Bhramar; Suesse, Thomas; Sparrow, David; Park, Sung Kyun
2009-02-01
The cumulative logit or the proportional odds regression model is commonly used to study covariate effects on ordinal responses. This paper provides some graphical and numerical methods for checking the adequacy of the proportional odds regression model. The methods focus on evaluating functional misspecification for specific covariate effects, but misspecification of the link function can also be dealt with under the same framework. For the logistic regression model with binary responses, Arbogast and Lin (Statist. Med. 2005; 24:229-247) developed similar graphical and numerical methods for assessing the adequacy of the model using the cumulative sums of residuals. The paper generalizes their methods to ordinal responses and illustrates them using an example from the VA Normative Aging Study. Simulation studies comparing the performance of the different diagnostic methods indicate that some of the graphical methods are more powerful in detecting model misspecification than the Hosmer-Lemeshow-type goodness-of-fit statistics for the class of models studied. Copyright (c) 2008 John Wiley & Sons, Ltd.
DEFF Research Database (Denmark)
Azarang, Leyla; Scheike, Thomas; de Uña-Álvarez, Jacobo
2017-01-01
In this work, we present direct regression analysis for the transition probabilities in the possibly non-Markov progressive illness–death model. The method is based on binomial regression, where the response is the indicator of the occupancy for the given state along time. Randomly weighted score...... equations that are able to remove the bias due to censoring are introduced. By solving these equations, one can estimate the possibly time-varying regression coefficients, which have an immediate interpretation as covariate effects on the transition probabilities. The performance of the proposed estimator...... is investigated through simulations. We apply the method to data from the Registry of Systematic Lupus Erythematosus RELESSER, a multicenter registry created by the Spanish Society of Rheumatology. Specifically, we investigate the effect of age at Lupus diagnosis, sex, and ethnicity on the probability of damage...
Testing and Inference in Nonlinear Cointegrating Vector Error Correction Models
DEFF Research Database (Denmark)
Kristensen, Dennis; Rahbek, Anders
In this paper, we consider a general class of vector error correction models which allow for asymmetric and non-linear error correction. We provide asymptotic results for (quasi-)maximum likelihood (QML) based estimators and tests. General hypothesis testing is considered, where testing...... for linearity is of particular interest as parameters of non-linear components vanish under the null. To solve the latter type of testing, we use the so-called sup tests, which here requires development of new (uniform) weak convergence results. These results are potentially useful in general for analysis...... symmetric non-linear error correction are considered. A simulation study shows that the finite sample properties of the bootstrapped tests are satisfactory with good size and power properties for reasonable sample sizes....
Testing and Inference in Nonlinear Cointegrating Vector Error Correction Models
DEFF Research Database (Denmark)
Kristensen, Dennis; Rahbæk, Anders
In this paper, we consider a general class of vector error correction models which allow for asymmetric and non-linear error correction. We provide asymptotic results for (quasi-)maximum likelihood (QML) based estimators and tests. General hypothesis testing is considered, where testing...... for linearity is of particular interest as parameters of non-linear components vanish under the null. To solve the latter type of testing, we use the so-called sup tests, which here requires development of new (uniform) weak convergence results. These results are potentially useful in general for analysis...... symmetric non-linear error correction considered. A simulation study shows that the fi…nite sample properties of the bootstrapped tests are satisfactory with good size and power properties for reasonable sample sizes....
Analysis of a vector-bias model on malaria transmission.
Chamchod, Farida; Britton, Nicholas F
2011-03-01
We incorporate a vector-bias term into a malaria-transmission model to account for the greater attractiveness of infectious humans to mosquitoes in terms of differing probabilities that a mosquito arriving at a human at random picks that human depending on whether he is infectious or susceptible. We prove that transcritical bifurcation occurs at the basic reproductive ratio equalling 1 by projecting the flow onto the extended centre manifold. We next study the dynamics of the system when incubation time of malaria parasites in mosquitoes is included, and find that the longer incubation time reduces the prevalence of malaria. Also, we incorporate a random movement of mosquitoes as a diffusion term and a chemically directed movement of mosquitoes to humans expressed in terms of sweat and body odour as a chemotaxis term to study the propagation of infected population to uninfected population. We find that a travelling wave occurs; its speed is calculated numerically and estimated for the lower bound analytically.
Some Identification Problems in the Cointegrated Vector Autoregressive Model
DEFF Research Database (Denmark)
Johansen, Søren
2010-01-01
The paper analyses some identification problems in the cointegrated vector autoregressive model. A criteria for identification by linear restrictions on individual relations is given. The asymptotic distribution of the estimators of a and ß is derived when they are identified by linear restrictions...... on ß , and when they are identified by linear restrictions on a . It it shown that, in the latter case, a component of is asymptotically Gaussian. Finally we discuss identification of shocks by introducing the contemporaneous and permanent effect of a shock and the distinction between permanent...... and transitory shocks, which allows one to identify permanent shocks from the long-run variance and transitory shocks from the short-run variance....
Regression model for tuning the PID controller with fractional order time delay system
Directory of Open Access Journals (Sweden)
S.P. Agnihotri
2014-12-01
Full Text Available In this paper a regression model based for tuning proportional integral derivative (PID controller with fractional order time delay system is proposed. The novelty of this paper is that tuning parameters of the fractional order time delay system are optimally predicted using the regression model. In the proposed method, the output parameters of the fractional order system are used to derive the regression function. Here, the regression model depends on the weights of the exponential function. By using the iterative algorithm, the best weight of the regression model is evaluated. Using the regression technique, fractional order time delay systems are tuned and the stability parameters of the system are maintained. The effectiveness and feasibility of the proposed technique is demonstrated through the MATLAB/Simulink platform, as well as testing and comparison using the classical PID controller, Ziegler–Nichols tuning method, Wang tuning method and curve fitting technique base tuning method.
Shaofu Zhuyu Decoction Regresses Endometriotic Lesions in a Rat Model
Directory of Open Access Journals (Sweden)
Guanghui Zhu
2018-01-01
Full Text Available The current therapies for endometriosis are restricted by various side effects and treatment outcome has been less than satisfactory. Shaofu Zhuyu Decoction (SZD, a classic traditional Chinese medicinal (TCM prescription for dysmenorrhea, has been widely used in clinical practice by TCM doctors to relieve symptoms of endometriosis. The present study aimed to investigate the effects of SZD on a rat model of endometriosis. Forty-eight female Sprague-Dawley rats with regular estrous cycles went through autotransplantation operation to establish endometriosis model. Then 38 rats with successful ectopic implants were randomized into two groups: vehicle- and SZD-treated groups. The latter were administered SZD through oral gavage for 4 weeks. By the end of the treatment period, the volume of the endometriotic lesions was measured, the histopathological properties of the ectopic endometrium were evaluated, and levels of proliferating cell nuclear antigen (PCNA, CD34, and hypoxia inducible factor- (HIF- 1α in the ectopic endometrium were detected with immunohistochemistry. Furthermore, apoptosis was assessed using the terminal deoxynucleotidyl transferase (TdT deoxyuridine 5′-triphosphate (dUTP nick-end labeling (TUNEL assay. In this study, SZD significantly reduced the size of ectopic lesions in rats with endometriosis, inhibited cell proliferation, increased cell apoptosis, and reduced microvessel density and HIF-1α expression. It suggested that SZD could be an effective therapy for the treatment and prevention of endometriosis recurrence.
Elbourne, A.; de Haan, J.
2009-01-01
Using the vector autoregressive methodology, we present estimates of monetary transmission for five new EU member countries in Central and Eastern Europe with more or less flexible exchange rates. We select sample periods to estimate over the longest possible period that can be considered as a
On weak exogeneity of the student's t and elliptical linear regression models
Jiro Hodoshima
2004-01-01
This paper studies weak exogeneity of conditioning variables for the inference of a subset of parameters of the conditional student's t and elliptical linear regression models considered by Spanos (1994). Weak exogeneity of the conditioning variables is shown to hold for the inference of regression parameters of the conditional student's t and elliptical linear regression models. A new definition of weak exogeneity is given which utilizes block-diagonality of the conditional information matri...
Top-down induction of model trees with regression and splitting nodes.
Malerba, Donato; Esposito, Floriana; Ceci, Michelangelo; Appice, Annalisa
2004-05-01
Model trees are an extension of regression trees that associate leaves with multiple regression models. In this paper, a method for the data-driven construction of model trees is presented, namely, the Stepwise Model Tree Induction (SMOTI) method. Its main characteristic is the induction of trees with two types of nodes: regression nodes, which perform only straight-line regression, and splitting nodes, which partition the feature space. The multiple linear model associated with each leaf is then built stepwise by combining straight-line regressions reported along the path from the root to the leaf. In this way, internal regression nodes contribute to the definition of multiple models and have a "global" effect, while straight-line regressions at leaves have only "local" effects. Experimental results on artificially generated data sets show that SMOTI outperforms two model tree induction systems, M5' and RETIS, in accuracy. Results on benchmark data sets used for studies on both regression and model trees show that SMOTI performs better than RETIS in accuracy, while it is not possible to draw statistically significant conclusions on the comparison with M5'. Model trees induced by SMOTI are generally simple and easily interpretable and their analysis often reveals interesting patterns.
Bias and Uncertainty in Regression-Calibrated Models of Groundwater Flow in Heterogeneous Media
DEFF Research Database (Denmark)
Cooley, R.L.; Christensen, Steen
2006-01-01
Groundwater models need to account for detailed but generally unknown spatial variability (heterogeneity) of the hydrogeologic model inputs. To address this problem we replace the large, m-dimensional stochastic vector β that reflects both small and large scales of heterogeneity in the inputs...
Using the classical linear regression model in analysis of the dependences of conveyor belt life
Directory of Open Access Journals (Sweden)
Miriam Andrejiová
2013-12-01
Full Text Available The paper deals with the classical linear regression model of the dependence of conveyor belt life on some selected parameters: thickness of paint layer, width and length of the belt, conveyor speed and quantity of transported material. The first part of the article is about regression model design, point and interval estimation of parameters, verification of statistical significance of the model, and about the parameters of the proposed regression model. The second part of the article deals with identification of influential and extreme values that can have an impact on estimation of regression model parameters. The third part focuses on assumptions of the classical regression model, i.e. on verification of independence assumptions, normality and homoscedasticity of residuals.
Directory of Open Access Journals (Sweden)
Wei-Bo Chen
2015-01-01
Full Text Available In this study, two artificial neural network models (i.e., a radial basis function neural network, RBFN, and an adaptive neurofuzzy inference system approach, ANFIS and a multilinear regression (MLR model were developed to simulate the DO, TP, Chl a, and SD in the Mingder Reservoir of central Taiwan. The input variables of the neural network and the MLR models were determined using linear regression. The performances were evaluated using the RBFN, ANFIS, and MLR models based on statistical errors, including the mean absolute error, the root mean square error, and the correlation coefficient, computed from the measured and the model-simulated DO, TP, Chl a, and SD values. The results indicate that the performance of the ANFIS model is superior to those of the MLR and RBFN models. The study results show that the neural network using the ANFIS model is suitable for simulating the water quality variables with reasonable accuracy, suggesting that the ANFIS model can be used as a valuable tool for reservoir management in Taiwan.
Artificial neural network and regression models for flow velocity at sediment incipient deposition
Safari, Mir-Jafar-Sadegh; Aksoy, Hafzullah; Mohammadi, Mirali
2016-10-01
A set of experiments for the determination of flow characteristics at sediment incipient deposition has been carried out in a trapezoidal cross-section channel. Using experimental data, a regression model is developed for computing velocity of flow in a trapezoidal cross-section channel at the incipient deposition condition and is presented together with already available regression models of rectangular, circular, and U-shape channels. A generalized regression model is also provided by combining the available data of any cross-section. For comparison of the models, a powerful tool, the artificial neural network (ANN) is used for modelling incipient deposition of sediment in rigid boundary channels. Three different ANN techniques, namely, the feed-forward back propagation (FFBP), generalized regression (GR), and radial basis function (RBF), are applied using six input variables; flow discharge, flow depth, channel bed slope, hydraulic radius, relative specific mass of sediment and median size of sediment particles; all taken from laboratory experiments. Hydrodynamic forces acting on sediment particles in the flow are considered in the regression models indirectly for deriving particle Froude number and relative particle size, both being dimensionless. The accuracy of the models is studied by the root mean square error (RMSE), the mean absolute percentage error (MAPE), the discrepancy ratio (Dr) and the concordance coefficient (CC). Evaluation of the models finds ANN models superior and some regression models with an acceptable performance. Therefore, it is concluded that appropriately constructed ANN and regression models can be developed and used for the rigid boundary channel design.
Zhang, Ying; Bi, Peng; Hiller, Janet
2008-01-01
This is the first study to identify appropriate regression models for the association between climate variation and salmonellosis transmission. A comparison between different regression models was conducted using surveillance data in Adelaide, South Australia. By using notified salmonellosis cases and climatic variables from the Adelaide metropolitan area over the period 1990-2003, four regression methods were examined: standard Poisson regression, autoregressive adjusted Poisson regression, multiple linear regression, and a seasonal autoregressive integrated moving average (SARIMA) model. Notified salmonellosis cases in 2004 were used to test the forecasting ability of the four models. Parameter estimation, goodness-of-fit and forecasting ability of the four regression models were compared. Temperatures occurring 2 weeks prior to cases were positively associated with cases of salmonellosis. Rainfall was also inversely related to the number of cases. The comparison of the goodness-of-fit and forecasting ability suggest that the SARIMA model is better than the other three regression models. Temperature and rainfall may be used as climatic predictors of salmonellosis cases in regions with climatic characteristics similar to those of Adelaide. The SARIMA model could, thus, be adopted to quantify the relationship between climate variations and salmonellosis transmission.
Using regression models in design-based estimation of spatial means of soil properties
Brus, D.J.
2000-01-01
The precision of design-based sampling strategies can be increased by using regression models at the estimation stage. A general regression estimator is given that can be used for a wide variety of models and any well-defined sampling design. It equals the estimator plus an adjustment term that
Validation of regression models for nitrate concentrations in the upper groundwater in sandy soils
Sonneveld, M.P.W.; Brus, D.J.; Roelsma, J.
2010-01-01
For Dutch sandy regions, linear regression models have been developed that predict nitrate concentrations in the upper groundwater on the basis of residual nitrate contents in the soil in autumn. The objective of our study was to validate these regression models for one particular sandy region
Zhang, Ying; Bi, Peng; Hiller, Janet
2008-01-01
This is the first study to identify appropriate regression models for the association between climate variation and salmonellosis transmission. A comparison between different regression models was conducted using surveillance data in Adelaide, South Australia. By using notified salmonellosis cases and climatic variables from the Adelaide metropolitan area over the period 1990-2003, four regression methods were examined: standard Poisson regression, autoregressive adjusted Poisson regression, multiple linear regression, and a seasonal autoregressive integrated moving average (SARIMA) model. Notified salmonellosis cases in 2004 were used to test the forecasting ability of the four models. Parameter estimation, goodness-of-fit and forecasting ability of the four regression models were compared. Temperatures occurring 2 weeks prior to cases were positively associated with cases of salmonellosis. Rainfall was also inversely related to the number of cases. The comparison of the goodness-of-fit and forecasting ability suggest that the SARIMA model is better than the other three regression models. Temperature and rainfall may be used as climatic predictors of salmonellosis cases in regions with climatic characteristics similar to those of Adelaide. The SARIMA model could, thus, be adopted to quantify the relationship between climate variations and salmonellosis transmission.
Technology diffusion in hospitals : A log odds random effects regression model
Blank, J.L.T.; Valdmanis, V.G.
2013-01-01
This study identifies the factors that affect the diffusion of hospital innovations. We apply a log odds random effects regression model on hospital micro data. We introduce the concept of clustering innovations and the application of a log odds random effects regression model to describe the
Technology diffusion in hospitals: A log odds random effects regression model
J.L.T. Blank (Jos); V.G. Valdmanis (Vivian G.)
2015-01-01
textabstractThis study identifies the factors that affect the diffusion of hospital innovations. We apply a log odds random effects regression model on hospital micro data. We introduce the concept of clustering innovations and the application of a log odds random effects regression model to
Modelling Nitrogen Oxides in Los Angeles Using a Hybrid Dispersion/Land Use Regression Model
Wilton, Darren C.
The goal of this dissertation is to develop models capable of predicting long term annual average NOx concentrations in urban areas. Predictions from simple meteorological dispersion models and seasonal proxies for NO2 oxidation were included as covariates in a land use regression (LUR) model for NOx in Los Angeles, CA. The NO x measurements were obtained from a comprehensive measurement campaign that is part of the Multi-Ethnic Study of Atherosclerosis Air Pollution Study (MESA Air). Simple land use regression models were initially developed using a suite of GIS-derived land use variables developed from various buffer sizes (R²=0.15). Caline3, a simple steady-state Gaussian line source model, was initially incorporated into the land-use regression framework. The addition of this spatio-temporally varying Caline3 covariate improved the simple LUR model predictions. The extent of improvement was much more pronounced for models based solely on the summer measurements (simple LUR: R²=0.45; Caline3/LUR: R²=0.70), than it was for models based on all seasons (R²=0.20). We then used a Lagrangian dispersion model to convert static land use covariates for population density, commercial/industrial area into spatially and temporally varying covariates. The inclusion of these covariates resulted in significant improvement in model prediction (R²=0.57). In addition to the dispersion model covariates described above, a two-week average value of daily peak-hour ozone was included as a surrogate of the oxidation of NO2 during the different sampling periods. This additional covariate further improved overall model performance for all models. The best model by 10-fold cross validation (R²=0.73) contained the Caline3 prediction, a static covariate for length of A3 roads within 50 meters, the Calpuff-adjusted covariates derived from both population density and industrial/commercial land area, and the ozone covariate. This model was tested against annual average NOx
Ramilo, David W; Nunes, Telmo; Madeira, Sara; Boinas, Fernando; da Fonseca, Isabel Pereira
2017-01-01
Vector-borne diseases are not only accounted responsible for their burden on human health-care systems, but also known to cause economic constraints to livestock and animal production. Animals are affected directly by the transmitted pathogens and indirectly when animal movement is restricted. Distribution of such diseases depends on climatic and social factors, namely, environmental changes, globalization, trade and unplanned urbanization. Culicoides biting midges are responsible for the transmission of several pathogenic agents with relevant economic impact. Due to a fragmentary knowledge of their ecology, occurrence is difficult to predict consequently, limiting the control of these arthropod vectors. In order to understand the distribution of Culicoides species, in mainland Portugal, data collected during the National Entomologic Surveillance Program for Bluetongue disease (2005-2013), were used for statistical evaluation. Logistic regression analysis was preformed and prediction maps (per season) were obtained for vector and potentially vector species. The variables used at the present study were selected from WorldClim (two climatic variables) and CORINE databases (twenty-two land cover variables). This work points to an opposite distribution of C. imicola and species from the Obsoletus group within mainland Portugal. Such findings are evidenced in autumn, with the former appearing in Central and Southern regions. Although appearing northwards, on summer and autumn, C. newsteadi reveals a similar distribution to C. imicola. The species C. punctatus appears in all Portuguese territory throughout the year. Contrary, C. pulicaris is poorly caught in all areas of mainland Portugal, being paradoxical present near coastal areas and higher altitude regions.
Analysis of dental caries using generalized linear and count regression models
Directory of Open Access Journals (Sweden)
Javali M. Phil
2013-11-01
Full Text Available Generalized linear models (GLM are generalization of linear regression models, which allow fitting regression models to response data in all the sciences especially medical and dental sciences that follow a general exponential family. These are flexible and widely used class of such models that can accommodate response variables. Count data are frequently characterized by overdispersion and excess zeros. Zero-inflated count models provide a parsimonious yet powerful way to model this type of situation. Such models assume that the data are a mixture of two separate data generation processes: one generates only zeros, and the other is either a Poisson or a negative binomial data-generating process. Zero inflated count regression models such as the zero-inflated Poisson (ZIP, zero-inflated negative binomial (ZINB regression models have been used to handle dental caries count data with many zeros. We present an evaluation framework to the suitability of applying the GLM, Poisson, NB, ZIP and ZINB to dental caries data set where the count data may exhibit evidence of many zeros and over-dispersion. Estimation of the model parameters using the method of maximum likelihood is provided. Based on the Vuong test statistic and the goodness of fit measure for dental caries data, the NB and ZINB regression models perform better than other count regression models.
Directory of Open Access Journals (Sweden)
Soldić-Aleksić Jasna
2009-01-01
Full Text Available Market segmentation presents one of the key concepts of the modern marketing. The main goal of market segmentation is focused on creating groups (segments of customers that have similar characteristics, needs, wishes and/or similar behavior regarding the purchase of concrete product/service. Companies can create specific marketing plan for each of these segments and therefore gain short or long term competitive advantage on the market. Depending on the concrete marketing goal, different segmentation schemes and techniques may be applied. This paper presents a predictive market segmentation model based on the application of logistic regression model and CHAID analysis. The logistic regression model was used for the purpose of variables selection (from the initial pool of eleven variables which are statistically significant for explaining the dependent variable. Selected variables were afterwards included in the CHAID procedure that generated the predictive market segmentation model. The model results are presented on the concrete empirical example in the following form: summary model results, CHAID tree, Gain chart, Index chart, risk and classification tables.
DEFF Research Database (Denmark)
Tan, Qihua; Bathum, L; Christiansen, L
2003-01-01
In this paper, we apply logistic regression models to measure genetic association with human survival for highly polymorphic and pleiotropic genes. By modelling genotype frequency as a function of age, we introduce a logistic regression model with polytomous responses to handle the polymorphic...... situation. Genotype and allele-based parameterization can be used to investigate the modes of gene action and to reduce the number of parameters, so that the power is increased while the amount of multiple testing minimized. A binomial logistic regression model with fractional polynomials is used to capture...
Ecological footprint model using the support vector machine technique.
Ma, Haibo; Chang, Wenjuan; Cui, Guangbai
2012-01-01
The per capita ecological footprint (EF) is one of the most widely recognized measures of environmental sustainability. It aims to quantify the Earth's biological resources required to support human activity. In this paper, we summarize relevant previous literature, and present five factors that influence per capita EF. These factors are: National gross domestic product (GDP), urbanization (independent of economic development), distribution of income (measured by the Gini coefficient), export dependence (measured by the percentage of exports to total GDP), and service intensity (measured by the percentage of service to total GDP). A new ecological footprint model based on a support vector machine (SVM), which is a machine-learning method based on the structural risk minimization principle from statistical learning theory was conducted to calculate the per capita EF of 24 nations using data from 123 nations. The calculation accuracy was measured by average absolute error and average relative error. They were 0.004883 and 0.351078% respectively. Our results demonstrate that the EF model based on SVM has good calculation performance.
Koon, Sharon; Petscher, Yaacov
2015-01-01
The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…
Huang, Mengmeng; Wang, Qiao; Chen, Xinyu; Zhang, Yu
2017-04-15
This study investigated the effect of flavanols and their derivatives on acrylamide formation under low-moisture conditions via prediction using the support vector regression (SVR) approach. Acrylamide was generated in a potato-based equimolar asparagine-reducing sugar model system through oven heating. Both positive and negative effects were observed when the flavonoid treatment ranged 1-10,000μmol/L. Flavanols and derivatives (100μmol/L) suppress the acrylamide formation within a range of 59.9-78.2%, while their maximal promotion effects ranged from 2.15-fold to 2.84-fold for the control at a concentration of 10,000μmol/L. The correlations between inhibition rates and changes in Trolox-equivalent antioxidant capacity (ΔTEAC) (RTEAC-DPPH=0.878, RTEAC-ABTS=0.882, RTEAC-FRAP=0.871) were better than promotion rates (RTEAC-DPPH=0.815, RTEAC-ABTS=0.749, RTEAC-FRAP=0.841). Using ΔTEAC as variables, an optimized SVR model could robustly serve as a new predictive tool for estimating the effect (R: 0.783-0.880), the fitting performance of which was slightly better than that of multiple linear regression model (R: 0.754-0.880). Copyright © 2016 Elsevier Ltd. All rights reserved.
Amalia, Junita; Purhadi, Otok, Bambang Widjanarko
2017-11-01
Poisson distribution is a discrete distribution with count data as the random variables and it has one parameter defines both mean and variance. Poisson regression assumes mean and variance should be same (equidispersion). Nonetheless, some case of the count data unsatisfied this assumption because variance exceeds mean (over-dispersion). The ignorance of over-dispersion causes underestimates in standard error. Furthermore, it causes incorrect decision in the statistical test. Previously, paired count data has a correlation and it has bivariate Poisson distribution. If there is over-dispersion, modeling paired count data is not sufficient with simple bivariate Poisson regression. Bivariate Poisson Inverse Gaussian Regression (BPIGR) model is mix Poisson regression for modeling paired count data within over-dispersion. BPIGR model produces a global model for all locations. In another hand, each location has different geographic conditions, social, cultural and economic so that Geographically Weighted Regression (GWR) is needed. The weighting function of each location in GWR generates a different local model. Geographically Weighted Bivariate Poisson Inverse Gaussian Regression (GWBPIGR) model is used to solve over-dispersion and to generate local models. Parameter estimation of GWBPIGR model obtained by Maximum Likelihood Estimation (MLE) method. Meanwhile, hypothesis testing of GWBPIGR model acquired by Maximum Likelihood Ratio Test (MLRT) method.
Bayesian Information Sharing Between Noise And Regression Models Improves Prediction of Weak Effects
Gillberg, Jussi; Marttinen, Pekka; Pirinen, Matti; Kangas, Antti J.; Soininen, Pasi; Järvelin, Marjo-Riitta; Ala-Korpela, Mika; Kaski, Samuel
2013-01-01
We consider the prediction of weak effects in a multiple-output regression setup, when covariates are expected to explain a small amount, less than $\\approx 1%$, of the variance of the target variables. To facilitate the prediction of the weak effects, we constrain our model structure by introducing a novel Bayesian approach of sharing information between the regression model and the noise model. Further reduction of the effective number of parameters is achieved by introducing an infinite sh...
Can We Use Regression Modeling to Quantify Mean Annual Streamflow at a Global-Scale?
Barbarossa, V.; Huijbregts, M. A. J.; Hendriks, J. A.; Beusen, A.; Clavreul, J.; King, H.; Schipper, A.
2016-12-01
Quantifying mean annual flow of rivers (MAF) at ungauged sites is essential for a number of applications, including assessments of global water supply, ecosystem integrity and water footprints. MAF can be quantified with spatially explicit process-based models, which might be overly time-consuming and data-intensive for this purpose, or with empirical regression models that predict MAF based on climate and catchment characteristics. Yet, regression models have mostly been developed at a regional scale and the extent to which they can be extrapolated to other regions is not known. In this study, we developed a global-scale regression model for MAF using observations of discharge and catchment characteristics from 1,885 catchments worldwide, ranging from 2 to 106 km2 in size. In addition, we compared the performance of the regression model with the predictive ability of the spatially explicit global hydrological model PCR-GLOBWB [van Beek et al., 2011] by comparing results from both models to independent measurements. We obtained a regression model explaining 89% of the variance in MAF based on catchment area, mean annual precipitation and air temperature, average slope and elevation. The regression model performed better than PCR-GLOBWB for the prediction of MAF, as root-mean-square error values were lower (0.29 - 0.38 compared to 0.49 - 0.57) and the modified index of agreement was higher (0.80 - 0.83 compared to 0.72 - 0.75). Our regression model can be applied globally at any point of the river network, provided that the input parameters are within the range of values employed in the calibration of the model. The performance is reduced for water scarce regions and further research should focus on improving such an aspect for regression-based global hydrological models.
ANALYSIS OF THE FINANCIAL PERFORMANCES OF THE FIRM, BY USING THE MULTIPLE REGRESSION MODEL
Directory of Open Access Journals (Sweden)
Constantin Anghelache
2011-11-01
Full Text Available The information achieved through the use of simple linear regression are not always enough to characterize the evolution of an economic phenomenon and, furthermore, to identify its possible future evolution. To remedy these drawbacks, the special literature includes multiple regression models, in which the evolution of the dependant variable is defined depending on two or more factorial variables.
Thomas, Michael S. C.; Knowland, Victoria C. P.; Karmiloff-Smith, Annette
2011-01-01
Loss of previously established behaviors in early childhood constitutes a markedly atypical developmental trajectory. It is found almost uniquely in autism and its cause is currently unknown (Baird et al., 2008). We present an artificial neural network model of developmental regression, exploring the hypothesis that regression is caused by…
DEFF Research Database (Denmark)
Strathe, Anders B; Mark, Thomas; Nielsen, Bjarne
Random regression models were used to estimate covariance functions between cumulated feed intake (CFI) and body weight (BW) in 8424 Danish Duroc pigs. Random regressions on second order Legendre polynomials of age were used to describe genetic and permanent environmental curves in BW and CFI. Ba...
On the effects of non-robustness in the spurious regression model ...
African Journals Online (AJOL)
... exchange of Nigeria, United state of America and Great Britain. It was found that violation of these assumptions play an important role in determining if a spurious regression emanates from the statistically related model for reliable predictive purposes. Keywords: Spurious regression, non robustness and foreign exchange.
Artificial regression based LM tests of mis-specification for ordered probit models
Murphy, Anthony
1994-01-01
Lagrange Multiplier (LM) tests for omitted variables, heteroscedasticity, incorrect functional form, and non-normality in the ordered probit model may be readily calculated using an artificial regression. The proposed artificial regression is both convenient and likely to have better small sample properties than the more common outer product gradient (OPG) form.
Directory of Open Access Journals (Sweden)
Mach Łukasz
2017-06-01
Full Text Available The research process aimed at building regression models, which helps to valuate residential real estate, is presented in the following article. Two widely used computational tools i.e. the classical multiple regression and regression models of artificial neural networks were used in order to build models. An attempt to define the utilitarian usefulness of the above-mentioned tools and comparative analysis of them is the aim of the conducted research. Data used for conducting analyses refers to the secondary transactional residential real estate market.
Zhang, Jun; Gao, Yaozong; Wang, Li; Tang, Zhen; Xia, James J; Shen, Dinggang
2016-09-01
The goal of this paper is to automatically digitize craniomaxillofacial (CMF) landmarks efficiently and accurately from cone-beam computed tomography (CBCT) images, by addressing the challenge caused by large morphological variations across patients and image artifacts of CBCT images. We propose a segmentation-guided partially-joint regression forest (S-PRF) model to automatically digitize CMF landmarks. In this model, a regression voting strategy is first adopted to localize each landmark by aggregating evidences from context locations, thus potentially relieving the problem caused by image artifacts near the landmark. Second, CBCT image segmentation is utilized to remove uninformative voxels caused by morphological variations across patients. Third, a partially-joint model is further proposed to separately localize landmarks based on the coherence of landmark positions to improve the digitization reliability. In addition, we propose a fast vector quantization method to extract high-level multiscale statistical features to describe a voxel's appearance, which has low dimensionality, high efficiency, and is also invariant to the local inhomogeneity caused by artifacts. Mean digitization errors for 15 landmarks, in comparison to the ground truth, are all less than 2 mm. Our model has addressed challenges of both interpatient morphological variations and imaging artifacts. Experiments on a CBCT dataset show that our approach achieves clinically acceptable accuracy for landmark digitalization. Our automatic landmark digitization method can be used clinically to reduce the labor cost and also improve digitalization consistency.
Zhang, Jun; Gao, Yaozong; Wang, Li; Tang, Zhen; Xia, James J.; Shen, Dinggang
2016-01-01
Objective The goal of this paper is to automatically digitize craniomaxillofacial (CMF) landmarks efficiently and accurately from cone-beam computed tomography (CBCT) images, by addressing the challenge caused by large morphological variations across patients and image artifacts of CBCT images. Methods We propose a Segmentation-guided Partially-joint Regression Forest (S-PRF) model to automatically digitize CMF landmarks. In this model, a regression voting strategy is first adopted to localize each landmark by aggregating evidences from context locations, thus potentially relieving the problem caused by image artifacts near the landmark. Second, CBCT image segmentation is utilized to remove uninformative voxels caused by morphological variations across patients. Third, a partially-joint model is further proposed to separately localize landmarks based on the coherence of landmark positions to improve the digitization reliability. In addition, we propose a fast vector quantization (VQ) method to extract high-level multi-scale statistical features to describe a voxel's appearance, which has low dimensionality, high efficiency, and is also invariant to the local inhomogeneity caused by artifacts. Results Mean digitization errors for 15 landmarks, in comparison to the ground truth, are all less than 2mm. Conclusion Our model has addressed challenges of both inter-patient morphological variations and imaging artifacts. Experiments on a CBCT dataset show that our approach achieves clinically acceptable accuracy for landmark digitalization. Significance Our automatic landmark digitization method can be used clinically to reduce the labor cost and also improve digitalization consistency. PMID:26625402
Mathematical modelling of vector-borne diseases and insecticide resistance evolution.
Gabriel Kuniyoshi, Maria Laura; Pio Dos Santos, Fernando Luiz
2017-01-01
Vector-borne diseases are important public health issues and, consequently, in silico models that simulate them can be useful. The susceptible-infected-recovered (SIR) model simulates the population dynamics of an epidemic and can be easily adapted to vector-borne diseases, whereas the Hardy-Weinberg model simulates allele frequencies and can be used to study insecticide resistance evolution. The aim of the present study is to develop a coupled system that unifies both models, therefore enabling the analysis of the effects of vector population genetics on the population dynamics of an epidemic. Our model consists of an ordinary differential equation system. We considered the populations of susceptible, infected and recovered humans, as well as susceptible and infected vectors. Concerning these vectors, we considered a pair of alleles, with complete dominance interaction that determined the rate of mortality induced by insecticides. Thus, we were able to separate the vectors according to the genotype. We performed three numerical simulations of the model. In simulation one, both alleles conferred the same mortality rate values, therefore there was no resistant strain. In simulations two and three, the recessive and dominant alleles, respectively, conferred a lower mortality. Our numerical results show that the genetic composition of the vector population affects the dynamics of human diseases. We found that the absolute number of vectors and the proportion of infected vectors are smaller when there is no resistant strain, whilst the ratio of infected people is larger in the presence of insecticide-resistant vectors. The dynamics observed for infected humans in all simulations has a very similar shape to real epidemiological data. The population genetics of vectors can affect epidemiological dynamics, and the presence of insecticide-resistant strains can increase the number of infected people. Based on the present results, the model is a basis for development of
Application of wavelet-based multiple linear regression model to rainfall forecasting in Australia
He, X.; Guan, H.; Zhang, X.; Simmons, C.
2013-12-01
In this study, a wavelet-based multiple linear regression model is applied to forecast monthly rainfall in Australia by using monthly historical rainfall data and climate indices as inputs. The wavelet-based model is constructed by incorporating the multi-resolution analysis (MRA) with the discrete wavelet transform and multiple linear regression (MLR) model. The standardized monthly rainfall anomaly and large-scale climate index time series are decomposed using MRA into a certain number of component subseries at different temporal scales. The hierarchical lag relationship between the rainfall anomaly and each potential predictor is identified by cross correlation analysis with a lag time of at least one month at different temporal scales. The components of predictor variables with known lag times are then screened with a stepwise linear regression algorithm to be selectively included into the final forecast model. The MRA-based rainfall forecasting method is examined with 255 stations over Australia, and compared to the traditional multiple linear regression model based on the original time series. The models are trained with data from the 1959-1995 period and then tested in the 1996-2008 period for each station. The performance is compared with observed rainfall values, and evaluated by common statistics of relative absolute error and correlation coefficient. The results show that the wavelet-based regression model provides considerably more accurate monthly rainfall forecasts for all of the selected stations over Australia than the traditional regression model.
Stratton, Margaret D.; Ehrlich, Hanna Y.; Mor, Siobhan M.; Naumova, Elena N.
2017-01-01
Ross River virus (RRV), Barmah Forest virus (BFV), and dengue are three common mosquito-borne diseases in Australia that display notable seasonal patterns. Although all three diseases have been modeled on localized scales, no previous study has used harmonic models to compare seasonality of mosquito-borne diseases on a continent-wide scale. We fit Poisson harmonic regression models to surveillance data on RRV, BFV, and dengue (from 1993, 1995 and 1991, respectively, through 2015) incorporating seasonal, trend, and climate (temperature and rainfall) parameters. The models captured an average of 50-65% variability of the data. Disease incidence for all three diseases generally peaked in January or February, but peak timing was most variable for dengue. The most significant predictor parameters were trend and inter-annual periodicity for BFV, intra-annual periodicity for RRV, and trend for dengue. We found that a Temperature Suitability Index (TSI), designed to reclassify climate data relative to optimal conditions for vector establishment, could be applied to this context. Finally, we extrapolated our models to estimate the impact of a false-positive BFV epidemic in 2013. Creating these models and comparing variations in periodicities may provide insight into historical outbreaks as well as future patterns of mosquito-borne diseases.
Poisson regression for modeling count and frequency outcomes in trauma research.
Gagnon, David R; Doron-LaMarca, Susan; Bell, Margret; O'Farrell, Timothy J; Taft, Casey T
2008-10-01
The authors describe how the Poisson regression method for analyzing count or frequency outcome variables can be applied in trauma studies. The outcome of interest in trauma research may represent a count of the number of incidents of behavior occurring in a given time interval, such as acts of physical aggression or substance abuse. Traditional linear regression approaches assume a normally distributed outcome variable with equal variances over the range of predictor variables, and may not be optimal for modeling count outcomes. An application of Poisson regression is presented using data from a study of intimate partner aggression among male patients in an alcohol treatment program and their female partners. Results of Poisson regression and linear regression models are compared.
Random regression test-day model for the analysis of dairy cattle ...
African Journals Online (AJOL)
Random regression test-day model for the analysis of dairy cattle production data in South Africa: Creating the framework. EF Dzomba, KA Nephawe, AN Maiwashe, SWP Cloete, M Chimonyo, CB Banga, CJC Muller, K Dzama ...
U.S. Geological Survey, Department of the Interior — This dataset was created using the PRISM (Parameter-elevation Regressions on Independent Slopes Model) climate mapping system, developed by Dr. Christopher Daly,...
Li, Deping; Oranje, Andreas
2007-01-01
Two versions of a general method for approximating standard error of regression effect estimates within an IRT-based latent regression model are compared. The general method is based on Binder's (1983) approach, accounting for complex samples and finite populations by Taylor series linearization. In contrast, the current National Assessment of…
Reflexion on linear regression trip production modelling method for ensuring good model quality
Suprayitno, Hitapriya; Ratnasari, Vita
2017-11-01
Transport Modelling is important. For certain cases, the conventional model still has to be used, in which having a good trip production model is capital. A good model can only be obtained from a good sample. Two of the basic principles of a good sampling is having a sample capable to represent the population characteristics and capable to produce an acceptable error at a certain confidence level. It seems that this principle is not yet quite understood and used in trip production modeling. Therefore, investigating the Trip Production Modelling practice in Indonesia and try to formulate a better modeling method for ensuring the Model Quality is necessary. This research result is presented as follows. Statistics knows a method to calculate span of prediction value at a certain confidence level for linear regression, which is called Confidence Interval of Predicted Value. The common modeling practice uses R2 as the principal quality measure, the sampling practice varies and not always conform to the sampling principles. An experiment indicates that small sample is already capable to give excellent R2 value and sample composition can significantly change the model. Hence, good R2 value, in fact, does not always mean good model quality. These lead to three basic ideas for ensuring good model quality, i.e. reformulating quality measure, calculation procedure, and sampling method. A quality measure is defined as having a good R2 value and a good Confidence Interval of Predicted Value. Calculation procedure must incorporate statistical calculation method and appropriate statistical tests needed. A good sampling method must incorporate random well distributed stratified sampling with a certain minimum number of samples. These three ideas need to be more developed and tested.
The log-Burr XII regression model for grouped survival data.
Hashimoto, Elizabeth M; Ortega, Edwin M M; Cordeiro, Gauss M; Barreto, Mauricio L
2012-01-01
The log-Burr XII regression model for grouped survival data is evaluated in the presence of many ties. The methodology for grouped survival data is based on life tables, where the times are grouped in k intervals, and we fit discrete lifetime regression models to the data. The model parameters are estimated by maximum likelihood and jackknife methods. To detect influential observations in the proposed model, diagnostic measures based on case deletion, so-called global influence, and influence measures based on small perturbations in the data or in the model, referred to as local influence, are used. In addition to these measures, the total local influence and influential estimates are also used. We conduct Monte Carlo simulation studies to assess the finite sample behavior of the maximum likelihood estimators of the proposed model for grouped survival. A real data set is analyzed using a regression model for grouped data.
May, D; Sivakumar, M
2008-01-01
Urban stormwater quality is influenced by many interrelated processes. However, the site-specific nature of these complex processes makes stormwater quality difficult to predict using physically based process models. This has resulted in the need for more empirical techniques. In this study, artificial neural networks (ANN) were used to model urban stormwater quality. A total of 5 different constituents were analyzed-chemical oxygen demand, lead, suspended solids, total Kjeldahl nitrogen, and total phosphorus. Input variables were selected using stepwise linear regression models, calibrated on logarithmically transformed data. Artificial neural networks models were then developed and compared with the regression models. The results from the analyses indicate that multiple linear regression models were more applicable for predicting urban stormwater quality than ANN models.
Developing and testing a global-scale regression model to quantify mean annual streamflow
Barbarossa, Valerio; Huijbregts, Mark A. J.; Hendriks, A. Jan; Beusen, Arthur H. W.; Clavreul, Julie; King, Henry; Schipper, Aafke M.
2017-01-01
Quantifying mean annual flow of rivers (MAF) at ungauged sites is essential for assessments of global water supply, ecosystem integrity and water footprints. MAF can be quantified with spatially explicit process-based models, which might be overly time-consuming and data-intensive for this purpose, or with empirical regression models that predict MAF based on climate and catchment characteristics. Yet, regression models have mostly been developed at a regional scale and the extent to which they can be extrapolated to other regions is not known. In this study, we developed a global-scale regression model for MAF based on a dataset unprecedented in size, using observations of discharge and catchment characteristics from 1885 catchments worldwide, measuring between 2 and 106 km2. In addition, we compared the performance of the regression model with the predictive ability of the spatially explicit global hydrological model PCR-GLOBWB by comparing results from both models to independent measurements. We obtained a regression model explaining 89% of the variance in MAF based on catchment area and catchment averaged mean annual precipitation and air temperature, slope and elevation. The regression model performed better than PCR-GLOBWB for the prediction of MAF, as root-mean-square error (RMSE) values were lower (0.29-0.38 compared to 0.49-0.57) and the modified index of agreement (d) was higher (0.80-0.83 compared to 0.72-0.75). Our regression model can be applied globally to estimate MAF at any point of the river network, thus providing a feasible alternative to spatially explicit process-based global hydrological models.
Machine Learning, Linear and Bayesian Models for Logistic Regression in Failure Detection Problems
Pavlyshenko, B.
2016-01-01
In this work, we study the use of logistic regression in manufacturing failures detection. As a data set for the analysis, we used the data from Kaggle competition Bosch Production Line Performance. We considered the use of machine learning, linear and Bayesian models. For machine learning approach, we analyzed XGBoost tree based classifier to obtain high scored classification. Using the generalized linear model for logistic regression makes it possible to analyze the influence of the factors...
Use of Pollutant Load Regression Models with Various Sampling Frequencies for Annual Load Estimation
Youn Shik Park; Bernie A. Engel
2014-01-01
Water quality data are collected by various sampling frequencies, and the data may not be collected at a high frequency nor over the range of streamflow conditions. Therefore, regression models are used to estimate pollutant data for days on which water quality data were not measured. Pollutant load regression models were evaluated with six sampling frequencies for daily nitrogen, phosphorus, and sediment data. Annual pollutant load estimates exhibited various behaviors by sampling frequency...
Analysis for Regression Model Behavior by Sampling Strategy for Annual Pollutant Load Estimation.
Park, Youn Shik; Engel, Bernie A
2015-11-01
Water quality data are typically collected less frequently than streamflow data due to the cost of collection and analysis, and therefore water quality data may need to be estimated for additional days. Regression models are applicable to interpolate water quality data associated with streamflow data and have come to be extensively used, requiring relatively small amounts of data. There is a need to evaluate how well the regression models represent pollutant loads from intermittent water quality data sets. Both the specific regression model and water quality data frequency are important factors in pollutant load estimation. In this study, nine regression models from the Load Estimator (LOADEST) and one regression model from the Web-based Load Interpolation Tool (LOADIN) were evaluated with subsampled water quality data sets from daily measured water quality data sets for N, P, and sediment. Each water quality parameter had different correlations with streamflow, and the subsampled water quality data sets had various proportions of storm samples. The behaviors of the regression models differed not only by water quality parameter but also by proportion of storm samples. The regression models from LOADEST provided accurate and precise annual sediment and P load estimates using the water quality data of 20 to 40% storm samples. LOADIN provided more accurate and precise annual N load estimates than LOADEST. In addition, the results indicate that avoidance of water quality data extrapolation and availability of water quality data from storm events were crucial in annual pollutant load estimation using pollutant regression models. Copyright © by the American Society of Agronomy, Crop Science Society of America, and Soil Science Society of America, Inc.
Modelling bluetongue virus transmission between farms using animal and vector movements
Turner, Joanne; Bowers, Roger G.; Baylis, Matthew
2012-01-01
Bluetongue is a notifiable disease of ruminants which, in 2007, occurred for the first time in England. We present the first model for bluetongue that explicitly incorporates farm to farm movements of the two main hosts, as well as vector dispersal. The model also includes a seasonal vector to host ratio and dynamic restriction zones that evolve as infection is detected. Batch movements of sheep were included by modelling degree of mixing at markets. We investigate the transmission of bluetongue virus between farms in eastern England (the focus of the outbreak). Results indicate that most parameters affecting outbreak size relate to vectors and that the infection generally cannot be maintained without between-herd vector transmission. Movement restrictions are effective at reducing outbreak size, and a targeted approach would be as effective as a total movement ban. The model framework is flexible and can be adapted to other vector-borne diseases of livestock. PMID:22432051
Using the Logistic Regression model in supporting decisions of establishing marketing strategies
Directory of Open Access Journals (Sweden)
Cristinel CONSTANTIN
2015-12-01
Full Text Available This paper is about an instrumental research regarding the using of Logistic Regression model for data analysis in marketing research. The decision makers inside different organisation need relevant information to support their decisions regarding the marketing strategies. The data provided by marketing research could be computed in various ways but the multivariate data analysis models can enhance the utility of the information. Among these models we can find the Logistic Regression model, which is used for dichotomous variables. Our research is based on explanation the utility of this model and interpretation of the resulted information in order to help practitioners and researchers to use it in their future investigations
Regression Model Term Selection for the Analysis of Strain-Gage Balance Calibration Data
Ulbrich, Norbert Manfred; Volden, Thomas R.
2010-01-01
The paper discusses the selection of regression model terms for the analysis of wind tunnel strain-gage balance calibration data. Different function class combinations are presented that may be used to analyze calibration data using either a non-iterative or an iterative method. The role of the intercept term in a regression model of calibration data is reviewed. In addition, useful algorithms and metrics originating from linear algebra and statistics are recommended that will help an analyst (i) to identify and avoid both linear and near-linear dependencies between regression model terms and (ii) to make sure that the selected regression model of the calibration data uses only statistically significant terms. Three different tests are suggested that may be used to objectively assess the predictive capability of the final regression model of the calibration data. These tests use both the original data points and regression model independent confirmation points. Finally, data from a simplified manual calibration of the Ames MK40 balance is used to illustrate the application of some of the metrics and tests to a realistic calibration data set.
Gene therapy model of X-linked severe combined immunodeficiency using a modified foamy virus vector.
Directory of Open Access Journals (Sweden)
Satoshi Horino
Full Text Available X-linked severe combined immunodeficiency (SCID-X1 is an inherited genetic immunodeficiency associated with mutations in the common cytokine receptor γ chain (γc gene, and characterized by a complete defect of T and natural killer (NK cells. Gene therapy for SCID-X1 using conventional retroviral (RV vectors carrying the γc gene results in the successful reconstitution of T cell immunity. However, the high incidence of vector-mediated T cell leukemia, caused by vector insertion near or within cancer-related genes has been a serious problem. In this study, we established a gene therapy model of mouse SCID-X1 using a modified foamy virus (FV vector expressing human γc. Analysis of vector integration in a human T cell line demonstrated that the FV vector integration sites were significantly less likely to be located within or near transcriptional start sites than RV vector integration sites. To evaluate the therapeutic efficacy, bone marrow cells from γc-knockout (γc-KO mice were infected with the FV vector and transplanted into γc-KO mice. Transplantation of the FV-treated cells resulted in the successful reconstitution of functionally active T and B cells. These data suggest that FV vectors can be effective and may be safer than conventional RV vectors for gene therapy for SCID-X1.
Directory of Open Access Journals (Sweden)
Antonio Blanco-Oliver
2014-10-01
Full Text Available Despite the leading role that micro-entrepreneurship plays in economic development, and the high failure rate of microenterprise start-ups in their early years, very few studies have designed financial distress models to detect the financial problems of micro-entrepreneurs. Moreover, due to a lack of research, nothing is known about whether non-financial information and nonparametric statistical techniques improve the predictive capacity of these models. Therefore, this paper provides an innovative financial distress model specifically designed for microenterprise startups via support vector machines (SVMs that employs financial, non-financial, and macroeconomic variables. Based on a sample of almost 5,500 micro- entrepreneurs from a Peruvian Microfinance Institution (MFI, our findings show that the introduction of non-financial information related to the zone in which the entrepreneurs live and situate their business, the duration of the MFI-entrepreneur relationship, the number of loans granted by the MFI in the last year, the loan destination, and the opinion of experts on the probability that microenterprise start-ups may experience financial problems, significantly increases the accuracy performance of our financial distress model. Furthermore, the results reveal that the models that use SVMs outperform those which employ traditional logistic regression (LR analysis.
Directory of Open Access Journals (Sweden)
David M Poché
2016-08-01
Full Text Available Visceral leishmaniasis (VL is a disease caused by two known vector-borne parasite species (Leishmania donovani, L. infantum, transmitted to man by phlebotomine sand flies (species: Phlebotomus and Lutzomyia, resulting in ≈50,000 human fatalities annually, ≈67% occurring on the Indian subcontinent. Indoor residual spraying is the current method of sand fly control in India, but alternative means of vector control, such as the treatment of livestock with systemic insecticide-based drugs, are being evaluated. We describe an individual-based, stochastic, life-stage-structured model that represents a sand fly vector population within a village in India and simulates the effects of vector control via fipronil-based drugs orally administered to cattle, which target both blood-feeding adults and larvae that feed on host feces.Simulation results indicated efficacy of fipronil-based control schemes in reducing sand fly abundance depended on timing of drug applications relative to seasonality of the sand fly life cycle. Taking into account cost-effectiveness and logistical feasibility, two of the most efficacious treatment schemes reduced population peaks occurring from April through August by ≈90% (applications 3 times per year at 2-month intervals initiated in March and >95% (applications 6 times per year at 2-month intervals initiated in January relative to no control, with the cumulative number of sand fly days occurring April-August reduced by ≈83% and ≈97%, respectively, and more specifically during the summer months of peak human exposure (June-August by ≈85% and ≈97%, respectively.Our model should prove useful in a priori evaluation of the efficacy of fipronil-based drugs in controlling leishmaniasis on the Indian subcontinent and beyond.
ANALISIS PERILAKU SEKTOR PERTANIAN INDONESIA: APLIKASI VECTOR ERROR CORRECTION MODEL
Directory of Open Access Journals (Sweden)
Andi Irawan
2011-08-01
Full Text Available Normal 0 false false false MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Times New Roman"; mso-ansi-language:#0400; mso-fareast-language:#0400; mso-bidi-language:#0400;} The specific goals of this study were:Firstly, in the long run perspective, the goal is to analyze impact of policythat inflate agriculture price to growth, employment, and investment inagriculture sector. Secondly, in the short run perspective, the goals are: (1to analyze which economic blocks that have most producing instability toagriculture sector, (2 to analyze behaviour af inflation in agriculture sectorand causality relationship both among output price and input prices and amonginput prices. Quantitative methods used ini this studywere Vector Error Correction Model, Johansen Cointegration Test, and GrangerCausality Test. Data used in this study come from several sources such as BankIndonesia, BPS Statistic, International Financial Statistic and CEIC dataCompany Limited, series data from first monthly of 1993 (1993:01 up to thelast monthly of 2002 (2002:12. In the agriculture sector, production(output and capital are responsive to change in the output price. This meanthat inflating the output price effectively help generate output and newinvestment in this sector. Nevertheless because shock in price can be source ofinstability to agriculture sector, so government should be carefully applypolicies that can inflating the price in agriculture. To solve unemploymentproblem in agriculture sector, government should apply cost strategy such assubsidy policy of input price.
Lentiviral vectors in neurodegenrative disorders - Aspects in gene therapy and disease models
DEFF Research Database (Denmark)
Nielsen, Troels Tolstrup
2009-01-01
, which is most often only satisfactory in the initial phase of the disease. Gene therapy is a novel treatment strategy intended to treat or alleviate disease by genetically modifying cells by introducing nucleic acids into the cells. Lentiviral vectors hold great promise as gene transfer vectors...... expression and escape transgene silencing during differentiation of neural stem cell lines. However, insulator vectors appeared to be impaired in functionality, which has importance for the future use of insulators in viral vectors. Finally, cell based models of HD was constructed to elucidate...
Skin injury model classification based on shape vector analysis.
Röhrich, Emil; Thali, Michael; Schweitzer, Wolf
2012-11-06
Skin injuries can be crucial in judicial decision making. Forensic experts base their classification on subjective opinions. This study investigates whether known classes of simulated skin injuries are correctly classified statistically based on 3D surface models and derived numerical shape descriptors. Skin injury surface characteristics are simulated with plasticine. Six injury classes - abrasions, incised wounds, gunshot entry wounds, smooth and textured strangulation marks as well as patterned injuries - with 18 instances each are used for a k-fold cross validation with six partitions. Deformed plasticine models are captured with a 3D surface scanner. Mean curvature is estimated for each polygon surface vertex. Subsequently, distance distributions and derived aspect ratios, convex hulls, concentric spheres, hyperbolic points and Fourier transforms are used to generate 1284-dimensional shape vectors. Subsequent descriptor reduction maximizing SNR (signal-to-noise ratio) result in an average of 41 descriptors (varying across k-folds). With non-normal multivariate distribution of heteroskedastic data, requirements for LDA (linear discriminant analysis) are not met. Thus, shrinkage parameters of RDA (regularized discriminant analysis) are optimized yielding a best performance with λ = 0.99 and γ = 0.001. Receiver Operating Characteristic of a descriptive RDA yields an ideal Area Under the Curve of 1.0 for all six categories. Predictive RDA results in an average CRR (correct recognition rate) of 97,22% under a 6 partition k-fold. Adding uniform noise within the range of one standard deviation degrades the average CRR to 71,3%. Digitized 3D surface shape data can be used to automatically classify idealized shape models of simulated skin injuries. Deriving some well established descriptors such as histograms, saddle shape of hyperbolic points or convex hulls with subsequent reduction of dimensionality while maximizing SNR seem to work well for the data at hand, as
Hartmann, Armin; Van Der Kooij, Anita J; Zeeck, Almut
2009-07-01
In explorative regression studies, linear models are often applied without questioning the linearity of the relations between the predictor variables and the dependent variable, or linear relations are taken as an approximation. In this study, the method of regression with optimal scaling transformations is demonstrated. This method does not require predefined nonlinear functions and results in easy-to-interpret transformations that will show the form of the relations. The method is illustrated using data from a German multicenter project on the indication criteria for inpatient or day clinic psychotherapy treatment. The indication criteria to include in the regression model were selected with the Lasso, which is a tool for predictor selection that overcomes the disadvantages of stepwise regression methods. The resulting prediction model indicates that treatment status is (approximately) linearly related to some criteria and nonlinearly related to others.
Computation of the Exact Information Matrix of Gaussian Dynamic Regression Time Series Models
Klein, A.A.B.; Melard, G.; Zahaf, T.
1998-01-01
In this paper, the computation of the exact Fisher information matrix of a large class of Gaussian time series models is considered. This class, which is often called the single-input-single-output (SISO) model, includes dynamic regression with autocorrelated errors and the transfer function model,
Review Random regression test-day model for the analysis of dairy ...
African Journals Online (AJOL)
jannes
Abstract. Genetic evaluation of dairy cattle using test-day models is now common internationally. In South. Africa a fixed regression test-day model is used to generate breeding values for dairy animals on a routine basis. The model is, however, often criticized for erroneously assuming a standard lactation curve for cows.
Random regression test-day model for the analysis of dairy cattle ...
African Journals Online (AJOL)
Genetic evaluation of dairy cattle using test-day models is now common internationally. In South Africa a fixed regression test-day model is used to generate breeding values for dairy animals on a routine basis. The model is, however, often criticized for erroneously assuming a standard lactation curve for cows in similar ...
Modeling Dynamics of Wikipedia: An Empirical Analysis Using a Vector Error Correction Model
Directory of Open Access Journals (Sweden)
Liu Feng-Jun
2017-01-01
Full Text Available In this paper, we constructed a system dynamic model of Wikipedia based on the co-evolution theory, and investigated the interrelationships among topic popularity, group size, collaborative conflict, coordination mechanism, and information quality by using the vector error correction model (VECM. This study provides a useful framework for analyzing the dynamics of Wikipedia and presents a formal exposition of the VECM methodology in the information system research.
Global properties of vector-host disease models with time delays.
Cai, Li-Ming; Li, Xue-Zhi; Fang, Bin; Ruan, Shigui
2017-05-01
Since there exist extrinsic and intrinsic incubation periods of pathogens in the feedback interactions between the vectors and hosts, it is necessary to consider the incubation delays in vector-host disease transmission dynamics. In this paper, we propose vector-host disease models with two time delays, one describing the incubation period in the vector population and another representing the incubation period in the host population. Both distributed and discrete delays are used. By constructing suitable Liapunov functions, we obtain sufficient conditions for the global stability of the endemic equilibria of these models. The analytic results reveal that the global dynamics of such vector-host disease models with time delays are completely determined by the basic reproduction number. Some specific cases with discrete delay are studied and the corresponding results are improved.
Structured Additive Regression Models: An R Interface to BayesX
Directory of Open Access Journals (Sweden)
Nikolaus Umlauf
2015-02-01
Full Text Available Structured additive regression (STAR models provide a flexible framework for model- ing possible nonlinear effects of covariates: They contain the well established frameworks of generalized linear models and generalized additive models as special cases but also allow a wider class of effects, e.g., for geographical or spatio-temporal data, allowing for specification of complex and realistic models. BayesX is standalone software package providing software for fitting general class of STAR models. Based on a comprehensive open-source regression toolbox written in C++, BayesX uses Bayesian inference for estimating STAR models based on Markov chain Monte Carlo simulation techniques, a mixed model representation of STAR models, or stepwise regression techniques combining penalized least squares estimation with model selection. BayesX not only covers models for responses from univariate exponential families, but also models from less-standard regression situations such as models for multi-categorical responses with either ordered or unordered categories, continuous time survival data, or continuous time multi-state models. This paper presents a new fully interactive R interface to BayesX: the R package R2BayesX. With the new package, STAR models can be conveniently specified using Rs formula language (with some extended terms, fitted using the BayesX binary, represented in R with objects of suitable classes, and finally printed/summarized/plotted. This makes BayesX much more accessible to users familiar with R and adds extensive graphics capabilities for visualizing fitted STAR models. Furthermore, R2BayesX complements the already impressive capabilities for semiparametric regression in R by a comprehensive toolbox comprising in particular more complex response types and alternative inferential procedures such as simulation-based Bayesian inference.
Eigenvalues of Bethe vectors in the Gaudin model
Molev, A. I.; Mukhin, E. E.
2017-09-01
According to the Feigin-Frenkel-Reshetikhin theorem, the eigenvalues of higher Gaudin Hamiltonians on Bethe vectors can be found using the center of an affine vertex algebra at the critical level. We recently calculated explicit Harish-Chandra images of the generators of the center in all classical types. Combining these results leads to explicit formulas for the eigenvalues of higher Gaudin Hamiltonians on Bethe vectors. The Harish-Chandra images can be interpreted as elements of classical W-algebras. By calculating classical limits of the corresponding screening operators, we elucidate a direct connection between the rings of q-characters and classical W-algebras.
Nagel-Alne, G E; Krontveit, R; Bohlin, J; Valle, P S; Skjerve, E; Sølverød, L S
2014-07-01
In 2001, the Norwegian Goat Health Service initiated the Healthier Goats program (HG), with the aim of eradicating caprine arthritis encephalitis, caseous lymphadenitis, and Johne's disease (caprine paratuberculosis) in Norwegian goat herds. The aim of the present study was to explore how control and eradication of the above-mentioned diseases by enrolling in HG affected milk yield by comparison with herds not enrolled in HG. Lactation curves were modeled using a multilevel cubic spline regression model where farm, goat, and lactation were included as random effect parameters. The data material contained 135,446 registrations of daily milk yield from 28,829 lactations in 43 herds. The multilevel cubic spline regression model was applied to 4 categories of data: enrolled early, control early, enrolled late, and control late. For enrolled herds, the early and late notations refer to the situation before and after enrolling in HG; for nonenrolled herds (controls), they refer to development over time, independent of HG. Total milk yield increased in the enrolled herds after eradication: the total milk yields in the fourth lactation were 634.2 and 873.3 kg in enrolled early and enrolled late herds, respectively, and 613.2 and 701.4 kg in the control early and control late herds, respectively. Day of peak yield differed between enrolled and control herds. The day of peak yield came on d 6 of lactation for the control early category for parities 2, 3, and 4, indicating an inability of the goats to further increase their milk yield from the initial level. For enrolled herds, on the other hand, peak yield came between d 49 and 56, indicating a gradual increase in milk yield after kidding. Our results indicate that enrollment in the HG disease eradication program improved the milk yield of dairy goats considerably, and that the multilevel cubic spline regression was a suitable model for exploring effects of disease control and eradication on milk yield. Copyright © 2014