Akbari, Somaye; Zebardast, Tannaz; Zarghi, Afshin; Hajimahdi, Zahra
2017-01-01
COX-2 inhibitory activities of some 1,4-dihydropyridine and 5-oxo-1,4,5,6,7,8-hexahydroquinoline derivatives were modeled by quantitative structure-activity relationship (QSAR) using stepwise-multiple linear regression (SW-MLR) method. The built model was robust and predictive with correlation coefficient (R 2 ) of 0.972 and 0.531 for training and test groups, respectively. The quality of the model was evaluated by leave-one-out (LOO) cross validation (LOO correlation coefficient (Q 2 ) of 0.943) and Y-randomization. We also employed a leverage approach for the defining of applicability domain of model. Based on QSAR models results, COX-2 inhibitory activity of selected data set had correlation with BEHm6 (highest eigenvalue n. 6 of Burden matrix/weighted by atomic masses), Mor03u (signal 03/unweighted) and IVDE (Mean information content on the vertex degree equality) descriptors which derived from their structures.
Fisz, Jacek J
2006-12-07
The optimization approach based on the genetic algorithm (GA) combined with multiple linear regression (MLR) method, is discussed. The GA-MLR optimizer is designed for the nonlinear least-squares problems in which the model functions are linear combinations of nonlinear functions. GA optimizes the nonlinear parameters, and the linear parameters are calculated from MLR. GA-MLR is an intuitive optimization approach and it exploits all advantages of the genetic algorithm technique. This optimization method results from an appropriate combination of two well-known optimization methods. The MLR method is embedded in the GA optimizer and linear and nonlinear model parameters are optimized in parallel. The MLR method is the only one strictly mathematical "tool" involved in GA-MLR. The GA-MLR approach simplifies and accelerates considerably the optimization process because the linear parameters are not the fitted ones. Its properties are exemplified by the analysis of the kinetic biexponential fluorescence decay surface corresponding to a two-excited-state interconversion process. A short discussion of the variable projection (VP) algorithm, designed for the same class of the optimization problems, is presented. VP is a very advanced mathematical formalism that involves the methods of nonlinear functionals, algebra of linear projectors, and the formalism of Fréchet derivatives and pseudo-inverses. Additional explanatory comments are added on the application of recently introduced the GA-NR optimizer to simultaneous recovery of linear and weakly nonlinear parameters occurring in the same optimization problem together with nonlinear parameters. The GA-NR optimizer combines the GA method with the NR method, in which the minimum-value condition for the quadratic approximation to chi(2), obtained from the Taylor series expansion of chi(2), is recovered by means of the Newton-Raphson algorithm. The application of the GA-NR optimizer to model functions which are multi
Directory of Open Access Journals (Sweden)
Y. Plancherel
2013-07-01
Full Text Available Quantifying oceanic anthropogenic carbon uptake by monitoring interior dissolved inorganic carbon (DIC concentrations is complicated by the influence of natural variability. The "eMLR method" aims to address this issue by using empirical regression fits of the data instead of the data themselves, inferring the change in anthropogenic carbon in time by difference between predictions generated by the regressions at each time. The advantages of the method are that it provides in principle a means to filter out natural variability, which theoretically becomes the regression residuals, and a way to deal with sparsely and unevenly distributed data. The degree to which these advantages are realized in practice is unclear, however. The ability of the eMLR method to recover the anthropogenic carbon signal is tested here using a global circulation and biogeochemistry model in which the true signal is known. Results show that regression model selection is particularly important when the observational network changes in time. When the observational network is fixed, the likelihood that co-located systematic misfits between the empirical model and the underlying, yet unknown, true model cancel is greater, improving eMLR results. Changing the observational network modifies how the spatio-temporal variance pattern is captured by the respective datasets, resulting in empirical models that are dynamically or regionally inconsistent, leading to systematic errors. In consequence, the use of regression formulae that change in time to represent systematically best-fit models at all times does not guarantee the best estimates of anthropogenic carbon change if the spatial distributions of the stations emphasize hydrographic features differently in time. Other factors, such as a balanced and representative station coverage, vertical continuity of the regression formulae consistent with the hydrographic context and resiliency of the spatial distribution of the residual
Directory of Open Access Journals (Sweden)
Samsuri Abdullah
2016-07-01
Full Text Available Air pollution in Peninsular Malaysia is dominated by particulate matter which is demonstrated by having the highest Air Pollution Index (API value compared to the other pollutants at most part of the country. Particulate Matter (PM10 forecasting models development is crucial because it allows the authority and citizens of a community to take necessary actions to limit their exposure to harmful levels of particulates pollution and implement protection measures to significantly improve air quality on designated locations. This study aims in improving the ability of MLR using PCs inputs for PM10 concentrations forecasting. Daily observations for PM10 in Kuala Terengganu, Malaysia from January 2003 till December 2011 were utilized to forecast PM10 concentration levels. MLR and PCR (using PCs input models were developed and the performance was evaluated using RMSE, NAE and IA. Results revealed that PCR performed better than MLR due to the implementation of PCA which reduce intricacy and eliminate data multi-collinearity.
Rafiei, Hamid; Khanzadeh, Marziyeh; Mozaffari, Shahla; Bostanifar, Mohammad Hassan; Avval, Zhila Mohajeri; Aalizadeh, Reza; Pourbasheer, Eslam
2016-01-01
Quantitative structure-activity relationship (QSAR) study has been employed for predicting the inhibitory activities of the Hepatitis C virus (HCV) NS5B polymerase inhibitors . A data set consisted of 72 compounds was selected, and then different types of molecular descriptors were calculated. The whole data set was split into a training set (80 % of the dataset) and a test set (20 % of the dataset) using principle component analysis. The stepwise (SW) and the genetic algorithm (GA) techniques were used as variable selection tools. Multiple linear regression method was then used to linearly correlate the selected descriptors with inhibitory activities. Several validation technique including leave-one-out and leave-group-out cross-validation, Y-randomization method were used to evaluate the internal capability of the derived models. The external prediction ability of the derived models was further analyzed using modified r(2), concordance correlation coefficient values and Golbraikh and Tropsha acceptable model criteria's. Based on the derived results (GA-MLR), some new insights toward molecular structural requirements for obtaining better inhibitory activity were obtained.
Qin, Zijian; Wang, Maolin; Yan, Aixia
2017-07-01
In this study, quantitative structure-activity relationship (QSAR) models using various descriptor sets and training/test set selection methods were explored to predict the bioactivity of hepatitis C virus (HCV) NS3/4A protease inhibitors by using a multiple linear regression (MLR) and a support vector machine (SVM) method. 512 HCV NS3/4A protease inhibitors and their IC 50 values which were determined by the same FRET assay were collected from the reported literature to build a dataset. All the inhibitors were represented with selected nine global and 12 2D property-weighted autocorrelation descriptors calculated from the program CORINA Symphony. The dataset was divided into a training set and a test set by a random and a Kohonen's self-organizing map (SOM) method. The correlation coefficients (r 2 ) of training sets and test sets were 0.75 and 0.72 for the best MLR model, 0.87 and 0.85 for the best SVM model, respectively. In addition, a series of sub-dataset models were also developed. The performances of all the best sub-dataset models were better than those of the whole dataset models. We believe that the combination of the best sub- and whole dataset SVM models can be used as reliable lead designing tools for new NS3/4A protease inhibitors scaffolds in a drug discovery pipeline. Copyright © 2017 Elsevier Ltd. All rights reserved.
International Nuclear Information System (INIS)
Ryazantsev, E.P.; Egorenkov, P.M.; Nasonov, V.A.; Smimov, A.M.; Taliev, A.V.; Gromov, B.F.; Kousin, V.V.; Lantsov, M.N.; Radchenko, V.P.; Sharapov, V.N.
1998-01-01
The Material Testing Loop Reactor (MLR) development was commenced in 1991 with the aim of updating and widening Russia's experimental base to validate the selected directions of further progress of the nuclear power industry in Russia and to enhance its reliability and safety. The MLR reactor is the pool-type one. As coolant it applies light water and as side reflector beryllium. The direction of water circulation in the core is upward. The core comprises 30 FA arranged as hexagonal lattice with the 90-95 mm pitch. The central materials channel and six loop channels are sited in the core. The reflector includes up to 11 loop channels. The reactor power is 100 MW. The average power density of the core is 0.4 MW/I (maximal value 1.0 MW/l). The maximum neutron flux density is 7.10 14 n/cm 2 s in the core (E>0.1 MeV), and 5.10 14 n/cm 2 s in the reflector (E<0.625 eV). In 1995 due to the lack of funding the MLR designing was suspended. (author)
Liu, Ke; Chen, Xiaojing; Li, Limin; Chen, Huiling; Ruan, Xiukai; Liu, Wenbin
2015-02-09
The successive projections algorithm (SPA) is widely used to select variables for multiple linear regression (MLR) modeling. However, SPA used only once may not obtain all the useful information of the full spectra, because the number of selected variables cannot exceed the number of calibration samples in the SPA algorithm. Therefore, the SPA-MLR method risks the loss of useful information. To make a full use of the useful information in the spectra, a new method named "consensus SPA-MLR" (C-SPA-MLR) is proposed herein. This method is the combination of consensus strategy and SPA-MLR method. In the C-SPA-MLR method, SPA-MLR is used to construct member models with different subsets of variables, which are selected from the remaining variables iteratively. A consensus prediction is obtained by combining the predictions of the member models. The proposed method is evaluated by analyzing the near infrared (NIR) spectra of corn and diesel. The results of C-SPA-MLR method showed a better prediction performance compared with the SPA-MLR and full-spectra PLS methods. Moreover, these results could serve as a reference for combination the consensus strategy and other variable selection methods when analyzing NIR spectra and other spectroscopic techniques. Copyright © 2014 Elsevier B.V. All rights reserved.
Directory of Open Access Journals (Sweden)
Majid Mohammadhosseini
2014-05-01
Full Text Available A reliable quantitative structure retention relationship (QSRR study has been evaluated to predict the retention indices (RIs of a broad spectrum of compounds, namely 118 non-linear, cyclic and heterocyclic terpenoids (both saturated and unsaturated, on an HP-5MS fused silica column. A principal component analysis showed that seven compounds lay outside of the main cluster. After elimination of the outliers, the data set was divided into training and test sets involving 80 and 28 compounds. The method was tested by application of the particle swarm optimization (PSO method to find the most effective molecular descriptors, followed by multiple linear regressions (MLR. The PSO-MLR model was further confirmed through “leave one out cross validation” (LOO-CV and “leave group out cross validation” (LGO-CV, as well as external validations. The promising statistical figures of merit associated with the proposed model (R2train=0.936, Q2LOO=0.928, Q2LGO=0.921, F=376.4 confirm its high ability to predict RIs with negligible relative errors of predictions (REP train=4.8%, REP test=6.0%.
Budiastra, I. W.; Sutrisno; Widyotomo, S.; Ayu, P. C.
2018-05-01
Caffeine is one of important components in coffee that contributes to the coffee beverages flavor. Caffeine concentration in coffee bean is usually determined by chemical method which is time consuming and destructive method. A nondestructive method using NIR spectroscopy was successfully applied to determine the caffeine concentration of Arabica gayo coffee bean. In this study, NIR Spectroscopy was assessed to determine the caffeine concentration of java preanger coffee bean. A hundred samples, each consist of 96 g coffee beans were prepared for reflectance and chemical measurement. Reflectance of the sample was measured by FT-NIR spectrometer in the wavelength of 1000-2500 nm (10000-4000 cm-1) followed by determination of caffeine content using LCMS method. Calibration of NIR spectra and the caffeine content was carried out using PLS and MLR methods. Several spectra data processing was conducted to increase the accuracy of prediction. The result of the study showed that caffeine content could be determined by PLS model using 7 factors and spectra data processing of combination of the first derivative and MSC of spectra absorbance (r = 0.946; CV = 1.54 %; RPD = 2.28). A lower accuracy was obtained by MLR model consisted of three caffeine and other four absorption wavelengths (r = 0.683; CV = 3.31%; RPD = 1.18).
Alternative Methods of Regression
Birkes, David
2011-01-01
Of related interest. Nonlinear Regression Analysis and its Applications Douglas M. Bates and Donald G. Watts ".an extraordinary presentation of concepts and methods concerning the use and analysis of nonlinear regression models.highly recommend[ed].for anyone needing to use and/or understand issues concerning the analysis of nonlinear regression models." --Technometrics This book provides a balance between theory and practice supported by extensive displays of instructive geometrical constructs. Numerous in-depth case studies illustrate the use of nonlinear regression analysis--with all data s
DEFF Research Database (Denmark)
Fitzenberger, Bernd; Wilke, Ralf Andreas
2015-01-01
if the mean regression model does not. We provide a short informal introduction into the principle of quantile regression which includes an illustrative application from empirical labor market research. This is followed by briefly sketching the underlying statistical model for linear quantile regression based......Quantile regression is emerging as a popular statistical approach, which complements the estimation of conditional mean models. While the latter only focuses on one aspect of the conditional distribution of the dependent variable, the mean, quantile regression provides more detailed insights...... by modeling conditional quantiles. Quantile regression can therefore detect whether the partial effect of a regressor on the conditional quantiles is the same for all quantiles or differs across quantiles. Quantile regression can provide evidence for a statistical relationship between two variables even...
An Application of Robust Method in Multiple Linear Regression Model toward Credit Card Debt
Amira Azmi, Nur; Saifullah Rusiman, Mohd; Khalid, Kamil; Roslan, Rozaini; Sufahani, Suliadi; Mohamad, Mahathir; Salleh, Rohayu Mohd; Hamzah, Nur Shamsidah Amir
2018-04-01
Credit card is a convenient alternative replaced cash or cheque, and it is essential component for electronic and internet commerce. In this study, the researchers attempt to determine the relationship and significance variables between credit card debt and demographic variables such as age, household income, education level, years with current employer, years at current address, debt to income ratio and other debt. The provided data covers 850 customers information. There are three methods that applied to the credit card debt data which are multiple linear regression (MLR) models, MLR models with least quartile difference (LQD) method and MLR models with mean absolute deviation method. After comparing among three methods, it is found that MLR model with LQD method became the best model with the lowest value of mean square error (MSE). According to the final model, it shows that the years with current employer, years at current address, household income in thousands and debt to income ratio are positively associated with the amount of credit debt. Meanwhile variables for age, level of education and other debt are negatively associated with amount of credit debt. This study may serve as a reference for the bank company by using robust methods, so that they could better understand their options and choice that is best aligned with their goals for inference regarding to the credit card debt.
Regression methods for medical research
Tai, Bee Choo
2013-01-01
Regression Methods for Medical Research provides medical researchers with the skills they need to critically read and interpret research using more advanced statistical methods. The statistical requirements of interpreting and publishing in medical journals, together with rapid changes in science and technology, increasingly demands an understanding of more complex and sophisticated analytic procedures.The text explains the application of statistical models to a wide variety of practical medical investigative studies and clinical trials. Regression methods are used to appropriately answer the
Ghavami, Raouf; Sadeghi, Faridoon; Rasouli, Zolikha; Djannati, Farhad
2012-12-01
Experimental values for the 13C NMR chemical shifts (ppm, TMS = 0) at 300 K ranging from 96.28 ppm (C4' of indole derivative 17) to 159.93 ppm (C4' of indole derivative 23) relative to deuteride chloroform (CDCl3, 77.0 ppm) or dimethylsulfoxide (DMSO, 39.50 ppm) as internal reference in CDCl3 or DMSO-d6 solutions have been collected from literature for thirty 2-functionalized 5-(methylsulfonyl)-1-phenyl-1H-indole derivatives containing different substituted groups. An effective quantitative structure-property relationship (QSPR) models were built using hybrid method combining genetic algorithm (GA) based on stepwise selection multiple linear regression (SWS-MLR) as feature-selection tools and correlation models between each carbon atom of indole derivative and calculated descriptors. Each compound was depicted by molecular structural descriptors that encode constitutional, topological, geometrical, electrostatic, and quantum chemical features. The accuracy of all developed models were confirmed using different types of internal and external procedures and various statistical tests. Furthermore, the domain of applicability for each model which indicates the area of reliable predictions was defined.
Deeb, Omar; Shaik, Basheerulla; Agrawal, Vijay K
2014-10-01
Quantitative Structure-Activity Relationship (QSAR) models for binding affinity constants (log Ki) of 78 flavonoid ligands towards the benzodiazepine site of GABA (A) receptor complex were calculated using the machine learning methods: artificial neural network (ANN) and support vector machine (SVM) techniques. The models obtained were compared with those obtained using multiple linear regression (MLR) analysis. The descriptor selection and model building were performed with 10-fold cross-validation using the training data set. The SVM and MLR coefficient of determination values are 0.944 and 0.879, respectively, for the training set and are higher than those of ANN models. Though the SVM model shows improvement of training set fitting, the ANN model was superior to SVM and MLR in predicting the test set. Randomization test is employed to check the suitability of the models.
Dai, Huanping; Micheyl, Christophe
2012-11-01
Psychophysical "reverse-correlation" methods allow researchers to gain insight into the perceptual representations and decision weighting strategies of individual subjects in perceptual tasks. Although these methods have gained momentum, until recently their development was limited to experiments involving only two response categories. Recently, two approaches for estimating decision weights in m-alternative experiments have been put forward. One approach extends the two-category correlation method to m > 2 alternatives; the second uses multinomial logistic regression (MLR). In this article, the relative merits of the two methods are discussed, and the issues of convergence and statistical efficiency of the methods are evaluated quantitatively using Monte Carlo simulations. The results indicate that, for a range of values of the number of trials, the estimated weighting patterns are closer to their asymptotic values for the correlation method than for the MLR method. Moreover, for the MLR method, weight estimates for different stimulus components can exhibit strong correlations, making the analysis and interpretation of measured weighting patterns less straightforward than for the correlation method. These and other advantages of the correlation method, which include computational simplicity and a close relationship to other well-established psychophysical reverse-correlation methods, make it an attractive tool to uncover decision strategies in m-alternative experiments.
Directory of Open Access Journals (Sweden)
Adi Syahputra
2014-03-01
Full Text Available Quantitative structure activity relationship (QSAR for 21 insecticides of phthalamides containing hydrazone (PCH was studied using multiple linear regression (MLR, principle component regression (PCR and artificial neural network (ANN. Five descriptors were included in the model for MLR and ANN analysis, and five latent variables obtained from principle component analysis (PCA were used in PCR analysis. Calculation of descriptors was performed using semi-empirical PM6 method. ANN analysis was found to be superior statistical technique compared to the other methods and gave a good correlation between descriptors and activity (r2 = 0.84. Based on the obtained model, we have successfully designed some new insecticides with higher predicted activity than those of previously synthesized compounds, e.g.2-(decalinecarbamoyl-5-chloro-N’-((5-methylthiophen-2-ylmethylene benzohydrazide, 2-(decalinecarbamoyl-5-chloro-N’-((thiophen-2-yl-methylene benzohydrazide and 2-(decaline carbamoyl-N’-(4-fluorobenzylidene-5-chlorobenzohydrazide with predicted log LC50 of 1.640, 1.672, and 1.769 respectively.
As a fast and effective technique, the multiple linear regression (MLR) method has been widely used in modeling and prediction of beach bacteria concentrations. Among previous works on this subject, however, several issues were insufficiently or inconsistently addressed. Those is...
Opensource Software for MLR-Modelling of Solar Collectors
DEFF Research Database (Denmark)
Bacher, Peder; Perers, Bengt
2011-01-01
A first research version is now in operation of a software package for multiple linear regression (MLR) modeling and analysis of solar collectors according to ideas originating all the way from Walletun et. al. (1986), Perers, (1987 and 1993). The tool has been implemented in the free and open...... source program R http://www.r-project.org/. Applications of the software package includes: visual validation, resampling and conversion of data, collector performance testing analysis according to the European Standard EN 12975 (Fischer et al., 2004), statistical validation of results...
Regression modeling methods, theory, and computation with SAS
Panik, Michael
2009-01-01
Regression Modeling: Methods, Theory, and Computation with SAS provides an introduction to a diverse assortment of regression techniques using SAS to solve a wide variety of regression problems. The author fully documents the SAS programs and thoroughly explains the output produced by the programs.The text presents the popular ordinary least squares (OLS) approach before introducing many alternative regression methods. It covers nonparametric regression, logistic regression (including Poisson regression), Bayesian regression, robust regression, fuzzy regression, random coefficients regression,
Stochastic development regression using method of moments
DEFF Research Database (Denmark)
Kühnel, Line; Sommer, Stefan Horst
2017-01-01
This paper considers the estimation problem arising when inferring parameters in the stochastic development regression model for manifold valued non-linear data. Stochastic development regression captures the relation between manifold-valued response and Euclidean covariate variables using...... the stochastic development construction. It is thereby able to incorporate several covariate variables and random effects. The model is intrinsically defined using the connection of the manifold, and the use of stochastic development avoids linearizing the geometry. We propose to infer parameters using...... the Method of Moments procedure that matches known constraints on moments of the observations conditional on the latent variables. The performance of the model is investigated in a simulation example using data on finite dimensional landmark manifolds....
Estimation of Anti-HIV Activity of HEPT Analogues Using MLR, ANN, and SVM Techniques
Directory of Open Access Journals (Sweden)
Basheerulla Shaik
2013-01-01
value than those of MLR and SVM techniques. Rm2= metrics and ridge regression analysis indicated that the proposed four-variable model MATS5e, RDF080u, T(O⋯O, and MATS5m as correlating descriptors is the best for estimating the anti-HIV activity (log 1/C present set of compounds.
Multiple Linear Regression: A Realistic Reflector.
Nutt, A. T.; Batsell, R. R.
Examples of the use of Multiple Linear Regression (MLR) techniques are presented. This is done to show how MLR aids data processing and decision-making by providing the decision-maker with freedom in phrasing questions and by accurately reflecting the data on hand. A brief overview of the rationale underlying MLR is given, some basic definitions…
Dziga, Dariusz; Zielinska, Gabriela; Wladyka, Benedykt; Bochenska, Oliwia; Maksylewicz, Anna; Strzalka, Wojciech; Meriluoto, Jussi
2016-03-16
Bacterial degradation of toxic microcystins produced by cyanobacteria is a common phenomenon. However, our understanding of the mechanisms of these processes is rudimentary. In this paper several novel discoveries regarding the action of the enzymes of the mlr cluster responsible for microcystin biodegradation are presented using recombinant proteins. In particular, the predicted active sites of the recombinant MlrB and MlrC were analyzed using functional enzymes and their inactive muteins. A new degradation intermediate, a hexapeptide derived from linearized microcystins by MlrC, was discovered. Furthermore, the involvement of MlrA and MlrB in further degradation of the hexapeptides was confirmed and a corrected biochemical pathway of microcystin biodegradation has been proposed.
Directory of Open Access Journals (Sweden)
Dariusz Dziga
2016-03-01
Full Text Available Bacterial degradation of toxic microcystins produced by cyanobacteria is a common phenomenon. However, our understanding of the mechanisms of these processes is rudimentary. In this paper several novel discoveries regarding the action of the enzymes of the mlr cluster responsible for microcystin biodegradation are presented using recombinant proteins. In particular, the predicted active sites of the recombinant MlrB and MlrC were analyzed using functional enzymes and their inactive muteins. A new degradation intermediate, a hexapeptide derived from linearized microcystins by MlrC, was discovered. Furthermore, the involvement of MlrA and MlrB in further degradation of the hexapeptides was confirmed and a corrected biochemical pathway of microcystin biodegradation has been proposed.
Method for nonlinear exponential regression analysis
Junkin, B. G.
1972-01-01
Two computer programs developed according to two general types of exponential models for conducting nonlinear exponential regression analysis are described. Least squares procedure is used in which the nonlinear problem is linearized by expanding in a Taylor series. Program is written in FORTRAN 5 for the Univac 1108 computer.
A method for nonlinear exponential regression analysis
Junkin, B. G.
1971-01-01
A computer-oriented technique is presented for performing a nonlinear exponential regression analysis on decay-type experimental data. The technique involves the least squares procedure wherein the nonlinear problem is linearized by expansion in a Taylor series. A linear curve fitting procedure for determining the initial nominal estimates for the unknown exponential model parameters is included as an integral part of the technique. A correction matrix was derived and then applied to the nominal estimate to produce an improved set of model parameters. The solution cycle is repeated until some predetermined criterion is satisfied.
EPMLR: sequence-based linear B-cell epitope prediction method using multiple linear regression.
Lian, Yao; Ge, Meng; Pan, Xian-Ming
2014-12-19
B-cell epitopes have been studied extensively due to their immunological applications, such as peptide-based vaccine development, antibody production, and disease diagnosis and therapy. Despite several decades of research, the accurate prediction of linear B-cell epitopes has remained a challenging task. In this work, based on the antigen's primary sequence information, a novel linear B-cell epitope prediction model was developed using the multiple linear regression (MLR). A 10-fold cross-validation test on a large non-redundant dataset was performed to evaluate the performance of our model. To alleviate the problem caused by the noise of negative dataset, 300 experiments utilizing 300 sub-datasets were performed. We achieved overall sensitivity of 81.8%, precision of 64.1% and area under the receiver operating characteristic curve (AUC) of 0.728. We have presented a reliable method for the identification of linear B cell epitope using antigen's primary sequence information. Moreover, a web server EPMLR has been developed for linear B-cell epitope prediction: http://www.bioinfo.tsinghua.edu.cn/epitope/EPMLR/ .
Haji Gholizadeh, Mohammad; Melesse, Assefa M; Reddi, Lakshmi
2016-10-01
In this study, principal component analysis (PCA), factor analysis (FA), and the absolute principal component score-multiple linear regression (APCS-MLR) receptor modeling technique were used to assess the water quality and identify and quantify the potential pollution sources affecting the water quality of three major rivers of South Florida. For this purpose, 15years (2000-2014) dataset of 12 water quality variables covering 16 monitoring stations, and approximately 35,000 observations was used. The PCA/FA method identified five and four potential pollution sources in wet and dry seasons, respectively, and the effective mechanisms, rules and causes were explained. The APCS-MLR apportioned their contributions to each water quality variable. Results showed that the point source pollution discharges from anthropogenic factors due to the discharge of agriculture waste and domestic and industrial wastewater were the major sources of river water contamination. Also, the studied variables were categorized into three groups of nutrients (total kjeldahl nitrogen, total phosphorus, total phosphate, and ammonia-N), water murkiness conducive parameters (total suspended solids, turbidity, and chlorophyll-a), and salt ions (magnesium, chloride, and sodium), and average contributions of different potential pollution sources to these categories were considered separately. The data matrix was also subjected to PMF receptor model using the EPA PMF-5.0 program and the two-way model described was performed for the PMF analyses. Comparison of the obtained results of PMF and APCS-MLR models showed that there were some significant differences in estimated contribution for each potential pollution source, especially in the wet season. Eventually, it was concluded that the APCS-MLR receptor modeling approach appears to be more physically plausible for the current study. It is believed that the results of apportionment could be very useful to the local authorities for the control and
Rahmati, Mehdi
2017-08-01
Developing accurate and reliable pedo-transfer functions (PTFs) to predict soil non-readily available characteristics is one of the most concerned topic in soil science and selecting more appropriate predictors is a crucial factor in PTFs' development. Group method of data handling (GMDH), which finds an approximate relationship between a set of input and output variables, not only provide an explicit procedure to select the most essential PTF input variables, but also results in more accurate and reliable estimates than other mostly applied methodologies. Therefore, the current research was aimed to apply GMDH in comparison with multivariate linear regression (MLR) and artificial neural network (ANN) to develop several PTFs to predict soil cumulative infiltration point-basely at specific time intervals (0.5-45 min) using soil readily available characteristics (RACs). In this regard, soil infiltration curves as well as several soil RACs including soil primary particles (clay (CC), silt (Si), and sand (Sa)), saturated hydraulic conductivity (Ks), bulk (Db) and particle (Dp) densities, organic carbon (OC), wet-aggregate stability (WAS), electrical conductivity (EC), and soil antecedent (θi) and field saturated (θfs) water contents were measured at 134 different points in Lighvan watershed, northwest of Iran. Then, applying GMDH, MLR, and ANN methodologies, several PTFs have been developed to predict cumulative infiltrations using two sets of selected soil RACs including and excluding Ks. According to the test data, results showed that developed PTFs by GMDH and MLR procedures using all soil RACs including Ks resulted in more accurate (with E values of 0.673-0.963) and reliable (with CV values lower than 11 percent) predictions of cumulative infiltrations at different specific time steps. In contrast, ANN procedure had lower accuracy (with E values of 0.356-0.890) and reliability (with CV values up to 50 percent) compared to GMDH and MLR. The results also revealed
Ridge regression estimator: combining unbiased and ordinary ridge regression methods of estimation
Directory of Open Access Journals (Sweden)
Sharad Damodar Gore
2009-10-01
Full Text Available Statistical literature has several methods for coping with multicollinearity. This paper introduces a new shrinkage estimator, called modified unbiased ridge (MUR. This estimator is obtained from unbiased ridge regression (URR in the same way that ordinary ridge regression (ORR is obtained from ordinary least squares (OLS. Properties of MUR are derived. Results on its matrix mean squared error (MMSE are obtained. MUR is compared with ORR and URR in terms of MMSE. These results are illustrated with an example based on data generated by Hoerl and Kennard (1975.
A multiple regression method for genomewide association studies ...
Indian Academy of Sciences (India)
Bujun Mei
2018-06-07
Jun 7, 2018 ... Similar to the typical genomewide association tests using LD ... new approach performed validly when the multiple regression based on linkage method was employed. .... the model, two groups of scenarios were simulated.
BOX-COX REGRESSION METHOD IN TIME SCALING
Directory of Open Access Journals (Sweden)
ATİLLA GÖKTAŞ
2013-06-01
Full Text Available Box-Cox regression method with λj, for j = 1, 2, ..., k, power transformation can be used when dependent variable and error term of the linear regression model do not satisfy the continuity and normality assumptions. The situation obtaining the smallest mean square error when optimum power λj, transformation for j = 1, 2, ..., k, of Y has been discussed. Box-Cox regression method is especially appropriate to adjust existence skewness or heteroscedasticity of error terms for a nonlinear functional relationship between dependent and explanatory variables. In this study, the advantage and disadvantage use of Box-Cox regression method have been discussed in differentiation and differantial analysis of time scale concept.
On two flexible methods of 2-dimensional regression analysis
Czech Academy of Sciences Publication Activity Database
Volf, Petr
2012-01-01
Roč. 18, č. 4 (2012), s. 154-164 ISSN 1803-9782 Grant - others:GA ČR(CZ) GAP209/10/2045 Institutional support: RVO:67985556 Keywords : regression analysis * Gordon surface * prediction error * projection pursuit Subject RIV: BB - Applied Statistics, Operational Research http://library.utia.cas.cz/separaty/2013/SI/volf-on two flexible methods of 2-dimensional regression analysis.pdf
Prediction of size-fractionated airborne particle-bound metals using MLR, BP-ANN and SVM analyses.
Leng, Xiang'zi; Wang, Jinhua; Ji, Haibo; Wang, Qin'geng; Li, Huiming; Qian, Xin; Li, Fengying; Yang, Meng
2017-08-01
Size-fractionated heavy metal concentrations were observed in airborne particulate matter (PM) samples collected from 2014 to 2015 (spanning all four seasons) from suburban (Xianlin) and industrial (Pukou) areas in Nanjing, a megacity of southeast China. Rapid prediction models of size-fractionated metals were established based on multiple linear regression (MLR), back propagation artificial neural network (BP-ANN) and support vector machine (SVM) by using meteorological factors and PM concentrations as input parameters. About 38% and 77% of PM 2.5 concentrations in Xianlin and Pukou, respectively, were beyond the Chinese National Ambient Air Quality Standard limit of 75 μg/m 3 . Nearly all elements had higher concentrations in industrial areas, and in winter among the four seasons. Anthropogenic elements such as Pb, Zn, Cd and Cu showed larger percentages in the fine fraction (ø≤2.5 μm), whereas the crustal elements including Al, Ba, Fe, Ni, Sr and Ti showed larger percentages in the coarse fraction (ø > 2.5 μm). SVM showed a higher training correlation coefficient (R), and lower mean absolute error (MAE) as well as lower root mean square error (RMSE), than MLR and BP-ANN for most metals. All the three methods showed better prediction results for Ni, Al, V, Cd and As, whereas relatively poor for Cr and Fe. The daily airborne metal concentrations in 2015 were then predicted by the fully trained SVM models and the results showed the heaviest pollution of airborne heavy metals occurred in December and January, whereas the lightest pollution occurred in June and July. Copyright © 2017 Elsevier Ltd. All rights reserved.
Thermal Efficiency Degradation Diagnosis Method Using Regression Model
International Nuclear Information System (INIS)
Jee, Chang Hyun; Heo, Gyun Young; Jang, Seok Won; Lee, In Cheol
2011-01-01
This paper proposes an idea for thermal efficiency degradation diagnosis in turbine cycles, which is based on turbine cycle simulation under abnormal conditions and a linear regression model. The correlation between the inputs for representing degradation conditions (normally unmeasured but intrinsic states) and the simulation outputs (normally measured but superficial states) was analyzed with the linear regression model. The regression models can inversely response an associated intrinsic state for a superficial state observed from a power plant. The diagnosis method proposed herein is classified into three processes, 1) simulations for degradation conditions to get measured states (referred as what-if method), 2) development of the linear model correlating intrinsic and superficial states, and 3) determination of an intrinsic state using the superficial states of current plant and the linear regression model (referred as inverse what-if method). The what-if method is to generate the outputs for the inputs including various root causes and/or boundary conditions whereas the inverse what-if method is the process of calculating the inverse matrix with the given superficial states, that is, component degradation modes. The method suggested in this paper was validated using the turbine cycle model for an operating power plant
Linear regression methods a ccording to objective functions
Yasemin Sisman; Sebahattin Bektas
2012-01-01
The aim of the study is to explain the parameter estimation methods and the regression analysis. The simple linear regressionmethods grouped according to the objective function are introduced. The numerical solution is achieved for the simple linear regressionmethods according to objective function of Least Squares and theLeast Absolute Value adjustment methods. The success of the appliedmethods is analyzed using their objective function values.
Comparing parametric and nonparametric regression methods for panel data
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard; Henningsen, Arne
We investigate and compare the suitability of parametric and non-parametric stochastic regression methods for analysing production technologies and the optimal firm size. Our theoretical analysis shows that the most commonly used functional forms in empirical production analysis, Cobb......-Douglas and Translog, are unsuitable for analysing the optimal firm size. We show that the Translog functional form implies an implausible linear relationship between the (logarithmic) firm size and the elasticity of scale, where the slope is artificially related to the substitutability between the inputs....... The practical applicability of the parametric and non-parametric regression methods is scrutinised and compared by an empirical example: we analyse the production technology and investigate the optimal size of Polish crop farms based on a firm-level balanced panel data set. A nonparametric specification test...
Directory of Open Access Journals (Sweden)
Avval Zhila Mohajeri
2015-01-01
Full Text Available This paper deals with developing a linear quantitative structure-activity relationship (QSAR model for predicting the RSK inhibition activity of some new compounds. A dataset consisting of 62 pyrazino [1,2-α] indole, diazepino [1,2-α] indole, and imidazole derivatives with known inhibitory activities was used. Multiple linear regressions (MLR technique combined with the stepwise (SW and the genetic algorithm (GA methods as variable selection tools was employed. For more checking stability, robustness and predictability of the proposed models, internal and external validation techniques were used. Comparison of the results obtained, indicate that the GA-MLR model is superior to the SW-MLR model and that it isapplicable for designing novel RSK inhibitors.
FATAL, General Experiment Fitting Program by Nonlinear Regression Method
International Nuclear Information System (INIS)
Salmon, L.; Budd, T.; Marshall, M.
1982-01-01
1 - Description of problem or function: A generalized fitting program with a free-format keyword interface to the user. It permits experimental data to be fitted by non-linear regression methods to any function describable by the user. The user requires the minimum of computer experience but needs to provide a subroutine to define his function. Some statistical output is included as well as 'best' estimates of the function's parameters. 2 - Method of solution: The regression method used is based on a minimization technique devised by Powell (Harwell Subroutine Library VA05A, 1972) which does not require the use of analytical derivatives. The method employs a quasi-Newton procedure balanced with a steepest descent correction. Experience shows this to be efficient for a very wide range of application. 3 - Restrictions on the complexity of the problem: The current version of the program permits functions to be defined with up to 20 parameters. The function may be fitted to a maximum of 400 points, preferably with estimated values of weight given
and Multinomial Logistic Regression
African Journals Online (AJOL)
This work presented the results of an experimental comparison of two models: Multinomial Logistic Regression (MLR) and Artificial Neural Network (ANN) for classifying students based on their academic performance. The predictive accuracy for each model was measured by their average Classification Correct Rate (CCR).
Ng, Kar Yong; Awang, Norhashidah
2018-01-06
Frequent haze occurrences in Malaysia have made the management of PM 10 (particulate matter with aerodynamic less than 10 μm) pollution a critical task. This requires knowledge on factors associating with PM 10 variation and good forecast of PM 10 concentrations. Hence, this paper demonstrates the prediction of 1-day-ahead daily average PM 10 concentrations based on predictor variables including meteorological parameters and gaseous pollutants. Three different models were built. They were multiple linear regression (MLR) model with lagged predictor variables (MLR1), MLR model with lagged predictor variables and PM 10 concentrations (MLR2) and regression with time series error (RTSE) model. The findings revealed that humidity, temperature, wind speed, wind direction, carbon monoxide and ozone were the main factors explaining the PM 10 variation in Peninsular Malaysia. Comparison among the three models showed that MLR2 model was on a same level with RTSE model in terms of forecasting accuracy, while MLR1 model was the worst.
Mapping urban environmental noise: a land use regression method.
Xie, Dan; Liu, Yi; Chen, Jining
2011-09-01
Forecasting and preventing urban noise pollution are major challenges in urban environmental management. Most existing efforts, including experiment-based models, statistical models, and noise mapping, however, have limited capacity to explain the association between urban growth and corresponding noise change. Therefore, these conventional methods can hardly forecast urban noise at a given outlook of development layout. This paper, for the first time, introduces a land use regression method, which has been applied for simulating urban air quality for a decade, to construct an urban noise model (LUNOS) in Dalian Municipality, Northwest China. The LUNOS model describes noise as a dependent variable of surrounding various land areas via a regressive function. The results suggest that a linear model performs better in fitting monitoring data, and there is no significant difference of the LUNOS's outputs when applied to different spatial scales. As the LUNOS facilitates a better understanding of the association between land use and urban environmental noise in comparison to conventional methods, it can be regarded as a promising tool for noise prediction for planning purposes and aid smart decision-making.
Application of Soft Computing Techniques and Multiple Regression Models for CBR prediction of Soils
Directory of Open Access Journals (Sweden)
Fatimah Khaleel Ibrahim
2017-08-01
Full Text Available The techniques of soft computing technique such as Artificial Neutral Network (ANN have improved the predicting capability and have actually discovered application in Geotechnical engineering. The aim of this research is to utilize the soft computing technique and Multiple Regression Models (MLR for forecasting the California bearing ratio CBR( of soil from its index properties. The indicator of CBR for soil could be predicted from various soils characterizing parameters with the assist of MLR and ANN methods. The data base that collected from the laboratory by conducting tests on 86 soil samples that gathered from different projects in Basrah districts. Data gained from the experimental result were used in the regression models and soft computing techniques by using artificial neural network. The liquid limit, plastic index , modified compaction test and the CBR test have been determined. In this work, different ANN and MLR models were formulated with the different collection of inputs to be able to recognize their significance in the prediction of CBR. The strengths of the models that were developed been examined in terms of regression coefficient (R2, relative error (RE% and mean square error (MSE values. From the results of this paper, it absolutely was noticed that all the proposed ANN models perform better than that of MLR model. In a specific ANN model with all input parameters reveals better outcomes than other ANN models.
Dimension Reduction and Discretization in Stochastic Problems by Regression Method
DEFF Research Database (Denmark)
Ditlevsen, Ove Dalager
1996-01-01
The chapter mainly deals with dimension reduction and field discretizations based directly on the concept of linear regression. Several examples of interesting applications in stochastic mechanics are also given.Keywords: Random fields discretization, Linear regression, Stochastic interpolation, ...
Analyzing Big Data with the Hybrid Interval Regression Methods
Directory of Open Access Journals (Sweden)
Chia-Hui Huang
2014-01-01
Full Text Available Big data is a new trend at present, forcing the significant impacts on information technologies. In big data applications, one of the most concerned issues is dealing with large-scale data sets that often require computation resources provided by public cloud services. How to analyze big data efficiently becomes a big challenge. In this paper, we collaborate interval regression with the smooth support vector machine (SSVM to analyze big data. Recently, the smooth support vector machine (SSVM was proposed as an alternative of the standard SVM that has been proved more efficient than the traditional SVM in processing large-scale data. In addition the soft margin method is proposed to modify the excursion of separation margin and to be effective in the gray zone that the distribution of data becomes hard to be described and the separation margin between classes.
DEFF Research Database (Denmark)
Sharifzadeh, Sara; Skytte, Jacob Lercke; Nielsen, Otto Højager Attermann
2012-01-01
Statistical solutions find wide spread use in food and medicine quality control. We investigate the effect of different regression and sparse regression methods for a viscosity estimation problem using the spectro-temporal features from new Sub-Surface Laser Scattering (SLS) vision system. From...... with sparse LAR, lasso and Elastic Net (EN) sparse regression methods. Due to the inconsistent measurement condition, Locally Weighted Scatter plot Smoothing (Loess) has been employed to alleviate the undesired variation in the estimated viscosity. The experimental results of applying different methods show...
Qu, Mingkai; Wang, Yan; Huang, Biao; Zhao, Yongcun
2018-06-01
The traditional source apportionment models, such as absolute principal component scores-multiple linear regression (APCS-MLR), are usually susceptible to outliers, which may be widely present in the regional geochemical dataset. Furthermore, the models are merely built on variable space instead of geographical space and thus cannot effectively capture the local spatial characteristics of each source contributions. To overcome the limitations, a new receptor model, robust absolute principal component scores-robust geographically weighted regression (RAPCS-RGWR), was proposed based on the traditional APCS-MLR model. Then, the new method was applied to the source apportionment of soil metal elements in a region of Wuhan City, China as a case study. Evaluations revealed that: (i) RAPCS-RGWR model had better performance than APCS-MLR model in the identification of the major sources of soil metal elements, and (ii) source contributions estimated by RAPCS-RGWR model were more close to the true soil metal concentrations than that estimated by APCS-MLR model. It is shown that the proposed RAPCS-RGWR model is a more effective source apportionment method than APCS-MLR (i.e., non-robust and global model) in dealing with the regional geochemical dataset. Copyright © 2018 Elsevier B.V. All rights reserved.
Methods of Detecting Outliers in A Regression Analysis Model ...
African Journals Online (AJOL)
PROF. O. E. OSUAGWU
2013-06-01
Jun 1, 2013 ... especially true in observational studies .... Simple linear regression and multiple ... The simple linear ..... Grubbs,F.E (1950): Sample Criteria for Testing Outlying observations: Annals of ... In experimental design, the Relative.
Analysis of some methods for reduced rank Gaussian process regression
DEFF Research Database (Denmark)
Quinonero-Candela, J.; Rasmussen, Carl Edward
2005-01-01
While there is strong motivation for using Gaussian Processes (GPs) due to their excellent performance in regression and classification problems, their computational complexity makes them impractical when the size of the training set exceeds a few thousand cases. This has motivated the recent...... proliferation of a number of cost-effective approximations to GPs, both for classification and for regression. In this paper we analyze one popular approximation to GPs for regression: the reduced rank approximation. While generally GPs are equivalent to infinite linear models, we show that Reduced Rank...... Gaussian Processes (RRGPs) are equivalent to finite sparse linear models. We also introduce the concept of degenerate GPs and show that they correspond to inappropriate priors. We show how to modify the RRGP to prevent it from being degenerate at test time. Training RRGPs consists both in learning...
2010-12-30
...-AA06 Health Insurance Issuers Implementing Medical Loss Ratio (MLR) Requirements Under the Patient... Register (FR Doc 2010-29596 (75 FR 74864)) entitled ``Health Insurance Issuers Implementing Medical Loss... request for comments entitled ``Health Insurance Issuers Implementing Medical Loss Ratio (MLR...
Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula
2011-01-01
Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.
DEFF Research Database (Denmark)
Kirkeby, Carsten Thure; Hisham Beshara Halasa, Tariq; Gussmann, Maya Katrin
2017-01-01
the transmission rate. We use data from the two simulation models and vary the sampling intervals and the size of the population sampled. We devise two new methods to determine transmission rate, and compare these to the frequently used Poisson regression method in both epidemic and endemic situations. For most...... tested scenarios these new methods perform similar or better than Poisson regression, especially in the case of long sampling intervals. We conclude that transmission rate estimates are easily biased, which is important to take into account when using these rates in simulation models....
Helmreich, James E.; Krog, K. Peter
2018-01-01
We present a short, inquiry-based learning course on concepts and methods underlying ordinary least squares (OLS), least absolute deviation (LAD), and quantile regression (QR). Students investigate squared, absolute, and weighted absolute distance functions (metrics) as location measures. Using differential calculus and properties of convex…
International Nuclear Information System (INIS)
Ghasemi, Jahanbakhsh; Asadpour, Saeid; Abdolmaleki, Azizeh
2007-01-01
A quantitative structure-retention relationship (QSRR) study, has been carried out on the gas chromatograph/electron capture detector (GC/ECD) system retention times (t R s) of 38 diverse chlorinated pesticides, herbicides, and organohalides by using molecular structural descriptors. Modeling of retention times of these compounds as a function of the theoretically derived descriptors was established by multiple linear regression (MLR) and partial least squares (PLS) regression. The stepwise regression using SPSS was used for the selection of the variables that resulted in the best-fitted models. Appropriate models with low standard errors and high correlation coefficients were obtained. Three types of molecular descriptors including electronic, steric and thermodynamic were used to develop a quantitative relationship between the retention times and structural properties. MLR and PLS analysis has been carried out to derive the best QSRR models. After variables selection, MLR and PLS methods used with leave-one-out cross validation for building the regression models. The predictive quality of the QSRR models were tested for an external prediction set of 12 compounds randomly chosen from 38 compounds. The PLS regression method was used to model the structure-retention relationships, more accurately. However, the results surprisingly showed more or less the same quality for MLR and PLS modeling according to squared regression coefficients R 2 which were 0.951 and 0.948 for MLR and PLS, respectively
Dexter, Jason; Dziga, Dariusz; Lv, Jing; Zhu, Junqi; Strzalka, Wojciech; Maksylewicz, Anna; Maroszek, Magdalena; Marek, Sylwia; Fu, Pengcheng
2018-06-01
In this report, we establish proof-of-principle demonstrating for the first time genetic engineering of a photoautotrophic microorganism for bioremediation of naturally occurring cyanotoxins. In model cyanobacterium Synechocystis sp. PCC 6803 we have heterologously expressed Sphingopyxis sp. USTB-05 microcystinase (MlrA) bearing a 23 amino acid N-terminus secretion peptide from native Synechocystis sp. PCC 6803 PilA (sll1694). The resultant whole cell biocatalyst displayed about 3 times higher activity against microcystin-LR compared to a native MlrA host (Sphingomonas sp. ACM 3962), normalized for optical density. In addition, MlrA activity was found to be almost entirely located in the cyanobacterial cytosolic fraction, despite the presence of the secretion tag, with crude cellular extracts showing MlrA activity comparable to extracts from MlrA expressing E. coli. Furthermore, despite approximately 9.4-fold higher initial MlrA activity of a whole cell E. coli biocatalyst, utilization of a photoautotrophic chassis resulted in prolonged stability of MlrA activity when cultured under semi-natural conditions (using lake water), with the heterologous MlrA biocatalytic activity of the E. coli culture disappearing after 4 days, while the cyanobacterial host displayed activity (3% of initial activity) after 9 days. In addition, the cyanobacterial cell density was maintained over the duration of this experiment while the cell density of the E. coli culture rapidly declined. Lastly, failure to establish a stable cyanobacterial isolate expressing native MlrA (without the N-terminus tag) via the strong cpcB560 promoter draws attention to the use of peptide tags to positively modulate expression of potentially toxic proteins. Copyright © 2018 Elsevier Ltd. All rights reserved.
Directory of Open Access Journals (Sweden)
Sarkhosh Maryam
2012-05-01
Full Text Available Abstract A quantitative structure-property relationship (QSPR study is suggested for the prediction of retention times of volatile organic compounds. Various kinds of molecular descriptors were calculated to represent the molecular structure of compounds. Modeling of retention times of these compounds as a function of the theoretically derived descriptors was established by multiple linear regression (MLR and artificial neural network (ANN. The stepwise regression was used for the selection of the variables which gives the best-fitted models. After variable selection ANN, MLR methods were used with leave-one-out cross validation for building the regression models. The prediction results are in very good agreement with the experimental values. MLR as the linear regression method shows good ability in the prediction of the retention times of the prediction set. This provided a new and effective method for predicting the chromatography retention index for the volatile organic compounds.
International Nuclear Information System (INIS)
Lisa, C.; Ungureanu, M.; Cosmaţchi, P.C.; Bolat, G.
2015-01-01
Graphical abstract: - Highlights: • Thermodynamic properties of the ethylbenzene–octane–propylbenzene system. • Equations with much lower standard deviations in comparison with other models. • The prediction of the V E based on the refractive index by means of the MLR method. - Abstract: The density (ρ) and the refractive index (n) have been experimentally determined for the ethylbenzene (1)–octane (2)–propylbenzene (3) ternary system in the entire variation range of the composition, at three temperatures: 298.15, 308.15 and 318.15 K and pressure 0.1 MPa. The excess thermodynamic properties that had been calculated based on the experimental determinations have been used to build empirical models which, despite of the disadvantage of having a greater number of coefficients, result in much lower standard deviations in comparison with the Redlich–Kister type models. The statistical processing of experimental data by means of the multiple linear regression method (MLR) was used in order to model the excess thermodynamic properties. Lower standard deviations than the Redlich–Kister type models were also obtained. The adjustment of the excess molar volume (V E ) based on refractive index by means of the Multiple linear regression of the SigmaPlot 11.2 program was made for the ethylbenzene (1)–octane (2)–propylbenzene (3) ternary system, obtaining a simple mathematical model which correlates the excess molar volume with the refractive index, the normalized temperature and the composition of the ternary mixture: V E = A 0 + A 1 X 1 + A 2 X 2 + A 3 (T/298.15) + A 4 n for which the standard deviation is 0.03.
Finding-equal regression method and its application in predication of U resources
International Nuclear Information System (INIS)
Cao Huimo
1995-03-01
The commonly adopted deposit model method in mineral resources predication has two main part: one is model data that show up geological mineralization law for deposit, the other is statistics predication method that accords with characters of the data namely pretty regression method. This kind of regression method may be called finding-equal regression, which is made of the linear regression and distribution finding-equal method. Because distribution finding-equal method is a data pretreatment which accords with advanced mathematical precondition for the linear regression namely equal distribution theory, and this kind of data pretreatment is possible of realization. Therefore finding-equal regression not only can overcome nonlinear limitations, that are commonly occurred in traditional linear regression or other regression and always have no solution, but also can distinguish outliers and eliminate its weak influence, which would usually appeared when Robust regression possesses outlier in independent variables. Thus this newly finding-equal regression stands the best status in all kind of regression methods. Finally, two good examples of U resource quantitative predication are provided
ATLS Hypovolemic Shock Classification by Prediction of Blood Loss in Rats Using Regression Models.
Choi, Soo Beom; Choi, Joon Yul; Park, Jee Soo; Kim, Deok Won
2016-07-01
In our previous study, our input data set consisted of 78 rats, the blood loss in percent as a dependent variable, and 11 independent variables (heart rate, systolic blood pressure, diastolic blood pressure, mean arterial pressure, pulse pressure, respiration rate, temperature, perfusion index, lactate concentration, shock index, and new index (lactate concentration/perfusion)). The machine learning methods for multicategory classification were applied to a rat model in acute hemorrhage to predict the four Advanced Trauma Life Support (ATLS) hypovolemic shock classes for triage in our previous study. However, multicategory classification is much more difficult and complicated than binary classification. We introduce a simple approach for classifying ATLS hypovolaemic shock class by predicting blood loss in percent using support vector regression and multivariate linear regression (MLR). We also compared the performance of the classification models using absolute and relative vital signs. The accuracies of support vector regression and MLR models with relative values by predicting blood loss in percent were 88.5% and 84.6%, respectively. These were better than the best accuracy of 80.8% of the direct multicategory classification using the support vector machine one-versus-one model in our previous study for the same validation data set. Moreover, the simple MLR models with both absolute and relative values could provide possibility of the future clinical decision support system for ATLS classification. The perfusion index and new index were more appropriate with relative changes than absolute values.
Directory of Open Access Journals (Sweden)
Hailun Wang
2017-01-01
Full Text Available Support vector regression algorithm is widely used in fault diagnosis of rolling bearing. A new model parameter selection method for support vector regression based on adaptive fusion of the mixed kernel function is proposed in this paper. We choose the mixed kernel function as the kernel function of support vector regression. The mixed kernel function of the fusion coefficients, kernel function parameters, and regression parameters are combined together as the parameters of the state vector. Thus, the model selection problem is transformed into a nonlinear system state estimation problem. We use a 5th-degree cubature Kalman filter to estimate the parameters. In this way, we realize the adaptive selection of mixed kernel function weighted coefficients and the kernel parameters, the regression parameters. Compared with a single kernel function, unscented Kalman filter (UKF support vector regression algorithms, and genetic algorithms, the decision regression function obtained by the proposed method has better generalization ability and higher prediction accuracy.
Deviance-Related Responses along the Auditory Hierarchy: Combined FFR, MLR and MMN Evidence
Shiga, Tetsuya; Althen, Heike; Cornella, Miriam; Zarnowiec, Katarzyna; Yabe, Hirooki; Escera, Carles
2015-01-01
The mismatch negativity (MMN) provides a correlate of automatic auditory discrimination in human auditory cortex that is elicited in response to violation of any acoustic regularity. Recently, deviance-related responses were found at much earlier cortical processing stages as reflected by the middle latency response (MLR) of the auditory evoked potential, and even at the level of the auditory brainstem as reflected by the frequency following response (FFR). However, no study has reported deviance-related responses in the FFR, MLR and long latency response (LLR) concurrently in a single recording protocol. Amplitude-modulated (AM) sounds were presented to healthy human participants in a frequency oddball paradigm to investigate deviance-related responses along the auditory hierarchy in the ranges of FFR, MLR and LLR. AM frequency deviants modulated the FFR, the Na and Nb components of the MLR, and the LLR eliciting the MMN. These findings demonstrate that it is possible to elicit deviance-related responses at three different levels (FFR, MLR and LLR) in one single recording protocol, highlight the involvement of the whole auditory hierarchy in deviance detection and have implications for cognitive and clinical auditory neuroscience. Moreover, the present protocol provides a new research tool into clinical neuroscience so that the functional integrity of the auditory novelty system can now be tested as a whole in a range of clinical populations where the MMN was previously shown to be defective. PMID:26348628
The Use of Nonparametric Kernel Regression Methods in Econometric Production Analysis
DEFF Research Database (Denmark)
Czekaj, Tomasz Gerard
and nonparametric estimations of production functions in order to evaluate the optimal firm size. The second paper discusses the use of parametric and nonparametric regression methods to estimate panel data regression models. The third paper analyses production risk, price uncertainty, and farmers' risk preferences...... within a nonparametric panel data regression framework. The fourth paper analyses the technical efficiency of dairy farms with environmental output using nonparametric kernel regression in a semiparametric stochastic frontier analysis. The results provided in this PhD thesis show that nonparametric......This PhD thesis addresses one of the fundamental problems in applied econometric analysis, namely the econometric estimation of regression functions. The conventional approach to regression analysis is the parametric approach, which requires the researcher to specify the form of the regression...
Deconinck, E; Zhang, M H; Petitet, F; Dubus, E; Ijjaali, I; Coomans, D; Vander Heyden, Y
2008-02-18
The use of some unconventional non-linear modeling techniques, i.e. classification and regression trees and multivariate adaptive regression splines-based methods, was explored to model the blood-brain barrier (BBB) passage of drugs and drug-like molecules. The data set contains BBB passage values for 299 structural and pharmacological diverse drugs, originating from a structured knowledge-based database. Models were built using boosted regression trees (BRT) and multivariate adaptive regression splines (MARS), as well as their respective combinations with stepwise multiple linear regression (MLR) and partial least squares (PLS) regression in two-step approaches. The best models were obtained using combinations of MARS with either stepwise MLR or PLS. It could be concluded that the use of combinations of a linear with a non-linear modeling technique results in some improved properties compared to the individual linear and non-linear models and that, when the use of such a combination is appropriate, combinations using MARS as non-linear technique should be preferred over those with BRT, due to some serious drawbacks of the BRT approaches.
Easy methods for extracting individual regression slopes: Comparing SPSS, R, and Excel
Directory of Open Access Journals (Sweden)
Roland Pfister
2013-10-01
Full Text Available Three different methods for extracting coefficientsof linear regression analyses are presented. The focus is on automatic and easy-to-use approaches for common statistical packages: SPSS, R, and MS Excel / LibreOffice Calc. Hands-on examples are included for each analysis, followed by a brief description of how a subsequent regression coefficient analysis is performed.
Anderson, Carl A; McRae, Allan F; Visscher, Peter M
2006-07-01
Standard quantitative trait loci (QTL) mapping techniques commonly assume that the trait is both fully observed and normally distributed. When considering survival or age-at-onset traits these assumptions are often incorrect. Methods have been developed to map QTL for survival traits; however, they are both computationally intensive and not available in standard genome analysis software packages. We propose a grouped linear regression method for the analysis of continuous survival data. Using simulation we compare this method to both the Cox and Weibull proportional hazards models and a standard linear regression method that ignores censoring. The grouped linear regression method is of equivalent power to both the Cox and Weibull proportional hazards methods and is significantly better than the standard linear regression method when censored observations are present. The method is also robust to the proportion of censored individuals and the underlying distribution of the trait. On the basis of linear regression methodology, the grouped linear regression model is computationally simple and fast and can be implemented readily in freely available statistical software.
Gusriani, N.; Firdaniza
2018-03-01
The existence of outliers on multiple linear regression analysis causes the Gaussian assumption to be unfulfilled. If the Least Square method is forcedly used on these data, it will produce a model that cannot represent most data. For that, we need a robust regression method against outliers. This paper will compare the Minimum Covariance Determinant (MCD) method and the TELBS method on secondary data on the productivity of phytoplankton, which contains outliers. Based on the robust determinant coefficient value, MCD method produces a better model compared to TELBS method.
Energy Technology Data Exchange (ETDEWEB)
Lopez Fontan, J.L.; Costa, J.; Ruso, J.M.; Prieto, G. [Dept. of Applied Physics, Univ. of Santiago de Compostela, Santiago de Compostela (Spain); Sarmiento, F. [Dept. of Mathematics, Faculty of Informatics, Univ. of A Coruna, A Coruna (Spain)
2004-02-01
The application of a statistical method, the local polynomial regression method, (LPRM), based on a nonparametric estimation of the regression function to determine the critical micelle concentration (cmc) is presented. The method is extremely flexible because it does not impose any parametric model on the subjacent structure of the data but rather allows the data to speak for themselves. Good concordance of cmc values with those obtained by other methods was found for systems in which the variation of a measured physical property with concentration showed an abrupt change. When this variation was slow, discrepancies between the values obtained by LPRM and others methods were found. (orig.)
2010-12-01
... Part III Department of Health and Human Services 45 CFR Part 158 Health Insurance Issuers... 0950-AA06 Health Insurance Issuers Implementing Medical Loss Ratio (MLR) Requirements Under the Patient... health insurance issuers under the Public Health Service Act, as added by the Patient Protection and...
2012-05-16
... DEPARTMENT OF HEALTH AND HUMAN SERVICES 45 CFR Part 158 [CMS-9998-IFC3] Health Insurance Issuers..., entitled ``Health Insurance Issuers Implementing Medical Loss Ratio (MLR) Requirements Under the Patient...) requirements for health insurance issuers under section 2718 of the Public Health Service Act, as added by the...
Fuzzy Linear Regression for the Time Series Data which is Fuzzified with SMRGT Method
Directory of Open Access Journals (Sweden)
Seçil YALAZ
2016-10-01
Full Text Available Our work on regression and classification provides a new contribution to the analysis of time series used in many areas for years. Owing to the fact that convergence could not obtained with the methods used in autocorrelation fixing process faced with time series regression application, success is not met or fall into obligation of changing the models’ degree. Changing the models’ degree may not be desirable in every situation. In our study, recommended for these situations, time series data was fuzzified by using the simple membership function and fuzzy rule generation technique (SMRGT and to estimate future an equation has created by applying fuzzy least square regression (FLSR method which is a simple linear regression method to this data. Although SMRGT has success in determining the flow discharge in open channels and can be used confidently for flow discharge modeling in open canals, as well as in pipe flow with some modifications, there is no clue about that this technique is successful in fuzzy linear regression modeling. Therefore, in order to address the luck of such a modeling, a new hybrid model has been described within this study. In conclusion, to demonstrate our methods’ efficiency, classical linear regression for time series data and linear regression for fuzzy time series data were applied to two different data sets, and these two approaches performances were compared by using different measures.
An NCME Instructional Module on Data Mining Methods for Classification and Regression
Sinharay, Sandip
2016-01-01
Data mining methods for classification and regression are becoming increasingly popular in various scientific fields. However, these methods have not been explored much in educational measurement. This module first provides a review, which should be accessible to a wide audience in education measurement, of some of these methods. The module then…
Cohen, Ayala; Nahum-Shani, Inbal; Doveh, Etti
2010-01-01
In their seminal paper, Edwards and Parry (1993) presented the polynomial regression as a better alternative to applying difference score in the study of congruence. Although this method is increasingly applied in congruence research, its complexity relative to other methods for assessing congruence (e.g., difference score methods) was one of the…
Statistical approach for selection of regression model during validation of bioanalytical method
Directory of Open Access Journals (Sweden)
Natalija Nakov
2014-06-01
Full Text Available The selection of an adequate regression model is the basis for obtaining accurate and reproducible results during the bionalytical method validation. Given the wide concentration range, frequently present in bioanalytical assays, heteroscedasticity of the data may be expected. Several weighted linear and quadratic regression models were evaluated during the selection of the adequate curve fit using nonparametric statistical tests: One sample rank test and Wilcoxon signed rank test for two independent groups of samples. The results obtained with One sample rank test could not give statistical justification for the selection of linear vs. quadratic regression models because slight differences between the error (presented through the relative residuals were obtained. Estimation of the significance of the differences in the RR was achieved using Wilcoxon signed rank test, where linear and quadratic regression models were treated as two independent groups. The application of this simple non-parametric statistical test provides statistical confirmation of the choice of an adequate regression model.
Directory of Open Access Journals (Sweden)
ELİF BULUT
2013-06-01
Full Text Available Partial Least Squares Regression (PLSR is a multivariate statistical method that consists of partial least squares and multiple linear regression analysis. Explanatory variables, X, having multicollinearity are reduced to components which explain the great amount of covariance between explanatory and response variable. These components are few in number and they don’t have multicollinearity problem. Then multiple linear regression analysis is applied to those components to model the response variable Y. There are various PLSR algorithms. In this study NIPALS and PLS-Kernel algorithms will be studied and illustrated on a real data set.
The Bland-Altman Method Should Not Be Used in Regression Cross-Validation Studies
O'Connor, Daniel P.; Mahar, Matthew T.; Laughlin, Mitzi S.; Jackson, Andrew S.
2011-01-01
The purpose of this study was to demonstrate the bias in the Bland-Altman (BA) limits of agreement method when it is used to validate regression models. Data from 1,158 men were used to develop three regression equations to estimate maximum oxygen uptake (R[superscript 2] = 0.40, 0.61, and 0.82, respectively). The equations were evaluated in a…
Sparling, D.W.; Barzen, J.A.; Lovvorn, J.R.; Serie, J.R.
1992-01-01
Regression equations that use mensural data to estimate body condition have been developed for several water birds. These equations often have been based on data that represent different sexes, age classes, or seasons, without being adequately tested for intergroup differences. We used proximate carcass analysis of 538 adult and juvenile canvasbacks (Aythya valisineria ) collected during fall migration, winter, and spring migrations in 1975-76 and 1982-85 to test regression methods for estimating body condition.
Treating experimental data of inverse kinetic method by unitary linear regression analysis
International Nuclear Information System (INIS)
Zhao Yusen; Chen Xiaoliang
2009-01-01
The theory of treating experimental data of inverse kinetic method by unitary linear regression analysis was described. Not only the reactivity, but also the effective neutron source intensity could be calculated by this method. Computer code was compiled base on the inverse kinetic method and unitary linear regression analysis. The data of zero power facility BFS-1 in Russia were processed and the results were compared. The results show that the reactivity and the effective neutron source intensity can be obtained correctly by treating experimental data of inverse kinetic method using unitary linear regression analysis and the precision of reactivity measurement is improved. The central element efficiency can be calculated by using the reactivity. The result also shows that the effect to reactivity measurement caused by external neutron source should be considered when the reactor power is low and the intensity of external neutron source is strong. (authors)
Regression Methods for Virtual Metrology of Layer Thickness in Chemical Vapor Deposition
DEFF Research Database (Denmark)
Purwins, Hendrik; Barak, Bernd; Nagi, Ahmed
2014-01-01
The quality of wafer production in semiconductor manufacturing cannot always be monitored by a costly physical measurement. Instead of measuring a quantity directly, it can be predicted by a regression method (Virtual Metrology). In this paper, a survey on regression methods is given to predict...... average Silicon Nitride cap layer thickness for the Plasma Enhanced Chemical Vapor Deposition (PECVD) dual-layer metal passivation stack process. Process and production equipment Fault Detection and Classification (FDC) data are used as predictor variables. Various variable sets are compared: one most...... algorithm, and Support Vector Regression (SVR). On a test set, SVR outperforms the other methods by a large margin, being more robust towards changes in the production conditions. The method performs better on high-dimensional multivariate input data than on the most predictive variables alone. Process...
Statistical methods in regression and calibration analysis of chromosome aberration data
International Nuclear Information System (INIS)
Merkle, W.
1983-01-01
The method of iteratively reweighted least squares for the regression analysis of Poisson distributed chromosome aberration data is reviewed in the context of other fit procedures used in the cytogenetic literature. As an application of the resulting regression curves methods for calculating confidence intervals on dose from aberration yield are described and compared, and, for the linear quadratic model a confidence interval is given. Emphasis is placed on the rational interpretation and the limitations of various methods from a statistical point of view. (orig./MG)
Thompson, Russel L.
Homoscedasticity is an important assumption of linear regression. This paper explains what it is and why it is important to the researcher. Graphical and mathematical methods for testing the homoscedasticity assumption are demonstrated. Sources of homoscedasticity and types of homoscedasticity are discussed, and methods for correction are…
Calculation of U, Ra, Th and K contents in uranium ore by multiple linear regression method
International Nuclear Information System (INIS)
Lin Chao; Chen Yingqiang; Zhang Qingwen; Tan Fuwen; Peng Guanghui
1991-01-01
A multiple linear regression method was used to compute γ spectra of uranium ore samples and to calculate contents of U, Ra, Th, and K. In comparison with the inverse matrix method, its advantage is that no standard samples of pure U, Ra, Th and K are needed for obtaining response coefficients
Martens, Edwin P; de Boer, Anthonius; Pestman, Wiebe R; Belitser, Svetlana V; Stricker, Bruno H Ch; Klungel, Olaf H
PURPOSE: To compare adjusted effects of drug treatment for hypertension on the risk of stroke from propensity score (PS) methods with a multivariable Cox proportional hazards (Cox PH) regression in an observational study with censored data. METHODS: From two prospective population-based cohort
Predicting Fuel Ignition Quality Using 1H NMR Spectroscopy and Multiple Linear Regression
Abdul Jameel, Abdul Gani; Naser, Nimal; Emwas, Abdul-Hamid M.; Dooley, Stephen; Sarathy, Mani
2016-01-01
An improved model for the prediction of ignition quality of hydrocarbon fuels has been developed using 1H nuclear magnetic resonance (NMR) spectroscopy and multiple linear regression (MLR) modeling. Cetane number (CN) and derived cetane number (DCN
Whole-Genome Regression and Prediction Methods Applied to Plant and Animal Breeding
de los Campos, Gustavo; Hickey, John M.; Pong-Wong, Ricardo; Daetwyler, Hans D.; Calus, Mario P. L.
2013-01-01
Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade. PMID:22745228
Hu, L; Liang, M; Mouraux, A; Wise, R G; Hu, Y; Iannetti, G D
2011-12-01
Across-trial averaging is a widely used approach to enhance the signal-to-noise ratio (SNR) of event-related potentials (ERPs). However, across-trial variability of ERP latency and amplitude may contain physiologically relevant information that is lost by across-trial averaging. Hence, we aimed to develop a novel method that uses 1) wavelet filtering (WF) to enhance the SNR of ERPs and 2) a multiple linear regression with a dispersion term (MLR(d)) that takes into account shape distortions to estimate the single-trial latency and amplitude of ERP peaks. Using simulated ERP data sets containing different levels of noise, we provide evidence that, compared with other approaches, the proposed WF+MLR(d) method yields the most accurate estimate of single-trial ERP features. When applied to a real laser-evoked potential data set, the WF+MLR(d) approach provides reliable estimation of single-trial latency, amplitude, and morphology of ERPs and thereby allows performing meaningful correlations at single-trial level. We obtained three main findings. First, WF significantly enhances the SNR of single-trial ERPs. Second, MLR(d) effectively captures and measures the variability in the morphology of single-trial ERPs, thus providing an accurate and unbiased estimate of their peak latency and amplitude. Third, intensity of pain perception significantly correlates with the single-trial estimates of N2 and P2 amplitude. These results indicate that WF+MLR(d) can be used to explore the dynamics between different ERP features, behavioral variables, and other neuroimaging measures of brain activity, thus providing new insights into the functional significance of the different brain processes underlying the brain responses to sensory stimuli.
An improved partial least-squares regression method for Raman spectroscopy
Momenpour Tehran Monfared, Ali; Anis, Hanan
2017-10-01
It is known that the performance of partial least-squares (PLS) regression analysis can be improved using the backward variable selection method (BVSPLS). In this paper, we further improve the BVSPLS based on a novel selection mechanism. The proposed method is based on sorting the weighted regression coefficients, and then the importance of each variable of the sorted list is evaluated using root mean square errors of prediction (RMSEP) criterion in each iteration step. Our Improved BVSPLS (IBVSPLS) method has been applied to leukemia and heparin data sets and led to an improvement in limit of detection of Raman biosensing ranged from 10% to 43% compared to PLS. Our IBVSPLS was also compared to the jack-knifing (simpler) and Genetic Algorithm (more complex) methods. Our method was consistently better than the jack-knifing method and showed either a similar or a better performance compared to the genetic algorithm.
Wang, Jiangbo; Liu, Junhui; Li, Tiantian; Yin, Shuo; He, Xinhui
2018-01-01
The monthly electricity sales forecasting is a basic work to ensure the safety of the power system. This paper presented a monthly electricity sales forecasting method which comprehensively considers the coupled multi-factors of temperature, economic growth, electric power replacement and business expansion. The mathematical model is constructed by using regression method. The simulation results show that the proposed method is accurate and effective.
International Nuclear Information System (INIS)
Shuke, Noriyuki
1991-01-01
In hepatobiliary scintigraphy, kinetic model analysis, which provides kinetic parameters like hepatic extraction or excretion rate, have been done for quantitative evaluation of liver function. In this analysis, unknown model parameters are usually determined using nonlinear least square regression method (NLS method) where iterative calculation and initial estimate for unknown parameters are required. As a simple alternative to NLS method, direct integral linear least square regression method (DILS method), which can determine model parameters by a simple calculation without initial estimate, is proposed, and tested the applicability to analysis of hepatobiliary scintigraphy. In order to see whether DILS method could determine model parameters as good as NLS method, or to determine appropriate weight for DILS method, simulated theoretical data based on prefixed parameters were fitted to 1 compartment model using both DILS method with various weightings and NLS method. The parameter values obtained were then compared with prefixed values which were used for data generation. The effect of various weights on the error of parameter estimate was examined, and inverse of time was found to be the best weight to make the error minimum. When using this weight, DILS method could give parameter values close to those obtained by NLS method and both parameter values were very close to prefixed values. With appropriate weighting, the DILS method could provide reliable parameter estimate which is relatively insensitive to the data noise. In conclusion, the DILS method could be used as a simple alternative to NLS method, providing reliable parameter estimate. (author)
A different approach to estimate nonlinear regression model using numerical methods
Mahaboob, B.; Venkateswarlu, B.; Mokeshrayalu, G.; Balasiddamuni, P.
2017-11-01
This research paper concerns with the computational methods namely the Gauss-Newton method, Gradient algorithm methods (Newton-Raphson method, Steepest Descent or Steepest Ascent algorithm method, the Method of Scoring, the Method of Quadratic Hill-Climbing) based on numerical analysis to estimate parameters of nonlinear regression model in a very different way. Principles of matrix calculus have been used to discuss the Gradient-Algorithm methods. Yonathan Bard [1] discussed a comparison of gradient methods for the solution of nonlinear parameter estimation problems. However this article discusses an analytical approach to the gradient algorithm methods in a different way. This paper describes a new iterative technique namely Gauss-Newton method which differs from the iterative technique proposed by Gorden K. Smyth [2]. Hans Georg Bock et.al [10] proposed numerical methods for parameter estimation in DAE’s (Differential algebraic equation). Isabel Reis Dos Santos et al [11], Introduced weighted least squares procedure for estimating the unknown parameters of a nonlinear regression metamodel. For large-scale non smooth convex minimization the Hager and Zhang (HZ) conjugate gradient Method and the modified HZ (MHZ) method were presented by Gonglin Yuan et al [12].
Regression dilution bias: tools for correction methods and sample size calculation.
Berglund, Lars
2012-08-01
Random errors in measurement of a risk factor will introduce downward bias of an estimated association to a disease or a disease marker. This phenomenon is called regression dilution bias. A bias correction may be made with data from a validity study or a reliability study. In this article we give a non-technical description of designs of reliability studies with emphasis on selection of individuals for a repeated measurement, assumptions of measurement error models, and correction methods for the slope in a simple linear regression model where the dependent variable is a continuous variable. Also, we describe situations where correction for regression dilution bias is not appropriate. The methods are illustrated with the association between insulin sensitivity measured with the euglycaemic insulin clamp technique and fasting insulin, where measurement of the latter variable carries noticeable random error. We provide software tools for estimation of a corrected slope in a simple linear regression model assuming data for a continuous dependent variable and a continuous risk factor from a main study and an additional measurement of the risk factor in a reliability study. Also, we supply programs for estimation of the number of individuals needed in the reliability study and for choice of its design. Our conclusion is that correction for regression dilution bias is seldom applied in epidemiological studies. This may cause important effects of risk factors with large measurement errors to be neglected.
African Journals Online (AJOL)
model was further illustrated using various evaluation techniques: leave- one- out ... minimum energy conformation were obtained ..... The distribution of errors for the ... are distributed on both sides of the zero line, .... of systems in solution.
Directory of Open Access Journals (Sweden)
Yi-Ming Kuo
2011-06-01
Full Text Available Fine airborne particulate matter (PM2.5 has adverse effects on human health. Assessing the long-term effects of PM2.5 exposure on human health and ecology is often limited by a lack of reliable PM2.5 measurements. In Taipei, PM2.5 levels were not systematically measured until August, 2005. Due to the popularity of geographic information systems (GIS, the landuse regression method has been widely used in the spatial estimation of PM concentrations. This method accounts for the potential contributing factors of the local environment, such as traffic volume. Geostatistical methods, on other hand, account for the spatiotemporal dependence among the observations of ambient pollutants. This study assesses the performance of the landuse regression model for the spatiotemporal estimation of PM2.5 in the Taipei area. Specifically, this study integrates the landuse regression model with the geostatistical approach within the framework of the Bayesian maximum entropy (BME method. The resulting epistemic framework can assimilate knowledge bases including: (a empirical-based spatial trends of PM concentration based on landuse regression, (b the spatio-temporal dependence among PM observation information, and (c site-specific PM observations. The proposed approach performs the spatiotemporal estimation of PM2.5 levels in the Taipei area (Taiwan from 2005–2007.
Yu, Hwa-Lung; Wang, Chih-Hsih; Liu, Ming-Che; Kuo, Yi-Ming
2011-06-01
Fine airborne particulate matter (PM2.5) has adverse effects on human health. Assessing the long-term effects of PM2.5 exposure on human health and ecology is often limited by a lack of reliable PM2.5 measurements. In Taipei, PM2.5 levels were not systematically measured until August, 2005. Due to the popularity of geographic information systems (GIS), the landuse regression method has been widely used in the spatial estimation of PM concentrations. This method accounts for the potential contributing factors of the local environment, such as traffic volume. Geostatistical methods, on other hand, account for the spatiotemporal dependence among the observations of ambient pollutants. This study assesses the performance of the landuse regression model for the spatiotemporal estimation of PM2.5 in the Taipei area. Specifically, this study integrates the landuse regression model with the geostatistical approach within the framework of the Bayesian maximum entropy (BME) method. The resulting epistemic framework can assimilate knowledge bases including: (a) empirical-based spatial trends of PM concentration based on landuse regression, (b) the spatio-temporal dependence among PM observation information, and (c) site-specific PM observations. The proposed approach performs the spatiotemporal estimation of PM2.5 levels in the Taipei area (Taiwan) from 2005-2007.
A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants
Cooper, Paul D.
2010-01-01
A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…
Bianca N.I. Eskelson; Hailemariam Temesgen; Tara M. Barrett
2009-01-01
Cavity tree and snag abundance data are highly variable and contain many zero observations. We predict cavity tree and snag abundance from variables that are readily available from forest cover maps or remotely sensed data using negative binomial (NB), zero-inflated NB, and zero-altered NB (ZANB) regression models as well as nearest neighbor (NN) imputation methods....
Cox regression with missing covariate data using a modified partial likelihood method
DEFF Research Database (Denmark)
Martinussen, Torben; Holst, Klaus K.; Scheike, Thomas H.
2016-01-01
Missing covariate values is a common problem in survival analysis. In this paper we propose a novel method for the Cox regression model that is close to maximum likelihood but avoids the use of the EM-algorithm. It exploits that the observed hazard function is multiplicative in the baseline hazard...
Convert a low-cost sensor to a colorimeter using an improved regression method
Wu, Yifeng
2008-01-01
Closed loop color calibration is a process to maintain consistent color reproduction for color printers. To perform closed loop color calibration, a pre-designed color target should be printed, and automatically measured by a color measuring instrument. A low cost sensor has been embedded to the printer to perform the color measurement. A series of sensor calibration and color conversion methods have been developed. The purpose is to get accurate colorimetric measurement from the data measured by the low cost sensor. In order to get high accuracy colorimetric measurement, we need carefully calibrate the sensor, and minimize all possible errors during the color conversion. After comparing several classical color conversion methods, a regression based color conversion method has been selected. The regression is a powerful method to estimate the color conversion functions. But the main difficulty to use this method is to find an appropriate function to describe the relationship between the input and the output data. In this paper, we propose to use 1D pre-linearization tables to improve the linearity between the input sensor measuring data and the output colorimetric data. Using this method, we can increase the accuracy of the regression method, so as to improve the accuracy of the color conversion.
Sidik, S. M.
1975-01-01
Ridge, Marquardt's generalized inverse, shrunken, and principal components estimators are discussed in terms of the objectives of point estimation of parameters, estimation of the predictive regression function, and hypothesis testing. It is found that as the normal equations approach singularity, more consideration must be given to estimable functions of the parameters as opposed to estimation of the full parameter vector; that biased estimators all introduce constraints on the parameter space; that adoption of mean squared error as a criterion of goodness should be independent of the degree of singularity; and that ordinary least-squares subset regression is the best overall method.
Li, Hongjian; Leung, Kwong-Sak; Wong, Man-Hon; Ballester, Pedro J
2014-08-27
State-of-the-art protein-ligand docking methods are generally limited by the traditionally low accuracy of their scoring functions, which are used to predict binding affinity and thus vital for discriminating between active and inactive compounds. Despite intensive research over the years, classical scoring functions have reached a plateau in their predictive performance. These assume a predetermined additive functional form for some sophisticated numerical features, and use standard multivariate linear regression (MLR) on experimental data to derive the coefficients. In this study we show that such a simple functional form is detrimental for the prediction performance of a scoring function, and replacing linear regression by machine learning techniques like random forest (RF) can improve prediction performance. We investigate the conditions of applying RF under various contexts and find that given sufficient training samples RF manages to comprehensively capture the non-linearity between structural features and measured binding affinities. Incorporating more structural features and training with more samples can both boost RF performance. In addition, we analyze the importance of structural features to binding affinity prediction using the RF variable importance tool. Lastly, we use Cyscore, a top performing empirical scoring function, as a baseline for comparison study. Machine-learning scoring functions are fundamentally different from classical scoring functions because the former circumvents the fixed functional form relating structural features with binding affinities. RF, but not MLR, can effectively exploit more structural features and more training samples, leading to higher prediction performance. The future availability of more X-ray crystal structures will further widen the performance gap between RF-based and MLR-based scoring functions. This further stresses the importance of substituting RF for MLR in scoring function development.
Anderson, Carl A.; McRae, Allan F.; Visscher, Peter M.
2006-01-01
Standard quantitative trait loci (QTL) mapping techniques commonly assume that the trait is both fully observed and normally distributed. When considering survival or age-at-onset traits these assumptions are often incorrect. Methods have been developed to map QTL for survival traits; however, they are both computationally intensive and not available in standard genome analysis software packages. We propose a grouped linear regression method for the analysis of continuous survival data. Using...
A Comparative Study of Pairwise Learning Methods Based on Kernel Ridge Regression.
Stock, Michiel; Pahikkala, Tapio; Airola, Antti; De Baets, Bernard; Waegeman, Willem
2018-06-12
Many machine learning problems can be formulated as predicting labels for a pair of objects. Problems of that kind are often referred to as pairwise learning, dyadic prediction, or network inference problems. During the past decade, kernel methods have played a dominant role in pairwise learning. They still obtain a state-of-the-art predictive performance, but a theoretical analysis of their behavior has been underexplored in the machine learning literature. In this work we review and unify kernel-based algorithms that are commonly used in different pairwise learning settings, ranging from matrix filtering to zero-shot learning. To this end, we focus on closed-form efficient instantiations of Kronecker kernel ridge regression. We show that independent task kernel ridge regression, two-step kernel ridge regression, and a linear matrix filter arise naturally as a special case of Kronecker kernel ridge regression, implying that all these methods implicitly minimize a squared loss. In addition, we analyze universality, consistency, and spectral filtering properties. Our theoretical results provide valuable insights into assessing the advantages and limitations of existing pairwise learning methods.
Estimation Methods for Non-Homogeneous Regression - Minimum CRPS vs Maximum Likelihood
Gebetsberger, Manuel; Messner, Jakob W.; Mayr, Georg J.; Zeileis, Achim
2017-04-01
Non-homogeneous regression models are widely used to statistically post-process numerical weather prediction models. Such regression models correct for errors in mean and variance and are capable to forecast a full probability distribution. In order to estimate the corresponding regression coefficients, CRPS minimization is performed in many meteorological post-processing studies since the last decade. In contrast to maximum likelihood estimation, CRPS minimization is claimed to yield more calibrated forecasts. Theoretically, both scoring rules used as an optimization score should be able to locate a similar and unknown optimum. Discrepancies might result from a wrong distributional assumption of the observed quantity. To address this theoretical concept, this study compares maximum likelihood and minimum CRPS estimation for different distributional assumptions. First, a synthetic case study shows that, for an appropriate distributional assumption, both estimation methods yield to similar regression coefficients. The log-likelihood estimator is slightly more efficient. A real world case study for surface temperature forecasts at different sites in Europe confirms these results but shows that surface temperature does not always follow the classical assumption of a Gaussian distribution. KEYWORDS: ensemble post-processing, maximum likelihood estimation, CRPS minimization, probabilistic temperature forecasting, distributional regression models
A Fast Gradient Method for Nonnegative Sparse Regression With Self-Dictionary
Gillis, Nicolas; Luce, Robert
2018-01-01
A nonnegative matrix factorization (NMF) can be computed efficiently under the separability assumption, which asserts that all the columns of the given input data matrix belong to the cone generated by a (small) subset of them. The provably most robust methods to identify these conic basis columns are based on nonnegative sparse regression and self dictionaries, and require the solution of large-scale convex optimization problems. In this paper we study a particular nonnegative sparse regression model with self dictionary. As opposed to previously proposed models, this model yields a smooth optimization problem where the sparsity is enforced through linear constraints. We show that the Euclidean projection on the polyhedron defined by these constraints can be computed efficiently, and propose a fast gradient method to solve our model. We compare our algorithm with several state-of-the-art methods on synthetic data sets and real-world hyperspectral images.
Van Belle, Vanya; Pelckmans, Kristiaan; Van Huffel, Sabine; Suykens, Johan A K
2011-10-01
To compare and evaluate ranking, regression and combined machine learning approaches for the analysis of survival data. The literature describes two approaches based on support vector machines to deal with censored observations. In the first approach the key idea is to rephrase the task as a ranking problem via the concordance index, a problem which can be solved efficiently in a context of structural risk minimization and convex optimization techniques. In a second approach, one uses a regression approach, dealing with censoring by means of inequality constraints. The goal of this paper is then twofold: (i) introducing a new model combining the ranking and regression strategy, which retains the link with existing survival models such as the proportional hazards model via transformation models; and (ii) comparison of the three techniques on 6 clinical and 3 high-dimensional datasets and discussing the relevance of these techniques over classical approaches fur survival data. We compare svm-based survival models based on ranking constraints, based on regression constraints and models based on both ranking and regression constraints. The performance of the models is compared by means of three different measures: (i) the concordance index, measuring the model's discriminating ability; (ii) the logrank test statistic, indicating whether patients with a prognostic index lower than the median prognostic index have a significant different survival than patients with a prognostic index higher than the median; and (iii) the hazard ratio after normalization to restrict the prognostic index between 0 and 1. Our results indicate a significantly better performance for models including regression constraints above models only based on ranking constraints. This work gives empirical evidence that svm-based models using regression constraints perform significantly better than svm-based models based on ranking constraints. Our experiments show a comparable performance for methods
Directory of Open Access Journals (Sweden)
Giuliano de Oliveira Freitas
2013-10-01
Full Text Available PURPOSE: To determine linear regression models between Alpins descriptive indices and Thibos astigmatic power vectors (APV, assessing the validity and strength of such correlations. METHODS: This case series prospectively assessed 62 eyes of 31 consecutive cataract patients with preoperative corneal astigmatism between 0.75 and 2.50 diopters in both eyes. Patients were randomly assorted among two phacoemulsification groups: one assigned to receive AcrySof®Toric intraocular lens (IOL in both eyes and another assigned to have AcrySof Natural IOL associated with limbal relaxing incisions, also in both eyes. All patients were reevaluated postoperatively at 6 months, when refractive astigmatism analysis was performed using both Alpins and Thibos methods. The ratio between Thibos postoperative APV and preoperative APV (APVratio and its linear regression to Alpins percentage of success of astigmatic surgery, percentage of astigmatism corrected and percentage of astigmatism reduction at the intended axis were assessed. RESULTS: Significant negative correlation between the ratio of post- and preoperative Thibos APVratio and Alpins percentage of success (%Success was found (Spearman's ρ=-0.93; linear regression is given by the following equation: %Success = (-APVratio + 1.00x100. CONCLUSION: The linear regression we found between APVratio and %Success permits a validated mathematical inference concerning the overall success of astigmatic surgery.
Using the fuzzy linear regression method to benchmark the energy efficiency of commercial buildings
International Nuclear Information System (INIS)
Chung, William
2012-01-01
Highlights: ► Fuzzy linear regression method is used for developing benchmarking systems. ► The systems can be used to benchmark energy efficiency of commercial buildings. ► The resulting benchmarking model can be used by public users. ► The resulting benchmarking model can capture the fuzzy nature of input–output data. -- Abstract: Benchmarking systems from a sample of reference buildings need to be developed to conduct benchmarking processes for the energy efficiency of commercial buildings. However, not all benchmarking systems can be adopted by public users (i.e., other non-reference building owners) because of the different methods in developing such systems. An approach for benchmarking the energy efficiency of commercial buildings using statistical regression analysis to normalize other factors, such as management performance, was developed in a previous work. However, the field data given by experts can be regarded as a distribution of possibility. Thus, the previous work may not be adequate to handle such fuzzy input–output data. Consequently, a number of fuzzy structures cannot be fully captured by statistical regression analysis. This present paper proposes the use of fuzzy linear regression analysis to develop a benchmarking process, the resulting model of which can be used by public users. An illustrative example is given as well.
Directory of Open Access Journals (Sweden)
Fereshteh Shiri
2010-08-01
Full Text Available In the present work, support vector machines (SVMs and multiple linear regression (MLR techniques were used for quantitative structure–property relationship (QSPR studies of retention time (tR in standardized liquid chromatography–UV–mass spectrometry of 67 mycotoxins (aflatoxins, trichothecenes, roquefortines and ochratoxins based on molecular descriptors calculated from the optimized 3D structures. By applying missing value, zero and multicollinearity tests with a cutoff value of 0.95, and genetic algorithm method of variable selection, the most relevant descriptors were selected to build QSPR models. MLRand SVMs methods were employed to build QSPR models. The robustness of the QSPR models was characterized by the statistical validation and applicability domain (AD. The prediction results from the MLR and SVM models are in good agreement with the experimental values. The correlation and predictability measure by r2 and q2 are 0.931 and 0.932, repectively, for SVM and 0.923 and 0.915, respectively, for MLR. The applicability domain of the model was investigated using William’s plot. The effects of different descriptors on the retention times are described.
Directory of Open Access Journals (Sweden)
Massoud Tabesh
2011-07-01
Full Text Available Optimum operation of water distribution networks is one of the priorities of sustainable development of water resources, considering the issues of increasing efficiency and decreasing the water losses. One of the key subjects in optimum operational management of water distribution systems is preparing rehabilitation and replacement schemes, prediction of pipes break rate and evaluation of their reliability. Several approaches have been presented in recent years regarding prediction of pipe failure rates which each one requires especial data sets. Deterministic models based on age and deterministic multi variables and stochastic group modeling are examples of the solutions which relate pipe break rates to parameters like age, material and diameters. In this paper besides the mentioned parameters, more factors such as pipe depth and hydraulic pressures are considered as well. Then using multi variable regression method, intelligent approaches (Artificial neural network and neuro fuzzy models and Evolutionary polynomial Regression method (EPR pipe burst rate are predicted. To evaluate the results of different approaches, a case study is carried out in a part ofMashhadwater distribution network. The results show the capability and advantages of ANN and EPR methods to predict pipe break rates, in comparison with neuro fuzzy and multi-variable regression methods.
Kaneko, Hiromasa
2018-02-26
To develop a new ensemble learning method and construct highly predictive regression models in chemoinformatics and chemometrics, applicability domains (ADs) are introduced into the ensemble learning process of prediction. When estimating values of an objective variable using subregression models, only the submodels with ADs that cover a query sample, i.e., the sample is inside the model's AD, are used. By constructing submodels and changing a list of selected explanatory variables, the union of the submodels' ADs, which defines the overall AD, becomes large, and the prediction performance is enhanced for diverse compounds. By analyzing a quantitative structure-activity relationship data set and a quantitative structure-property relationship data set, it is confirmed that the ADs can be enlarged and the estimation performance of regression models is improved compared with traditional methods.
Development of Compressive Failure Strength for Composite Laminate Using Regression Analysis Method
Energy Technology Data Exchange (ETDEWEB)
Lee, Myoung Keon [Agency for Defense Development, Daejeon (Korea, Republic of); Lee, Jeong Won; Yoon, Dong Hyun; Kim, Jae Hoon [Chungnam Nat’l Univ., Daejeon (Korea, Republic of)
2016-10-15
This paper provides the compressive failure strength value of composite laminate developed by using regression analysis method. Composite material in this document is a Carbon/Epoxy unidirection(UD) tape prepreg(Cycom G40-800/5276-1) cured at 350°F(177°C). The operating temperature is –60°F~+200°F(-55°C - +95°C). A total of 56 compression tests were conducted on specimens from eight (8) distinct laminates that were laid up by standard angle layers (0°, +45°, –45° and 90°). The ASTM-D-6484 standard was used for test method. The regression analysis was performed with the response variable being the laminate ultimate fracture strength and the regressor variables being two ply orientations (0° and ±45°)
Development of Compressive Failure Strength for Composite Laminate Using Regression Analysis Method
International Nuclear Information System (INIS)
Lee, Myoung Keon; Lee, Jeong Won; Yoon, Dong Hyun; Kim, Jae Hoon
2016-01-01
This paper provides the compressive failure strength value of composite laminate developed by using regression analysis method. Composite material in this document is a Carbon/Epoxy unidirection(UD) tape prepreg(Cycom G40-800/5276-1) cured at 350°F(177°C). The operating temperature is –60°F~+200°F(-55°C - +95°C). A total of 56 compression tests were conducted on specimens from eight (8) distinct laminates that were laid up by standard angle layers (0°, +45°, –45° and 90°). The ASTM-D-6484 standard was used for test method. The regression analysis was performed with the response variable being the laminate ultimate fracture strength and the regressor variables being two ply orientations (0° and ±45°)
Assessing the accuracy of ANFIS, EEMD-GRNN, PCR, and MLR models in predicting PM2.5
Ausati, Shadi; Amanollahi, Jamil
2016-10-01
Since Sanandaj is considered one of polluted cities of Iran, prediction of any type of pollution especially prediction of suspended particles of PM2.5, which are the cause of many diseases, could contribute to health of society by timely announcements and prior to increase of PM2.5. In order to predict PM2.5 concentration in the Sanandaj air the hybrid models consisting of an ensemble empirical mode decomposition and general regression neural network (EEMD-GRNN), Adaptive Neuro-Fuzzy Inference System (ANFIS), principal component regression (PCR), and linear model such as multiple liner regression (MLR) model were used. In these models the data of suspended particles of PM2.5 were the dependent variable and the data related to air quality including PM2.5, PM10, SO2, NO2, CO, O3 and meteorological data including average minimum temperature (Min T), average maximum temperature (Max T), average atmospheric pressure (AP), daily total precipitation (TP), daily relative humidity level of the air (RH) and daily wind speed (WS) for the year 2014 in Sanandaj were the independent variables. Among the used models, EEMD-GRNN model with values of R2 = 0.90, root mean square error (RMSE) = 4.9218 and mean absolute error (MAE) = 3.4644 in the training phase and with values of R2 = 0.79, RMSE = 5.0324 and MAE = 3.2565 in the testing phase, exhibited the best function in predicting this phenomenon. It can be concluded that hybrid models have accurate results to predict PM2.5 concentration compared with linear model.
James W. Hardin; Henrik Schmeidiche; Raymond J. Carroll
2003-01-01
This paper discusses and illustrates the method of regression calibration. This is a straightforward technique for fitting models with additive measurement error. We present this discussion in terms of generalized linear models (GLMs) following the notation defined in Hardin and Carroll (2003). Discussion will include specified measurement error, measurement error estimated by replicate error-prone proxies, and measurement error estimated by instrumental variables. The discussion focuses on s...
Assessing the performance of variational methods for mixed logistic regression models
Czech Academy of Sciences Publication Activity Database
Rijmen, F.; Vomlel, Jiří
2008-01-01
Roč. 78, č. 8 (2008), s. 765-779 ISSN 0094-9655 R&D Projects: GA MŠk 1M0572 Grant - others:GA MŠk(CZ) 2C06019 Institutional research plan: CEZ:AV0Z10750506 Keywords : Mixed models * Logistic regression * Variational methods * Lower bound approximation Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.353, year: 2008
Comparison of Adaline and Multiple Linear Regression Methods for Rainfall Forecasting
Sutawinaya, IP; Astawa, INGA; Hariyanti, NKD
2018-01-01
Heavy rainfall can cause disaster, therefore need a forecast to predict rainfall intensity. Main factor that cause flooding is there is a high rainfall intensity and it makes the river become overcapacity. This will cause flooding around the area. Rainfall factor is a dynamic factor, so rainfall is very interesting to be studied. In order to support the rainfall forecasting, there are methods that can be used from Artificial Intelligence (AI) to statistic. In this research, we used Adaline for AI method and Regression for statistic method. The more accurate forecast result shows the method that used is good for forecasting the rainfall. Through those methods, we expected which is the best method for rainfall forecasting here.
Hassanzadeh, S.; Hosseinibalam, F.; Omidvari, M.
2008-04-01
Data of seven meteorological variables (relative humidity, wet temperature, dry temperature, maximum temperature, minimum temperature, ground temperature and sun radiation time) and ozone values have been used for statistical analysis. Meteorological variables and ozone values were analyzed using both multiple linear regression and principal component methods. Data for the period 1999-2004 are analyzed jointly using both methods. For all periods, temperature dependent variables were highly correlated, but were all negatively correlated with relative humidity. Multiple regression analysis was used to fit the meteorological variables using the meteorological variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to obtain subsets of the predictor variables to be included in the linear regression model of the meteorological variables. In 1999, 2001 and 2002 one of the meteorological variables was weakly influenced predominantly by the ozone concentrations. However, the model did not predict that the meteorological variables for the year 2000 were not influenced predominantly by the ozone concentrations that point to variation in sun radiation. This could be due to other factors that were not explicitly considered in this study.
Correcting for cryptic relatedness by a regression-based genomic control method
Directory of Open Access Journals (Sweden)
Yang Yaning
2009-12-01
Full Text Available Abstract Background Genomic control (GC method is a useful tool to correct for the cryptic relatedness in population-based association studies. It was originally proposed for correcting for the variance inflation of Cochran-Armitage's additive trend test by using information from unlinked null markers, and was later generalized to be applicable to other tests with the additional requirement that the null markers are matched with the candidate marker in allele frequencies. However, matching allele frequencies limits the number of available null markers and thus limits the applicability of the GC method. On the other hand, errors in genotype/allele frequencies may cause further bias and variance inflation and thereby aggravate the effect of GC correction. Results In this paper, we propose a regression-based GC method using null markers that are not necessarily matched in allele frequencies with the candidate marker. Variation of allele frequencies of the null markers is adjusted by a regression method. Conclusion The proposed method can be readily applied to the Cochran-Armitage's trend tests other than the additive trend test, the Pearson's chi-square test and other robust efficiency tests. Simulation results show that the proposed method is effective in controlling type I error in the presence of population substructure.
A subagging regression method for estimating the qualitative and quantitative state of groundwater
Jeong, Jina; Park, Eungyu; Han, Weon Shik; Kim, Kue-Young
2017-08-01
A subsample aggregating (subagging) regression (SBR) method for the analysis of groundwater data pertaining to trend-estimation-associated uncertainty is proposed. The SBR method is validated against synthetic data competitively with other conventional robust and non-robust methods. From the results, it is verified that the estimation accuracies of the SBR method are consistent and superior to those of other methods, and the uncertainties are reasonably estimated; the others have no uncertainty analysis option. To validate further, actual groundwater data are employed and analyzed comparatively with Gaussian process regression (GPR). For all cases, the trend and the associated uncertainties are reasonably estimated by both SBR and GPR regardless of Gaussian or non-Gaussian skewed data. However, it is expected that GPR has a limitation in applications to severely corrupted data by outliers owing to its non-robustness. From the implementations, it is determined that the SBR method has the potential to be further developed as an effective tool of anomaly detection or outlier identification in groundwater state data such as the groundwater level and contaminant concentration.
Impact of regression methods on improved effects of soil structure on soil water retention estimates
Nguyen, Phuong Minh; De Pue, Jan; Le, Khoa Van; Cornelis, Wim
2015-06-01
Increasing the accuracy of pedotransfer functions (PTFs), an indirect method for predicting non-readily available soil features such as soil water retention characteristics (SWRC), is of crucial importance for large scale agro-hydrological modeling. Adding significant predictors (i.e., soil structure), and implementing more flexible regression algorithms are among the main strategies of PTFs improvement. The aim of this study was to investigate whether the improved effect of categorical soil structure information on estimating soil-water content at various matric potentials, which has been reported in literature, could be enduringly captured by regression techniques other than the usually applied linear regression. Two data mining techniques, i.e., Support Vector Machines (SVM), and k-Nearest Neighbors (kNN), which have been recently introduced as promising tools for PTF development, were utilized to test if the incorporation of soil structure will improve PTF's accuracy under a context of rather limited training data. The results show that incorporating descriptive soil structure information, i.e., massive, structured and structureless, as grouping criterion can improve the accuracy of PTFs derived by SVM approach in the range of matric potential of -6 to -33 kPa (average RMSE decreased up to 0.005 m3 m-3 after grouping, depending on matric potentials). The improvement was primarily attributed to the outperformance of SVM-PTFs calibrated on structureless soils. No improvement was obtained with kNN technique, at least not in our study in which the data set became limited in size after grouping. Since there is an impact of regression techniques on the improved effect of incorporating qualitative soil structure information, selecting a proper technique will help to maximize the combined influence of flexible regression algorithms and soil structure information on PTF accuracy.
A method for fitting regression splines with varying polynomial order in the linear mixed model.
Edwards, Lloyd J; Stewart, Paul W; MacDougall, James E; Helms, Ronald W
2006-02-15
The linear mixed model has become a widely used tool for longitudinal analysis of continuous variables. The use of regression splines in these models offers the analyst additional flexibility in the formulation of descriptive analyses, exploratory analyses and hypothesis-driven confirmatory analyses. We propose a method for fitting piecewise polynomial regression splines with varying polynomial order in the fixed effects and/or random effects of the linear mixed model. The polynomial segments are explicitly constrained by side conditions for continuity and some smoothness at the points where they join. By using a reparameterization of this explicitly constrained linear mixed model, an implicitly constrained linear mixed model is constructed that simplifies implementation of fixed-knot regression splines. The proposed approach is relatively simple, handles splines in one variable or multiple variables, and can be easily programmed using existing commercial software such as SAS or S-plus. The method is illustrated using two examples: an analysis of longitudinal viral load data from a study of subjects with acute HIV-1 infection and an analysis of 24-hour ambulatory blood pressure profiles.
Prastuti, M.; Suhartono; Salehah, NA
2018-04-01
The need for energy supply, especially for electricity in Indonesia has been increasing in the last past years. Furthermore, the high electricity usage by people at different times leads to the occurrence of heteroscedasticity issue. Estimate the electricity supply that could fulfilled the community’s need is very important, but the heteroscedasticity issue often made electricity forecasting hard to be done. An accurate forecast of electricity consumptions is one of the key challenges for energy provider to make better resources and service planning and also take control actions in order to balance the electricity supply and demand for community. In this paper, hybrid ARIMAX Quantile Regression (ARIMAX-QR) approach was proposed to predict the short-term electricity consumption in East Java. This method will also be compared to time series regression using RMSE, MAPE, and MdAPE criteria. The data used in this research was the electricity consumption per half-an-hour data during the period of September 2015 to April 2016. The results show that the proposed approach can be a competitive alternative to forecast short-term electricity in East Java. ARIMAX-QR using lag values and dummy variables as predictors yield more accurate prediction in both in-sample and out-sample data. Moreover, both time series regression and ARIMAX-QR methods with addition of lag values as predictor could capture accurately the patterns in the data. Hence, it produces better predictions compared to the models that not use additional lag variables.
Yi, Jun; Yang, Wenhong; Sun, Wen-Hua; Nomura, Kotohiro; Hada, Masahiko
2017-11-30
The NMR chemical shifts of vanadium ( 51 V) in (imido)vanadium(V) dichloride complexes with imidazolin-2-iminato and imidazolidin-2-iminato ligands were calculated by the density functional theory (DFT) method with GIAO. The calculated 51 V NMR chemical shifts were analyzed by the multiple linear regression (MLR) analysis (MLRA) method with a series of calculated molecular properties. Some of calculated NMR chemical shifts were incorrect using the optimized molecular geometries of the X-ray structures. After the global minimum geometries of all of the molecules were determined, the trend of the observed chemical shifts was well reproduced by the present DFT method. The MLRA method was performed to investigate the correlation between the 51 V NMR chemical shift and the natural charge, band energy gap, and Wiberg bond index of the V═N bond. The 51 V NMR chemical shifts obtained with the present MLR model were well reproduced with a correlation coefficient of 0.97.
A robust and efficient stepwise regression method for building sparse polynomial chaos expansions
Energy Technology Data Exchange (ETDEWEB)
Abraham, Simon, E-mail: Simon.Abraham@ulb.ac.be [Vrije Universiteit Brussel (VUB), Department of Mechanical Engineering, Research Group Fluid Mechanics and Thermodynamics, Pleinlaan 2, 1050 Brussels (Belgium); Raisee, Mehrdad [School of Mechanical Engineering, College of Engineering, University of Tehran, P.O. Box: 11155-4563, Tehran (Iran, Islamic Republic of); Ghorbaniasl, Ghader; Contino, Francesco; Lacor, Chris [Vrije Universiteit Brussel (VUB), Department of Mechanical Engineering, Research Group Fluid Mechanics and Thermodynamics, Pleinlaan 2, 1050 Brussels (Belgium)
2017-03-01
Polynomial Chaos (PC) expansions are widely used in various engineering fields for quantifying uncertainties arising from uncertain parameters. The computational cost of classical PC solution schemes is unaffordable as the number of deterministic simulations to be calculated grows dramatically with the number of stochastic dimension. This considerably restricts the practical use of PC at the industrial level. A common approach to address such problems is to make use of sparse PC expansions. This paper presents a non-intrusive regression-based method for building sparse PC expansions. The most important PC contributions are detected sequentially through an automatic search procedure. The variable selection criterion is based on efficient tools relevant to probabilistic method. Two benchmark analytical functions are used to validate the proposed algorithm. The computational efficiency of the method is then illustrated by a more realistic CFD application, consisting of the non-deterministic flow around a transonic airfoil subject to geometrical uncertainties. To assess the performance of the developed methodology, a detailed comparison is made with the well established LAR-based selection technique. The results show that the developed sparse regression technique is able to identify the most significant PC contributions describing the problem. Moreover, the most important stochastic features are captured at a reduced computational cost compared to the LAR method. The results also demonstrate the superior robustness of the method by repeating the analyses using random experimental designs.
Kim, Yoonsang; Choi, Young-Ku; Emery, Sherry
2013-08-01
Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.
Kim, Yoonsang; Emery, Sherry
2013-01-01
Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods’ performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages—SAS GLIMMIX Laplace and SuperMix Gaussian quadrature—perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. PMID:24288415
Directory of Open Access Journals (Sweden)
Guan Lian
2018-01-01
Full Text Available Accurate prediction of taxi-out time is significant precondition for improving the operationality of the departure process at an airport, as well as reducing the long taxi-out time, congestion, and excessive emission of greenhouse gases. Unfortunately, several of the traditional methods of predicting taxi-out time perform unsatisfactorily at congested airports. This paper describes and tests three of those conventional methods which include Generalized Linear Model, Softmax Regression Model, and Artificial Neural Network method and two improved Support Vector Regression (SVR approaches based on swarm intelligence algorithm optimization, which include Particle Swarm Optimization (PSO and Firefly Algorithm. In order to improve the global searching ability of Firefly Algorithm, adaptive step factor and Lévy flight are implemented simultaneously when updating the location function. Six factors are analysed, of which delay is identified as one significant factor in congested airports. Through a series of specific dynamic analyses, a case study of Beijing International Airport (PEK is tested with historical data. The performance measures show that the proposed two SVR approaches, especially the Improved Firefly Algorithm (IFA optimization-based SVR method, not only perform as the best modelling measures and accuracy rate compared with the representative forecast models, but also can achieve a better predictive performance when dealing with abnormal taxi-out time states.
da Silva, Claudia Pereira; Emídio, Elissandro Soares; de Marchi, Mary Rosa Rodrigues
2015-01-01
This paper describes the validation of a method consisting of solid-phase extraction followed by gas chromatography-tandem mass spectrometry for the analysis of the ultraviolet (UV) filters benzophenone-3, ethylhexyl salicylate, ethylhexyl methoxycinnamate and octocrylene. The method validation criteria included evaluation of selectivity, analytical curve, trueness, precision, limits of detection and limits of quantification. The non-weighted linear regression model has traditionally been used for calibration, but it is not necessarily the optimal model in all cases. Because the assumption of homoscedasticity was not met for the analytical data in this work, a weighted least squares linear regression was used for the calibration method. The evaluated analytical parameters were satisfactory for the analytes and showed recoveries at four fortification levels between 62% and 107%, with relative standard deviations less than 14%. The detection limits ranged from 7.6 to 24.1 ng L(-1). The proposed method was used to determine the amount of UV filters in water samples from water treatment plants in Araraquara and Jau in São Paulo, Brazil. Copyright © 2014 Elsevier B.V. All rights reserved.
Zhu, Xiaofeng; Suk, Heung-Il; Wang, Li; Lee, Seong-Whan; Shen, Dinggang
2017-05-01
In this paper, we focus on joint regression and classification for Alzheimer's disease diagnosis and propose a new feature selection method by embedding the relational information inherent in the observations into a sparse multi-task learning framework. Specifically, the relational information includes three kinds of relationships (such as feature-feature relation, response-response relation, and sample-sample relation), for preserving three kinds of the similarity, such as for the features, the response variables, and the samples, respectively. To conduct feature selection, we first formulate the objective function by imposing these three relational characteristics along with an ℓ 2,1 -norm regularization term, and further propose a computationally efficient algorithm to optimize the proposed objective function. With the dimension-reduced data, we train two support vector regression models to predict the clinical scores of ADAS-Cog and MMSE, respectively, and also a support vector classification model to determine the clinical label. We conducted extensive experiments on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset to validate the effectiveness of the proposed method. Our experimental results showed the efficacy of the proposed method in enhancing the performances of both clinical scores prediction and disease status identification, compared to the state-of-the-art methods. Copyright © 2015 Elsevier B.V. All rights reserved.
Directory of Open Access Journals (Sweden)
Gholam Reza Sheykhzadeh
2017-02-01
Full Text Available Introduction: Penetration resistance is one of the criteria for evaluating soil compaction. It correlates with several soil properties such as vehicle trafficability, resistance to root penetration, seedling emergence, and soil compaction by farm machinery. Direct measurement of penetration resistance is time consuming and difficult because of high temporal and spatial variability. Therefore, many different regressions and artificial neural network pedotransfer functions have been proposed to estimate penetration resistance from readily available soil variables such as particle size distribution, bulk density (Db and gravimetric water content (θm. The lands of Ardabil Province are one of the main production regions of potato in Iran, thus, obtaining the soil penetration resistance in these regions help with the management of potato production. The objective of this research was to derive pedotransfer functions by using regression and artificial neural network to predict penetration resistance from some soil variations in the agricultural soils of Ardabil plain and to compare the performance of artificial neural network with regression models. Materials and methods: Disturbed and undisturbed soil samples (n= 105 were systematically taken from 0-10 cm soil depth with nearly 3000 m distance in the agricultural lands of the Ardabil plain ((lat 38°15' to 38°40' N, long 48°16' to 48°61' E. The contents of sand, silt and clay (hydrometer method, CaCO3 (titration method, bulk density (cylinder method, particle density (Dp (pychnometer method, organic carbon (wet oxidation method, total porosity(calculating from Db and Dp, saturated (θs and field soil water (θf using the gravimetric method were measured in the laboratory. Mean geometric diameter (dg and standard deviation (σg of soil particles were computed using the percentages of sand, silt and clay. Penetration resistance was measured in situ using cone penetrometer (analog model at 10
Landslide susceptibility mapping on a global scale using the method of logistic regression
Directory of Open Access Journals (Sweden)
L. Lin
2017-08-01
Full Text Available This paper proposes a statistical model for mapping global landslide susceptibility based on logistic regression. After investigating explanatory factors for landslides in the existing literature, five factors were selected for model landslide susceptibility: relative relief, extreme precipitation, lithology, ground motion and soil moisture. When building the model, 70 % of landslide and nonlandslide points were randomly selected for logistic regression, and the others were used for model validation. To evaluate the accuracy of predictive models, this paper adopts several criteria including a receiver operating characteristic (ROC curve method. Logistic regression experiments found all five factors to be significant in explaining landslide occurrence on a global scale. During the modeling process, percentage correct in confusion matrix of landslide classification was approximately 80 % and the area under the curve (AUC was nearly 0.87. During the validation process, the above statistics were about 81 % and 0.88, respectively. Such a result indicates that the model has strong robustness and stable performance. This model found that at a global scale, soil moisture can be dominant in the occurrence of landslides and topographic factor may be secondary.
Development of K-Nearest Neighbour Regression Method in Forecasting River Stream Flow
Directory of Open Access Journals (Sweden)
Mohammad Azmi
2012-07-01
Full Text Available Different statistical, non-statistical and black-box methods have been used in forecasting processes. Among statistical methods, K-nearest neighbour non-parametric regression method (K-NN due to its natural simplicity and mathematical base is one of the recommended methods for forecasting processes. In this study, K-NN method is explained completely. Besides, development and improvement approaches such as best neighbour estimation, data transformation functions, distance functions and proposed extrapolation method are described. K-NN method in company with its development approaches is used in streamflow forecasting of Zayandeh-Rud Dam upper basin. Comparing between final results of classic K-NN method and modified K-NN (number of neighbour 5, transformation function of Range Scaling, distance function of Mahanalobis and proposed extrapolation method shows that modified K-NN in criteria of goodness of fit, root mean square error, percentage of volume of error and correlation has had performance improvement 45% , 59% and 17% respectively. These results approve necessity of applying mentioned approaches to derive more accurate forecasts.
Comparing the index-flood and multiple-regression methods using L-moments
Malekinezhad, H.; Nachtnebel, H. P.; Klik, A.
In arid and semi-arid regions, the length of records is usually too short to ensure reliable quantile estimates. Comparing index-flood and multiple-regression analyses based on L-moments was the main objective of this study. Factor analysis was applied to determine main influencing variables on flood magnitude. Ward’s cluster and L-moments approaches were applied to several sites in the Namak-Lake basin in central Iran to delineate homogeneous regions based on site characteristics. Homogeneity test was done using L-moments-based measures. Several distributions were fitted to the regional flood data and index-flood and multiple-regression methods as two regional flood frequency methods were compared. The results of factor analysis showed that length of main waterway, compactness coefficient, mean annual precipitation, and mean annual temperature were the main variables affecting flood magnitude. The study area was divided into three regions based on the Ward’s method of clustering approach. The homogeneity test based on L-moments showed that all three regions were acceptably homogeneous. Five distributions were fitted to the annual peak flood data of three homogeneous regions. Using the L-moment ratios and the Z-statistic criteria, GEV distribution was identified as the most robust distribution among five candidate distributions for all the proposed sub-regions of the study area, and in general, it was concluded that the generalised extreme value distribution was the best-fit distribution for every three regions. The relative root mean square error (RRMSE) measure was applied for evaluating the performance of the index-flood and multiple-regression methods in comparison with the curve fitting (plotting position) method. In general, index-flood method gives more reliable estimations for various flood magnitudes of different recurrence intervals. Therefore, this method should be adopted as regional flood frequency method for the study area and the Namak-Lake basin
International Nuclear Information System (INIS)
Wu, Jie; Wang, Jianzhou; Lu, Haiyan; Dong, Yao; Lu, Xiaoxiao
2013-01-01
Highlights: ► The seasonal and trend items of the data series are forecasted separately. ► Seasonal item in the data series is verified by the Kendall τ correlation testing. ► Different regression models are applied to the trend item forecasting. ► We examine the superiority of the combined models by the quartile value comparison. ► Paired-sample T test is utilized to confirm the superiority of the combined models. - Abstract: For an energy-limited economy system, it is crucial to forecast load demand accurately. This paper devotes to 1-week-ahead daily load forecasting approach in which load demand series are predicted by employing the information of days before being similar to that of the forecast day. As well as in many nonlinear systems, seasonal item and trend item are coexisting in load demand datasets. In this paper, the existing of the seasonal item in the load demand data series is firstly verified according to the Kendall τ correlation testing method. Then in the belief of the separate forecasting to the seasonal item and the trend item would improve the forecasting accuracy, hybrid models by combining seasonal exponential adjustment method (SEAM) with the regression methods are proposed in this paper, where SEAM and the regression models are employed to seasonal and trend items forecasting respectively. Comparisons of the quartile values as well as the mean absolute percentage error values demonstrate this forecasting technique can significantly improve the accuracy though models applied to the trend item forecasting are eleven different ones. This superior performance of this separate forecasting technique is further confirmed by the paired-sample T tests
Directory of Open Access Journals (Sweden)
Bangyong Sun
2014-01-01
Full Text Available The polynomial regression method is employed to calculate the relationship of device color space and CIE color space for color characterization, and the performance of different expressions with specific parameters is evaluated. Firstly, the polynomial equation for color conversion is established and the computation of polynomial coefficients is analysed. And then different forms of polynomial equations are used to calculate the RGB and CMYK’s CIE color values, while the corresponding color errors are compared. At last, an optimal polynomial expression is obtained by analysing several related parameters during color conversion, including polynomial numbers, the degree of polynomial terms, the selection of CIE visual spaces, and the linearization.
Face Hallucination with Linear Regression Model in Semi-Orthogonal Multilinear PCA Method
Asavaskulkiet, Krissada
2018-04-01
In this paper, we propose a new face hallucination technique, face images reconstruction in HSV color space with a semi-orthogonal multilinear principal component analysis method. This novel hallucination technique can perform directly from tensors via tensor-to-vector projection by imposing the orthogonality constraint in only one mode. In our experiments, we use facial images from FERET database to test our hallucination approach which is demonstrated by extensive experiments with high-quality hallucinated color faces. The experimental results assure clearly demonstrated that we can generate photorealistic color face images by using the SO-MPCA subspace with a linear regression model.
Real-time prediction of respiratory motion based on local regression methods
International Nuclear Information System (INIS)
Ruan, D; Fessler, J A; Balter, J M
2007-01-01
Recent developments in modulation techniques enable conformal delivery of radiation doses to small, localized target volumes. One of the challenges in using these techniques is real-time tracking and predicting target motion, which is necessary to accommodate system latencies. For image-guided-radiotherapy systems, it is also desirable to minimize sampling rates to reduce imaging dose. This study focuses on predicting respiratory motion, which can significantly affect lung tumours. Predicting respiratory motion in real-time is challenging, due to the complexity of breathing patterns and the many sources of variability. We propose a prediction method based on local regression. There are three major ingredients of this approach: (1) forming an augmented state space to capture system dynamics, (2) local regression in the augmented space to train the predictor from previous observation data using semi-periodicity of respiratory motion, (3) local weighting adjustment to incorporate fading temporal correlations. To evaluate prediction accuracy, we computed the root mean square error between predicted tumor motion and its observed location for ten patients. For comparison, we also investigated commonly used predictive methods, namely linear prediction, neural networks and Kalman filtering to the same data. The proposed method reduced the prediction error for all imaging rates and latency lengths, particularly for long prediction lengths
Local regression type methods applied to the study of geophysics and high frequency financial data
Mariani, M. C.; Basu, K.
2014-09-01
In this work we applied locally weighted scatterplot smoothing techniques (Lowess/Loess) to Geophysical and high frequency financial data. We first analyze and apply this technique to the California earthquake geological data. A spatial analysis was performed to show that the estimation of the earthquake magnitude at a fixed location is very accurate up to the relative error of 0.01%. We also applied the same method to a high frequency data set arising in the financial sector and obtained similar satisfactory results. The application of this approach to the two different data sets demonstrates that the overall method is accurate and efficient, and the Lowess approach is much more desirable than the Loess method. The previous works studied the time series analysis; in this paper our local regression models perform a spatial analysis for the geophysics data providing different information. For the high frequency data, our models estimate the curve of best fit where data are dependent on time.
Geographically weighted regression based methods for merging satellite and gauge precipitation
Chao, Lijun; Zhang, Ke; Li, Zhijia; Zhu, Yuelong; Wang, Jingfeng; Yu, Zhongbo
2018-03-01
Real-time precipitation data with high spatiotemporal resolutions are crucial for accurate hydrological forecasting. To improve the spatial resolution and quality of satellite precipitation, a three-step satellite and gauge precipitation merging method was formulated in this study: (1) bilinear interpolation is first applied to downscale coarser satellite precipitation to a finer resolution (PS); (2) the (mixed) geographically weighted regression methods coupled with a weighting function are then used to estimate biases of PS as functions of gauge observations (PO) and PS; and (3) biases of PS are finally corrected to produce a merged precipitation product. Based on the above framework, eight algorithms, a combination of two geographically weighted regression methods and four weighting functions, are developed to merge CMORPH (CPC MORPHing technique) precipitation with station observations on a daily scale in the Ziwuhe Basin of China. The geographical variables (elevation, slope, aspect, surface roughness, and distance to the coastline) and a meteorological variable (wind speed) were used for merging precipitation to avoid the artificial spatial autocorrelation resulting from traditional interpolation methods. The results show that the combination of the MGWR and BI-square function (MGWR-BI) has the best performance (R = 0.863 and RMSE = 7.273 mm/day) among the eight algorithms. The MGWR-BI algorithm was then applied to produce hourly merged precipitation product. Compared to the original CMORPH product (R = 0.208 and RMSE = 1.208 mm/hr), the quality of the merged data is significantly higher (R = 0.724 and RMSE = 0.706 mm/hr). The developed merging method not only improves the spatial resolution and quality of the satellite product but also is easy to implement, which is valuable for hydrological modeling and other applications.
Reporting quality of multivariable logistic regression in selected Indian medical journals.
Kumar, R; Indrayan, A; Chhabra, P
2012-01-01
Use of multivariable logistic regression (MLR) modeling has steeply increased in the medical literature over the past few years. Testing of model assumptions and adequate reporting of MLR allow the reader to interpret results more accurately. To review the fulfillment of assumptions and reporting quality of MLR in selected Indian medical journals using established criteria. Analysis of published literature. Medknow.com publishes 68 Indian medical journals with open access. Eight of these journals had at least five articles using MLR between the years 1994 to 2008. Articles from each of these journals were evaluated according to the previously established 10-point quality criteria for reporting and to test the MLR model assumptions. SPSS 17 software and non-parametric test (Kruskal-Wallis H, Mann Whitney U, Spearman Correlation). One hundred and nine articles were finally found using MLR for analyzing the data in the selected eight journals. The number of such articles gradually increased after year 2003, but quality score remained almost similar over time. P value, odds ratio, and 95% confidence interval for coefficients in MLR was reported in 75.2% and sufficient cases (>10) per covariate of limiting sample size were reported in the 58.7% of the articles. No article reported the test for conformity of linear gradient for continuous covariates. Total score was not significantly different across the journals. However, involvement of statistician or epidemiologist as a co-author improved the average quality score significantly (P=0.014). Reporting of MLR in many Indian journals is incomplete. Only one article managed to score 8 out of 10 among 109 articles under review. All others scored less. Appropriate guidelines in instructions to authors, and pre-publication review of articles using MLR by a qualified statistician may improve quality of reporting.
Nonparametric Methods in Astronomy: Think, Regress, Observe—Pick Any Three
Steinhardt, Charles L.; Jermyn, Adam S.
2018-02-01
Telescopes are much more expensive than astronomers, so it is essential to minimize required sample sizes by using the most data-efficient statistical methods possible. However, the most commonly used model-independent techniques for finding the relationship between two variables in astronomy are flawed. In the worst case they can lead without warning to subtly yet catastrophically wrong results, and even in the best case they require more data than necessary. Unfortunately, there is no single best technique for nonparametric regression. Instead, we provide a guide for how astronomers can choose the best method for their specific problem and provide a python library with both wrappers for the most useful existing algorithms and implementations of two new algorithms developed here.
Estimating HIES Data through Ratio and Regression Methods for Different Sampling Designs
Directory of Open Access Journals (Sweden)
Faqir Muhammad
2007-01-01
Full Text Available In this study, comparison has been made for different sampling designs, using the HIES data of North West Frontier Province (NWFP for 2001-02 and 1998-99 collected from the Federal Bureau of Statistics, Statistical Division, Government of Pakistan, Islamabad. The performance of the estimators has also been considered using bootstrap and Jacknife. A two-stage stratified random sample design is adopted by HIES. In the first stage, enumeration blocks and villages are treated as the first stage Primary Sampling Units (PSU. The sample PSU’s are selected with probability proportional to size. Secondary Sampling Units (SSU i.e., households are selected by systematic sampling with a random start. They have used a single study variable. We have compared the HIES technique with some other designs, which are: Stratified Simple Random Sampling. Stratified Systematic Sampling. Stratified Ranked Set Sampling. Stratified Two Phase Sampling. Ratio and Regression methods were applied with two study variables, which are: Income (y and Household sizes (x. Jacknife and Bootstrap are used for variance replication. Simple Random Sampling with sample size (462 to 561 gave moderate variances both by Jacknife and Bootstrap. By applying Systematic Sampling, we received moderate variance with sample size (467. In Jacknife with Systematic Sampling, we obtained variance of regression estimator greater than that of ratio estimator for a sample size (467 to 631. At a sample size (952 variance of ratio estimator gets greater than that of regression estimator. The most efficient design comes out to be Ranked set sampling compared with other designs. The Ranked set sampling with jackknife and bootstrap, gives minimum variance even with the smallest sample size (467. Two Phase sampling gave poor performance. Multi-stage sampling applied by HIES gave large variances especially if used with a single study variable.
Robust Methods for Moderation Analysis with a Two-Level Regression Model.
Yang, Miao; Yuan, Ke-Hai
2016-01-01
Moderation analysis has many applications in social sciences. Most widely used estimation methods for moderation analysis assume that errors are normally distributed and homoscedastic. When these assumptions are not met, the results from a classical moderation analysis can be misleading. For more reliable moderation analysis, this article proposes two robust methods with a two-level regression model when the predictors do not contain measurement error. One method is based on maximum likelihood with Student's t distribution and the other is based on M-estimators with Huber-type weights. An algorithm for obtaining the robust estimators is developed. Consistent estimates of standard errors of the robust estimators are provided. The robust approaches are compared against normal-distribution-based maximum likelihood (NML) with respect to power and accuracy of parameter estimates through a simulation study. Results show that the robust approaches outperform NML under various distributional conditions. Application of the robust methods is illustrated through a real data example. An R program is developed and documented to facilitate the application of the robust methods.
Applications of Monte Carlo method to nonlinear regression of rheological data
Kim, Sangmo; Lee, Junghaeng; Kim, Sihyun; Cho, Kwang Soo
2018-02-01
In rheological study, it is often to determine the parameters of rheological models from experimental data. Since both rheological data and values of the parameters vary in logarithmic scale and the number of the parameters is quite large, conventional method of nonlinear regression such as Levenberg-Marquardt (LM) method is usually ineffective. The gradient-based method such as LM is apt to be caught in local minima which give unphysical values of the parameters whenever the initial guess of the parameters is far from the global optimum. Although this problem could be solved by simulated annealing (SA), the Monte Carlo (MC) method needs adjustable parameter which could be determined in ad hoc manner. We suggest a simplified version of SA, a kind of MC methods which results in effective values of the parameters of most complicated rheological models such as the Carreau-Yasuda model of steady shear viscosity, discrete relaxation spectrum and zero-shear viscosity as a function of concentration and molecular weight.
Logistic Regression and Path Analysis Method to Analyze Factors influencing Students’ Achievement
Noeryanti, N.; Suryowati, K.; Setyawan, Y.; Aulia, R. R.
2018-04-01
Students' academic achievement cannot be separated from the influence of two factors namely internal and external factors. The first factors of the student (internal factors) consist of intelligence (X1), health (X2), interest (X3), and motivation of students (X4). The external factors consist of family environment (X5), school environment (X6), and society environment (X7). The objects of this research are eighth grade students of the school year 2016/2017 at SMPN 1 Jiwan Madiun sampled by using simple random sampling. Primary data are obtained by distributing questionnaires. The method used in this study is binary logistic regression analysis that aims to identify internal and external factors that affect student’s achievement and how the trends of them. Path Analysis was used to determine the factors that influence directly, indirectly or totally on student’s achievement. Based on the results of binary logistic regression, variables that affect student’s achievement are interest and motivation. And based on the results obtained by path analysis, factors that have a direct impact on student’s achievement are students’ interest (59%) and students’ motivation (27%). While the factors that have indirect influences on students’ achievement, are family environment (97%) and school environment (37).
Wulandari, S. P.; Salamah, M.; Rositawati, A. F. D.
2018-04-01
Food security is the condition where the food fulfilment is managed well for the country till the individual. Indonesia is one of the country which has the commitment to create the food security becomes main priority. However, the food necessity becomes common thing means that it doesn’t care about nutrient standard and the health condition of family member, so in the fulfilment of food necessity also has to consider the disease suffered by the family member, one of them is pulmonary tuberculosa. From that reasons, this research is conducted to know the factors which influence on household food security status which suffered from pulmonary tuberculosis in the coastal area of Surabaya by using binary logistic regression method. The analysis result by using binary logistic regression shows that the variables wife latest education, house density and spacious house ventilation significantly affect on household food security status which suffered from pulmonary tuberculosis in the coastal area of Surabaya, where the wife education level is University/equivalent, the house density is eligible or 8 m2/person and spacious house ventilation 10% of the floor area has the opportunity to become food secure households amounted to 0.911089. While the chance of becoming food insecure households amounted to 0.088911. The model household food security status which suffered from pulmonary tuberculosis in the coastal area of Surabaya has been conformable, and the overall percentages of those classifications are at 71.8%.
International Nuclear Information System (INIS)
Gupta, N
2008-01-01
3013 containers are designed in accordance with the DOE-STD-3013-2004. These containers are qualified to store plutonium (Pu) bearing materials such as PuO2 for 50 years. DOT shipping packages such as the 9975 are used to store the 3013 containers in the K-Area Material Storage (KAMS) facility at Savannah River Site (SRS). DOE-STD-3013-2004 requires that a comprehensive surveillance program be set up to ensure that the 3013 container design parameters are not violated during the long term storage. To ensure structural integrity of the 3013 containers, thermal analyses using finite element models were performed to predict the contents and component temperatures for different but well defined parameters such as storage ambient temperature, PuO 2 density, fill heights, weights, and thermal loading. Interpolation is normally used to calculate temperatures if the actual parameter values are different from the analyzed values. A statistical analysis technique using regression methods is proposed to develop simple polynomial relations to predict temperatures for the actual parameter values found in the containers. The analysis shows that regression analysis is a powerful tool to develop simple relations to assess component temperatures
Multi-step polynomial regression method to model and forecast malaria incidence.
Directory of Open Access Journals (Sweden)
Chandrajit Chatterjee
Full Text Available Malaria is one of the most severe problems faced by the world even today. Understanding the causative factors such as age, sex, social factors, environmental variability etc. as well as underlying transmission dynamics of the disease is important for epidemiological research on malaria and its eradication. Thus, development of suitable modeling approach and methodology, based on the available data on the incidence of the disease and other related factors is of utmost importance. In this study, we developed a simple non-linear regression methodology in modeling and forecasting malaria incidence in Chennai city, India, and predicted future disease incidence with high confidence level. We considered three types of data to develop the regression methodology: a longer time series data of Slide Positivity Rates (SPR of malaria; a smaller time series data (deaths due to Plasmodium vivax of one year; and spatial data (zonal distribution of P. vivax deaths for the city along with the climatic factors, population and previous incidence of the disease. We performed variable selection by simple correlation study, identification of the initial relationship between variables through non-linear curve fitting and used multi-step methods for induction of variables in the non-linear regression analysis along with applied Gauss-Markov models, and ANOVA for testing the prediction, validity and constructing the confidence intervals. The results execute the applicability of our method for different types of data, the autoregressive nature of forecasting, and show high prediction power for both SPR and P. vivax deaths, where the one-lag SPR values plays an influential role and proves useful for better prediction. Different climatic factors are identified as playing crucial role on shaping the disease curve. Further, disease incidence at zonal level and the effect of causative factors on different zonal clusters indicate the pattern of malaria prevalence in the city
A New Global Regression Analysis Method for the Prediction of Wind Tunnel Model Weight Corrections
Ulbrich, Norbert Manfred; Bridge, Thomas M.; Amaya, Max A.
2014-01-01
A new global regression analysis method is discussed that predicts wind tunnel model weight corrections for strain-gage balance loads during a wind tunnel test. The method determines corrections by combining "wind-on" model attitude measurements with least squares estimates of the model weight and center of gravity coordinates that are obtained from "wind-off" data points. The method treats the least squares fit of the model weight separate from the fit of the center of gravity coordinates. Therefore, it performs two fits of "wind- off" data points and uses the least squares estimator of the model weight as an input for the fit of the center of gravity coordinates. Explicit equations for the least squares estimators of the weight and center of gravity coordinates are derived that simplify the implementation of the method in the data system software of a wind tunnel. In addition, recommendations for sets of "wind-off" data points are made that take typical model support system constraints into account. Explicit equations of the confidence intervals on the model weight and center of gravity coordinates and two different error analyses of the model weight prediction are also discussed in the appendices of the paper.
Hwang, Kyu-Baek; Lee, In-Hee; Park, Jin-Ho; Hambuch, Tina; Choe, Yongjoon; Kim, MinHyeok; Lee, Kyungjoon; Song, Taemin; Neu, Matthew B; Gupta, Neha; Kohane, Isaac S; Green, Robert C; Kong, Sek Won
2014-08-01
As whole genome sequencing (WGS) uncovers variants associated with rare and common diseases, an immediate challenge is to minimize false-positive findings due to sequencing and variant calling errors. False positives can be reduced by combining results from orthogonal sequencing methods, but costly. Here, we present variant filtering approaches using logistic regression (LR) and ensemble genotyping to minimize false positives without sacrificing sensitivity. We evaluated the methods using paired WGS datasets of an extended family prepared using two sequencing platforms and a validated set of variants in NA12878. Using LR or ensemble genotyping based filtering, false-negative rates were significantly reduced by 1.1- to 17.8-fold at the same levels of false discovery rates (5.4% for heterozygous and 4.5% for homozygous single nucleotide variants (SNVs); 30.0% for heterozygous and 18.7% for homozygous insertions; 25.2% for heterozygous and 16.6% for homozygous deletions) compared to the filtering based on genotype quality scores. Moreover, ensemble genotyping excluded > 98% (105,080 of 107,167) of false positives while retaining > 95% (897 of 937) of true positives in de novo mutation (DNM) discovery in NA12878, and performed better than a consensus method using two sequencing platforms. Our proposed methods were effective in prioritizing phenotype-associated variants, and an ensemble genotyping would be essential to minimize false-positive DNM candidates. © 2014 WILEY PERIODICALS, INC.
A dynamic particle filter-support vector regression method for reliability prediction
International Nuclear Information System (INIS)
Wei, Zhao; Tao, Tao; ZhuoShu, Ding; Zio, Enrico
2013-01-01
Support vector regression (SVR) has been applied to time series prediction and some works have demonstrated the feasibility of its use to forecast system reliability. For accuracy of reliability forecasting, the selection of SVR's parameters is important. The existing research works on SVR's parameters selection divide the example dataset into training and test subsets, and tune the parameters on the training data. However, these fixed parameters can lead to poor prediction capabilities if the data of the test subset differ significantly from those of training. Differently, the novel method proposed in this paper uses particle filtering to estimate the SVR model parameters according to the whole measurement sequence up to the last observation instance. By treating the SVR training model as the observation equation of a particle filter, our method allows updating the SVR model parameters dynamically when a new observation comes. Because of the adaptability of the parameters to dynamic data pattern, the new PF–SVR method has superior prediction performance over that of standard SVR. Four application results show that PF–SVR is more robust than SVR to the decrease of the number of training data and the change of initial SVR parameter values. Also, even if there are trends in the test data different from those in the training data, the method can capture the changes, correct the SVR parameters and obtain good predictions. -- Highlights: •A dynamic PF–SVR method is proposed to predict the system reliability. •The method can adjust the SVR parameters according to the change of data. •The method is robust to the size of training data and initial parameter values. •Some cases based on both artificial and real data are studied. •PF–SVR shows superior prediction performance over standard SVR
Directory of Open Access Journals (Sweden)
Assia Belhassan
2017-11-01
Full Text Available The ACD/ChemSketch, MarvinSketch, and ChemOffice programmes were used to calculate several molecular descriptors of 51 odorant molecules (15 alcohols, 11 aldehydes, 9 ketones and 16 esters. The best descriptors were selected to establish the Quantitative Structure-Property Relationship (QSPR of the retention/release property of odorant molecules in pectin gels using Principal Components Analysis (PCA, Multiple Linear Regression (MLR, Multiple Non-linear Regression (MNLR and Artificial Neural Network (ANN methods We propose a quantitative model based on these analyses. PCA has been used to select descriptors that exhibit high correlation with the retention/release property. The MLR method yielded correlation coefficients of 0.960 and 0.958 for PG-0.4 (pectin concentration: 0.4% w/w and PG-0.8 (pectin concentration: 0.8% w/w media, respectively. Internal and external validations were used to determine the statistical quality of the QSPR of the two MLR models. The MNLR method, considering the relevant descriptors obtained from the MLR, yielded correlation coefficients of 0.978 and 0.975 for PG-0.4 and PG-0.8 media, respectively. The applicability domain of MLR models was investigated using simple and leverage approaches to detect outliers and outside compounds. The effects of different descriptors on the retention/release property are described, and these descriptors were used to study and design new compounds with higher and lower values of the property than the existing ones. Keywords: Odorant Molecules, Retention/Release, Pectin Gels, Quantitative Structure Property Relationship, Multiple Linear Regression, Artificial Neural Network
Statistical learning method in regression analysis of simulated positron spectral data
International Nuclear Information System (INIS)
Avdic, S. Dz.
2005-01-01
Positron lifetime spectroscopy is a non-destructive tool for detection of radiation induced defects in nuclear reactor materials. This work concerns the applicability of the support vector machines method for the input data compression in the neural network analysis of positron lifetime spectra. It has been demonstrated that the SVM technique can be successfully applied to regression analysis of positron spectra. A substantial data compression of about 50 % and 8 % of the whole training set with two and three spectral components respectively has been achieved including a high accuracy of the spectra approximation. However, some parameters in the SVM approach such as the insensitivity zone e and the penalty parameter C have to be chosen carefully to obtain a good performance. (author)
The crux of the method: assumptions in ordinary least squares and logistic regression.
Long, Rebecca G
2008-10-01
Logistic regression has increasingly become the tool of choice when analyzing data with a binary dependent variable. While resources relating to the technique are widely available, clear discussions of why logistic regression should be used in place of ordinary least squares regression are difficult to find. The current paper compares and contrasts the assumptions of ordinary least squares with those of logistic regression and explains why logistic regression's looser assumptions make it adept at handling violations of the more important assumptions in ordinary least squares.
Austin, Peter C; Lee, Douglas S; Steyerberg, Ewout W; Tu, Jack V
2012-01-01
In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease. PMID:22777999
Dinç, Erdal; Ustündağ, Ozgür; Baleanu, Dumitru
2010-08-01
The sole use of pyridoxine hydrochloride during treatment of tuberculosis gives rise to pyridoxine deficiency. Therefore, a combination of pyridoxine hydrochloride and isoniazid is used in pharmaceutical dosage form in tuberculosis treatment to reduce this side effect. In this study, two chemometric methods, partial least squares (PLS) and principal component regression (PCR), were applied to the simultaneous determination of pyridoxine (PYR) and isoniazid (ISO) in their tablets. A concentration training set comprising binary mixtures of PYR and ISO consisting of 20 different combinations were randomly prepared in 0.1 M HCl. Both multivariate calibration models were constructed using the relationships between the concentration data set (concentration data matrix) and absorbance data matrix in the spectral region 200-330 nm. The accuracy and the precision of the proposed chemometric methods were validated by analyzing synthetic mixtures containing the investigated drugs. The recovery results obtained by applying PCR and PLS calibrations to the artificial mixtures were found between 100.0 and 100.7%. Satisfactory results obtained by applying the PLS and PCR methods to both artificial and commercial samples were obtained. The results obtained in this manuscript strongly encourage us to use them for the quality control and the routine analysis of the marketing tablets containing PYR and ISO drugs. Copyright © 2010 John Wiley & Sons, Ltd.
International Nuclear Information System (INIS)
Tsushima, Motoo; Fujii, Shigeki; Yutani, Chikao; Yamamoto, Akira; Naitoh, Hiroaki.
1990-01-01
We evaluated the wall thickening and stenosis rate (ASI), the calcification rate (ACI), and the wall thickening and calcification stenosis rate (SCI) of the lower abdominal aorta calculated by the 12 sector method from simple or enhanced computed tomography. The intra-observer variation of the calculation of ASI was 5.7% and that of ACI was 2.4%. In 9 patients who underwent an autopsy examination, ACI was significantly correlated with the rate of the calcification dimension to the whole objective area of the abdominal aorta (r=0.856, p<0.01). However, there were no correlations between ASI and the surface involvement or the atherosclerotic index obtained by the point-counting method of the autopsy materials. In the analysis of 40 patients with atherosclerotic vascular diseases, ASI and ACI were also highly correlated with the percentage volume of the arterial wall in relation to the whole volume of the observed artery (r=0.852, p<0.0001) and also the percentage calcification volume (r=0.913, p<0.0001) calculated by the computed method, respectively. The percentage of atherosclerotic vascular diseases increased in the group of both high ASI (over 10%) and high ACI (over 20%). We used SCI as a reliable index when the progression and regression of atherosclerosis was considered. Among patients of hypercholesterolemia consisting of 15 with familial hypercholesterolemia (FH) and 6 non-FH patients, the change of SCI (d-SCI) was significantly correlated with the change of total cholesterol concentration (d-TC) after the treatment (r=0.466, p<0.05) and the change of the right Achilles' tendon thickening (d-ATT) was also correlated with d-TC (r=0.634, p<0.005). However, no correlation between d-SCI and d-ATT was observed. In conclusion, CT indices of atherosclerosis were useful as a noninvasive quantitative diagnostic method and we were able to use them to assess the progression and regression of atherosclerosis. (author)
Reflexion on linear regression trip production modelling method for ensuring good model quality
Suprayitno, Hitapriya; Ratnasari, Vita
2017-11-01
Transport Modelling is important. For certain cases, the conventional model still has to be used, in which having a good trip production model is capital. A good model can only be obtained from a good sample. Two of the basic principles of a good sampling is having a sample capable to represent the population characteristics and capable to produce an acceptable error at a certain confidence level. It seems that this principle is not yet quite understood and used in trip production modeling. Therefore, investigating the Trip Production Modelling practice in Indonesia and try to formulate a better modeling method for ensuring the Model Quality is necessary. This research result is presented as follows. Statistics knows a method to calculate span of prediction value at a certain confidence level for linear regression, which is called Confidence Interval of Predicted Value. The common modeling practice uses R2 as the principal quality measure, the sampling practice varies and not always conform to the sampling principles. An experiment indicates that small sample is already capable to give excellent R2 value and sample composition can significantly change the model. Hence, good R2 value, in fact, does not always mean good model quality. These lead to three basic ideas for ensuring good model quality, i.e. reformulating quality measure, calculation procedure, and sampling method. A quality measure is defined as having a good R2 value and a good Confidence Interval of Predicted Value. Calculation procedure must incorporate statistical calculation method and appropriate statistical tests needed. A good sampling method must incorporate random well distributed stratified sampling with a certain minimum number of samples. These three ideas need to be more developed and tested.
Directory of Open Access Journals (Sweden)
Nina L. Timofeeva
2014-01-01
Full Text Available The article presents the methodological and technical bases for the creation of regression models that adequately reflect reality. The focus is on methods of removing residual autocorrelation in models. Algorithms eliminating heteroscedasticity and autocorrelation of the regression model residuals: reweighted least squares method, the method of Cochran-Orkutta are given. A model of "pure" regression is build, as well as to compare the effect on the dependent variable of the different explanatory variables when the latter are expressed in different units, a standardized form of the regression equation. The scheme of abatement techniques of heteroskedasticity and autocorrelation for the creation of regression models specific to the social and cultural sphere is developed.
2017-12-01
Fig. 2 Simulation method; the process for one iteration of the simulation . It was repeated 250 times per combination of HR and FAR. Analysis was...distribution is unlimited. 8 Fig. 2 Simulation method; the process for one iteration of the simulation . It was repeated 250 times per combination of HR...stimuli. Simulations show that this regression method results in an unbiased and accurate estimate of target detection performance. The regression
Isa, Zakiah Mohd; Tawfiq, Omar Farouq; Noor, Norliza Mohd; Shamsudheen, Mohd Iqbal; Rijal, Omar Mohd
2010-03-01
In rehabilitating edentulous patients, selecting appropriately sized teeth in the absence of preextraction records is problematic. The purpose of this study was to investigate the relationships between some facial dimensions and widths of the maxillary anterior teeth to potentially provide a guide for tooth selection. Sixty full dentate Malaysian adults (18-36 years) representing 2 ethnic groups (Malay and Chinese), with well aligned maxillary anterior teeth and minimal attrition, participated in this study. Standardized digital images of the face, viewed frontally, were recorded. Using image analyzing software, the images were used to determine the interpupillary distance (IPD), inner canthal distance (ICD), and interalar width (IA). Widths of the 6 maxillary anterior teeth were measured directly from casts of the subjects using digital calipers. Regression analyses were conducted to measure the strength of the associations between the variables (alpha=.10). The means (standard deviations) of IPD, IA, and ICD of the subjects were 62.28 (2.47), 39.36 (3.12), and 34.36 (2.15) mm, respectively. The mesiodistal diameters of the maxillary central incisors, lateral incisors, and canines were 8.54 (0.50), 7.09 (0.48), and 7.94 (0.40) mm, respectively. The width of the central incisors was highly correlated to the IPD (r=0.99), while the widths of the lateral incisors and canines were highly correlated to a combination of IPD and IA (r=0.99 and 0.94, respectively). Using regression methods, the widths of the anterior teeth within the population tested may be predicted by a combination of the facial dimensions studied. (c) 2010 The Editorial Council of the Journal of Prosthetic Dentistry. Published by Mosby, Inc. All rights reserved.
Directory of Open Access Journals (Sweden)
Jibo Yue
2018-01-01
Full Text Available Above-ground biomass (AGB provides a vital link between solar energy consumption and yield, so its correct estimation is crucial to accurately monitor crop growth and predict yield. In this work, we estimate AGB by using 54 vegetation indexes (e.g., Normalized Difference Vegetation Index, Soil-Adjusted Vegetation Index and eight statistical regression techniques: artificial neural network (ANN, multivariable linear regression (MLR, decision-tree regression (DT, boosted binary regression tree (BBRT, partial least squares regression (PLSR, random forest regression (RF, support vector machine regression (SVM, and principal component regression (PCR, which are used to analyze hyperspectral data acquired by using a field spectrophotometer. The vegetation indexes (VIs determined from the spectra were first used to train regression techniques for modeling and validation to select the best VI input, and then summed with white Gaussian noise to study how remote sensing errors affect the regression techniques. Next, the VIs were divided into groups of different sizes by using various sampling methods for modeling and validation to test the stability of the techniques. Finally, the AGB was estimated by using a leave-one-out cross validation with these powerful techniques. The results of the study demonstrate that, of the eight techniques investigated, PLSR and MLR perform best in terms of stability and are most suitable when high-accuracy and stable estimates are required from relatively few samples. In addition, RF is extremely robust against noise and is best suited to deal with repeated observations involving remote-sensing data (i.e., data affected by atmosphere, clouds, observation times, and/or sensor noise. Finally, the leave-one-out cross-validation method indicates that PLSR provides the highest accuracy (R2 = 0.89, RMSE = 1.20 t/ha, MAE = 0.90 t/ha, NRMSE = 0.07, CV (RMSE = 0.18; thus, PLSR is best suited for works requiring high
Delwiche, Stephen R; Reeves, James B
2010-01-01
In multivariate regression analysis of spectroscopy data, spectral preprocessing is often performed to reduce unwanted background information (offsets, sloped baselines) or accentuate absorption features in intrinsically overlapping bands. These procedures, also known as pretreatments, are commonly smoothing operations or derivatives. While such operations are often useful in reducing the number of latent variables of the actual decomposition and lowering residual error, they also run the risk of misleading the practitioner into accepting calibration equations that are poorly adapted to samples outside of the calibration. The current study developed a graphical method to examine this effect on partial least squares (PLS) regression calibrations of near-infrared (NIR) reflection spectra of ground wheat meal with two analytes, protein content and sodium dodecyl sulfate sedimentation (SDS) volume (an indicator of the quantity of the gluten proteins that contribute to strong doughs). These two properties were chosen because of their differing abilities to be modeled by NIR spectroscopy: excellent for protein content, fair for SDS sedimentation volume. To further demonstrate the potential pitfalls of preprocessing, an artificial component, a randomly generated value, was included in PLS regression trials. Savitzky-Golay (digital filter) smoothing, first-derivative, and second-derivative preprocess functions (5 to 25 centrally symmetric convolution points, derived from quadratic polynomials) were applied to PLS calibrations of 1 to 15 factors. The results demonstrated the danger of an over reliance on preprocessing when (1) the number of samples used in a multivariate calibration is low (<50), (2) the spectral response of the analyte is weak, and (3) the goodness of the calibration is based on the coefficient of determination (R(2)) rather than a term based on residual error. The graphical method has application to the evaluation of other preprocess functions and various
Whole-genome regression and prediction methods applied to plant and animal breeding
Los Campos, De G.; Hickey, J.M.; Pong-Wong, R.; Daetwyler, H.D.; Calus, M.P.L.
2013-01-01
Genomic-enabled prediction is becoming increasingly important in animal and plant breeding, and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of
Modelling infant mortality rate in Central Java, Indonesia use generalized poisson regression method
Prahutama, Alan; Sudarno
2018-05-01
The infant mortality rate is the number of deaths under one year of age occurring among the live births in a given geographical area during a given year, per 1,000 live births occurring among the population of the given geographical area during the same year. This problem needs to be addressed because it is an important element of a country’s economic development. High infant mortality rate will disrupt the stability of a country as it relates to the sustainability of the population in the country. One of regression model that can be used to analyze the relationship between dependent variable Y in the form of discrete data and independent variable X is Poisson regression model. Recently The regression modeling used for data with dependent variable is discrete, among others, poisson regression, negative binomial regression and generalized poisson regression. In this research, generalized poisson regression modeling gives better AIC value than poisson regression. The most significant variable is the Number of health facilities (X1), while the variable that gives the most influence to infant mortality rate is the average breastfeeding (X9).
Selecting minimum dataset soil variables using PLSR as a regressive multivariate method
Stellacci, Anna Maria; Armenise, Elena; Castellini, Mirko; Rossi, Roberta; Vitti, Carolina; Leogrande, Rita; De Benedetto, Daniela; Ferrara, Rossana M.; Vivaldi, Gaetano A.
2017-04-01
Long-term field experiments and science-based tools that characterize soil status (namely the soil quality indices, SQIs) assume a strategic role in assessing the effect of agronomic techniques and thus in improving soil management especially in marginal environments. Selecting key soil variables able to best represent soil status is a critical step for the calculation of SQIs. Current studies show the effectiveness of statistical methods for variable selection to extract relevant information deriving from multivariate datasets. Principal component analysis (PCA) has been mainly used, however supervised multivariate methods and regressive techniques are progressively being evaluated (Armenise et al., 2013; de Paul Obade et al., 2016; Pulido Moncada et al., 2014). The present study explores the effectiveness of partial least square regression (PLSR) in selecting critical soil variables, using a dataset comparing conventional tillage and sod-seeding on durum wheat. The results were compared to those obtained using PCA and stepwise discriminant analysis (SDA). The soil data derived from a long-term field experiment in Southern Italy. On samples collected in April 2015, the following set of variables was quantified: (i) chemical: total organic carbon and nitrogen (TOC and TN), alkali-extractable C (TEC and humic substances - HA-FA), water extractable N and organic C (WEN and WEOC), Olsen extractable P, exchangeable cations, pH and EC; (ii) physical: texture, dry bulk density (BD), macroporosity (Pmac), air capacity (AC), and relative field capacity (RFC); (iii) biological: carbon of the microbial biomass quantified with the fumigation-extraction method. PCA and SDA were previously applied to the multivariate dataset (Stellacci et al., 2016). PLSR was carried out on mean centered and variance scaled data of predictors (soil variables) and response (wheat yield) variables using the PLS procedure of SAS/STAT. In addition, variable importance for projection (VIP
Prophage insertions in Escherichia coli O157:H7 mlrA contribute to the low expression of curli fimbriae and biofilm observed in many clinical isolates. Varying levels of CsgD-dependent curli/biofilm expression are restored to strains bearing prophage insertions in mlrA by mutation of regulatory gene...
Ventura, Cristina; Latino, Diogo A R S; Martins, Filomena
2013-01-01
The performance of two QSAR methodologies, namely Multiple Linear Regressions (MLR) and Neural Networks (NN), towards the modeling and prediction of antitubercular activity was evaluated and compared. A data set of 173 potentially active compounds belonging to the hydrazide family and represented by 96 descriptors was analyzed. Models were built with Multiple Linear Regressions (MLR), single Feed-Forward Neural Networks (FFNNs), ensembles of FFNNs and Associative Neural Networks (AsNNs) using four different data sets and different types of descriptors. The predictive ability of the different techniques used were assessed and discussed on the basis of different validation criteria and results show in general a better performance of AsNNs in terms of learning ability and prediction of antitubercular behaviors when compared with all other methods. MLR have, however, the advantage of pinpointing the most relevant molecular characteristics responsible for the behavior of these compounds against Mycobacterium tuberculosis. The best results for the larger data set (94 compounds in training set and 18 in test set) were obtained with AsNNs using seven descriptors (R(2) of 0.874 and RMSE of 0.437 against R(2) of 0.845 and RMSE of 0.472 in MLRs, for test set). Counter-Propagation Neural Networks (CPNNs) were trained with the same data sets and descriptors. From the scrutiny of the weight levels in each CPNN and the information retrieved from MLRs, a rational design of potentially active compounds was attempted. Two new compounds were synthesized and tested against M. tuberculosis showing an activity close to that predicted by the majority of the models. Copyright © 2013 Elsevier Masson SAS. All rights reserved.
Functional regression method for whole genome eQTL epistasis analysis with sequencing data.
Xu, Kelin; Jin, Li; Xiong, Momiao
2017-05-18
Epistasis plays an essential rule in understanding the regulation mechanisms and is an essential component of the genetic architecture of the gene expressions. However, interaction analysis of gene expressions remains fundamentally unexplored due to great computational challenges and data availability. Due to variation in splicing, transcription start sites, polyadenylation sites, post-transcriptional RNA editing across the entire gene, and transcription rates of the cells, RNA-seq measurements generate large expression variability and collectively create the observed position level read count curves. A single number for measuring gene expression which is widely used for microarray measured gene expression analysis is highly unlikely to sufficiently account for large expression variation across the gene. Simultaneously analyzing epistatic architecture using the RNA-seq and whole genome sequencing (WGS) data poses enormous challenges. We develop a nonlinear functional regression model (FRGM) with functional responses where the position-level read counts within a gene are taken as a function of genomic position, and functional predictors where genotype profiles are viewed as a function of genomic position, for epistasis analysis with RNA-seq data. Instead of testing the interaction of all possible pair-wises SNPs, the FRGM takes a gene as a basic unit for epistasis analysis, which tests for the interaction of all possible pairs of genes and use all the information that can be accessed to collectively test interaction between all possible pairs of SNPs within two genome regions. By large-scale simulations, we demonstrate that the proposed FRGM for epistasis analysis can achieve the correct type 1 error and has higher power to detect the interactions between genes than the existing methods. The proposed methods are applied to the RNA-seq and WGS data from the 1000 Genome Project. The numbers of pairs of significantly interacting genes after Bonferroni correction
Directory of Open Access Journals (Sweden)
Sergei Vladimirovich Varaksin
2017-06-01
Full Text Available Purpose. Construction of a mathematical model of the dynamics of childbearing change in the Altai region in 2000–2016, analysis of the dynamics of changes in birth rates for multiple age categories of women of childbearing age. Methodology. A auxiliary analysis element is the construction of linear mathematical models of the dynamics of childbearing by using fuzzy linear regression method based on fuzzy numbers. Fuzzy linear regression is considered as an alternative to standard statistical linear regression for short time series and unknown distribution law. The parameters of fuzzy linear and standard statistical regressions for childbearing time series were defined with using the built in language MatLab algorithm. Method of fuzzy linear regression is not used in sociological researches yet. Results. There are made the conclusions about the socio-demographic changes in society, the high efficiency of the demographic policy of the leadership of the region and the country, and the applicability of the method of fuzzy linear regression for sociological analysis.
Pralle, R S; Weigel, K W; White, H M
2018-05-01
Prediction of postpartum hyperketonemia (HYK) using Fourier transform infrared (FTIR) spectrometry analysis could be a practical diagnostic option for farms because these data are now available from routine milk analysis during Dairy Herd Improvement testing. The objectives of this study were to (1) develop and evaluate blood β-hydroxybutyrate (BHB) prediction models using multivariate linear regression (MLR), partial least squares regression (PLS), and artificial neural network (ANN) methods and (2) evaluate whether milk FTIR spectrum (mFTIR)-based models are improved with the inclusion of test-day variables (mTest; milk composition and producer-reported data). Paired blood and milk samples were collected from multiparous cows 5 to 18 d postpartum at 3 Wisconsin farms (3,629 observations from 1,013 cows). Blood BHB concentration was determined by a Precision Xtra meter (Abbot Diabetes Care, Alameda, CA), and milk samples were analyzed by a privately owned laboratory (AgSource, Menomonie, WI) for components and FTIR spectrum absorbance. Producer-recorded variables were extracted from farm management software. A blood BHB ≥1.2 mmol/L was considered HYK. The data set was divided into a training set (n = 3,020) and an external testing set (n = 609). Model fitting was implemented with JMP 12 (SAS Institute, Cary, NC). A 5-fold cross-validation was performed on the training data set for the MLR, PLS, and ANN prediction methods, with square root of blood BHB as the dependent variable. Each method was fitted using 3 combinations of variables: mFTIR, mTest, or mTest + mFTIR variables. Models were evaluated based on coefficient of determination, root mean squared error, and area under the receiver operating characteristic curve. Four models (PLS-mTest + mFTIR, ANN-mFTIR, ANN-mTest, and ANN-mTest + mFTIR) were chosen for further evaluation in the testing set after fitting to the full training set. In the cross-validation analysis, model fit was greatest for ANN, followed
Zarei, Kobra; Atabati, Morteza; Ahmadi, Monire
2017-05-04
Bee algorithm (BA) is an optimization algorithm inspired by the natural foraging behaviour of honey bees to find the optimal solution which can be proposed to feature selection. In this paper, shuffling cross-validation-BA (CV-BA) was applied to select the best descriptors that could describe the retention factor (log k) in the biopartitioning micellar chromatography (BMC) of 79 heterogeneous pesticides. Six descriptors were obtained using BA and then the selected descriptors were applied for model development using multiple linear regression (MLR). The descriptor selection was also performed using stepwise, genetic algorithm and simulated annealing methods and MLR was applied to model development and then the results were compared with those obtained from shuffling CV-BA. The results showed that shuffling CV-BA can be applied as a powerful descriptor selection method. Support vector machine (SVM) was also applied for model development using six selected descriptors by BA. The obtained statistical results using SVM were better than those obtained using MLR, as the root mean square error (RMSE) and correlation coefficient (R) for whole data set (training and test), using shuffling CV-BA-MLR, were obtained as 0.1863 and 0.9426, respectively, while these amounts for the shuffling CV-BA-SVM method were obtained as 0.0704 and 0.9922, respectively.
International Nuclear Information System (INIS)
Wang Weida; Xia Junding; Zhou Zhixin; Leung, P.L.
2001-01-01
Thermoluminescence (TL) dating using a regression method of saturating exponential in pre-dose technique was described. 23 porcelain samples from past dynasties of China were dated by this method. The results show that the TL ages are in reasonable agreement with archaeological dates within a standard deviation of 27%. Such error can be accepted in porcelain dating
The analysis of survival data in nephrology: basic concepts and methods of Cox regression
van Dijk, Paul C.; Jager, Kitty J.; Zwinderman, Aeilko H.; Zoccali, Carmine; Dekker, Friedo W.
2008-01-01
How much does the survival of one group differ from the survival of another group? How do differences in age in these two groups affect such a comparison? To obtain a quantity to compare the survival of different patient groups and to account for confounding effects, a multiple regression technique
Estimating traffic volume on Wyoming low volume roads using linear and logistic regression methods
Directory of Open Access Journals (Sweden)
Dick Apronti
2016-12-01
Full Text Available Traffic volume is an important parameter in most transportation planning applications. Low volume roads make up about 69% of road miles in the United States. Estimating traffic on the low volume roads is a cost-effective alternative to taking traffic counts. This is because traditional traffic counts are expensive and impractical for low priority roads. The purpose of this paper is to present the development of two alternative means of cost-effectively estimating traffic volumes for low volume roads in Wyoming and to make recommendations for their implementation. The study methodology involves reviewing existing studies, identifying data sources, and carrying out the model development. The utility of the models developed were then verified by comparing actual traffic volumes to those predicted by the model. The study resulted in two regression models that are inexpensive and easy to implement. The first regression model was a linear regression model that utilized pavement type, access to highways, predominant land use types, and population to estimate traffic volume. In verifying the model, an R2 value of 0.64 and a root mean square error of 73.4% were obtained. The second model was a logistic regression model that identified the level of traffic on roads using five thresholds or levels. The logistic regression model was verified by estimating traffic volume thresholds and determining the percentage of roads that were accurately classified as belonging to the given thresholds. For the five thresholds, the percentage of roads classified correctly ranged from 79% to 88%. In conclusion, the verification of the models indicated both model types to be useful for accurate and cost-effective estimation of traffic volumes for low volume Wyoming roads. The models developed were recommended for use in traffic volume estimations for low volume roads in pavement management and environmental impact assessment studies.
Wavelet regression model in forecasting crude oil price
Hamid, Mohd Helmie; Shabri, Ani
2017-05-01
This study presents the performance of wavelet multiple linear regression (WMLR) technique in daily crude oil forecasting. WMLR model was developed by integrating the discrete wavelet transform (DWT) and multiple linear regression (MLR) model. The original time series was decomposed to sub-time series with different scales by wavelet theory. Correlation analysis was conducted to assist in the selection of optimal decomposed components as inputs for the WMLR model. The daily WTI crude oil price series has been used in this study to test the prediction capability of the proposed model. The forecasting performance of WMLR model were also compared with regular multiple linear regression (MLR), Autoregressive Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) using root mean square errors (RMSE) and mean absolute errors (MAE). Based on the experimental results, it appears that the WMLR model performs better than the other forecasting technique tested in this study.
Energy Technology Data Exchange (ETDEWEB)
Keilacker, H; Becker, G; Ziegler, M; Gottschling, H D [Zentralinstitut fuer Diabetes, Karlsburg (German Democratic Republic)
1980-10-01
In order to handle all types of radioimmunoassay (RIA) calibration curves obtained in the authors' laboratory in the same way, they tried to find a non-linear expression for their regression which allows calibration curves with different degrees of curvature to be fitted. Considering the two boundary cases of the incubation protocol they derived a hyperbolic inverse regression function: x = a/sub 1/y + a/sub 0/ + asub(-1)y/sup -1/, where x is the total concentration of antigen, asub(i) are constants, and y is the specifically bound radioactivity. An RIA evaluation procedure based on this function is described providing a fitted inverse RIA calibration curve and some statistical quality parameters. The latter are of an order which is normal for RIA systems. There is an excellent agreement between fitted and experimentally obtained calibration curves having a different degree of curvature.
Directory of Open Access Journals (Sweden)
Long Jiao
2015-05-01
Full Text Available The quantitative structure property relationship (QSPR for the boiling point (Tb of polychlorinated dibenzo-p-dioxins and polychlorinated dibenzofurans (PCDD/Fs was investigated. The molecular distance-edge vector (MDEV index was used as the structural descriptor. The quantitative relationship between the MDEV index and Tb was modeled by using multivariate linear regression (MLR and artificial neural network (ANN, respectively. Leave-one-out cross validation and external validation were carried out to assess the prediction performance of the models developed. For the MLR method, the prediction root mean square relative error (RMSRE of leave-one-out cross validation and external validation was 1.77 and 1.23, respectively. For the ANN method, the prediction RMSRE of leave-one-out cross validation and external validation was 1.65 and 1.16, respectively. A quantitative relationship between the MDEV index and Tb of PCDD/Fs was demonstrated. Both MLR and ANN are practicable for modeling this relationship. The MLR model and ANN model developed can be used to predict the Tb of PCDD/Fs. Thus, the Tb of each PCDD/F was predicted by the developed models.
International Nuclear Information System (INIS)
Jafri, Y.Z.; Kamal, L.
2007-01-01
Various statistical techniques was used on five-year data from 1998-2002 of average humidity, rainfall, maximum and minimum temperatures, respectively. The relationships to regression analysis time series (RATS) were developed for determining the overall trend of these climate parameters on the basis of which forecast models can be corrected and modified. We computed the coefficient of determination as a measure of goodness of fit, to our polynomial regression analysis time series (PRATS). The correlation to multiple linear regression (MLR) and multiple linear regression analysis time series (MLRATS) were also developed for deciphering the interdependence of weather parameters. Spearman's rand correlation and Goldfeld-Quandt test were used to check the uniformity or non-uniformity of variances in our fit to polynomial regression (PR). The Breusch-Pagan test was applied to MLR and MLRATS, respectively which yielded homoscedasticity. We also employed Bartlett's test for homogeneity of variances on a five-year data of rainfall and humidity, respectively which showed that the variances in rainfall data were not homogenous while in case of humidity, were homogenous. Our results on regression and regression analysis time series show the best fit to prediction modeling on climatic data of Quetta, Pakistan. (author)
Directory of Open Access Journals (Sweden)
Jun Bi
2018-04-01
Full Text Available Battery electric vehicles (BEVs reduce energy consumption and air pollution as compared with conventional vehicles. However, the limited driving range and potential long charging time of BEVs create new problems. Accurate charging time prediction of BEVs helps drivers determine travel plans and alleviate their range anxiety during trips. This study proposed a combined model for charging time prediction based on regression and time-series methods according to the actual data from BEVs operating in Beijing, China. After data analysis, a regression model was established by considering the charged amount for charging time prediction. Furthermore, a time-series method was adopted to calibrate the regression model, which significantly improved the fitting accuracy of the model. The parameters of the model were determined by using the actual data. Verification results confirmed the accuracy of the model and showed that the model errors were small. The proposed model can accurately depict the charging time characteristics of BEVs in Beijing.
Smith, Paul F; Ganesh, Siva; Liu, Ping
2013-10-30
Regression is a common statistical tool for prediction in neuroscience. However, linear regression is by far the most common form of regression used, with regression trees receiving comparatively little attention. In this study, the results of conventional multiple linear regression (MLR) were compared with those of random forest regression (RFR), in the prediction of the concentrations of 9 neurochemicals in the vestibular nucleus complex and cerebellum that are part of the l-arginine biochemical pathway (agmatine, putrescine, spermidine, spermine, l-arginine, l-ornithine, l-citrulline, glutamate and γ-aminobutyric acid (GABA)). The R(2) values for the MLRs were higher than the proportion of variance explained values for the RFRs: 6/9 of them were ≥ 0.70 compared to 4/9 for RFRs. Even the variables that had the lowest R(2) values for the MLRs, e.g. ornithine (0.50) and glutamate (0.61), had much lower proportion of variance explained values for the RFRs (0.27 and 0.49, respectively). The RSE values for the MLRs were lower than those for the RFRs in all but two cases. In general, MLRs seemed to be superior to the RFRs in terms of predictive value and error. In the case of this data set, MLR appeared to be superior to RFR in terms of its explanatory value and error. This result suggests that MLR may have advantages over RFR for prediction in neuroscience with this kind of data set, but that RFR can still have good predictive value in some cases. Copyright © 2013 Elsevier B.V. All rights reserved.
Lusiana, Evellin Dewi
2017-12-01
The parameters of binary probit regression model are commonly estimated by using Maximum Likelihood Estimation (MLE) method. However, MLE method has limitation if the binary data contains separation. Separation is the condition where there are one or several independent variables that exactly grouped the categories in binary response. It will result the estimators of MLE method become non-convergent, so that they cannot be used in modeling. One of the effort to resolve the separation is using Firths approach instead. This research has two aims. First, to identify the chance of separation occurrence in binary probit regression model between MLE method and Firths approach. Second, to compare the performance of binary probit regression model estimator that obtained by MLE method and Firths approach using RMSE criteria. Those are performed using simulation method and under different sample size. The results showed that the chance of separation occurrence in MLE method for small sample size is higher than Firths approach. On the other hand, for larger sample size, the probability decreased and relatively identic between MLE method and Firths approach. Meanwhile, Firths estimators have smaller RMSE than MLEs especially for smaller sample sizes. But for larger sample sizes, the RMSEs are not much different. It means that Firths estimators outperformed MLE estimator.
Regression to fuzziness method for estimation of remaining useful life in power plant components
Alamaniotis, Miltiadis; Grelle, Austin; Tsoukalas, Lefteri H.
2014-10-01
Mitigation of severe accidents in power plants requires the reliable operation of all systems and the on-time replacement of mechanical components. Therefore, the continuous surveillance of power systems is a crucial concern for the overall safety, cost control, and on-time maintenance of a power plant. In this paper a methodology called regression to fuzziness is presented that estimates the remaining useful life (RUL) of power plant components. The RUL is defined as the difference between the time that a measurement was taken and the estimated failure time of that component. The methodology aims to compensate for a potential lack of historical data by modeling an expert's operational experience and expertise applied to the system. It initially identifies critical degradation parameters and their associated value range. Once completed, the operator's experience is modeled through fuzzy sets which span the entire parameter range. This model is then synergistically used with linear regression and a component's failure point to estimate the RUL. The proposed methodology is tested on estimating the RUL of a turbine (the basic electrical generating component of a power plant) in three different cases. Results demonstrate the benefits of the methodology for components for which operational data is not readily available and emphasize the significance of the selection of fuzzy sets and the effect of knowledge representation on the predicted output. To verify the effectiveness of the methodology, it was benchmarked against the data-based simple linear regression model used for predictions which was shown to perform equal or worse than the presented methodology. Furthermore, methodology comparison highlighted the improvement in estimation offered by the adoption of appropriate of fuzzy sets for parameter representation.
Amini, Payam; Maroufizadeh, Saman; Samani, Reza Omani; Hamidi, Omid; Sepidarkish, Mahdi
2017-06-01
Preterm birth (PTB) is a leading cause of neonatal death and the second biggest cause of death in children under five years of age. The objective of this study was to determine the prevalence of PTB and its associated factors using logistic regression and decision tree classification methods. This cross-sectional study was conducted on 4,415 pregnant women in Tehran, Iran, from July 6-21, 2015. Data were collected by a researcher-developed questionnaire through interviews with mothers and review of their medical records. To evaluate the accuracy of the logistic regression and decision tree methods, several indices such as sensitivity, specificity, and the area under the curve were used. The PTB rate was 5.5% in this study. The logistic regression outperformed the decision tree for the classification of PTB based on risk factors. Logistic regression showed that multiple pregnancies, mothers with preeclampsia, and those who conceived with assisted reproductive technology had an increased risk for PTB ( p logistic regression model for the classification of risk groups for PTB.
Eekhout, I.; Wiel, M.A. van de; Heymans, M.W.
2017-01-01
Background. Multiple imputation is a recommended method to handle missing data. For significance testing after multiple imputation, Rubin’s Rules (RR) are easily applied to pool parameter estimates. In a logistic regression model, to consider whether a categorical covariate with more than two levels
Bias due to two-stage residual-outcome regression analysis in genetic association studies.
Demissie, Serkalem; Cupples, L Adrienne
2011-11-01
Association studies of risk factors and complex diseases require careful assessment of potential confounding factors. Two-stage regression analysis, sometimes referred to as residual- or adjusted-outcome analysis, has been increasingly used in association studies of single nucleotide polymorphisms (SNPs) and quantitative traits. In this analysis, first, a residual-outcome is calculated from a regression of the outcome variable on covariates and then the relationship between the adjusted-outcome and the SNP is evaluated by a simple linear regression of the adjusted-outcome on the SNP. In this article, we examine the performance of this two-stage analysis as compared with multiple linear regression (MLR) analysis. Our findings show that when a SNP and a covariate are correlated, the two-stage approach results in biased genotypic effect and loss of power. Bias is always toward the null and increases with the squared-correlation between the SNP and the covariate (). For example, for , 0.1, and 0.5, two-stage analysis results in, respectively, 0, 10, and 50% attenuation in the SNP effect. As expected, MLR was always unbiased. Since individual SNPs often show little or no correlation with covariates, a two-stage analysis is expected to perform as well as MLR in many genetic studies; however, it produces considerably different results from MLR and may lead to incorrect conclusions when independent variables are highly correlated. While a useful alternative to MLR under , the two -stage approach has serious limitations. Its use as a simple substitute for MLR should be avoided. © 2011 Wiley Periodicals, Inc.
Determination of benzo(apyrene content in PM10 using regression methods
Directory of Open Access Journals (Sweden)
Jacek Gębicki
2015-12-01
Full Text Available The paper presents an attempt of application of multidimensional linear regression to estimation of an empirical model describing the factors influencing on B(aP content in suspended dust PM10 in Olsztyn and Elbląg city regions between 2010 and 2013. During this period annual average concentration of B(aP in PM10 exceeded the admissible level 1.5-3 times. Conducted investigations confirm that the reasons of B(aP concentration increase are low-efficiency individual home heat stations or low-temperature heat sources, which are responsible for so-called low emission during heating period. Dependences between the following quantities were analysed: concentration of PM10 dust in air, air temperature, wind velocity, air humidity. A measure of model fitting to actual B(aP concentration in PM10 was the coefficient of determination of the model. Application of multidimensional linear regression yielded the equations characterized by high values of the coefficient of determination of the model, especially during heating season. This parameter ranged from 0.54 to 0.80 during the analyzed period.
Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.
2015-05-01
The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO2, Fe2O3, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na2O, K2O, TiO2, and P2O5, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144 channels
Directory of Open Access Journals (Sweden)
Xiangbing Zhou
2018-04-01
Full Text Available Rapidly growing GPS (Global Positioning System trajectories hide much valuable information, such as city road planning, urban travel demand, and population migration. In order to mine the hidden information and to capture better clustering results, a trajectory regression clustering method (an unsupervised trajectory clustering method is proposed to reduce local information loss of the trajectory and to avoid getting stuck in the local optimum. Using this method, we first define our new concept of trajectory clustering and construct a novel partitioning (angle-based partitioning method of line segments; second, the Lagrange-based method and Hausdorff-based K-means++ are integrated in fuzzy C-means (FCM clustering, which are used to maintain the stability and the robustness of the clustering process; finally, least squares regression model is employed to achieve regression clustering of the trajectory. In our experiment, the performance and effectiveness of our method is validated against real-world taxi GPS data. When comparing our clustering algorithm with the partition-based clustering algorithms (K-means, K-median, and FCM, our experimental results demonstrate that the presented method is more effective and generates a more reasonable trajectory.
A method to determine the necessity for global signal regression in resting-state fMRI studies.
Chen, Gang; Chen, Guangyu; Xie, Chunming; Ward, B Douglas; Li, Wenjun; Antuono, Piero; Li, Shi-Jiang
2012-12-01
In resting-state functional MRI studies, the global signal (operationally defined as the global average of resting-state functional MRI time courses) is often considered a nuisance effect and commonly removed in preprocessing. This global signal regression method can introduce artifacts, such as false anticorrelated resting-state networks in functional connectivity analyses. Therefore, the efficacy of this technique as a correction tool remains questionable. In this article, we establish that the accuracy of the estimated global signal is determined by the level of global noise (i.e., non-neural noise that has a global effect on the resting-state functional MRI signal). When the global noise level is low, the global signal resembles the resting-state functional MRI time courses of the largest cluster, but not those of the global noise. Using real data, we demonstrate that the global signal is strongly correlated with the default mode network components and has biological significance. These results call into question whether or not global signal regression should be applied. We introduce a method to quantify global noise levels. We show that a criteria for global signal regression can be found based on the method. By using the criteria, one can determine whether to include or exclude the global signal regression in minimizing errors in functional connectivity measures. Copyright © 2012 Wiley Periodicals, Inc.
Kolasa-Wiecek, Alicja
2015-04-01
The energy sector in Poland is the source of 81% of greenhouse gas (GHG) emissions. Poland, among other European Union countries, occupies a leading position with regard to coal consumption. Polish energy sector actively participates in efforts to reduce GHG emissions to the atmosphere, through a gradual decrease of the share of coal in the fuel mix and development of renewable energy sources. All evidence which completes the knowledge about issues related to GHG emissions is a valuable source of information. The article presents the results of modeling of GHG emissions which are generated by the energy sector in Poland. For a better understanding of the quantitative relationship between total consumption of primary energy and greenhouse gas emission, multiple stepwise regression model was applied. The modeling results of CO2 emissions demonstrate a high relationship (0.97) with the hard coal consumption variable. Adjustment coefficient of the model to actual data is high and equal to 95%. The backward step regression model, in the case of CH4 emission, indicated the presence of hard coal (0.66), peat and fuel wood (0.34), solid waste fuels, as well as other sources (-0.64) as the most important variables. The adjusted coefficient is suitable and equals R2=0.90. For N2O emission modeling the obtained coefficient of determination is low and equal to 43%. A significant variable influencing the amount of N2O emission is the peat and wood fuel consumption. Copyright © 2015. Published by Elsevier B.V.
Efectivity of Additive Spline for Partial Least Square Method in Regression Model Estimation
Directory of Open Access Journals (Sweden)
Ahmad Bilfarsah
2005-04-01
Full Text Available Additive Spline of Partial Least Square method (ASPL as one generalization of Partial Least Square (PLS method. ASPLS method can be acommodation to non linear and multicollinearity case of predictor variables. As a principle, The ASPLS method approach is cahracterized by two idea. The first is to used parametric transformations of predictors by spline function; the second is to make ASPLS components mutually uncorrelated, to preserve properties of the linear PLS components. The performance of ASPLS compared with other PLS method is illustrated with the fisher economic application especially the tuna fish production.
Ennouri, Karim; Ben Ayed, Rayda; Triki, Mohamed Ali; Ottaviani, Ennio; Mazzarello, Maura; Hertelli, Fathi; Zouari, Nabil
2017-07-01
The aim of the present work was to develop a model that supplies accurate predictions of the yields of delta-endotoxins and proteases produced by B. thuringiensis var. kurstaki HD-1. Using available medium ingredients as variables, a mathematical method, based on Plackett-Burman design (PB), was employed to analyze and compare data generated by the Bootstrap method and processed by multiple linear regressions (MLR) and artificial neural networks (ANN) including multilayer perceptron (MLP) and radial basis function (RBF) models. The predictive ability of these models was evaluated by comparison of output data through the determination of coefficient (R 2 ) and mean square error (MSE) values. The results demonstrate that the prediction of the yields of delta-endotoxin and protease was more accurate by ANN technique (87 and 89% for delta-endotoxin and protease determination coefficients, respectively) when compared with MLR method (73.1 and 77.2% for delta-endotoxin and protease determination coefficients, respectively), suggesting that the proposed ANNs, especially MLP, is a suitable new approach for determining yields of bacterial products that allow us to make more appropriate predictions in a shorter time and with less engineering effort.
Spady, Richard; Stouli, Sami
2012-01-01
We propose dual regression as an alternative to the quantile regression process for the global estimation of conditional distribution functions under minimal assumptions. Dual regression provides all the interpretational power of the quantile regression process while avoiding the need for repairing the intersecting conditional quantile surfaces that quantile regression often produces in practice. Our approach introduces a mathematical programming characterization of conditional distribution f...
Olive, David J
2017-01-01
This text covers both multiple linear regression and some experimental design models. The text uses the response plot to visualize the model and to detect outliers, does not assume that the error distribution has a known parametric distribution, develops prediction intervals that work when the error distribution is unknown, suggests bootstrap hypothesis tests that may be useful for inference after variable selection, and develops prediction regions and large sample theory for the multivariate linear regression model that has m response variables. A relationship between multivariate prediction regions and confidence regions provides a simple way to bootstrap confidence regions. These confidence regions often provide a practical method for testing hypotheses. There is also a chapter on generalized linear models and generalized additive models. There are many R functions to produce response and residual plots, to simulate prediction intervals and hypothesis tests, to detect outliers, and to choose response trans...
International Nuclear Information System (INIS)
Sun Zhong-Hua; Jiang Fan
2010-01-01
In this paper a new continuous variable called core-ratio is defined to describe the probability for a residue to be in a binding site, thereby replacing the previous binary description of the interface residue using 0 and 1. So we can use the support vector machine regression method to fit the core-ratio value and predict the protein binding sites. We also design a new group of physical and chemical descriptors to characterize the binding sites. The new descriptors are more effective, with an averaging procedure used. Our test shows that much better prediction results can be obtained by the support vector regression (SVR) method than by the support vector classification method. (rapid communication)
Directory of Open Access Journals (Sweden)
Liyun Su
2012-01-01
Full Text Available We introduce the extension of local polynomial fitting to the linear heteroscedastic regression model. Firstly, the local polynomial fitting is applied to estimate heteroscedastic function, then the coefficients of regression model are obtained by using generalized least squares method. One noteworthy feature of our approach is that we avoid the testing for heteroscedasticity by improving the traditional two-stage method. Due to nonparametric technique of local polynomial estimation, we do not need to know the heteroscedastic function. Therefore, we can improve the estimation precision, when the heteroscedastic function is unknown. Furthermore, we focus on comparison of parameters and reach an optimal fitting. Besides, we verify the asymptotic normality of parameters based on numerical simulations. Finally, this approach is applied to a case of economics, and it indicates that our method is surely effective in finite-sample situations.
Gilstrap, Donald L.
2013-01-01
In addition to qualitative methods presented in chaos and complexity theories in educational research, this article addresses quantitative methods that may show potential for future research studies. Although much in the social and behavioral sciences literature has focused on computer simulations, this article explores current chaos and…
Forecast daily indices of solar activity, F10.7, using support vector regression method
International Nuclear Information System (INIS)
Huang Cong; Liu Dandan; Wang Jingsong
2009-01-01
The 10.7 cm solar radio flux (F10.7), the value of the solar radio emission flux density at a wavelength of 10.7 cm, is a useful index of solar activity as a proxy for solar extreme ultraviolet radiation. It is meaningful and important to predict F10.7 values accurately for both long-term (months-years) and short-term (days) forecasting, which are often used as inputs in space weather models. This study applies a novel neural network technique, support vector regression (SVR), to forecasting daily values of F10.7. The aim of this study is to examine the feasibility of SVR in short-term F10.7 forecasting. The approach, based on SVR, reduces the dimension of feature space in the training process by using a kernel-based learning algorithm. Thus, the complexity of the calculation becomes lower and a small amount of training data will be sufficient. The time series of F10.7 from 2002 to 2006 are employed as the data sets. The performance of the approach is estimated by calculating the norm mean square error and mean absolute percentage error. It is shown that our approach can perform well by using fewer training data points than the traditional neural network. (research paper)
Multivariate linear regression of high-dimensional fMRI data with multiple target variables.
Valente, Giancarlo; Castellanos, Agustin Lage; Vanacore, Gianluca; Formisano, Elia
2014-05-01
Multivariate regression is increasingly used to study the relation between fMRI spatial activation patterns and experimental stimuli or behavioral ratings. With linear models, informative brain locations are identified by mapping the model coefficients. This is a central aspect in neuroimaging, as it provides the sought-after link between the activity of neuronal populations and subject's perception, cognition or behavior. Here, we show that mapping of informative brain locations using multivariate linear regression (MLR) may lead to incorrect conclusions and interpretations. MLR algorithms for high dimensional data are designed to deal with targets (stimuli or behavioral ratings, in fMRI) separately, and the predictive map of a model integrates information deriving from both neural activity patterns and experimental design. Not accounting explicitly for the presence of other targets whose associated activity spatially overlaps with the one of interest may lead to predictive maps of troublesome interpretation. We propose a new model that can correctly identify the spatial patterns associated with a target while achieving good generalization. For each target, the training is based on an augmented dataset, which includes all remaining targets. The estimation on such datasets produces both maps and interaction coefficients, which are then used to generalize. The proposed formulation is independent of the regression algorithm employed. We validate this model on simulated fMRI data and on a publicly available dataset. Results indicate that our method achieves high spatial sensitivity and good generalization and that it helps disentangle specific neural effects from interaction with predictive maps associated with other targets. Copyright © 2013 Wiley Periodicals, Inc.
International Nuclear Information System (INIS)
Yang, Jianhong; Yi, Cancan; Xu, Jinwu; Ma, Xianghong
2015-01-01
A new LIBS quantitative analysis method based on analytical line adaptive selection and Relevance Vector Machine (RVM) regression model is proposed. First, a scheme of adaptively selecting analytical line is put forward in order to overcome the drawback of high dependency on a priori knowledge. The candidate analytical lines are automatically selected based on the built-in characteristics of spectral lines, such as spectral intensity, wavelength and width at half height. The analytical lines which will be used as input variables of regression model are determined adaptively according to the samples for both training and testing. Second, an LIBS quantitative analysis method based on RVM is presented. The intensities of analytical lines and the elemental concentrations of certified standard samples are used to train the RVM regression model. The predicted elemental concentration analysis results will be given with a form of confidence interval of probabilistic distribution, which is helpful for evaluating the uncertainness contained in the measured spectra. Chromium concentration analysis experiments of 23 certified standard high-alloy steel samples have been carried out. The multiple correlation coefficient of the prediction was up to 98.85%, and the average relative error of the prediction was 4.01%. The experiment results showed that the proposed LIBS quantitative analysis method achieved better prediction accuracy and better modeling robustness compared with the methods based on partial least squares regression, artificial neural network and standard support vector machine. - Highlights: • Both training and testing samples are considered for analytical lines selection. • The analytical lines are auto-selected based on the built-in characteristics of spectral lines. • The new method can achieve better prediction accuracy and modeling robustness. • Model predictions are given with confidence interval of probabilistic distribution
Directory of Open Access Journals (Sweden)
Sara Mortaz Hejri
2013-01-01
Full Text Available Background: One of the methods used for standard setting is the borderline regression method (BRM. This study aims to assess the reliability of BRM when the pass-fail standard in an objective structured clinical examination (OSCE was calculated by averaging the BRM standards obtained for each station separately. Materials and Methods: In nine stations of the OSCE with direct observation the examiners gave each student a checklist score and a global score. Using a linear regression model for each station, we calculated the checklist score cut-off on the regression equation for the global scale cut-off set at 2. The OSCE pass-fail standard was defined as the average of all station′s standard. To determine the reliability, the root mean square error (RMSE was calculated. The R2 coefficient and the inter-grade discrimination were calculated to assess the quality of OSCE. Results: The mean total test score was 60.78. The OSCE pass-fail standard and its RMSE were 47.37 and 0.55, respectively. The R2 coefficients ranged from 0.44 to 0.79. The inter-grade discrimination score varied greatly among stations. Conclusion: The RMSE of the standard was very small indicating that BRM is a reliable method of setting standard for OSCE, which has the advantage of providing data for quality assurance.
International Nuclear Information System (INIS)
Ballini, J.-P.; Cazes, P.; Turpin, P.-Y.
1976-01-01
Analysing the histogram of anode pulse amplitudes allows a discussion of the hypothesis that has been proposed to account for the statistical processes of secondary multiplication in a photomultiplier. In an earlier work, good agreement was obtained between experimental and reconstructed spectra, assuming a first dynode distribution including two Poisson distributions of distinct mean values. This first approximation led to a search for a method which could give the weights of several Poisson distributions of distinct mean values. Three methods have been briefly exposed: classical linear regression, constraint regression (d'Esopo's method), and regression on variables subject to error. The use of these methods gives an approach of the frequency function which represents the dispersion of the punctual mean gain around the whole first dynode mean gain value. Comparison between this function and the one employed in Polya distribution allows the statement that the latter is inadequate to describe the statistical process of secondary multiplication. Numerous spectra obtained with two kinds of photomultiplier working under different physical conditions have been analysed. Then two points are discussed: - Does the frequency function represent the dynode structure and the interdynode collection process. - Is the model (the multiplication process of all dynodes but the first one, is Poissonian) valid whatever the photomultiplier and the utilization conditions. (Auth.)
Borodachev, S. M.
2016-06-01
The simple derivation of recursive least squares (RLS) method equations is given as special case of Kalman filter estimation of a constant system state under changing observation conditions. A numerical example illustrates application of RLS to multicollinearity problem.
Huang, Lei
2015-01-01
To solve the problem in which the conventional ARMA modeling methods for gyro random noise require a large number of samples and converge slowly, an ARMA modeling method using a robust Kalman filtering is developed. The ARMA model parameters are employed as state arguments. Unknown time-varying estimators of observation noise are used to achieve the estimated mean and variance of the observation noise. Using the robust Kalman filtering, the ARMA model parameters are estimated accurately. The developed ARMA modeling method has the advantages of a rapid convergence and high accuracy. Thus, the required sample size is reduced. It can be applied to modeling applications for gyro random noise in which a fast and accurate ARMA modeling method is required. PMID:26437409
Bolarinwa, O A; Adeola, O
2012-12-01
Digestible and metabolizable energy contents of feed ingredients for pigs can be determined by direct or indirect methods. There are situations when only the indirect approach is suitable and the regression method is a robust indirect approach. This study was conducted to compare the direct and regression methods for determining the energy value of wheat for pigs. Twenty-four barrows with an average initial BW of 31 kg were assigned to 4 diets in a randomized complete block design. The 4 diets consisted of 969 g wheat/kg plus minerals and vitamins (sole wheat) for the direct method, corn (Zea mays)-soybean (Glycine max) meal reference diet (RD), RD + 300 g wheat/kg, and RD + 600 g wheat/kg. The 3 corn-soybean meal diets were used for the regression method and wheat replaced the energy-yielding ingredients, corn and soybean meal, so that the same ratio of corn and soybean meal across the experimental diets was maintained. The wheat used was analyzed to contain 883 g DM, 15.2 g N, and 3.94 Mcal GE/kg. Each diet was fed to 6 barrows in individual metabolism crates for a 5-d acclimation followed by a 5-d total but separate collection of feces and urine. The DE and ME for the sole wheat diet were 3.83 and 3.77 Mcal/kg DM, respectively. Because the sole wheat diet contained 969 g wheat/kg, these translate to 3.95 Mcal DE/kg DM and 3.89 Mcal ME/kg DM. The RD used for the regression approach yielded 4.00 Mcal DE and 3.91 Mcal ME/kg DM diet. Increasing levels of wheat in the RD linearly reduced (P direct method (3.95 and 3.89 Mcal/kg DM) did not differ (0.78 < P < 0.89) from those obtained using the regression method (3.96 and 3.88 Mcal/kg DM).
Liou, Jyun-you; Smith, Elliot H.; Bateman, Lisa M.; McKhann, Guy M., II; Goodman, Robert R.; Greger, Bradley; Davis, Tyler S.; Kellis, Spencer S.; House, Paul A.; Schevon, Catherine A.
2017-08-01
Objective. Epileptiform discharges, an electrophysiological hallmark of seizures, can propagate across cortical tissue in a manner similar to traveling waves. Recent work has focused attention on the origination and propagation patterns of these discharges, yielding important clues to their source location and mechanism of travel. However, systematic studies of methods for measuring propagation are lacking. Approach. We analyzed epileptiform discharges in microelectrode array recordings of human seizures. The array records multiunit activity and local field potentials at 400 micron spatial resolution, from a small cortical site free of obstructions. We evaluated several computationally efficient statistical methods for calculating traveling wave velocity, benchmarking them to analyses of associated neuronal burst firing. Main results. Over 90% of discharges met statistical criteria for propagation across the sampled cortical territory. Detection rate, direction and speed estimates derived from a multiunit estimator were compared to four field potential-based estimators: negative peak, maximum descent, high gamma power, and cross-correlation. Interestingly, the methods that were computationally simplest and most efficient (negative peak and maximal descent) offer non-inferior results in predicting neuronal traveling wave velocities compared to the other two, more complex methods. Moreover, the negative peak and maximal descent methods proved to be more robust against reduced spatial sampling challenges. Using least absolute deviation in place of least squares error minimized the impact of outliers, and reduced the discrepancies between local field potential-based and multiunit estimators. Significance. Our findings suggest that ictal epileptiform discharges typically take the form of exceptionally strong, rapidly traveling waves, with propagation detectable across millimeter distances. The sequential activation of neurons in space can be inferred from clinically
Energy Technology Data Exchange (ETDEWEB)
Jabr, R.A. [Electrical, Computer and Communication Engineering Department, Notre Dame University, P.O. Box 72, Zouk Mikhael, Zouk Mosbeh (Lebanon)
2006-02-15
This paper presents an implementation of the least absolute value (LAV) power system state estimator based on obtaining a sequence of solutions to the L{sub 1}-regression problem using an iteratively reweighted least squares (IRLS{sub L1}) method. The proposed implementation avoids reformulating the regression problem into standard linear programming (LP) form and consequently does not require the use of common methods of LP, such as those based on the simplex method or interior-point methods. It is shown that the IRLS{sub L1} method is equivalent to solving a sequence of linear weighted least squares (LS) problems. Thus, its implementation presents little additional effort since the sparse LS solver is common to existing LS state estimators. Studies on the termination criteria of the IRLS{sub L1} method have been carried out to determine a procedure for which the proposed estimator is more computationally efficient than a previously proposed non-linear iteratively reweighted least squares (IRLS) estimator. Indeed, it is revealed that the proposed method is a generalization of the previously reported IRLS estimator, but is based on more rigorous theory. (author)
Energy Technology Data Exchange (ETDEWEB)
Boucher, Thomas F., E-mail: boucher@cs.umass.edu [School of Computer Science, University of Massachusetts Amherst, 140 Governor' s Drive, Amherst, MA 01003, United States. (United States); Ozanne, Marie V. [Department of Astronomy, Mount Holyoke College, South Hadley, MA 01075 (United States); Carmosino, Marco L. [School of Computer Science, University of Massachusetts Amherst, 140 Governor' s Drive, Amherst, MA 01003, United States. (United States); Dyar, M. Darby [Department of Astronomy, Mount Holyoke College, South Hadley, MA 01075 (United States); Mahadevan, Sridhar [School of Computer Science, University of Massachusetts Amherst, 140 Governor' s Drive, Amherst, MA 01003, United States. (United States); Breves, Elly A.; Lepore, Kate H. [Department of Astronomy, Mount Holyoke College, South Hadley, MA 01075 (United States); Clegg, Samuel M. [Los Alamos National Laboratory, P.O. Box 1663, MS J565, Los Alamos, NM 87545 (United States)
2015-05-01
The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO{sub 2}, Fe{sub 2}O{sub 3}, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na{sub 2}O, K{sub 2}O, TiO{sub 2}, and P{sub 2}O{sub 5}, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high
International Nuclear Information System (INIS)
Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.
2015-01-01
The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO 2 , Fe 2 O 3 , CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na 2 O, K 2 O, TiO 2 , and P 2 O 5 , the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144
Directory of Open Access Journals (Sweden)
Tamer Khatib
2014-01-01
Full Text Available In this research an improved approach for sizing standalone PV system (SAPV is presented. This work is an improved work developed previously by the authors. The previous work is based on the analytical method which faced some concerns regarding the difficulty of finding the model’s coefficients. Therefore, the proposed approach in this research is based on a combination of an analytical method and a machine learning approach for a generalized artificial neural network (GRNN. The GRNN assists to predict the optimal size of a PV system using the geographical coordinates of the targeted site instead of using mathematical formulas. Employing the GRNN facilitates the use of a previously developed method by the authors and avoids some of its drawbacks. The approach has been tested using data from five Malaysian sites. According to the results, the proposed method can be efficiently used for SAPV sizing whereas the proposed GRNN based model predicts the sizing curves of the PV system accurately with a prediction error of 0.6%. Moreover, hourly meteorological and load demand data are used in this research in order to consider the uncertainty of the solar energy and the load demand.
Comparison of Sparse and Jack-knife partial least squares regression methods for variable selection
DEFF Research Database (Denmark)
Karaman, Ibrahim; Qannari, El Mostafa; Martens, Harald
2013-01-01
The objective of this study was to compare two different techniques of variable selection, Sparse PLSR and Jack-knife PLSR, with respect to their predictive ability and their ability to identify relevant variables. Sparse PLSR is a method that is frequently used in genomics, whereas Jack-knife PL...
Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating
He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei
2013-01-01
Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…
Sun, L.G.; De Visser, C.C.; Chu, Q.P.; Mulder, J.A.
2012-01-01
The optimality of the kernel number and kernel centers plays a significant role in determining the approximation power of nearly all kernel methods. However, the process of choosing optimal kernels is always formulated as a global optimization task, which is hard to accomplish. Recently, an
Physics constrained nonlinear regression models for time series
International Nuclear Information System (INIS)
Majda, Andrew J; Harlim, John
2013-01-01
A central issue in contemporary science is the development of data driven statistical nonlinear dynamical models for time series of partial observations of nature or a complex physical model. It has been established recently that ad hoc quadratic multi-level regression (MLR) models can have finite-time blow up of statistical solutions and/or pathological behaviour of their invariant measure. Here a new class of physics constrained multi-level quadratic regression models are introduced, analysed and applied to build reduced stochastic models from data of nonlinear systems. These models have the advantages of incorporating memory effects in time as well as the nonlinear noise from energy conserving nonlinear interactions. The mathematical guidelines for the performance and behaviour of these physics constrained MLR models as well as filtering algorithms for their implementation are developed here. Data driven applications of these new multi-level nonlinear regression models are developed for test models involving a nonlinear oscillator with memory effects and the difficult test case of the truncated Burgers–Hopf model. These new physics constrained quadratic MLR models are proposed here as process models for Bayesian estimation through Markov chain Monte Carlo algorithms of low frequency behaviour in complex physical data. (paper)
Asghari, Mehdi Poursheikhali; Hayatshahi, Sayyed Hamed Sadat; Abdolmaleki, Parviz
2012-01-01
From both the structural and functional points of view, β-turns play important biological roles in proteins. In the present study, a novel two-stage hybrid procedure has been developed to identify β-turns in proteins. Binary logistic regression was initially used for the first time to select significant sequence parameters in identification of β-turns due to a re-substitution test procedure. Sequence parameters were consisted of 80 amino acid positional occurrences and 20 amino acid percentages in sequence. Among these parameters, the most significant ones which were selected by binary logistic regression model, were percentages of Gly, Ser and the occurrence of Asn in position i+2, respectively, in sequence. These significant parameters have the highest effect on the constitution of a β-turn sequence. A neural network model was then constructed and fed by the parameters selected by binary logistic regression to build a hybrid predictor. The networks have been trained and tested on a non-homologous dataset of 565 protein chains. With applying a nine fold cross-validation test on the dataset, the network reached an overall accuracy (Qtotal) of 74, which is comparable with results of the other β-turn prediction methods. In conclusion, this study proves that the parameter selection ability of binary logistic regression together with the prediction capability of neural networks lead to the development of more precise models for identifying β-turns in proteins.
Directory of Open Access Journals (Sweden)
Xiaoyan Yang
2018-04-01
Full Text Available The Advanced Spaceborne Thermal-Emission and Reflection Radiometer Global Digital Elevation Model (ASTER GDEM is important to a wide range of geographical and environmental studies. Its accuracy, to some extent associated with land-use types reflecting topography, vegetation coverage, and human activities, impacts the results and conclusions of these studies. In order to improve the accuracy of ASTER GDEM prior to its application, we investigated ASTER GDEM errors based on individual land-use types and proposed two linear regression calibration methods, one considering only land use-specific errors and the other considering the impact of both land-use and topography. Our calibration methods were tested on the coastal prefectural city of Lianyungang in eastern China. Results indicate that (1 ASTER GDEM is highly accurate for rice, wheat, grass and mining lands but less accurate for scenic, garden, wood and bare lands; (2 despite improvements in ASTER GDEM2 accuracy, multiple linear regression calibration requires more data (topography and a relatively complex calibration process; (3 simple linear regression calibration proves a practicable and simplified means to systematically investigate and improve the impact of land-use on ASTER GDEM accuracy. Our method is applicable to areas with detailed land-use data based on highly accurate field-based point-elevation measurements.
Preacher, Kristopher J.; Curran, Patrick J.; Bauer, Daniel J.
2006-01-01
Simple slopes, regions of significance, and confidence bands are commonly used to evaluate interactions in multiple linear regression (MLR) models, and the use of these techniques has recently been extended to multilevel or hierarchical linear modeling (HLM) and latent curve analysis (LCA). However, conducting these tests and plotting the…
Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...
Directory of Open Access Journals (Sweden)
Hukharnsusatrue, A.
2005-11-01
Full Text Available The objective of this research is to compare multiple regression coefficients estimating methods with existence of multicollinearity among independent variables. The estimation methods are Ordinary Least Squares method (OLS, Restricted Least Squares method (RLS, Restricted Ridge Regression method (RRR and Restricted Liu method (RL when restrictions are true and restrictions are not true. The study used the Monte Carlo Simulation method. The experiment was repeated 1,000 times under each situation. The analyzed results of the data are demonstrated as follows. CASE 1: The restrictions are true. In all cases, RRR and RL methods have a smaller Average Mean Square Error (AMSE than OLS and RLS method, respectively. RRR method provides the smallest AMSE when the level of correlations is high and also provides the smallest AMSE for all level of correlations and all sample sizes when standard deviation is equal to 5. However, RL method provides the smallest AMSE when the level of correlations is low and middle, except in the case of standard deviation equal to 3, small sample sizes, RRR method provides the smallest AMSE.The AMSE varies with, most to least, respectively, level of correlations, standard deviation and number of independent variables but inversely with to sample size.CASE 2: The restrictions are not true.In all cases, RRR method provides the smallest AMSE, except in the case of standard deviation equal to 1 and error of restrictions equal to 5%, OLS method provides the smallest AMSE when the level of correlations is low or median and there is a large sample size, but the small sample sizes, RL method provides the smallest AMSE. In addition, when error of restrictions is increased, OLS method provides the smallest AMSE for all level, of correlations and all sample sizes, except when the level of correlations is high and sample sizes small. Moreover, the case OLS method provides the smallest AMSE, the most RLS method has a smaller AMSE than
A computer program for uncertainty analysis integrating regression and Bayesian methods
Lu, Dan; Ye, Ming; Hill, Mary C.; Poeter, Eileen P.; Curtis, Gary
2014-01-01
This work develops a new functionality in UCODE_2014 to evaluate Bayesian credible intervals using the Markov Chain Monte Carlo (MCMC) method. The MCMC capability in UCODE_2014 is based on the FORTRAN version of the differential evolution adaptive Metropolis (DREAM) algorithm of Vrugt et al. (2009), which estimates the posterior probability density function of model parameters in high-dimensional and multimodal sampling problems. The UCODE MCMC capability provides eleven prior probability distributions and three ways to initialize the sampling process. It evaluates parametric and predictive uncertainties and it has parallel computing capability based on multiple chains to accelerate the sampling process. This paper tests and demonstrates the MCMC capability using a 10-dimensional multimodal mathematical function, a 100-dimensional Gaussian function, and a groundwater reactive transport model. The use of the MCMC capability is made straightforward and flexible by adopting the JUPITER API protocol. With the new MCMC capability, UCODE_2014 can be used to calculate three types of uncertainty intervals, which all can account for prior information: (1) linear confidence intervals which require linearity and Gaussian error assumptions and typically 10s–100s of highly parallelizable model runs after optimization, (2) nonlinear confidence intervals which require a smooth objective function surface and Gaussian observation error assumptions and typically 100s–1,000s of partially parallelizable model runs after optimization, and (3) MCMC Bayesian credible intervals which require few assumptions and commonly 10,000s–100,000s or more partially parallelizable model runs. Ready access allows users to select methods best suited to their work, and to compare methods in many circumstances.
CSIR Research Space (South Africa)
Gregor, Luke
2017-12-01
Full Text Available understanding with spatially integrated air–sea flux estimates (Fay and McKinley, 2014). Conversely, ocean biogeochemical process models are good tools for mechanis- tic understanding, but fail to represent the seasonality of CO2 fluxes in the Southern Ocean... of including coordinate variables as proxies of 1pCO2 in the empirical methods. In the inter- comparison study by Rödenbeck et al. (2015) proxies typi- cally include, but are not limited to, sea surface temperature (SST), chlorophyll a (Chl a), mixed layer...
Consistency analysis of subspace identification methods based on a linear regression approach
DEFF Research Database (Denmark)
Knudsen, Torben
2001-01-01
In the literature results can be found which claim consistency for the subspace method under certain quite weak assumptions. Unfortunately, a new result gives a counter example showing inconsistency under these assumptions and then gives new more strict sufficient assumptions which however does n...... not include important model structures as e.g. Box-Jenkins. Based on a simple least squares approach this paper shows the possible inconsistency under the weak assumptions and develops only slightly stricter assumptions sufficient for consistency and which includes any model structure...
International Nuclear Information System (INIS)
Sambou, Soussou
2004-01-01
In flood forecasting modelling, large basins are often considered as hydrological systems with multiple inputs and one output. Inputs are hydrological variables such rainfall, runoff and physical characteristics of basin; output is runoff. Relating inputs to output can be achieved using deterministic, conceptual, or stochastic models. Rainfall runoff models generally lack of accuracy. Physical hydrological processes based models, either deterministic or conceptual are highly data requirement demanding and by the way very complex. Stochastic multiple input-output models, using only historical chronicles of hydrological variables particularly runoff are by the way very popular among the hydrologists for large river basin flood forecasting. Application is made on the Senegal River upstream of Bakel, where the River is formed by the main branch, Bafing, and two tributaries, Bakoye and Faleme; Bafing being regulated by Manantaly Dam. A three inputs and one output model has been used for flood forecasting on Bakel. Influence of the lead forecasting, and of the three inputs taken separately, then associated two by two, and altogether has been verified using a dimensionless variance as criterion of quality. Inadequacies occur generally between model output and observations; to put model in better compliance with current observations, we have compared four parameter updating procedure, recursive least squares, Kalman filtering, stochastic gradient method, iterative method, and an AR errors forecasting model. A combination of these model updating have been used in real time flood forecasting.(Author)
High cycle fatigue test and regression methods of S-N curve
International Nuclear Information System (INIS)
Kim, D. W.; Park, J. Y.; Kim, W. G.; Yoon, J. H.
2011-11-01
The fatigue design curve in the ASME Boiler and Pressure Vessel Code Section III are based on the assumption that fatigue life is infinite after 106 cycles. This is because standard fatigue testing equipment prior to the past decades was limited in speed to less than 200 cycles per second. Traditional servo-hydraulic machines work at frequency of 50 Hz. Servo-hydraulic machines working at 1000 Hz have been developed after 1997. This machines allow high frequency and displacement of up to ±0.1 mm and dynamic load of ±20 kN are guaranteed. The frequency of resonant fatigue test machine is 50-250 Hz. Various forced vibration-based system works at 500 Hz or 1.8 kHz. Rotating bending machines allow testing frequency at 0.1-200 Hz. The main advantage of ultrasonic fatigue testing at 20 kHz is performing Although S-N curve is determined by experiment, the fatigue strength corresponding to a given fatigue life should be determined by statistical method considering the scatter of fatigue properties. In this report, the statistical methods for evaluation of fatigue test data is investigated
Variable selection methods in PLS regression - a comparison study on metabolomics data
DEFF Research Database (Denmark)
Karaman, İbrahim; Hedemann, Mette Skou; Knudsen, Knud Erik Bach
. The aim of the metabolomics study was to investigate the metabolic profile in pigs fed various cereal fractions with special attention to the metabolism of lignans using LC-MS based metabolomic approach. References 1. Lê Cao KA, Rossouw D, Robert-Granié C, Besse P: A Sparse PLS for Variable Selection when...... integrated approach. Due to the high number of variables in data sets (both raw data and after peak picking) the selection of important variables in an explorative analysis is difficult, especially when different data sets of metabolomics data need to be related. Variable selection (or removal of irrelevant...... different strategies for variable selection on PLSR method were considered and compared with respect to selected subset of variables and the possibility for biological validation. Sparse PLSR [1] as well as PLSR with Jack-knifing [2] was applied to data in order to achieve variable selection prior...
Freitas, Alex A; Limbu, Kriti; Ghafourian, Taravat
2015-01-01
Volume of distribution is an important pharmacokinetic property that indicates the extent of a drug's distribution in the body tissues. This paper addresses the problem of how to estimate the apparent volume of distribution at steady state (Vss) of chemical compounds in the human body using decision tree-based regression methods from the area of data mining (or machine learning). Hence, the pros and cons of several different types of decision tree-based regression methods have been discussed. The regression methods predict Vss using, as predictive features, both the compounds' molecular descriptors and the compounds' tissue:plasma partition coefficients (Kt:p) - often used in physiologically-based pharmacokinetics. Therefore, this work has assessed whether the data mining-based prediction of Vss can be made more accurate by using as input not only the compounds' molecular descriptors but also (a subset of) their predicted Kt:p values. Comparison of the models that used only molecular descriptors, in particular, the Bagging decision tree (mean fold error of 2.33), with those employing predicted Kt:p values in addition to the molecular descriptors, such as the Bagging decision tree using adipose Kt:p (mean fold error of 2.29), indicated that the use of predicted Kt:p values as descriptors may be beneficial for accurate prediction of Vss using decision trees if prior feature selection is applied. Decision tree based models presented in this work have an accuracy that is reasonable and similar to the accuracy of reported Vss inter-species extrapolations in the literature. The estimation of Vss for new compounds in drug discovery will benefit from methods that are able to integrate large and varied sources of data and flexible non-linear data mining methods such as decision trees, which can produce interpretable models. Graphical AbstractDecision trees for the prediction of tissue partition coefficient and volume of distribution of drugs.
Kim, T. W.; Park, G. H.
2014-12-01
Seasonal variation of aragonite saturation state (Ωarag) in the North Pacific Ocean (NPO) was investigated, using multiple linear regression (MLR) models produced from the PACIFICA (Pacific Ocean interior carbon) dataset. Data within depth ranges of 50-1200m were used to derive MLR models, and three parameters (potential temperature, nitrate, and apparent oxygen utilization (AOU)) were chosen as predictor variables because these parameters are associated with vertical mixing, DIC (dissolved inorganic carbon) removal and release which all affect Ωarag in water column directly or indirectly. The PACIFICA dataset was divided into 5° × 5° grids, and a MLR model was produced in each grid, giving total 145 independent MLR models over the NPO. Mean RMSE (root mean square error) and r2 (coefficient of determination) of all derived MLR models were approximately 0.09 and 0.96, respectively. Then the obtained MLR coefficients for each of predictor variables and an intercept were interpolated over the study area, thereby making possible to allocate MLR coefficients to data-sparse ocean regions. Predictability from the interpolated coefficients was evaluated using Hawaiian time-series data, and as a result mean residual between measured and predicted Ωarag values was approximately 0.08, which is less than the mean RMSE of our MLR models. The interpolated MLR coefficients were combined with seasonal climatology of World Ocean Atlas 2013 (1° × 1°) to produce seasonal Ωarag distributions over various depths. Large seasonal variability in Ωarag was manifested in the mid-latitude Western NPO (24-40°N, 130-180°E) and low-latitude Eastern NPO (0-12°N, 115-150°W). In the Western NPO, seasonal fluctuations of water column stratification appeared to be responsible for the seasonal variation in Ωarag (~ 0.5 at 50 m) because it closely followed temperature variations in a layer of 0-75 m. In contrast, remineralization of organic matter was the main cause for the seasonal
Current Mathematical Methods Used in QSAR/QSPR Studies
Directory of Open Access Journals (Sweden)
Peixun Liu
2009-04-01
Full Text Available This paper gives an overview of the mathematical methods currently used in quantitative structure-activity/property relationship (QASR/QSPR studies. Recently, the mathematical methods applied to the regression of QASR/QSPR models are developing very fast, and new methods, such as Gene Expression Programming (GEP, Project Pursuit Regression (PPR and Local Lazy Regression (LLR have appeared on the QASR/QSPR stage. At the same time, the earlier methods, including Multiple Linear Regression (MLR, Partial Least Squares (PLS, Neural Networks (NN, Support Vector Machine (SVM and so on, are being upgraded to improve their performance in QASR/QSPR studies. These new and upgraded methods and algorithms are described in detail, and their advantages and disadvantages are evaluated and discussed, to show their application potential in QASR/QSPR studies in the future.
Principal component regression for crop yield estimation
Suryanarayana, T M V
2016-01-01
This book highlights the estimation of crop yield in Central Gujarat, especially with regard to the development of Multiple Regression Models and Principal Component Regression (PCR) models using climatological parameters as independent variables and crop yield as a dependent variable. It subsequently compares the multiple linear regression (MLR) and PCR results, and discusses the significance of PCR for crop yield estimation. In this context, the book also covers Principal Component Analysis (PCA), a statistical procedure used to reduce a number of correlated variables into a smaller number of uncorrelated variables called principal components (PC). This book will be helpful to the students and researchers, starting their works on climate and agriculture, mainly focussing on estimation models. The flow of chapters takes the readers in a smooth path, in understanding climate and weather and impact of climate change, and gradually proceeds towards downscaling techniques and then finally towards development of ...
Zhao, Yu Xi; Xie, Ping; Sang, Yan Fang; Wu, Zi Yi
2018-04-01
Hydrological process evaluation is temporal dependent. Hydrological time series including dependence components do not meet the data consistency assumption for hydrological computation. Both of those factors cause great difficulty for water researches. Given the existence of hydrological dependence variability, we proposed a correlationcoefficient-based method for significance evaluation of hydrological dependence based on auto-regression model. By calculating the correlation coefficient between the original series and its dependence component and selecting reasonable thresholds of correlation coefficient, this method divided significance degree of dependence into no variability, weak variability, mid variability, strong variability, and drastic variability. By deducing the relationship between correlation coefficient and auto-correlation coefficient in each order of series, we found that the correlation coefficient was mainly determined by the magnitude of auto-correlation coefficient from the 1 order to p order, which clarified the theoretical basis of this method. With the first-order and second-order auto-regression models as examples, the reasonability of the deduced formula was verified through Monte-Carlo experiments to classify the relationship between correlation coefficient and auto-correlation coefficient. This method was used to analyze three observed hydrological time series. The results indicated the coexistence of stochastic and dependence characteristics in hydrological process.
Coskuntuncel, Orkun
2013-01-01
The purpose of this study is two-fold; the first aim being to show the effect of outliers on the widely used least squares regression estimator in social sciences. The second aim is to compare the classical method of least squares with the robust M-estimator using the "determination of coefficient" (R[superscript 2]). For this purpose,…
Directory of Open Access Journals (Sweden)
Faridah Hani Mohamed Salleh
2017-01-01
Full Text Available Gene regulatory network (GRN reconstruction is the process of identifying regulatory gene interactions from experimental data through computational analysis. One of the main reasons for the reduced performance of previous GRN methods had been inaccurate prediction of cascade motifs. Cascade error is defined as the wrong prediction of cascade motifs, where an indirect interaction is misinterpreted as a direct interaction. Despite the active research on various GRN prediction methods, the discussion on specific methods to solve problems related to cascade errors is still lacking. In fact, the experiments conducted by the past studies were not specifically geared towards proving the ability of GRN prediction methods in avoiding the occurrences of cascade errors. Hence, this research aims to propose Multiple Linear Regression (MLR to infer GRN from gene expression data and to avoid wrongly inferring of an indirect interaction (A → B → C as a direct interaction (A → C. Since the number of observations of the real experiment datasets was far less than the number of predictors, some predictors were eliminated by extracting the random subnetworks from global interaction networks via an established extraction method. In addition, the experiment was extended to assess the effectiveness of MLR in dealing with cascade error by using a novel experimental procedure that had been proposed in this work. The experiment revealed that the number of cascade errors had been very minimal. Apart from that, the Belsley collinearity test proved that multicollinearity did affect the datasets used in this experiment greatly. All the tested subnetworks obtained satisfactory results, with AUROC values above 0.5.
Salleh, Faridah Hani Mohamed; Zainudin, Suhaila; Arif, Shereena M
2017-01-01
Gene regulatory network (GRN) reconstruction is the process of identifying regulatory gene interactions from experimental data through computational analysis. One of the main reasons for the reduced performance of previous GRN methods had been inaccurate prediction of cascade motifs. Cascade error is defined as the wrong prediction of cascade motifs, where an indirect interaction is misinterpreted as a direct interaction. Despite the active research on various GRN prediction methods, the discussion on specific methods to solve problems related to cascade errors is still lacking. In fact, the experiments conducted by the past studies were not specifically geared towards proving the ability of GRN prediction methods in avoiding the occurrences of cascade errors. Hence, this research aims to propose Multiple Linear Regression (MLR) to infer GRN from gene expression data and to avoid wrongly inferring of an indirect interaction (A → B → C) as a direct interaction (A → C). Since the number of observations of the real experiment datasets was far less than the number of predictors, some predictors were eliminated by extracting the random subnetworks from global interaction networks via an established extraction method. In addition, the experiment was extended to assess the effectiveness of MLR in dealing with cascade error by using a novel experimental procedure that had been proposed in this work. The experiment revealed that the number of cascade errors had been very minimal. Apart from that, the Belsley collinearity test proved that multicollinearity did affect the datasets used in this experiment greatly. All the tested subnetworks obtained satisfactory results, with AUROC values above 0.5.
Fatekurohman, Mohamat; Nurmala, Nita; Anggraeni, Dian
2018-04-01
Lungs are the most important organ, in the case of respiratory system. Problems related to disorder of the lungs are various, i.e. pneumonia, emphysema, tuberculosis and lung cancer. Comparing all those problems, lung cancer is the most harmful. Considering about that, the aim of this research applies survival analysis and factors affecting the endurance of the lung cancer patient using comparison of exact, Efron and Breslow parameter approach method on hazard ratio and stratified cox regression model. The data applied are based on the medical records of lung cancer patients in Jember Paru-paru hospital on 2016, east java, Indonesia. The factors affecting the endurance of the lung cancer patients can be classified into several criteria, i.e. sex, age, hemoglobin, leukocytes, erythrocytes, sedimentation rate of blood, therapy status, general condition, body weight. The result shows that exact method of stratified cox regression model is better than other. On the other hand, the endurance of the patients is affected by their age and the general conditions.
Directory of Open Access Journals (Sweden)
Mohd Faris Dziauddin
2017-07-01
Full Text Available This study estimates the effect of locational attributes on residential property values in Kuala Lumpur, Malaysia. Geographically weighted regression (GWR enables the use of the local parameter rather than the global parameter to be estimated, with the results presented in map form. The results of this study reveal that residential property values are mainly determined by the property’s physical (structural attributes, but proximity to locational attributes also contributes marginally. The use of GWR in this study is considered a better approach than other methods to examine the effect of locational attributes on residential property values. GWR has the capability to produce meaningful results in which different locational attributes have differential spatial effects across a geographical area on residential property values. This method has the ability to determine the factors on which premiums depend, and in turn it can assist the government in taxation matters.
Zhang, Hongyang; Welch, William J.; Zamar, Ruben H.
2017-01-01
Tomal et al. (2015) introduced the notion of "phalanxes" in the context of rare-class detection in two-class classification problems. A phalanx is a subset of features that work well for classification tasks. In this paper, we propose a different class of phalanxes for application in regression settings. We define a "Regression Phalanx" - a subset of features that work well together for prediction. We propose a novel algorithm which automatically chooses Regression Phalanxes from high-dimensi...
Zhang, L; Liu, X J
2016-06-03
With the rapid development of next-generation high-throughput sequencing technology, RNA-seq has become a standard and important technique for transcriptome analysis. For multi-sample RNA-seq data, the existing expression estimation methods usually deal with each single-RNA-seq sample, and ignore that the read distributions are consistent across multiple samples. In the current study, we propose a structured sparse regression method, SSRSeq, to estimate isoform expression using multi-sample RNA-seq data. SSRSeq uses a non-parameter model to capture the general tendency of non-uniformity read distribution for all genes across multiple samples. Additionally, our method adds a structured sparse regularization, which not only incorporates the sparse specificity between a gene and its corresponding isoform expression levels, but also reduces the effects of noisy reads, especially for lowly expressed genes and isoforms. Four real datasets were used to evaluate our method on isoform expression estimation. Compared with other popular methods, SSRSeq reduced the variance between multiple samples, and produced more accurate isoform expression estimations, and thus more meaningful biological interpretations.
Directory of Open Access Journals (Sweden)
Ying-Hsin Chang
2013-01-01
Full Text Available Human estrogen receptor (ER isoforms, ERα and ERβ, have long been an important focus in the field of biology. To better understand the structural features associated with the binding of ERα ligands to ERα and modulate their function, several QSAR models, including CoMFA, CoMSIA, SVR, and LR methods, have been employed to predict the inhibitory activity of 68 raloxifene derivatives. In the SVR and LR modeling, 11 descriptors were selected through feature ranking and sequential feature addition/deletion to generate equations to predict the inhibitory activity toward ERα. Among four descriptors that constantly appear in various generated equations, two agree with CoMFA and CoMSIA steric fields and another two can be correlated to a calculated electrostatic potential of ERα.
Directory of Open Access Journals (Sweden)
Corrado Dimauro
2010-01-01
Full Text Available Two methods of SNPs pre-selection based on single marker regression for the estimation of genomic breeding values (G-EBVs were compared using simulated data provided by the XII QTL-MAS workshop: i Bonferroni correction of the significance threshold and ii Permutation test to obtain the reference distribution of the null hypothesis and identify significant markers at P<0.01 and P<0.001 significance thresholds. From the set of markers significant at P<0.001, random subsets of 50% and 25% markers were extracted, to evaluate the effect of further reducing the number of significant SNPs on G-EBV predictions. The Bonferroni correction method allowed the identification of 595 significant SNPs that gave the best G-EBV accuracies in prediction generations (82.80%. The permutation methods gave slightly lower G-EBV accuracies even if a larger number of SNPs resulted significant (2,053 and 1,352 for 0.01 and 0.001 significance thresholds, respectively. Interestingly, halving or dividing by four the number of SNPs significant at P<0.001 resulted in an only slightly decrease of G-EBV accuracies. The genetic structure of the simulated population with few QTL carrying large effects, might have favoured the Bonferroni method.
Directory of Open Access Journals (Sweden)
Rachid Darnag
2017-02-01
Full Text Available Support vector machines (SVM represent one of the most promising Machine Learning (ML tools that can be applied to develop a predictive quantitative structure–activity relationship (QSAR models using molecular descriptors. Multiple linear regression (MLR and artificial neural networks (ANNs were also utilized to construct quantitative linear and non linear models to compare with the results obtained by SVM. The prediction results are in good agreement with the experimental value of HIV activity; also, the results reveal the superiority of the SVM over MLR and ANN model. The contribution of each descriptor to the structure–activity relationships was evaluated.
Directory of Open Access Journals (Sweden)
Lüdtke Rainer
2008-08-01
Full Text Available Abstract Background Regression to the mean (RTM occurs in situations of repeated measurements when extreme values are followed by measurements in the same subjects that are closer to the mean of the basic population. In uncontrolled studies such changes are likely to be interpreted as a real treatment effect. Methods Several statistical approaches have been developed to analyse such situations, including the algorithm of Mee and Chua which assumes a known population mean μ. We extend this approach to a situation where μ is unknown and suggest to vary it systematically over a range of reasonable values. Using differential calculus we provide formulas to estimate the range of μ where treatment effects are likely to occur when RTM is present. Results We successfully applied our method to three real world examples denoting situations when (a no treatment effect can be confirmed regardless which μ is true, (b when a treatment effect must be assumed independent from the true μ and (c in the appraisal of results of uncontrolled studies. Conclusion Our method can be used to separate the wheat from the chaff in situations, when one has to interpret the results of uncontrolled studies. In meta-analysis, health-technology reports or systematic reviews this approach may be helpful to clarify the evidence given from uncontrolled observational studies.
Regression analysis by example
Chatterjee, Samprit
2012-01-01
Praise for the Fourth Edition: ""This book is . . . an excellent source of examples for regression analysis. It has been and still is readily readable and understandable."" -Journal of the American Statistical Association Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. Regression Analysis by Example, Fifth Edition has been expanded
International Nuclear Information System (INIS)
Schierle, C.; Otto, M.
1992-01-01
A two layer perceptron with backpropagation of error is used for quantitative analysis in ICP-AES. The network was trained by emission spectra of two interfering lines of Cd and As and the concentrations of both elements were subsequently estimated from mixture spectra. The spectra of the Cd and As lines were also used to perform multiple linear regression (MLR) via the calculation of the pseudoinverse S + of the sensitivity matrix S. In the present paper it is shown that there exist close relations between the operation of the perceptron and the MLR procedure. These are most clearly apparent in the correlation between the weights of the backpropagation network and the elements of the pseudoinverse. Using MLR, the confidence intervals over the predictions are exploited to correct for the optical device of the wavelength shift. (orig.)
International Nuclear Information System (INIS)
Arsenault, Louis-François; Millis, Andrew J; Neuberg, Richard; Hannah, Lauren A
2017-01-01
We present a supervised machine learning approach to the inversion of Fredholm integrals of the first kind as they arise, for example, in the analytic continuation problem of quantum many-body physics. The approach provides a natural regularization for the ill-conditioned inverse of the Fredholm kernel, as well as an efficient and stable treatment of constraints. The key observation is that the stability of the forward problem permits the construction of a large database of outputs for physically meaningful inputs. Applying machine learning to this database generates a regression function of controlled complexity, which returns approximate solutions for previously unseen inputs; the approximate solutions are then projected onto the subspace of functions satisfying relevant constraints. Under standard error metrics the method performs as well or better than the Maximum Entropy method for low input noise and is substantially more robust to increased input noise. We suggest that the methodology will be similarly effective for other problems involving a formally ill-conditioned inversion of an integral operator, provided that the forward problem can be efficiently solved. (paper)
Fouad, Marwa A; Tolba, Enas H; El-Shal, Manal A; El Kerdawy, Ahmed M
2018-05-11
The justified continuous emerging of new β-lactam antibiotics provokes the need for developing suitable analytical methods that accelerate and facilitate their analysis. A face central composite experimental design was adopted using different levels of phosphate buffer pH, acetonitrile percentage at zero time and after 15 min in a gradient program to obtain the optimum chromatographic conditions for the elution of 31 β-lactam antibiotics. Retention factors were used as the target property to build two QSRR models utilizing the conventional forward selection and the advanced nature-inspired firefly algorithm for descriptor selection, coupled with multiple linear regression. The obtained models showed high performance in both internal and external validation indicating their robustness and predictive ability. Williams-Hotelling test and student's t-test showed that there is no statistical significant difference between the models' results. Y-randomization validation showed that the obtained models are due to significant correlation between the selected molecular descriptors and the analytes' chromatographic retention. These results indicate that the generated FS-MLR and FFA-MLR models are showing comparable quality on both the training and validation levels. They also gave comparable information about the molecular features that influence the retention behavior of β-lactams under the current chromatographic conditions. We can conclude that in some cases simple conventional feature selection algorithm can be used to generate robust and predictive models comparable to that are generated using advanced ones. Copyright © 2018 Elsevier B.V. All rights reserved.
Energy Technology Data Exchange (ETDEWEB)
Golusin, Mirjana [Educons University, Vojvode Putnika st. bb, 21013 Sremska Kamnica (RS); Ivanovic, Olja Munitlak [Faculty of Business in Services, Vojvode Putnik st. bb, 21013 Sremska Kamenica (RS); Teodorovic, Natasa [Faculty of Entrepreneurial Management, Modene st. 5, 21000 Novi Sad (RS)
2011-01-15
The need for preservation and adequate management of the quality of environment requires the development of new methods and techniques by which the achieved degree of sustainable development can be defined as well as the laws regarding the relationship among its subsystems. Main objective of research is to point to a strong contradiction between the development of ecological and economic subsystems. In order to improve previous research, this study suggests the use of linear evaluation, by which it is possible to determine the exact degree of contradiction between these two subsystems and to define the regularities as well as the deviations. Authors present the essential steps that were used. Conducted by the method of linear regression this research shows a significant negative correlation between ecological and economic subsystem indicators, whereas its value R{sup 2} 0.58 proves the expected contradiction that exists between the two previously mentioned subsystems. By observing the sustainable development as a two-dimensional system that includes ecological and economic indicators, the authors suggest the methodology to modelling the relationship between economic and ecological development as an orthogonal distance between the degree of the current state measured by the relation between economic and ecological indicators of sustainable development and the degree which was obtained in a traditional way. The method used in this research proved to be extremely suitable for modelling the relationship between ecological and economic subsystems of sustainable development. This research was conducted on a repeated sample of countries of South East Europe by including the data for France and Germany, being two countries on the highest level of development in the European Union. (author)
Modeling of Soil Aggregate Stability using Support Vector Machines and Multiple Linear Regression
Directory of Open Access Journals (Sweden)
Ali Asghar Besalatpour
2016-02-01
by 20-m digital elevation model (DEM. The data set was divided into two subsets of training and testing. The training subset was randomly chosen from 70% of the total set of the data and the remaining samples (30% of the data were used as the testing set. The correlation coefficient (r, mean square error (MSE, and error percentage (ERROR% between the measured and the predicted GMD values were used to evaluate the performance of the models. Results and Discussion: The description statistics showed that there was little variability in the sample distributions of the variables used in this study to develop the GMD prediction models, indicating that their values were all normally distributed. The constructed SVM model had better performance in predicting GMD compared to the traditional multiple linear regression model. The obtained MSE and r values for the developed SVM model for soil aggregate stability prediction were 0.005 and 0.86, respectively. The obtained ERROR% value for soil aggregate stability prediction using the SVM model was 10.7% while it was 15.7% for the regression model. The scatter plot figures also showed that the SVM model was more accurate in GMD estimation than the MLR model, since the predicted GMD values were closer in agreement with the measured values for most of the samples. The worse performance of the MLR model might be due to the larger amount of data that is required for developing a sustainable regression model compared to intelligent systems. Furthermore, only the linear effects of the predictors on the dependent variable can be extracted by linear models while in many cases the effects may not be linear in nature. Meanwhile, the SVM model is suitable for modelling nonlinear relationships and its major advantage is that the method can be developed without knowing the exact form of the analytical function on which the model should be built. All these indicate that the SVM approach would be a better choice for predicting soil aggregate
Energy Technology Data Exchange (ETDEWEB)
Lee, Sang Dae; Lohumi, Santosh; Cho, Byoung Kwan [Dept. of Biosystems Machinery Engineering, Chungnam National University, Daejeon (Korea, Republic of); Kim, Moon Sung [United States Department of Agriculture Agricultural Research Service, Washington (United States); Lee, Soo Hee [Life and Technology Co.,Ltd., Hwasung (Korea, Republic of)
2014-08-15
This study was conducted to develop a non-destructive detection method for adulterated powder products using Raman spectroscopy and partial least squares regression(PLSR). Garlic and ginger powder, which are used as natural seasoning and in health supplement foods, were selected for this experiment. Samples were adulterated with corn starch in concentrations of 5-35%. PLSR models for adulterated garlic and ginger powders were developed and their performances evaluated using cross validation. The R{sup 2}{sub c} and SEC of an optimal PLSR model were 0.99 and 2.16 for the garlic powder samples, and 0.99 and 0.84 for the ginger samples, respectively. The variable importance in projection (VIP) score is a useful and simple tool for the evaluation of the importance of each variable in a PLSR model. After the VIP scores were taken pre-selection, the Raman spectrum data was reduced by one third. New PLSR models, based on a reduced number of wavelengths selected by the VIP scores technique, gave good predictions for the adulterated garlic and ginger powder samples.
Akita, Yasuyuki; Baldasano, Jose M; Beelen, Rob; Cirach, Marta; de Hoogh, Kees; Hoek, Gerard; Nieuwenhuijsen, Mark; Serre, Marc L; de Nazelle, Audrey
2014-04-15
In recognition that intraurban exposure gradients may be as large as between-city variations, recent air pollution epidemiologic studies have become increasingly interested in capturing within-city exposure gradients. In addition, because of the rapidly accumulating health data, recent studies also need to handle large study populations distributed over large geographic domains. Even though several modeling approaches have been introduced, a consistent modeling framework capturing within-city exposure variability and applicable to large geographic domains is still missing. To address these needs, we proposed a modeling framework based on the Bayesian Maximum Entropy method that integrates monitoring data and outputs from existing air quality models based on Land Use Regression (LUR) and Chemical Transport Models (CTM). The framework was applied to estimate the yearly average NO2 concentrations over the region of Catalunya in Spain. By jointly accounting for the global scale variability in the concentration from the output of CTM and the intraurban scale variability through LUR model output, the proposed framework outperformed more conventional approaches.
International Nuclear Information System (INIS)
Lee, Sang Dae; Lohumi, Santosh; Cho, Byoung Kwan; Kim, Moon Sung; Lee, Soo Hee
2014-01-01
This study was conducted to develop a non-destructive detection method for adulterated powder products using Raman spectroscopy and partial least squares regression(PLSR). Garlic and ginger powder, which are used as natural seasoning and in health supplement foods, were selected for this experiment. Samples were adulterated with corn starch in concentrations of 5-35%. PLSR models for adulterated garlic and ginger powders were developed and their performances evaluated using cross validation. The R 2 c and SEC of an optimal PLSR model were 0.99 and 2.16 for the garlic powder samples, and 0.99 and 0.84 for the ginger samples, respectively. The variable importance in projection (VIP) score is a useful and simple tool for the evaluation of the importance of each variable in a PLSR model. After the VIP scores were taken pre-selection, the Raman spectrum data was reduced by one third. New PLSR models, based on a reduced number of wavelengths selected by the VIP scores technique, gave good predictions for the adulterated garlic and ginger powder samples.
Matson, Johnny L.; Kozlowski, Alison M.
2010-01-01
Autistic regression is one of the many mysteries in the developmental course of autism and pervasive developmental disorders not otherwise specified (PDD-NOS). Various definitions of this phenomenon have been used, further clouding the study of the topic. Despite this problem, some efforts at establishing prevalence have been made. The purpose of…
International Nuclear Information System (INIS)
Zhao, Yongsheng; Zhao, Jihong; Huang, Ying; Zhou, Qing; Zhang, Xiangping; Zhang, Suojiang
2014-01-01
Highlights: • A comprehensive database on toxicity of ionic liquids (ILs) was established. • Relationship between structure and toxicity of IL has been analyzed qualitatively. • Two new QSAR models were developed for predicting toxicity of ILs to IPC-81. • Accuracy of proposed nonlinear SVM model is much higher than the linear MLR model. • The established models can be explored in designing novel green agents. - Abstract: A comprehensive database on toxicity of ionic liquids (ILs) is established. The database includes over 4000 pieces of data. Based on the database, the relationship between IL's structure and its toxicity has been analyzed qualitatively. Furthermore, Quantitative Structure–Activity relationships (QSAR) model is conducted to predict the toxicities (EC 50 values) of various ILs toward the Leukemia rat cell line IPC-81. Four parameters selected by the heuristic method (HM) are used to perform the studies of multiple linear regression (MLR) and support vector machine (SVM). The squared correlation coefficient (R 2 ) and the root mean square error (RMSE) of training sets by two QSAR models are 0.918 and 0.959, 0.258 and 0.179, respectively. The prediction R 2 and RMSE of QSAR test sets by MLR model are 0.892 and 0.329, by SVM model are 0.958 and 0.234, respectively. The nonlinear model developed by SVM algorithm is much outperformed MLR, which indicates that SVM model is more reliable in the prediction of toxicity of ILs. This study shows that increasing the relative number of O atoms of molecules leads to decrease in the toxicity of ILs
Gómez-Valent, Adrià; Amendola, Luca
2018-04-01
In this paper we present new constraints on the Hubble parameter H0 using: (i) the available data on H(z) obtained from cosmic chronometers (CCH); (ii) the Hubble rate data points extracted from the supernovae of Type Ia (SnIa) of the Pantheon compilation and the Hubble Space Telescope (HST) CANDELS and CLASH Multy-Cycle Treasury (MCT) programs; and (iii) the local HST measurement of H0 provided by Riess et al. (2018), H0HST=(73.45±1.66) km/s/Mpc. Various determinations of H0 using the Gaussian processes (GPs) method and the most updated list of CCH data have been recently provided by Yu, Ratra & Wang (2018). Using the Gaussian kernel they find H0=(67.42± 4.75) km/s/Mpc. Here we extend their analysis to also include the most released and complete set of SnIa data, which allows us to reduce the uncertainty by a factor ~ 3 with respect to the result found by only considering the CCH information. We obtain H0=(67.06± 1.68) km/s/Mpc, which favors again the lower range of values for H0 and is in tension with H0HST. The tension reaches the 2.71σ level. We round off the GPs determination too by taking also into account the error propagation of the kernel hyperparameters when the CCH with and without H0HST are used in the analysis. In addition, we present a novel method to reconstruct functions from data, which consists in a weighted sum of polynomial regressions (WPR). We apply it from a cosmographic perspective to reconstruct H(z) and estimate H0 from CCH and SnIa measurements. The result obtained with this method, H0=(68.90± 1.96) km/s/Mpc, is fully compatible with the GPs ones. Finally, a more conservative GPs+WPR value is also provided, H0=(68.45± 2.00) km/s/Mpc, which is still almost 2σ away from H0HST.
Cao, M H; Adeola, O
2016-02-01
The energy values of poultry byproduct meal (PBM) and animal-vegetable oil blend (A-V blend) were determined in 2 experiments with 288 broiler chickens from d 19 to 25 post hatching. The birds were fed a starter diet from d 0 to 19 post hatching. In each experiment, 144 birds were grouped by weight into 8 replicates of cages with 6 birds per cage. There were 3 diets in each experiment consisting of one reference diet (RD) and 2 test diets (TD). The TD contained 2 levels of PBM (Exp. 1) or A-V blend (Exp. 2) that replaced the energy sources in the RD at 50 or 100 g/kg (Exp. 1) or 40 or 80 g/kg (Exp. 2) in such a way that the same ratio were maintained for energy ingredients across experimental diets. The ileal digestible energy (IDE), ME, and MEn of PBM and A-V blend were determined by the regression method. Dry matter of PBM and A-V blend were 984 and 999 g/kg; the gross energies were 5,284 and 9,604 kcal/kg of DM, respectively. Addition of PBM to the RD in Exp. 1 linearly decreased (P blend to the RD linearly increased (P blend as follows: IDE = 10,616x + 7.350, r(2) = 0.96; ME = 10,121x + 0.447, r(2) = 0.99; MEn = 10,124x + 2.425, r(2) = 0.99. These data indicate the respective IDE, ME, MEn values (kcal/kg of DM) of PBM evaluated to be 3,537, 3,805, and 3,278, and A-V blend evaluated to be 10,616, 10,121, and 10,124. © 2015 Poultry Science Association Inc.
Choi, Giehae; Bell, Michelle L.; Lee, Jong-Tae
2017-04-01
The land-use regression (LUR) approach to estimate the levels of ambient air pollutants is becoming popular due to its high validity in predicting small-area variations. However, only a few studies have been conducted in Asian countries, and much less research has been conducted on comparing the performances and applied estimates of different exposure assessments including LUR. The main objectives of the current study were to conduct nitrogen dioxide (NO2) exposure assessment with four methods including LUR in the Republic of Korea, to compare the model performances, and to estimate the empirical NO2 exposures of a cohort. The study population was defined as the year 2010 participants of a government-supported cohort established for bio-monitoring in Ulsan, Republic of Korea. The annual ambient NO2 exposures of the 969 study participants were estimated with LUR, nearest station, inverse distance weighting, and ordinary kriging. Modeling was based on the annual NO2 average, traffic-related data, land-use data, and altitude of the 13 regularly monitored stations. The final LUR model indicated that area of transportation, distance to residential area, and area of wetland were important predictors of NO2. The LUR model explained 85.8% of the variation observed in the 13 monitoring stations of the year 2009. The LUR model outperformed the others based on leave-one out cross-validation comparing the correlations and root-mean square error. All NO2 estimates ranged from 11.3-18.0 ppb, with that of LUR having the widest range. The NO2 exposure levels of the residents differed by demographics. However, the average was below the national annual guidelines of the Republic of Korea (30 ppb). The LUR models showed high performances in an industrial city in the Republic of Korea, despite the small sample size and limited data. Our findings suggest that the LUR method may be useful in similar settings in Asian countries where the target region is small and availability of data is
Zhao, Wei; Fan, Shaojia; Guo, Hai; Gao, Bo; Sun, Jiaren; Chen, Laiguo
2016-11-01
The quantile regression (QR) method has been increasingly introduced to atmospheric environmental studies to explore the non-linear relationship between local meteorological conditions and ozone mixing ratios. In this study, we applied QR for the first time, together with multiple linear regression (MLR), to analyze the dominant meteorological parameters influencing the mean, 10th percentile, 90th percentile and 99th percentile of maximum daily 8-h average (MDA8) ozone concentrations in 2000-2015 in Hong Kong. The dominance analysis (DA) was used to assess the relative importance of meteorological variables in the regression models. Results showed that the MLR models worked better at suburban and rural sites than at urban sites, and worked better in winter than in summer. QR models performed better in summer for 99th and 90th percentiles and performed better in autumn and winter for 10th percentile. And QR models also performed better in suburban and rural areas for 10th percentile. The top 3 dominant variables associated with MDA8 ozone concentrations, changing with seasons and regions, were frequently associated with the six meteorological parameters: boundary layer height, humidity, wind direction, surface solar radiation, total cloud cover and sea level pressure. Temperature rarely became a significant variable in any season, which could partly explain the peak of monthly average ozone concentrations in October in Hong Kong. And we found the effect of solar radiation would be enhanced during extremely ozone pollution episodes (i.e., the 99th percentile). Finally, meteorological effects on MDA8 ozone had no significant changes before and after the 2010 Asian Games.
Wan, Jian; Chen, Yi-Chieh; Morris, A Julian; Thennadil, Suresh N
2017-07-01
Near-infrared (NIR) spectroscopy is being widely used in various fields ranging from pharmaceutics to the food industry for analyzing chemical and physical properties of the substances concerned. Its advantages over other analytical techniques include available physical interpretation of spectral data, nondestructive nature and high speed of measurements, and little or no need for sample preparation. The successful application of NIR spectroscopy relies on three main aspects: pre-processing of spectral data to eliminate nonlinear variations due to temperature, light scattering effects and many others, selection of those wavelengths that contribute useful information, and identification of suitable calibration models using linear/nonlinear regression . Several methods have been developed for each of these three aspects and many comparative studies of different methods exist for an individual aspect or some combinations. However, there is still a lack of comparative studies for the interactions among these three aspects, which can shed light on what role each aspect plays in the calibration and how to combine various methods of each aspect together to obtain the best calibration model. This paper aims to provide such a comparative study based on four benchmark data sets using three typical pre-processing methods, namely, orthogonal signal correction (OSC), extended multiplicative signal correction (EMSC) and optical path-length estimation and correction (OPLEC); two existing wavelength selection methods, namely, stepwise forward selection (SFS) and genetic algorithm optimization combined with partial least squares regression for spectral data (GAPLSSP); four popular regression methods, namely, partial least squares (PLS), least absolute shrinkage and selection operator (LASSO), least squares support vector machine (LS-SVM), and Gaussian process regression (GPR). The comparative study indicates that, in general, pre-processing of spectral data can play a significant
Bolarinwa, O A; Adeola, O
2016-02-01
Direct or indirect methods can be used to determine the DE and ME of feed ingredients for pigs. In situations when only the indirect approach is suitable, the regression method presents a robust indirect approach. Three experiments were conducted to compare the direct and regression methods for determining the DE and ME values of barley, sorghum, and wheat for pigs. In each experiment, 24 barrows with an average initial BW of 31, 32, and 33 kg were assigned to 4 diets in a randomized complete block design. The 4 diets consisted of 969 g barley, sorghum, or wheat/kg plus minerals and vitamins for the direct method; a corn-soybean meal reference diet (RD); the RD + 300 g barley, sorghum, or wheat/kg; and the RD + 600 g barley, sorghum, or wheat/kg. The 3 corn-soybean meal diets were used for the regression method. Each diet was fed to 6 barrows in individual metabolism crates for a 5-d acclimation followed by a 5-d period of total but separate collection of feces and urine in each experiment. Graded substitution of barley or wheat, but not sorghum, into the RD linearly reduced ( direct method-derived DE and ME for barley were 3,669 and 3,593 kcal/kg DM, respectively. The regressions of barley contribution to DE and ME in kilocalories against the quantity of barley DMI in kilograms generated 3,746 kcal DE/kg DM and 3,647 kcal ME/kg DM. The DE and ME for sorghum by the direct method were 4,097 and 4,042 kcal/kg DM, respectively; the corresponding regression-derived estimates were 4,145 and 4,066 kcal/kg DM. Using the direct method, energy values for wheat were 3,953 kcal DE/kg DM and 3,889 kcal ME/kg DM. The regressions of wheat contribution to DE and ME in kilocalories against the quantity of wheat DMI in kilograms generated 3,960 kcal DE/kg DM and 3,874 kcal ME/kg DM. The DE and ME of barley using the direct method were not different (0.3 direct method-derived DE and ME of sorghum were not different (0.5 direct method- and regression method-derived DE (3,953 and 3
Delbari, Masoomeh; Sharifazari, Salman; Mohammadi, Ehsan
2018-02-01
The knowledge of soil temperature at different depths is important for agricultural industry and for understanding climate change. The aim of this study is to evaluate the performance of a support vector regression (SVR)-based model in estimating daily soil temperature at 10, 30 and 100 cm depth at different climate conditions over Iran. The obtained results were compared to those obtained from a more classical multiple linear regression (MLR) model. The correlation sensitivity for the input combinations and periodicity effect were also investigated. Climatic data used as inputs to the models were minimum and maximum air temperature, solar radiation, relative humidity, dew point, and the atmospheric pressure (reduced to see level), collected from five synoptic stations Kerman, Ahvaz, Tabriz, Saghez, and Rasht located respectively in the hyper-arid, arid, semi-arid, Mediterranean, and hyper-humid climate conditions. According to the results, the performance of both MLR and SVR models was quite well at surface layer, i.e., 10-cm depth. However, SVR performed better than MLR in estimating soil temperature at deeper layers especially 100 cm depth. Moreover, both models performed better in humid climate condition than arid and hyper-arid areas. Further, adding a periodicity component into the modeling process considerably improved the models' performance especially in the case of SVR.
Sahabiev, I. A.; Ryazanov, S. S.; Kolcova, T. G.; Grigoryan, B. R.
2018-03-01
The three most common techniques to interpolate soil properties at a field scale—ordinary kriging (OK), regression kriging with multiple linear regression drift model (RK + MLR), and regression kriging with principal component regression drift model (RK + PCR)—were examined. The results of the performed study were compiled into an algorithm of choosing the most appropriate soil mapping technique. Relief attributes were used as the auxiliary variables. When spatial dependence of a target variable was strong, the OK method showed more accurate interpolation results, and the inclusion of the auxiliary data resulted in an insignificant improvement in prediction accuracy. According to the algorithm, the RK + PCR method effectively eliminates multicollinearity of explanatory variables. However, if the number of predictors is less than ten, the probability of multicollinearity is reduced, and application of the PCR becomes irrational. In that case, the multiple linear regression should be used instead.
Mansouri, Edris; Feizi, Faranak; Jafari Rad, Alireza; Arian, Mehran
2018-03-01
This paper uses multivariate regression to create a mathematical model for iron skarn exploration in the Sarvian area, central Iran, using multivariate regression for mineral prospectivity mapping (MPM). The main target of this paper is to apply multivariate regression analysis (as an MPM method) to map iron outcrops in the northeastern part of the study area in order to discover new iron deposits in other parts of the study area. Two types of multivariate regression models using two linear equations were employed to discover new mineral deposits. This method is one of the reliable methods for processing satellite images. ASTER satellite images (14 bands) were used as unique independent variables (UIVs), and iron outcrops were mapped as dependent variables for MPM. According to the results of the probability value (p value), coefficient of determination value (R2) and adjusted determination coefficient (Radj2), the second regression model (which consistent of multiple UIVs) fitted better than other models. The accuracy of the model was confirmed by iron outcrops map and geological observation. Based on field observation, iron mineralization occurs at the contact of limestone and intrusive rocks (skarn type).
Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L
2018-02-01
A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R 2 ), using R 2 as the primary metric of assay agreement. However, the use of R 2 alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Understanding poisson regression.
Hayat, Matthew J; Higgins, Melinda
2014-04-01
Nurse investigators often collect study data in the form of counts. Traditional methods of data analysis have historically approached analysis of count data either as if the count data were continuous and normally distributed or with dichotomization of the counts into the categories of occurred or did not occur. These outdated methods for analyzing count data have been replaced with more appropriate statistical methods that make use of the Poisson probability distribution, which is useful for analyzing count data. The purpose of this article is to provide an overview of the Poisson distribution and its use in Poisson regression. Assumption violations for the standard Poisson regression model are addressed with alternative approaches, including addition of an overdispersion parameter or negative binomial regression. An illustrative example is presented with an application from the ENSPIRE study, and regression modeling of comorbidity data is included for illustrative purposes. Copyright 2014, SLACK Incorporated.
Antropov, K M; Varaksin, A N
2013-01-01
This paper provides the description of Land Use Regression (LUR) modeling and the result of its application in the study of nitrogen dioxide air pollution in Ekaterinburg. The paper describes the difficulties of the modeling for air pollution caused by motor vehicles exhaust, and the ways to address these challenges. To create LUR model of the NO2 air pollution in Ekaterinburg, concentrations of NO2 were measured, data on factors affecting air pollution were collected, a statistical analysis of the data were held. A statistical model of NO2 air pollution (coefficient of determination R2 = 0.70) and a map of pollution were created.
Directory of Open Access Journals (Sweden)
Kamlendra Singh Bhadoriya
2016-09-01
Full Text Available TRPM8 is now best known as a cold- and menthol-activated channel implicated in thermosensation. TRPM8 is specifically expressed in a subset of pain- and temperature-sensing neuron. TRPM8 plays a major role in the sensation of cold and cooling substances. TRPM8 is a potential new target for the treatment of painful conditions. Thus, TRPM8 antagonists represent a new, novel and potentially useful treatment strategy to treat various disease states such as urological disorders, asthma, COPD, prostate and colon cancers, and painful conditions related to cold, such as cold allodynia and cold hyperalgesia. Better tools such as potent and specific TRPM8 antagonists are mandatory as high unmet medical need for such progress. To achieve this objective quantitative structure–activity relationship (QSAR studies were carried out on a series of 25 benzimidazole-containing TRPM8 antagonists to investigate the structural requirements of their inhibitory activity against cTRPM8. The statistically significant best 2D-QSAR model having correlation coefficient r2 = 0.88 and cross-validated squared correlation coefficient q2 = 0.64 with external predictive ability of pred_r2 = 0.69 was developed by SW-MLR. The physico-chemical descriptors such as polarizabilityAHP, kappa2, XcompDipole, +vePotentialSurfaceArea, XKMostHydrophilic were found to show a significant correlation with biological activity in benzimidazole derivatives. Molecular field analysis was used to construct the best 3D-QSAR model using SW-kNN method, showing good correlative and predictive capabilities in terms of q2 = 0.81 and pred_r2 = 0.55. Developed kNN-MFA model highlighted the importance of shape of the molecules, i.e., steric & electrostatic descriptors at the grid points S_774 & E_1024 for TRPM8 receptor binding. These models (2D & 3D were found to yield reliable clues for further optimization of benzimidazole derivatives in the data set. The information rendered by 2D- and 3D
Ali, M Sanni; Groenwold, Rolf H H; Belitser, Svetlana V; Souverein, Patrick C; Martín, Elisa; Gatto, Nicolle M; Huerta, Consuelo; Gardarsdottir, Helga; Roes, Kit C B; Hoes, Arno W; de Boer, Antonius; Klungel, Olaf H
2016-01-01
BACKGROUND: Observational studies including time-varying treatments are prone to confounding. We compared time-varying Cox regression analysis, propensity score (PS) methods, and marginal structural models (MSMs) in a study of antidepressant [selective serotonin reuptake inhibitors (SSRIs)] use and
Freund, Rudolf J; Sa, Ping
2006-01-01
The book provides complete coverage of the classical methods of statistical analysis. It is designed to give students an understanding of the purpose of statistical analyses, to allow the student to determine, at least to some degree, the correct type of statistical analyses to be performed in a given situation, and have some appreciation of what constitutes good experimental design
More, Anand Govind; Gupta, Sunil Kumar
2018-03-24
Bioelectrochemical system (BES) is a novel, self-sustaining metal removal technology functioning on the utilization of chemical energy of organic matter with the help of microorganisms. Experimental trials of two chambered BES reactor were conducted with varying substrate concentration using sodium acetate (500 mg/L to 2000 mg/L COD) and different initial chromium concentration (Cr i ) (10-100 mg/L) at different cathode pH (pH 1-7). In the current study mathematical models based on multiple linear regression (MLR) and non-linear regression (NLR) approach were developed using laboratory experimental data for determining chromium removal efficiency (CRE) in the cathode chamber of BES. Substrate concentration, rate of substrate consumption, Cr i , pH, temperature and hydraulic retention time (HRT) were the operating process parameters of the reactor considered for development of the proposed models. MLR showed a better correlation coefficient (0.972) as compared to NLR (0.952). Validation of the models using t-test analysis revealed unbiasedness of both the models, with t critical value (2.04) greater than t-calculated values for MLR (-0.708) and NLR (-0.86). The root-mean-square error (RMSE) for MLR and NLR were 5.06 % and 7.45 %, respectively. Comparison between both models suggested MLR to be best suited model for predicting the chromium removal behavior using the BES technology to specify a set of operating conditions for BES. Modelling the behavior of CRE will be helpful for scale up of BES technology at industrial level. Copyright © 2018 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.
Shabri, Ani; Samsudin, Ruhaidah
2014-01-01
Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series. PMID:24895666
Directory of Open Access Journals (Sweden)
Ani Shabri
2014-01-01
Full Text Available Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI, has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series.
Shabri, Ani; Samsudin, Ruhaidah
2014-01-01
Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series.
Lee, Mi Hee; Lee, Soo Bong; Eo, Yang Dam; Kim, Sun Woong; Woo, Jung-Hun; Han, Soo Hee
2017-07-01
Landsat optical images have enough spatial and spectral resolution to analyze vegetation growth characteristics. But, the clouds and water vapor degrade the image quality quite often, which limits the availability of usable images for the time series vegetation vitality measurement. To overcome this shortcoming, simulated images are used as an alternative. In this study, weighted average method, spatial and temporal adaptive reflectance fusion model (STARFM) method, and multilinear regression analysis method have been tested to produce simulated Landsat normalized difference vegetation index (NDVI) images of the Korean Peninsula. The test results showed that the weighted average method produced the images most similar to the actual images, provided that the images were available within 1 month before and after the target date. The STARFM method gives good results when the input image date is close to the target date. Careful regional and seasonal consideration is required in selecting input images. During summer season, due to clouds, it is very difficult to get the images close enough to the target date. Multilinear regression analysis gives meaningful results even when the input image date is not so close to the target date. Average R 2 values for weighted average method, STARFM, and multilinear regression analysis were 0.741, 0.70, and 0.61, respectively.
Energy Technology Data Exchange (ETDEWEB)
Zhang, Yan-Feng; Dai, Shu-Gui [College of Environmental Science and Engineering, Nankai University, Key Laboratory for Pollution Process and Environmental Criteria of Ministry of Education, Tianjin (China); Ma, Yi [College of Chemistry, Nankai University, Institute of Elemento-Organic Chemistry, Tianjin (China); Gao, Zhi-Xian [Institute of Hygiene and Environmental Medicine, Tianjin (China)
2010-07-15
Immunoassays have been regarded as a possible alternative or supplement for measuring polycyclic aromatic hydrocarbons (PAHs) in the environment. Since there are too many potential cross-reactants for PAH immunoassays, it is difficult to determine all the cross-reactivities (CRs) by experimental tests. The relationship between CR and the physical-chemical properties of PAHs and related compounds was investigated using the CR data from a commercial enzyme-linked immunosorbent assay (ELISA) kit test. Two quantitative structure-activity relationship (QSAR) techniques, regression analysis and comparative molecular field analysis (CoMFA), were applied for predicting the CR of PAHs in this ELISA kit. Parabolic regression indicates that the CRs are significantly correlated with the logarithm of the partition coefficient for the octanol-water system (log K{sub ow}) (r{sup 2}=0.643, n=23, P<0.0001), suggesting that hydrophobic interactions play an important role in the antigen-antibody binding and the cross-reactions in this ELISA test. The CoMFA model obtained shows that the CRs of the PAHs are correlated with the 3D structure of the molecules (r{sub cv}{sup 2}=0.663, r{sup 2}=0.873, F{sub 4,32}=55.086). The contributions of the steric and electrostatic fields to CR were 40.4 and 59.6%, respectively. Both of the QSAR models satisfactorily predict the CR in this PAH immunoassay kit, and help in understanding the mechanisms of antigen-antibody interaction. (orig.)
Directory of Open Access Journals (Sweden)
Mehmet Das
2018-01-01
Full Text Available In this study, an air heated solar collector (AHSC dryer was designed to determine the drying characteristics of the pear. Flat pear slices of 10 mm thickness were used in the experiments. The pears were dried both in the AHSC dryer and under the sun. Panel glass temperature, panel floor temperature, panel inlet temperature, panel outlet temperature, drying cabinet inlet temperature, drying cabinet outlet temperature, drying cabinet temperature, drying cabinet moisture, solar radiation, pear internal temperature, air velocity and mass loss of pear were measured at 30 min intervals. Experiments were carried out during the periods of June 2017 in Elazig, Turkey. The experiments started at 8:00 a.m. and continued till 18:00. The experiments were continued until the weight changes in the pear slices stopped. Wet basis moisture content (MCw, dry basis moisture content (MCd, adjustable moisture ratio (MR, drying rate (DR, and convective heat transfer coefficient (hc were calculated with both in the AHSC dryer and the open sun drying experiment data. It was found that the values of hc in both drying systems with a range 12.4 and 20.8 W/m2 °C. Three different kernel models were used in the support vector machine (SVM regression to construct the predictive model of the calculated hc values for both systems. The mean absolute error (MAE, root mean squared error (RMSE, relative absolute error (RAE and root relative absolute error (RRAE analysis were performed to indicate the predictive model’s accuracy. As a result, the rate of drying of the pear was examined for both systems and it was observed that the pear had dried earlier in the AHSC drying system. A predictive model was obtained using the SVM regression for the calculated hc values for the pear in the AHSC drying system. The normalized polynomial kernel was determined as the best kernel model in SVM for estimating the hc values.
Gross, Samuel M; Tibshirani, Robert
2015-04-01
We consider the scenario where one observes an outcome variable and sets of features from multiple assays, all measured on the same set of samples. One approach that has been proposed for dealing with these type of data is "sparse multiple canonical correlation analysis" (sparse mCCA). All of the current sparse mCCA techniques are biconvex and thus have no guarantees about reaching a global optimum. We propose a method for performing sparse supervised canonical correlation analysis (sparse sCCA), a specific case of sparse mCCA when one of the datasets is a vector. Our proposal for sparse sCCA is convex and thus does not face the same difficulties as the other methods. We derive efficient algorithms for this problem that can be implemented with off the shelf solvers, and illustrate their use on simulated and real data. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Tokunaga, Makoto; Watanabe, Susumu; Sonoda, Shigeru
2017-09-01
Multiple linear regression analysis is often used to predict the outcome of stroke rehabilitation. However, the predictive accuracy may not be satisfactory. The objective of this study was to elucidate the predictive accuracy of a method of calculating motor Functional Independence Measure (mFIM) at discharge from mFIM effectiveness predicted by multiple regression analysis. The subjects were 505 patients with stroke who were hospitalized in a convalescent rehabilitation hospital. The formula "mFIM at discharge = mFIM effectiveness × (91 points - mFIM at admission) + mFIM at admission" was used. By including the predicted mFIM effectiveness obtained through multiple regression analysis in this formula, we obtained the predicted mFIM at discharge (A). We also used multiple regression analysis to directly predict mFIM at discharge (B). The correlation between the predicted and the measured values of mFIM at discharge was compared between A and B. The correlation coefficients were .916 for A and .878 for B. Calculating mFIM at discharge from mFIM effectiveness predicted by multiple regression analysis had a higher degree of predictive accuracy of mFIM at discharge than that directly predicted. Copyright © 2017 National Stroke Association. Published by Elsevier Inc. All rights reserved.
Jacobsen, R. T.; Stewart, R. B.; Crain, R. W., Jr.; Rose, G. L.; Myers, A. F.
1976-01-01
A method was developed for establishing a rational choice of the terms to be included in an equation of state with a large number of adjustable coefficients. The methods presented were developed for use in the determination of an equation of state for oxygen and nitrogen. However, a general application of the methods is possible in studies involving the determination of an optimum polynomial equation for fitting a large number of data points. The data considered in the least squares problem are experimental thermodynamic pressure-density-temperature data. Attention is given to a description of stepwise multiple regression and the use of stepwise regression in the determination of an equation of state for oxygen and nitrogen.
Shi, Yuan; Lau, Kevin Ka-Lun; Ng, Edward
2017-08-01
Urban air quality serves as an important function of the quality of urban life. Land use regression (LUR) modelling of air quality is essential for conducting health impacts assessment but more challenging in mountainous high-density urban scenario due to the complexities of the urban environment. In this study, a total of 21 LUR models are developed for seven kinds of air pollutants (gaseous air pollutants CO, NO 2 , NO x , O 3 , SO 2 and particulate air pollutants PM 2.5 , PM 10 ) with reference to three different time periods (summertime, wintertime and annual average of 5-year long-term hourly monitoring data from local air quality monitoring network) in Hong Kong. Under the mountainous high-density urban scenario, we improved the traditional LUR modelling method by incorporating wind availability information into LUR modelling based on surface geomorphometrical analysis. As a result, 269 independent variables were examined to develop the LUR models by using the "ADDRESS" independent variable selection method and stepwise multiple linear regression (MLR). Cross validation has been performed for each resultant model. The results show that wind-related variables are included in most of the resultant models as statistically significant independent variables. Compared with the traditional method, a maximum increase of 20% was achieved in the prediction performance of annual averaged NO 2 concentration level by incorporating wind-related variables into LUR model development. Copyright © 2017 Elsevier Inc. All rights reserved.
Jović, Ozren; Smrečki, Neven; Popović, Zora
2016-04-01
A novel quantitative prediction and variable selection method called interval ridge regression (iRR) is studied in this work. The method is performed on six data sets of FTIR, two data sets of UV-vis and one data set of DSC. The obtained results show that models built with ridge regression on optimal variables selected with iRR significantly outperfom models built with ridge regression on all variables in both calibration (6 out of 9 cases) and validation (2 out of 9 cases). In this study, iRR is also compared with interval partial least squares regression (iPLS). iRR outperfomed iPLS in validation (insignificantly in 6 out of 9 cases and significantly in one out of 9 cases for poil, a well known health beneficial nutrient, is studied in this work by mixing it with cheap and widely used oils such as soybean (So) oil, rapeseed (R) oil and sunflower (Su) oil. Binary mixture sets of hempseed oil with these three oils (HSo, HR and HSu) and a ternary mixture set of H oil, R oil and Su oil (HRSu) were considered. The obtained accuracy indicates that using iRR on FTIR and UV-vis data, each particular oil can be very successfully quantified (in all 8 cases RMSEPoil (R(2)>0.99). Copyright © 2015 Elsevier B.V. All rights reserved.
Setiawan, Suhartono, Ahmad, Imam Safawi; Rahmawati, Noorgam Ika
2015-12-01
Bank Indonesia (BI) as the central bank of Republic Indonesiahas a single overarching objective to establish and maintain rupiah stability. This objective could be achieved by monitoring traffic of inflow and outflow money currency. Inflow and outflow are related to stock and distribution of money currency around Indonesia territory. It will effect of economic activities. Economic activities of Indonesia,as one of Moslem country, absolutely related to Islamic Calendar (lunar calendar), that different with Gregorian calendar. This research aims to forecast the inflow and outflow money currency of Representative Office (RO) of BI Semarang Central Java region. The results of the analysis shows that the characteristics of inflow and outflow money currency influenced by the effects of the calendar variations, that is the day of Eid al-Fitr (moslem holyday) as well as seasonal patterns. In addition, the period of a certain week during Eid al-Fitr also affect the increase of inflow and outflow money currency. The best model based on the value of the smallestRoot Mean Square Error (RMSE) for inflow data is ARIMA model. While the best model for predicting the outflow data in RO of BI Semarang is ARIMAX model or Time Series Regression, because both of them have the same model. The results forecast in a period of 2015 shows an increase of inflow money currency happened in August, while the increase in outflow money currency happened in July.
Directory of Open Access Journals (Sweden)
Matthias Schmid
Full Text Available Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1. Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures.
Doran, Kara S.; Howd, Peter A.; Sallenger,, Asbury H.
2016-01-04
This report documents the development of statistical tools used to quantify the hazard presented by the response of sea-level elevation to natural or anthropogenic changes in climate and ocean circulation. A hazard is a physical process (or processes) that, when combined with vulnerability (or susceptibility to the hazard), results in risk. This study presents the development and comparison of new and existing sea-level analysis methods, exploration of the strengths and weaknesses of the methods using synthetic time series, and when appropriate, synthesis of the application of the method to observed sea-level time series. These reports are intended to enhance material presented in peer-reviewed journal articles where it is not always possible to provide the level of detail that might be necessary to fully support or recreate published results.
International Nuclear Information System (INIS)
Hu, Chao; Jain, Gaurav; Zhang, Puqiang; Schmidt, Craig; Gomadam, Parthasarathy; Gorka, Tom
2014-01-01
Highlights: • We develop a data-driven method for the battery capacity estimation. • Five charge-related features that are indicative of the capacity are defined. • The kNN regression model captures the dependency of the capacity on the features. • Results with 10 years’ continuous cycling data verify the effectiveness of the method. - Abstract: Reliability of lithium-ion (Li-ion) rechargeable batteries used in implantable medical devices has been recognized as of high importance from a broad range of stakeholders, including medical device manufacturers, regulatory agencies, physicians, and patients. To ensure Li-ion batteries in these devices operate reliably, it is important to be able to assess the battery health condition by estimating the battery capacity over the life-time. This paper presents a data-driven method for estimating the capacity of Li-ion battery based on the charge voltage and current curves. The contributions of this paper are three-fold: (i) the definition of five characteristic features of the charge curves that are indicative of the capacity, (ii) the development of a non-linear kernel regression model, based on the k-nearest neighbor (kNN) regression, that captures the complex dependency of the capacity on the five features, and (iii) the adaptation of particle swarm optimization (PSO) to finding the optimal combination of feature weights for creating a kNN regression model that minimizes the cross validation (CV) error in the capacity estimation. Verification with 10 years’ continuous cycling data suggests that the proposed method is able to accurately estimate the capacity of Li-ion battery throughout the whole life-time
International Nuclear Information System (INIS)
Briggs, D.J.; De Hoogh, C.; Elliot, P.; Gulliver, J.; Wills, J.; Kingham, S.; Smallbone, K.
2000-01-01
Accurate, high-resolution maps of traffic-related air pollution are needed both as a basis for assessing exposures as part of epidemiological studies, and to inform urban air-quality policy and traffic management. This paper assesses the use of a GIS-based, regression mapping technique to model spatial patterns of traffic-related air pollution. The model - developed using data from 80 passive sampler sites in Huddersfield, as part of the SAVIAH (Small Area Variations in Air Quality and Health) project - uses data on traffic flows and land cover in the 300-m buffer zone around each site, and altitude of the site, as predictors of NO 2 concentrations. It was tested here by application in four urban areas in the UK: Huddersfield (for the year following that used for initial model development), Sheffield, Northampton, and part of London. In each case, a GIS was built in ArcInfo, integrating relevant data on road traffic, urban land use and topography. Monitoring of NO 2 was undertaken using replicate passive samplers (in London, data were obtained from surveys carried out as part of the London network). In Huddersfield, Sheffield and Northampton, the model was first calibrated by comparing modelled results with monitored NO 2 concentrations at 10 randomly selected sites; the calibrated model was then validated against data from a further 10-28 sites. In London, where data for only 11 sites were available, validation was not undertaken. Results showed that the model performed well in all cases. After local calibration, the model gave estimates of mean annual NO 2 concentrations within a factor of 1.5 of the actual mean (approx. 70-90%) of the time and within a factor of 2 between 70 and 100% of the time. r 2 values between modelled and observed concentrations are in the range of 0.58-0.76. These results are comparable to those achieved by more sophisticated dispersion models. The model also has several advantages over dispersion modelling. It is able, for example, to
Robust modelling of solubility in supercritical carbon dioxide using Bayesian methods.
Tarasova, Anna; Burden, Frank; Gasteiger, Johann; Winkler, David A
2010-04-01
Two sparse Bayesian methods were used to derive predictive models of solubility of organic dyes and polycyclic aromatic compounds in supercritical carbon dioxide (scCO(2)), over a wide range of temperatures (285.9-423.2K) and pressures (60-1400 bar): a multiple linear regression employing an expectation maximization algorithm and a sparse prior (MLREM) method and a non-linear Bayesian Regularized Artificial Neural Network with a Laplacian Prior (BRANNLP). A randomly selected test set was used to estimate the predictive ability of the models. The MLREM method resulted in a model of similar predictivity to the less sparse MLR method, while the non-linear BRANNLP method created models of substantially better predictivity than either the MLREM or MLR based models. The BRANNLP method simultaneously generated context-relevant subsets of descriptors and a robust, non-linear quantitative structure-property relationship (QSPR) model for the compound solubility in scCO(2). The differences between linear and non-linear descriptor selection methods are discussed. (c) 2009 Elsevier Inc. All rights reserved.
Silva, João Paulo Santos; Mônaco, Luciana da Mata; Paschoal, André Monteiro; Oliveira, Ícaro Agenor Ferreira de; Leoni, Renata Ferranti
2018-05-16
Arterial spin labeling (ASL) is an established magnetic resonance imaging (MRI) technique that is finding broader applications in functional studies of the healthy and diseased brain. To promote improvement in cerebral blood flow (CBF) signal specificity, many algorithms and imaging procedures, such as subtraction methods, were proposed to eliminate or, at least, minimize noise sources. Therefore, this study addressed the main considerations of how CBF functional connectivity (FC) is changed, regarding resting brain network (RBN) identification and correlations between regions of interest (ROI), by different subtraction methods and removal of residual motion artifacts and global signal fluctuations (RMAGSF). Twenty young healthy participants (13 M/7F, mean age = 25 ± 3 years) underwent an MRI protocol with a pseudo-continuous ASL (pCASL) sequence. Perfusion-based images were obtained using simple, sinc and running subtraction. RMAGSF removal was applied to all CBF time series. Independent Component Analysis (ICA) was used for RBN identification, while Pearson' correlation was performed for ROI-based FC analysis. Temporal signal-to-noise ratio (tSNR) was higher in CBF maps obtained by sinc subtraction, although RMAGSF removal had a significant effect on maps obtained with simple and running subtractions. Neither the subtraction method nor the RMAGSF removal directly affected the identification of RBNs. However, the number of correlated and anti-correlated voxels varied for different subtraction and filtering methods. In an ROI-to-ROI level, changes were prominent in FC values and their statistical significance. Our study showed that both RMAGSF filtering and subtraction method might influence resting-state FC results, especially in an ROI level, consequently affecting FC analysis and its interpretation. Taking our results and the whole discussion together, we understand that for an exploratory assessment of the brain, one could avoid removing RMAGSF to
Al-Harrasi, Ahmed; Rehman, Najeeb Ur; Mabood, Fazal; Albroumi, Muhammaed; Ali, Liaqat; Hussain, Javid; Hussain, Hidayat; Csuk, René; Khan, Abdul Latif; Alam, Tanveer; Alameri, Saif
2017-09-01
In the present study, for the first time, NIR spectroscopy coupled with PLS regression as a rapid and alternative method was developed to quantify the amount of Keto-β-Boswellic Acid (KBA) in different plant parts of Boswellia sacra and the resin exudates of the trunk. NIR spectroscopy was used for the measurement of KBA standards and B. sacra samples in absorption mode in the wavelength range from 700-2500 nm. PLS regression model was built from the obtained spectral data using 70% of KBA standards (training set) in the range from 0.1 ppm to 100 ppm. The PLS regression model obtained was having R-square value of 98% with 0.99 corelationship value and having good prediction with RMSEP value 3.2 and correlation of 0.99. It was then used to quantify the amount of KBA in the samples of B. sacra. The results indicated that the MeOH extract of resin has the highest concentration of KBA (0.6%) followed by essential oil (0.1%). However, no KBA was found in the aqueous extract. The MeOH extract of the resin was subjected to column chromatography to get various sub-fractions at different polarity of organic solvents. The sub-fraction at 4% MeOH/CHCl3 (4.1% of KBA) was found to contain the highest percentage of KBA followed by another sub-fraction at 2% MeOH/CHCl3 (2.2% of KBA). The present results also indicated that KBA is only present in the gum-resin of the trunk and not in all parts of the plant. These results were further confirmed through HPLC analysis and therefore it is concluded that NIRS coupled with PLS regression is a rapid and alternate method for quantification of KBA in Boswellia sacra. It is non-destructive, rapid, sensitive and uses simple methods of sample preparation.
Polynomial regression analysis and significance test of the regression function
International Nuclear Information System (INIS)
Gao Zhengming; Zhao Juan; He Shengping
2012-01-01
In order to analyze the decay heating power of a certain radioactive isotope per kilogram with polynomial regression method, the paper firstly demonstrated the broad usage of polynomial function and deduced its parameters with ordinary least squares estimate. Then significance test method of polynomial regression function is derived considering the similarity between the polynomial regression model and the multivariable linear regression model. Finally, polynomial regression analysis and significance test of the polynomial function are done to the decay heating power of the iso tope per kilogram in accord with the authors' real work. (authors)
Chen, Qingxia; Ibrahim, Joseph G
2014-07-01
Multiple Imputation, Maximum Likelihood and Fully Bayesian methods are the three most commonly used model-based approaches in missing data problems. Although it is easy to show that when the responses are missing at random (MAR), the complete case analysis is unbiased and efficient, the aforementioned methods are still commonly used in practice for this setting. To examine the performance of and relationships between these three methods in this setting, we derive and investigate small sample and asymptotic expressions of the estimates and standard errors, and fully examine how these estimates are related for the three approaches in the linear regression model when the responses are MAR. We show that when the responses are MAR in the linear model, the estimates of the regression coefficients using these three methods are asymptotically equivalent to the complete case estimates under general conditions. One simulation and a real data set from a liver cancer clinical trial are given to compare the properties of these methods when the responses are MAR.
Seber, George A F
2012-01-01
Concise, mathematically clear, and comprehensive treatment of the subject.* Expanded coverage of diagnostics and methods of model fitting.* Requires no specialized knowledge beyond a good grasp of matrix algebra and some acquaintance with straight-line regression and simple analysis of variance models.* More than 200 problems throughout the book plus outline solutions for the exercises.* This revision has been extensively class-tested.
Balabin, Roman M; Lomakina, Ekaterina I
2011-04-21
In this study, we make a general comparison of the accuracy and robustness of five multivariate calibration models: partial least squares (PLS) regression or projection to latent structures, polynomial partial least squares (Poly-PLS) regression, artificial neural networks (ANNs), and two novel techniques based on support vector machines (SVMs) for multivariate data analysis: support vector regression (SVR) and least-squares support vector machines (LS-SVMs). The comparison is based on fourteen (14) different datasets: seven sets of gasoline data (density, benzene content, and fractional composition/boiling points), two sets of ethanol gasoline fuel data (density and ethanol content), one set of diesel fuel data (total sulfur content), three sets of petroleum (crude oil) macromolecules data (weight percentages of asphaltenes, resins, and paraffins), and one set of petroleum resins data (resins content). Vibrational (near-infrared, NIR) spectroscopic data are used to predict the properties and quality coefficients of gasoline, biofuel/biodiesel, diesel fuel, and other samples of interest. The four systems presented here range greatly in composition, properties, strength of intermolecular interactions (e.g., van der Waals forces, H-bonds), colloid structure, and phase behavior. Due to the high diversity of chemical systems studied, general conclusions about SVM regression methods can be made. We try to answer the following question: to what extent can SVM-based techniques replace ANN-based approaches in real-world (industrial/scientific) applications? The results show that both SVR and LS-SVM methods are comparable to ANNs in accuracy. Due to the much higher robustness of the former, the SVM-based approaches are recommended for practical (industrial) application. This has been shown to be especially true for complicated, highly nonlinear objects.
Farhadian, Maryam; Aliabadi, Mohsen; Darvishi, Ebrahim
2015-01-01
Prediction models are used in a variety of medical domains, and they are frequently built from experience which constitutes data acquired from actual cases. This study aimed to analyze the potential of artificial neural networks and logistic regression techniques for estimation of hearing impairment among industrial workers. A total of 210 workers employed in a steel factory (in West of Iran) were selected, and their occupational exposure histories were analyzed. The hearing loss thresholds of the studied workers were determined using a calibrated audiometer. The personal noise exposures were also measured using a noise dosimeter in the workstations. Data obtained from five variables, which can influence the hearing loss, were used as input features, and the hearing loss thresholds were considered as target feature of the prediction methods. Multilayer feedforward neural networks and logistic regression were developed using MATLAB R2011a software. Based on the World Health Organization classification for the grades of hearing loss, 74.2% of the studied workers have normal hearing thresholds, 23.4% have slight hearing loss, and 2.4% have moderate hearing loss. The accuracy and kappa coefficient of the best developed neural networks for prediction of the grades of hearing loss were 88.6 and 66.30, respectively. The accuracy and kappa coefficient of the logistic regression were also 84.28 and 51.30, respectively. Neural networks could provide more accurate predictions of the hearing loss than logistic regression. The prediction method can provide reliable and comprehensible information for occupational health and medicine experts.
Donnelly, Aoife; Misstear, Bruce; Broderick, Brian
2011-02-15
Background concentrations of nitrogen dioxide (NO(2)) are not constant but vary temporally and spatially. The current paper presents a powerful tool for the quantification of the effects of wind direction and wind speed on background NO(2) concentrations, particularly in cases where monitoring data are limited. In contrast to previous studies which applied similar methods to sites directly affected by local pollution sources, the current study focuses on background sites with the aim of improving methods for predicting background concentrations adopted in air quality modelling studies. The relationship between measured NO(2) concentration in air at three such sites in Ireland and locally measured wind direction has been quantified using nonparametric regression methods. The major aim was to analyse a method for quantifying the effects of local wind direction on background levels of NO(2) in Ireland. The method was expanded to include wind speed as an added predictor variable. A Gaussian kernel function is used in the analysis and circular statistics employed for the wind direction variable. Wind direction and wind speed were both found to have a statistically significant effect on background levels of NO(2) at all three sites. Frequently environmental impact assessments are based on short term baseline monitoring producing a limited dataset. The presented non-parametric regression methods, in contrast to the frequently used methods such as binning of the data, allow concentrations for missing data pairs to be estimated and distinction between spurious and true peaks in concentrations to be made. The methods were found to provide a realistic estimation of long term concentration variation with wind direction and speed, even for cases where the data set is limited. Accurate identification of the actual variation at each location and causative factors could be made, thus supporting the improved definition of background concentrations for use in air quality modelling
Linear regression in astronomy. I
Isobe, Takashi; Feigelson, Eric D.; Akritas, Michael G.; Babu, Gutti Jogesh
1990-01-01
Five methods for obtaining linear regression fits to bivariate data with unknown or insignificant measurement errors are discussed: ordinary least-squares (OLS) regression of Y on X, OLS regression of X on Y, the bisector of the two OLS lines, orthogonal regression, and 'reduced major-axis' regression. These methods have been used by various researchers in observational astronomy, most importantly in cosmic distance scale applications. Formulas for calculating the slope and intercept coefficients and their uncertainties are given for all the methods, including a new general form of the OLS variance estimates. The accuracy of the formulas was confirmed using numerical simulations. The applicability of the procedures is discussed with respect to their mathematical properties, the nature of the astronomical data under consideration, and the scientific purpose of the regression. It is found that, for problems needing symmetrical treatment of the variables, the OLS bisector performs significantly better than orthogonal or reduced major-axis regression.
Classification and regression trees
Breiman, Leo; Olshen, Richard A; Stone, Charles J
1984-01-01
The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, which moved from pencil and paper to calculators, this text's use of trees was unthinkable before computers. Both the practical and theoretical sides have been developed in the authors' study of tree methods. Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
Barnwell-Ménard, Jean-Louis; Li, Qing; Cohen, Alan A
2015-03-15
The loss of signal associated with categorizing a continuous variable is well known, and previous studies have demonstrated that this can lead to an inflation of Type-I error when the categorized variable is a confounder in a regression analysis estimating the effect of an exposure on an outcome. However, it is not known how the Type-I error may vary under different circumstances, including logistic versus linear regression, different distributions of the confounder, and different categorization methods. Here, we analytically quantified the effect of categorization and then performed a series of 9600 Monte Carlo simulations to estimate the Type-I error inflation associated with categorization of a confounder under different regression scenarios. We show that Type-I error is unacceptably high (>10% in most scenarios and often 100%). The only exception was when the variable categorized was a continuous mixture proxy for a genuinely dichotomous latent variable, where both the continuous proxy and the categorized variable are error-ridden proxies for the dichotomous latent variable. As expected, error inflation was also higher with larger sample size, fewer categories, and stronger associations between the confounder and the exposure or outcome. We provide online tools that can help researchers estimate the potential error inflation and understand how serious a problem this is. Copyright © 2014 John Wiley & Sons, Ltd.
Directory of Open Access Journals (Sweden)
Mok Tik
2014-06-01
Full Text Available This study formulates regression of vector data that will enable statistical analysis of various geodetic phenomena such as, polar motion, ocean currents, typhoon/hurricane tracking, crustal deformations, and precursory earthquake signals. The observed vector variable of an event (dependent vector variable is expressed as a function of a number of hypothesized phenomena realized also as vector variables (independent vector variables and/or scalar variables that are likely to impact the dependent vector variable. The proposed representation has the unique property of solving the coefficients of independent vector variables (explanatory variables also as vectors, hence it supersedes multivariate multiple regression models, in which the unknown coefficients are scalar quantities. For the solution, complex numbers are used to rep- resent vector information, and the method of least squares is deployed to estimate the vector model parameters after transforming the complex vector regression model into a real vector regression model through isomorphism. Various operational statistics for testing the predictive significance of the estimated vector parameter coefficients are also derived. A simple numerical example demonstrates the use of the proposed vector regression analysis in modeling typhoon paths.
Differentiating regressed melanoma from regressed lichenoid keratosis.
Chan, Aegean H; Shulman, Kenneth J; Lee, Bonnie A
2017-04-01
Distinguishing regressed lichen planus-like keratosis (LPLK) from regressed melanoma can be difficult on histopathologic examination, potentially resulting in mismanagement of patients. We aimed to identify histopathologic features by which regressed melanoma can be differentiated from regressed LPLK. Twenty actively inflamed LPLK, 12 LPLK with regression and 15 melanomas with regression were compared and evaluated by hematoxylin and eosin staining as well as Melan-A, microphthalmia transcription factor (MiTF) and cytokeratin (AE1/AE3) immunostaining. (1) A total of 40% of regressed melanomas showed complete or near complete loss of melanocytes within the epidermis with Melan-A and MiTF immunostaining, while 8% of regressed LPLK exhibited this finding. (2) Necrotic keratinocytes were seen in the epidermis in 33% regressed melanomas as opposed to all of the regressed LPLK. (3) A dense infiltrate of melanophages in the papillary dermis was seen in 40% of regressed melanomas, a feature not seen in regressed LPLK. In summary, our findings suggest that a complete or near complete loss of melanocytes within the epidermis strongly favors a regressed melanoma over a regressed LPLK. In addition, necrotic epidermal keratinocytes and the presence of a dense band-like distribution of dermal melanophages can be helpful in differentiating these lesions. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Fridgeirsdottir, Gudrun A; Harris, Robert J; Dryden, Ian L; Fischer, Peter M; Roberts, Clive J
2018-03-29
Solid dispersions can be a successful way to enhance the bioavailability of poorly soluble drugs. Here 60 solid dispersion formulations were produced using ten chemically diverse, neutral, poorly soluble drugs, three commonly used polymers, and two manufacturing techniques, spray-drying and melt extrusion. Each formulation underwent a six-month stability study at accelerated conditions, 40 °C and 75% relative humidity (RH). Significant differences in times to crystallization (onset of crystallization) were observed between both the different polymers and the two processing methods. Stability from zero days to over one year was observed. The extensive experimental data set obtained from this stability study was used to build multiple linear regression models to correlate physicochemical properties of the active pharmaceutical ingredients (API) with the stability data. The purpose of these models is to indicate which combination of processing method and polymer carrier is most likely to give a stable solid dispersion. Six quantitative mathematical multiple linear regression-based models were produced based on selection of the most influential independent physical and chemical parameters from a set of 33 possible factors, one model for each combination of polymer and processing method, with good predictability of stability. Three general rules are proposed from these models for the formulation development of suitably stable solid dispersions. Namely, increased stability is correlated with increased glass transition temperature ( T g ) of solid dispersions, as well as decreased number of H-bond donors and increased molecular flexibility (such as rotatable bonds and ring count) of the drug molecule.
Rossi, M.; Apuani, T.; Felletti, F.
2009-04-01
The aim of this paper is to compare the results of two statistical methods for landslide susceptibility analysis: 1) univariate probabilistic method based on landslide susceptibility index, 2) multivariate method (logistic regression). The study area is the Febbraro valley, located in the central Italian Alps, where different types of metamorphic rocks croup out. On the eastern part of the studied basin a quaternary cover represented by colluvial and secondarily, by glacial deposits, is dominant. In this study 110 earth flows, mainly located toward NE portion of the catchment, were analyzed. They involve only the colluvial deposits and their extension mainly ranges from 36 to 3173 m2. Both statistical methods require to establish a spatial database, in which each landslide is described by several parameters that can be assigned using a main scarp central point of landslide. The spatial database is constructed using a Geographical Information System (GIS). Each landslide is described by several parameters corresponding to the value of main scarp central point of the landslide. Based on bibliographic review a total of 15 predisposing factors were utilized. The width of the intervals, in which the maps of the predisposing factors have to be reclassified, has been defined assuming constant intervals to: elevation (100 m), slope (5 °), solar radiation (0.1 MJ/cm2/year), profile curvature (1.2 1/m), tangential curvature (2.2 1/m), drainage density (0.5), lineament density (0.00126). For the other parameters have been used the results of the probability-probability plots analysis and the statistical indexes of landslides site. In particular slope length (0 ÷ 2, 2 ÷ 5, 5 ÷ 10, 10 ÷ 20, 20 ÷ 35, 35 ÷ 260), accumulation flow (0 ÷ 1, 1 ÷ 2, 2 ÷ 5, 5 ÷ 12, 12 ÷ 60, 60 ÷27265), Topographic Wetness Index 0 ÷ 0.74, 0.74 ÷ 1.94, 1.94 ÷ 2.62, 2.62 ÷ 3.48, 3.48 ÷ 6,00, 6.00 ÷ 9.44), Stream Power Index (0 ÷ 0.64, 0.64 ÷ 1.28, 1.28 ÷ 1.81, 1.81 ÷ 4.20, 4.20 ÷ 9
Sofowote, Uwayemi M; McCarry, Brian E; Marvin, Christopher H
2008-08-15
A total of 26 suspended sediment samples collected over a 5-year period in Hamilton Harbour, Ontario, Canada and surrounding creeks were analyzed for a suite of polycyclic aromatic hydrocarbons and sulfur heterocycles. Hamilton Harbour sediments contain relatively high levels of polycyclic aromatic compounds and heavy metals due to emissions from industrial and mobile sources. Two receptor modeling methods using factor analyses were compared to determine the profiles and relative contributions of pollution sources to the harbor; these methods are principal component analyses (PCA) with multiple linear regression analysis (MLR) and positive matrix factorization (PMF). Both methods identified four factors and gave excellent correlation coefficients between predicted and measured levels of 25 aromatic compounds; both methods predicted similar contributions from coal tar/coal combustion sources to the harbor (19 and 26%, respectively). One PCA factor was identified as contributions from vehicular emissions (61%); PMF was able to differentiate vehicular emissions into two factors, one attributed to gasoline emissions sources (28%) and the other to diesel emissions sources (24%). Overall, PMF afforded better source identification than PCA with MLR. This work constitutes one of the few examples of the application of PMF to the source apportionment of sediments; the addition of sulfur heterocycles to the analyte list greatly aided in the source identification process.
Ozdemir, Adnan
2011-07-01
SummaryThe purpose of this study is to produce a groundwater spring potential map of the Sultan Mountains in central Turkey, based on a logistic regression method within a Geographic Information System (GIS) environment. Using field surveys, the locations of the springs (440 springs) were determined in the study area. In this study, 17 spring-related factors were used in the analysis: geology, relative permeability, land use/land cover, precipitation, elevation, slope, aspect, total curvature, plan curvature, profile curvature, wetness index, stream power index, sediment transport capacity index, distance to drainage, distance to fault, drainage density, and fault density map. The coefficients of the predictor variables were estimated using binary logistic regression analysis and were used to calculate the groundwater spring potential for the entire study area. The accuracy of the final spring potential map was evaluated based on the observed springs. The accuracy of the model was evaluated by calculating the relative operating characteristics. The area value of the relative operating characteristic curve model was found to be 0.82. These results indicate that the model is a good estimator of the spring potential in the study area. The spring potential map shows that the areas of very low, low, moderate and high groundwater spring potential classes are 105.586 km 2 (28.99%), 74.271 km 2 (19.906%), 101.203 km 2 (27.14%), and 90.05 km 2 (24.671%), respectively. The interpretations of the potential map showed that stream power index, relative permeability of lithologies, geology, elevation, aspect, wetness index, plan curvature, and drainage density play major roles in spring occurrence and distribution in the Sultan Mountains. The logistic regression approach has not yet been used to delineate groundwater potential zones. In this study, the logistic regression method was used to locate potential zones for groundwater springs in the Sultan Mountains. The evolved model
Pedrini, D. T.; Pedrini, Bonnie C.
Regression, another mechanism studied by Sigmund Freud, has had much research, e.g., hypnotic regression, frustration regression, schizophrenic regression, and infra-human-animal regression (often directly related to fixation). Many investigators worked with hypnotic age regression, which has a long history, going back to Russian reflexologists.…
Ozdemir, Adnan; Altural, Tolga
2013-03-01
This study evaluated and compared landslide susceptibility maps produced with three different methods, frequency ratio, weights of evidence, and logistic regression, by using validation datasets. The field surveys performed as part of this investigation mapped the locations of 90 landslides that had been identified in the Sultan Mountains of south-western Turkey. The landslide influence parameters used for this study are geology, relative permeability, land use/land cover, precipitation, elevation, slope, aspect, total curvature, plan curvature, profile curvature, wetness index, stream power index, sediment transportation capacity index, distance to drainage, distance to fault, drainage density, fault density, and spring density maps. The relationships between landslide distributions and these parameters were analysed using the three methods, and the results of these methods were then used to calculate the landslide susceptibility of the entire study area. The accuracy of the final landslide susceptibility maps was evaluated based on the landslides observed during the fieldwork, and the accuracy of the models was evaluated by calculating each model's relative operating characteristic curve. The predictive capability of each model was determined from the area under the relative operating characteristic curve and the areas under the curves obtained using the frequency ratio, logistic regression, and weights of evidence methods are 0.976, 0.952, and 0.937, respectively. These results indicate that the frequency ratio and weights of evidence models are relatively good estimators of landslide susceptibility in the study area. Specifically, the results of the correlation analysis show a high correlation between the frequency ratio and weights of evidence results, and the frequency ratio and logistic regression methods exhibit correlation coefficients of 0.771 and 0.727, respectively. The frequency ratio model is simple, and its input, calculation and output processes are
Subset selection in regression
Miller, Alan
2002-01-01
Originally published in 1990, the first edition of Subset Selection in Regression filled a significant gap in the literature, and its critical and popular success has continued for more than a decade. Thoroughly revised to reflect progress in theory, methods, and computing power, the second edition promises to continue that tradition. The author has thoroughly updated each chapter, incorporated new material on recent developments, and included more examples and references. New in the Second Edition:A separate chapter on Bayesian methodsComplete revision of the chapter on estimationA major example from the field of near infrared spectroscopyMore emphasis on cross-validationGreater focus on bootstrappingStochastic algorithms for finding good subsets from large numbers of predictors when an exhaustive search is not feasible Software available on the Internet for implementing many of the algorithms presentedMore examplesSubset Selection in Regression, Second Edition remains dedicated to the techniques for fitting...
Mansilha, C; Melo, A; Rebelo, H; Ferreira, I M P L V O; Pinho, O; Domingues, V; Pinho, C; Gameiro, P
2010-10-22
A multi-residue methodology based on a solid phase extraction followed by gas chromatography-tandem mass spectrometry was developed for trace analysis of 32 compounds in water matrices, including estrogens and several pesticides from different chemical families, some of them with endocrine disrupting properties. Matrix standard calibration solutions were prepared by adding known amounts of the analytes to a residue-free sample to compensate matrix-induced chromatographic response enhancement observed for certain pesticides. Validation was done mainly according to the International Conference on Harmonisation recommendations, as well as some European and American validation guidelines with specifications for pesticides analysis and/or GC-MS methodology. As the assumption of homoscedasticity was not met for analytical data, weighted least squares linear regression procedure was applied as a simple and effective way to counteract the greater influence of the greater concentrations on the fitted regression line, improving accuracy at the lower end of the calibration curve. The method was considered validated for 31 compounds after consistent evaluation of the key analytical parameters: specificity, linearity, limit of detection and quantification, range, precision, accuracy, extraction efficiency, stability and robustness. Copyright © 2010 Elsevier B.V. All rights reserved.
Machado, Fabiana Andrade; Nakamura, Fábio Yuzo; Moraes, Solange Marta Franzói De
2012-01-01
This study examined the influence of the regression model and initial intensity of an incremental test on the relationship between the lactate threshold estimated by the maximal-deviation method and the endurance performance. Sixteen non-competitive, recreational female runners performed a discontinuous incremental treadmill test. The initial speed was set at 7 km · h⁻¹, and increased every 3 min by 1 km · h⁻¹ with a 30-s rest between the stages used for earlobe capillary blood sample collection. Lactate-speed data were fitted by an exponential-plus-constant and a third-order polynomial equation. The lactate threshold was determined for both regression equations, using all the coordinates, excluding the first and excluding the first and second initial points. Mean speed of a 10-km road race was the performance index (3.04 ± 0.22 m · s⁻¹). The exponentially-derived lactate threshold had a higher correlation (0.98 ≤ r ≤ 0.99) and smaller standard error of estimate (SEE) (0.04 ≤ SEE ≤ 0.05 m · s⁻¹) with performance than the polynomially-derived equivalent (0.83 ≤ r ≤ 0.89; 0.10 ≤ SEE ≤ 0.13 m · s⁻¹). The exponential lactate threshold was greater than the polynomial equivalent (P performance index that is independent of the initial intensity of the incremental test and better than the polynomial equivalent.
Yulia, M.; Suhandy, D.
2018-03-01
NIR spectra obtained from spectral data acquisition system contains both chemical information of samples as well as physical information of the samples, such as particle size and bulk density. Several methods have been established for developing calibration models that can compensate for sample physical information variations. One common approach is to include physical information variation in the calibration model both explicitly and implicitly. The objective of this study was to evaluate the feasibility of using explicit method to compensate the influence of different particle size of coffee powder in NIR calibration model performance. A number of 220 coffee powder samples with two different types of coffee (civet and non-civet) and two different particle sizes (212 and 500 µm) were prepared. Spectral data was acquired using NIR spectrometer equipped with an integrating sphere for diffuse reflectance measurement. A discrimination method based on PLS-DA was conducted and the influence of different particle size on the performance of PLS-DA was investigated. In explicit method, we add directly the particle size as predicted variable results in an X block containing only the NIR spectra and a Y block containing the particle size and type of coffee. The explicit inclusion of the particle size into the calibration model is expected to improve the accuracy of type of coffee determination. The result shows that using explicit method the quality of the developed calibration model for type of coffee determination is a little bit superior with coefficient of determination (R2) = 0.99 and root mean square error of cross-validation (RMSECV) = 0.041. The performance of the PLS2 calibration model for type of coffee determination with particle size compensation was quite good and able to predict the type of coffee in two different particle sizes with relatively high R2 pred values. The prediction also resulted in low bias and RMSEP values.
Steganalysis using logistic regression
Lubenko, Ivans; Ker, Andrew D.
2011-02-01
We advocate Logistic Regression (LR) as an alternative to the Support Vector Machine (SVM) classifiers commonly used in steganalysis. LR offers more information than traditional SVM methods - it estimates class probabilities as well as providing a simple classification - and can be adapted more easily and efficiently for multiclass problems. Like SVM, LR can be kernelised for nonlinear classification, and it shows comparable classification accuracy to SVM methods. This work is a case study, comparing accuracy and speed of SVM and LR classifiers in detection of LSB Matching and other related spatial-domain image steganography, through the state-of-art 686-dimensional SPAM feature set, in three image sets.
Santos-Concejero, Jordan; Tucker, Ross; Granados, Cristina; Irazusta, Jon; Bidaurrazaga-Letona, Iraia; Zabala-Lili, Jon; Gil, Susana María
2014-01-01
This study investigated the influence of the regression model and initial intensity during an incremental test on the relationship between the lactate threshold estimated by the maximal-deviation method and performance in elite-standard runners. Twenty-three well-trained runners completed a discontinuous incremental running test on a treadmill. Speed started at 9 km · h(-1) and increased by 1.5 km · h(-1) every 4 min until exhaustion, with a minute of recovery for blood collection. Lactate-speed data were fitted by exponential and polynomial models. The lactate threshold was determined for both models, using all the co-ordinates, excluding the first and excluding the first and second points. The exponential lactate threshold was greater than the polynomial equivalent in any co-ordinate condition (P performance and is independent of the initial intensity of the test.
Hwang, Jae Joon; Kim, Kee-Deog; Park, Hyok; Park, Chang Seo; Jeong, Ho-Gul
2014-01-01
Superimposition has been used as a method to evaluate the changes of orthodontic or orthopedic treatment in the dental field. With the introduction of cone beam CT (CBCT), evaluating 3 dimensional changes after treatment became possible by superimposition. 4 point plane orientation is one of the simplest ways to achieve superimposition of 3 dimensional images. To find factors influencing superimposition error of cephalometric landmarks by 4 point plane orientation method and to evaluate the reproducibility of cephalometric landmarks for analyzing superimposition error, 20 patients were analyzed who had normal skeletal and occlusal relationship and took CBCT for diagnosis of temporomandibular disorder. The nasion, sella turcica, basion and midpoint between the left and the right most posterior point of the lesser wing of sphenoidal bone were used to define a three-dimensional (3D) anatomical reference co-ordinate system. Another 15 reference cephalometric points were also determined three times in the same image. Reorientation error of each landmark could be explained substantially (23%) by linear regression model, which consists of 3 factors describing position of each landmark towards reference axes and locating error. 4 point plane orientation system may produce an amount of reorientation error that may vary according to the perpendicular distance between the landmark and the x-axis; the reorientation error also increases as the locating error and shift of reference axes viewed from each landmark increases. Therefore, in order to reduce the reorientation error, accuracy of all landmarks including the reference points is important. Construction of the regression model using reference points of greater precision is required for the clinical application of this model.
Huybrechts, Inge; Lioret, Sandrine; Mouratidou, Theodora; Gunter, Marc J; Manios, Yannis; Kersting, Mathilde; Gottrand, Frederic; Kafatos, Anthony; De Henauw, Stefaan; Cuenca-García, Magdalena; Widhalm, Kurt; Gonzales-Gross, Marcela; Molnar, Denes; Moreno, Luis A; McNaughton, Sarah A
2017-01-01
This study aims to examine repeatability of reduced rank regression (RRR) methods in calculating dietary patterns (DP) and cross-sectional associations with overweight (OW)/obesity across European and Australian samples of adolescents. Data from two cross-sectional surveys in Europe (2006/2007 Healthy Lifestyle in Europe by Nutrition in Adolescence study, including 1954 adolescents, 12-17 years) and Australia (2007 National Children's Nutrition and Physical Activity Survey, including 1498 adolescents, 12-16 years) were used. Dietary intake was measured using two non-consecutive, 24-h recalls. RRR was used to identify DP using dietary energy density, fibre density and percentage of energy intake from fat as the intermediate variables. Associations between DP scores and body mass/fat were examined using multivariable linear and logistic regression as appropriate, stratified by sex. The first DP extracted (labelled 'energy dense, high fat, low fibre') explained 47 and 31 % of the response variation in Australian and European adolescents, respectively. It was similar for European and Australian adolescents and characterised by higher consumption of biscuits/cakes, chocolate/confectionery, crisps/savoury snacks, sugar-sweetened beverages, and lower consumption of yogurt, high-fibre bread, vegetables and fresh fruit. DP scores were inversely associated with BMI z-scores in Australian adolescent boys and borderline inverse in European adolescent boys (so as with %BF). Similarly, a lower likelihood for OW in boys was observed with higher DP scores in both surveys. No such relationships were observed in adolescent girls. In conclusion, the DP identified in this cross-country study was comparable for European and Australian adolescents, demonstrating robustness of the RRR method in calculating DP among populations. However, longitudinal designs are more relevant when studying diet-obesity associations, to prevent reverse causality.
Seyedmahmoud, Rasoul
2014-04-07
This two-articles series presents an in-depth discussion of electrospun poly-l-lactide scaffolds for tissue engineering by means of statistical methodologies that can be used, in general, to gain a quantitative and systematic insight about effects and interactions between a handful of key scaffold properties (Ys) and a set of process parameters (Xs) in electrospinning. While Part-1 dealt with the DOE methods to unveil the interactions between Xs in determining the morphomechanical properties (ref. Y1-4), this Part-2 article continues and refocuses the discussion on the interdependence of scaffold properties investigated by standard regression methods. The discussion first explores the connection between mechanical properties (Y4) and morphological descriptors of the scaffolds (Y1-3) in 32 types of scaffolds, finding that the mean fiber diameter (Y1) plays a predominant role which is nonetheless and crucially modulated by the molecular weight (MW) of PLLA. The second part examines the biological performance (Y5) (i.e. the cell proliferation of seeded bone marrow-derived mesenchymal stromal cells) on a random subset of eight scaffolds vs. the mechanomorphological properties (Y1-4). In this case, the featured regression analysis on such an incomplete set was not conclusive, though, indirectly suggesting in quantitative terms that cell proliferation could not fully be explained as a function of considered mechanomorphological properties (Y1-4), but in the early stage seeding, and that a randomization effects occurs over time such that the differences in initial cell proliferation performance (at day 1) is smeared over time. The findings may be the cornerstone of a novel route to accrue sufficient understanding and establish design rules for scaffold biofunctional vs. architecture, mechanical properties, and process parameters.
Ali, M Sanni; Groenwold, Rolf H H; Belitser, Svetlana V; Souverein, Patrick C; Martín, Elisa; Gatto, Nicolle M; Huerta, Consuelo; Gardarsdottir, Helga; Roes, Kit C B; Hoes, Arno W; de Boer, Antonius; Klungel, Olaf H
2016-03-01
Observational studies including time-varying treatments are prone to confounding. We compared time-varying Cox regression analysis, propensity score (PS) methods, and marginal structural models (MSMs) in a study of antidepressant [selective serotonin reuptake inhibitors (SSRIs)] use and the risk of hip fracture. A cohort of patients with a first prescription for antidepressants (SSRI or tricyclic antidepressants) was extracted from the Dutch Mondriaan and Spanish Base de datos para la Investigación Farmacoepidemiológica en Atención Primaria (BIFAP) general practice databases for the period 2001-2009. The net (total) effect of SSRI versus no SSRI on the risk of hip fracture was estimated using time-varying Cox regression, stratification and covariate adjustment using the PS, and MSM. In MSM, censoring was accounted for by inverse probability of censoring weights. The crude hazard ratio (HR) of SSRI use versus no SSRI use on hip fracture was 1.75 (95%CI: 1.12, 2.72) in Mondriaan and 2.09 (1.89, 2.32) in BIFAP. After confounding adjustment using time-varying Cox regression, stratification, and covariate adjustment using the PS, HRs increased in Mondriaan [2.59 (1.63, 4.12), 2.64 (1.63, 4.25), and 2.82 (1.63, 4.25), respectively] and decreased in BIFAP [1.56 (1.40, 1.73), 1.54 (1.39, 1.71), and 1.61 (1.45, 1.78), respectively]. MSMs with stabilized weights yielded HR 2.15 (1.30, 3.55) in Mondriaan and 1.63 (1.28, 2.07) in BIFAP when accounting for censoring and 2.13 (1.32, 3.45) in Mondriaan and 1.66 (1.30, 2.12) in BIFAP without accounting for censoring. In this empirical study, differences between the different methods to control for time-dependent confounding were small. The observed differences in treatment effect estimates between the databases are likely attributable to different confounding information in the datasets, illustrating that adequate information on (time-varying) confounding is crucial to prevent bias. Copyright © 2016 John Wiley & Sons, Ltd.
Advanced statistics: linear regression, part I: simple linear regression.
Marill, Keith A
2004-01-01
Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.
International Nuclear Information System (INIS)
Balabin, Roman M.; Smirnov, Sergey V.
2011-01-01
During the past several years, near-infrared (near-IR/NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields from petroleum to biomedical sectors. The NIR spectrum (above 4000 cm -1 ) of a sample is typically measured by modern instruments at a few hundred of wavelengths. Recently, considerable effort has been directed towards developing procedures to identify variables (wavelengths) that contribute useful information. Variable selection (VS) or feature selection, also called frequency selection or wavelength selection, is a critical step in data analysis for vibrational spectroscopy (infrared, Raman, or NIRS). In this paper, we compare the performance of 16 different feature selection methods for the prediction of properties of biodiesel fuel, including density, viscosity, methanol content, and water concentration. The feature selection algorithms tested include stepwise multiple linear regression (MLR-step), interval partial least squares regression (iPLS), backward iPLS (BiPLS), forward iPLS (FiPLS), moving window partial least squares regression (MWPLS), (modified) changeable size moving window partial least squares (CSMWPLS/MCSMWPLSR), searching combination moving window partial least squares (SCMWPLS), successive projections algorithm (SPA), uninformative variable elimination (UVE, including UVE-SPA), simulated annealing (SA), back-propagation artificial neural networks (BP-ANN), Kohonen artificial neural network (K-ANN), and genetic algorithms (GAs, including GA-iPLS). Two linear techniques for calibration model building, namely multiple linear regression (MLR) and partial least squares regression/projection to latent structures (PLS/PLSR), are used for the evaluation of biofuel properties. A comparison with a non-linear calibration model, artificial neural networks (ANN-MLP), is also provided. Discussion of gasoline, ethanol-gasoline (bioethanol), and diesel fuel data is presented. The results of other spectroscopic
International Nuclear Information System (INIS)
Yuan, Haibo; Liu, Xiaowei; Xiang, Maosheng; Huang, Yang; Zhang, Huihua; Chen, Bingqiu
2015-01-01
In this paper we propose a spectroscopy-based stellar color regression (SCR) method to perform accurate color calibration for modern imaging surveys, taking advantage of millions of stellar spectra now available. The method is straightforward, insensitive to systematic errors in the spectroscopically determined stellar atmospheric parameters, applicable to regions that are effectively covered by spectroscopic surveys, and capable of delivering an accuracy of a few millimagnitudes for color calibration. As an illustration, we have applied the method to the Sloan Digital Sky Survey (SDSS) Stripe 82 data. With a total number of 23,759 spectroscopically targeted stars, we have mapped out the small but strongly correlated color zero-point errors present in the photometric catalog of Stripe 82, and we improve the color calibration by a factor of two to three. Our study also reveals some small but significant magnitude dependence errors in the z band for some charge-coupled devices (CCDs). Such errors are likely to be present in all the SDSS photometric data. Our results are compared with those from a completely independent test based on the intrinsic colors of red galaxies presented by Ivezić et al. The comparison, as well as other tests, shows that the SCR method has achieved a color calibration internally consistent at a level of about 5 mmag in u – g, 3 mmag in g – r, and 2 mmag in r – i and i – z. Given the power of the SCR method, we discuss briefly the potential benefits by applying the method to existing, ongoing, and upcoming imaging surveys
Directory of Open Access Journals (Sweden)
JEMMAH A I
2018-01-01
Full Text Available Taounate region is known by a high density of mass movements which cause several human and economic losses. The goal of this paper is to assess the landslide susceptibility of Taounate using the Weight of Evidence method (WofE and the Logistic Regression method (LR. Seven conditioning factors were used in this study: lithology, fault, drainage, slope, elevation, exposure and land use. Over the years, this site and its surroundings have experienced repeated landslides. For this reason, landslide susceptibility mapping is mandatory for risk prevention and land-use management. In this study, we have focused on recent large-scale mass movements. Finally, the ROC curves were established to evaluate the degree of fit of the model and to choose the best landslide susceptibility zonation. A total mass movements location were detected; 50% were randomly selected as input data for the entire process using the Spatial Data Model (SDM and the remaining locations were used for validation purposes. The obtained WofE’s landslide susceptibility map shows that high to very high susceptibility zones contain 62% of the total of inventoried landslides, while the same zones contain only 47% of landslides in the map obtained by the LR method. This landslide susceptibility map obtained is a major contribution to various urban and regional development plans under the Taounate Region National Development Program.
Fungible weights in logistic regression.
Jones, Jeff A; Waller, Niels G
2016-06-01
In this article we develop methods for assessing parameter sensitivity in logistic regression models. To set the stage for this work, we first review Waller's (2008) equations for computing fungible weights in linear regression. Next, we describe 2 methods for computing fungible weights in logistic regression. To demonstrate the utility of these methods, we compute fungible logistic regression weights using data from the Centers for Disease Control and Prevention's (2010) Youth Risk Behavior Surveillance Survey, and we illustrate how these alternate weights can be used to evaluate parameter sensitivity. To make our work accessible to the research community, we provide R code (R Core Team, 2015) that will generate both kinds of fungible logistic regression weights. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Testing discontinuities in nonparametric regression
Dai, Wenlin
2017-01-19
In nonparametric regression, it is often needed to detect whether there are jump discontinuities in the mean function. In this paper, we revisit the difference-based method in [13 H.-G. Müller and U. Stadtmüller, Discontinuous versus smooth regression, Ann. Stat. 27 (1999), pp. 299–337. doi: 10.1214/aos/1018031100
Testing discontinuities in nonparametric regression
Dai, Wenlin; Zhou, Yuejin; Tong, Tiejun
2017-01-01
In nonparametric regression, it is often needed to detect whether there are jump discontinuities in the mean function. In this paper, we revisit the difference-based method in [13 H.-G. Müller and U. Stadtmüller, Discontinuous versus smooth regression, Ann. Stat. 27 (1999), pp. 299–337. doi: 10.1214/aos/1018031100
Quantile regression theory and applications
Davino, Cristina; Vistocco, Domenico
2013-01-01
A guide to the implementation and interpretation of Quantile Regression models This book explores the theory and numerous applications of quantile regression, offering empirical data analysis as well as the software tools to implement the methods. The main focus of this book is to provide the reader with a comprehensivedescription of the main issues concerning quantile regression; these include basic modeling, geometrical interpretation, estimation and inference for quantile regression, as well as issues on validity of the model, diagnostic tools. Each methodological aspect is explored and
Hilbe, Joseph M
2009-01-01
This book really does cover everything you ever wanted to know about logistic regression … with updates available on the author's website. Hilbe, a former national athletics champion, philosopher, and expert in astronomy, is a master at explaining statistical concepts and methods. Readers familiar with his other expository work will know what to expect-great clarity.The book provides considerable detail about all facets of logistic regression. No step of an argument is omitted so that the book will meet the needs of the reader who likes to see everything spelt out, while a person familiar with some of the topics has the option to skip "obvious" sections. The material has been thoroughly road-tested through classroom and web-based teaching. … The focus is on helping the reader to learn and understand logistic regression. The audience is not just students meeting the topic for the first time, but also experienced users. I believe the book really does meet the author's goal … .-Annette J. Dobson, Biometric...
Seasonal prediction of winter extreme precipitation over Canada by support vector regression
Directory of Open Access Journals (Sweden)
Z. Zeng
2011-01-01
Full Text Available For forecasting the maximum 5-day accumulated precipitation over the winter season at lead times of 3, 6, 9 and 12 months over Canada from 1950 to 2007, two nonlinear and two linear regression models were used, where the models were support vector regression (SVR (nonlinear and linear versions, nonlinear Bayesian neural network (BNN and multiple linear regression (MLR. The 118 stations were grouped into six geographic regions by K-means clustering. For each region, the leading principal components of the winter maximum 5-d accumulated precipitation anomalies were the predictands. Potential predictors included quasi-global sea surface temperature anomalies and 500 hPa geopotential height anomalies over the Northern Hemisphere, as well as six climate indices (the Niño-3.4 region sea surface temperature, the North Atlantic Oscillation, the Pacific-North American teleconnection, the Pacific Decadal Oscillation, the Scandinavia pattern, and the East Atlantic pattern. The results showed that in general the two robust SVR models tended to have better forecast skills than the two non-robust models (MLR and BNN, and the nonlinear SVR model tended to forecast slightly better than the linear SVR model. Among the six regions, the Prairies region displayed the highest forecast skills, and the Arctic region the second highest. The strongest nonlinearity was manifested over the Prairies and the weakest nonlinearity over the Arctic.
Linear regression in astronomy. II
Feigelson, Eric D.; Babu, Gutti J.
1992-01-01
A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.
Time-adaptive quantile regression
DEFF Research Database (Denmark)
Møller, Jan Kloppenborg; Nielsen, Henrik Aalborg; Madsen, Henrik
2008-01-01
and an updating procedure are combined into a new algorithm for time-adaptive quantile regression, which generates new solutions on the basis of the old solution, leading to savings in computation time. The suggested algorithm is tested against a static quantile regression model on a data set with wind power......An algorithm for time-adaptive quantile regression is presented. The algorithm is based on the simplex algorithm, and the linear optimization formulation of the quantile regression problem is given. The observations have been split to allow a direct use of the simplex algorithm. The simplex method...... production, where the models combine splines and quantile regression. The comparison indicates superior performance for the time-adaptive quantile regression in all the performance parameters considered....
SEPARATION PHENOMENA LOGISTIC REGRESSION
Directory of Open Access Journals (Sweden)
Ikaro Daniel de Carvalho Barreto
2014-03-01
Full Text Available This paper proposes an application of concepts about the maximum likelihood estimation of the binomial logistic regression model to the separation phenomena. It generates bias in the estimation and provides different interpretations of the estimates on the different statistical tests (Wald, Likelihood Ratio and Score and provides different estimates on the different iterative methods (Newton-Raphson and Fisher Score. It also presents an example that demonstrates the direct implications for the validation of the model and validation of variables, the implications for estimates of odds ratios and confidence intervals, generated from the Wald statistics. Furthermore, we present, briefly, the Firth correction to circumvent the phenomena of separation.
Directory of Open Access Journals (Sweden)
Yi Liang
2016-11-01
Full Text Available The power industry is the main battlefield of CO2 emission reduction, which plays an important role in the implementation and development of the low carbon economy. The forecasting of electricity demand can provide a scientific basis for the country to formulate a power industry development strategy and further promote the sustained, healthy and rapid development of the national economy. Under the goal of low-carbon economy, medium and long term electricity demand forecasting will have very important practical significance. In this paper, a new hybrid electricity demand model framework is characterized as follows: firstly, integration of grey relation degree (GRD with induced ordered weighted harmonic averaging operator (IOWHA to propose a new weight determination method of hybrid forecasting model on basis of forecasting accuracy as induced variables is presented; secondly, utilization of the proposed weight determination method to construct the optimal hybrid forecasting model based on extreme learning machine (ELM forecasting model and multiple regression (MR model; thirdly, three scenarios in line with the level of realization of various carbon emission targets and dynamic simulation of effect of low-carbon economy on future electricity demand are discussed. The resulting findings show that, the proposed model outperformed and concentrated some monomial forecasting models, especially in boosting the overall instability dramatically. In addition, the development of a low-carbon economy will increase the demand for electricity, and have an impact on the adjustment of the electricity demand structure.
International Nuclear Information System (INIS)
Leng Ling; Zhang Tianyi; Kleinman, Lawrence; Zhu Wei
2007-01-01
Regression analysis, especially the ordinary least squares method which assumes that errors are confined to the dependent variable, has seen a fair share of its applications in aerosol science. The ordinary least squares approach, however, could be problematic due to the fact that atmospheric data often does not lend itself to calling one variable independent and the other dependent. Errors often exist for both measurements. In this work, we examine two regression approaches available to accommodate this situation. They are orthogonal regression and geometric mean regression. Comparisons are made theoretically as well as numerically through an aerosol study examining whether the ratio of organic aerosol to CO would change with age
Zounemat-Kermani, Mohammad
2012-08-01
In this study, the ability of two models of multi linear regression (MLR) and Levenberg-Marquardt (LM) feed-forward neural network was examined to estimate the hourly dew point temperature. Dew point temperature is the temperature at which water vapor in the air condenses into liquid. This temperature can be useful in estimating meteorological variables such as fog, rain, snow, dew, and evapotranspiration and in investigating agronomical issues as stomatal closure in plants. The availability of hourly records of climatic data (air temperature, relative humidity and pressure) which could be used to predict dew point temperature initiated the practice of modeling. Additionally, the wind vector (wind speed magnitude and direction) and conceptual input of weather condition were employed as other input variables. The three quantitative standard statistical performance evaluation measures, i.e. the root mean squared error, mean absolute error, and absolute logarithmic Nash-Sutcliffe efficiency coefficient ( {| {{{Log}}({{NS}})} |} ) were employed to evaluate the performances of the developed models. The results showed that applying wind vector and weather condition as input vectors along with meteorological variables could slightly increase the ANN and MLR predictive accuracy. The results also revealed that LM-NN was superior to MLR model and the best performance was obtained by considering all potential input variables in terms of different evaluation criteria.
International Nuclear Information System (INIS)
Jahandideh, Sepideh; Jahandideh, Samad; Asadabadi, Ebrahim Barzegari; Askarian, Mehrdad; Movahedi, Mohammad Mehdi; Hosseini, Somayyeh; Jahandideh, Mina
2009-01-01
Prediction of the amount of hospital waste production will be helpful in the storage, transportation and disposal of hospital waste management. Based on this fact, two predictor models including artificial neural networks (ANNs) and multiple linear regression (MLR) were applied to predict the rate of medical waste generation totally and in different types of sharp, infectious and general. In this study, a 5-fold cross-validation procedure on a database containing total of 50 hospitals of Fars province (Iran) were used to verify the performance of the models. Three performance measures including MAR, RMSE and R 2 were used to evaluate performance of models. The MLR as a conventional model obtained poor prediction performance measure values. However, MLR distinguished hospital capacity and bed occupancy as more significant parameters. On the other hand, ANNs as a more powerful model, which has not been introduced in predicting rate of medical waste generation, showed high performance measure values, especially 0.99 value of R 2 confirming the good fit of the data. Such satisfactory results could be attributed to the non-linear nature of ANNs in problem solving which provides the opportunity for relating independent variables to dependent ones non-linearly. In conclusion, the obtained results showed that our ANN-based model approach is very promising and may play a useful role in developing a better cost-effective strategy for waste management in future.
DEFF Research Database (Denmark)
Johansen, Søren
2008-01-01
The reduced rank regression model is a multivariate regression model with a coefficient matrix with reduced rank. The reduced rank regression algorithm is an estimation procedure, which estimates the reduced rank regression model. It is related to canonical correlations and involves calculating...
Puzo, Quirino; Qin, Ping; Mehlum, Lars
2016-03-11
Suicide mortality and the rates by specific methods in a population may change over time in response to concurrent changes in relevant factors in society. This study aimed to identify significant changing points in method-specific suicide mortality from 1969 to 2012 in Norway. Data on suicide mortality by specific methods and by sex and age were retrieved from the Norwegian Cause-of-Death Register. Long-term trends in age-standardized rates of suicide mortality were analyzed by using joinpoint regression analysis. The most frequently used suicide method in the total population was hanging, followed by poisoning and firearms. Men chose suicide by firearms more often than women, whereas poisoning and drowning were more frequently used by women. The joinpoint analysis revealed that the overall trend of suicide mortality significantly changed twice along the period of 1969 to 2012 for both sexes. The male age-standardized suicide rate increased by 3.1% per year until 1989, and decreased by 1.2% per year between 1994 and 2012. Among females the long-term suicide rate increased by 4.0% per year until 1988, decreased by 5.5% through 1995, and then stabilized. Both sexes experienced an upward trend for suicide by hanging during the 44-year observation period, with a particularly significant increase in 15-24 year old males. The most distinct change among men was seen for firearms after 1988 with a significant decrease through 2012 of around 5% per year. For women, significant reductions since 1985-88 were observed for suicide by drowning and poisoning. The present study demonstrates different time trends for different suicide methods with significant reductions in suicide by firearms, drowning and poisoning after the peak in the suicide rate in the late 1980s. Suicide by means of hanging continuously increased, but did not fully compensate for the reduced use of other methods. This lends some support for the effectiveness of method-specific suicide preventive measures
Quantile Regression With Measurement Error
Wei, Ying; Carroll, Raymond J.
2009-01-01
. The finite sample performance of the proposed method is investigated in a simulation study, and compared to the standard regression calibration approach. Finally, we apply our methodology to part of the National Collaborative Perinatal Project growth data, a
A Case Study on a Combination NDVI Forecasting Model Based on the Entropy Weight Method
Energy Technology Data Exchange (ETDEWEB)
Huang, Shengzhi; Ming, Bo; Huang, Qiang; Leng, Guoyong; Hou, Beibei
2017-05-05
It is critically meaningful to accurately predict NDVI (Normalized Difference Vegetation Index), which helps guide regional ecological remediation and environmental managements. In this study, a combination forecasting model (CFM) was proposed to improve the performance of NDVI predictions in the Yellow River Basin (YRB) based on three individual forecasting models, i.e., the Multiple Linear Regression (MLR), Artificial Neural Network (ANN), and Support Vector Machine (SVM) models. The entropy weight method was employed to determine the weight coefficient for each individual model depending on its predictive performance. Results showed that: (1) ANN exhibits the highest fitting capability among the four orecasting models in the calibration period, whilst its generalization ability becomes weak in the validation period; MLR has a poor performance in both calibration and validation periods; the predicted results of CFM in the calibration period have the highest stability; (2) CFM generally outperforms all individual models in the validation period, and can improve the reliability and stability of predicted results through combining the strengths while reducing the weaknesses of individual models; (3) the performances of all forecasting models are better in dense vegetation areas than in sparse vegetation areas.
DeForest, David K; Brix, Kevin V; Tear, Lucinda M; Adams, William J
2018-01-01
The bioavailability of aluminum (Al) to freshwater aquatic organisms varies as a function of several water chemistry parameters, including pH, dissolved organic carbon (DOC), and water hardness. We evaluated the ability of multiple linear regression (MLR) models to predict chronic Al toxicity to a green alga (Pseudokirchneriella subcapitata), a cladoceran (Ceriodaphnia dubia), and a fish (Pimephales promelas) as a function of varying DOC, pH, and hardness conditions. The MLR models predicted toxicity values that were within a factor of 2 of observed values in 100% of the cases for P. subcapitata (10 and 20% effective concentrations [EC10s and EC20s]), 91% of the cases for C. dubia (EC10s and EC20s), and 95% (EC10s) and 91% (EC20s) of the cases for P. promelas. The MLR models were then applied to all species with Al toxicity data to derive species and genus sensitivity distributions that could be adjusted as a function of varying DOC, pH, and hardness conditions (the P. subcapitata model was applied to algae and macrophytes, the C. dubia model was applied to invertebrates, and the P. promelas model was applied to fish). Hazardous concentrations to 5% of the species or genera were then derived in 2 ways: 1) fitting a log-normal distribution to species-mean EC10s for all species (following the European Union methodology), and 2) fitting a triangular distribution to genus-mean EC20s for animals only (following the US Environmental Protection Agency methodology). Overall, MLR-based models provide a viable approach for deriving Al water quality guidelines that vary as a function of DOC, pH, and hardness conditions and are a significant improvement over bioavailability corrections based on single parameters. Environ Toxicol Chem 2018;37:80-90. © 2017 SETAC. © 2017 SETAC.
Ochoa Gutierrez, L. H.; Vargas Jimenez, C. A.; Niño Vasquez, L. F.
2011-12-01
The "Sabana de Bogota" (Bogota Savannah) is the most important social and economical center of Colombia. Almost the third of population is concentrated in this region and generates about the 40% of Colombia's Internal Brute Product (IBP). According to this, the zone presents an elevated vulnerability in case that a high destructive seismic event occurs. Historical evidences show that high magnitude events took place in the past with a huge damage caused to the city and indicate that is probable that such events can occur in the next years. This is the reason why we are working in an early warning generation system, using the first few seconds of a seismic signal registered by three components and wide band seismometers. Such system can be implemented using Computational Intelligence tools, designed and calibrated to the particular Geological, Structural and environmental conditions present in the region. The methods developed are expected to work on real time, thus suitable software and electronic tools need to be developed. We used Support Vector Machines Regression (SVMR) methods trained and tested with historic seismic events registered by "EL ROSAL" Station, located near Bogotá, calculating descriptors or attributes as the input of the model, from the first 6 seconds of signal. With this algorithm, we obtained less than 10% of mean absolute error and correlation coefficients greater than 85% in hypocentral distance and Magnitude estimation. With this results we consider that we can improve the method trying to have better accuracy with less signal time and that this can be a very useful model to be implemented directly in the seismological stations to generate a fast characterization of the event, broadcasting not only raw signal but pre-processed information that can be very useful for accurate Early Warning Generation.
Directory of Open Access Journals (Sweden)
Larijani Kambiz
2011-01-01
Full Text Available The chemical composition of the volatile fraction obtained by head-space solid phase microextraction (HS-SPME, single drop microextraction (SDME and the essential oil obtained by cold-press from the peels of C. sinensis cv. valencia were analyzed employing gas chromatography-flame ionization detector (GC-FID and gas chromatography-mass spectrometry (GC-MS. The main components were limonene (61.34 %, 68.27 %, 90.50 %, myrcene (17.55 %, 12.35 %, 2.50 %, sabinene (6.50 %, 7.62 %, 0.5 % and α-pinene (0 %, 6.65 %, 1.4 % respectively obtained by HS-SPME, SDME and cold-press. Then a quantitative structure-retention relationship (QSRR study for the prediction of retention indices (RI of the compounds was developed by application of structural descriptors and the multiple linear regression (MLR method. Principal components analysis was used to select the training set. A simple model with low standard errors and high correlation coefficients was obtained. The results illustrated that linear techniques such as MLR combined with a successful variable selection procedure are capable of generating an efficient QSRR model for prediction of the retention indices of different compounds. This model, with high statistical significance (R2 train = 0.983, R2 test = 0.970, Q2 LOO = 0.962, Q2 LGO = 0.936, REP(% = 3.00, could be used adequately for the prediction and description of the retention indices of the volatile compounds.
Aszyk, Justyna; Kot, Jacek; Tkachenko, Yurii; Woźniak, Michał; Bogucka-Kocka, Anna; Kot-Wasik, Agata
2017-04-15
A simple, fast, sensitive and accurate methodology based on a LLE followed by liquid chromatography-tandem mass spectrometry for simultaneous determination of four regioisomers (8-iso prostaglandin F 2α , 8-iso-15(R)-prostaglandin F 2α , 11β-prostaglandin F 2α , 15(R)-prostaglandin F 2α ) in routine analysis of human plasma samples was developed. Isoprostanes are stable products of arachidonic acid peroxidation and are regarded as the most reliable markers of oxidative stress in vivo. Validation of method was performed by evaluation of the key analytical parameters such as: matrix effect, analytical curve, trueness, precision, limits of detection and limits of quantification. As a homoscedasticity was not met for analytical data, weighted linear regression was applied in order to improve the accuracy at the lower end points of calibration curve. The detection limits (LODs) ranged from 1.0 to 2.1pg/mL. For plasma samples spiked with the isoprostanes at the level of 50pg/mL, intra-and interday repeatability ranged from 2.1 to 3.5% and 0.1 to 5.1%, respectively. The applicability of the proposed approach has been verified by monitoring of isoprostane isomers level in plasma samples collected from young patients (n=8) subjected to hyperbaric hyperoxia (100% oxygen at 280kPa(a) for 30min) in a multiplace hyperbaric chamber. Copyright © 2017 Elsevier B.V. All rights reserved.
Regression analysis with categorized regression calibrated exposure: some interesting findings
Directory of Open Access Journals (Sweden)
Hjartåker Anette
2006-07-01
Full Text Available Abstract Background Regression calibration as a method for handling measurement error is becoming increasingly well-known and used in epidemiologic research. However, the standard version of the method is not appropriate for exposure analyzed on a categorical (e.g. quintile scale, an approach commonly used in epidemiologic studies. A tempting solution could then be to use the predicted continuous exposure obtained through the regression calibration method and treat it as an approximation to the true exposure, that is, include the categorized calibrated exposure in the main regression analysis. Methods We use semi-analytical calculations and simulations to evaluate the performance of the proposed approach compared to the naive approach of not correcting for measurement error, in situations where analyses are performed on quintile scale and when incorporating the original scale into the categorical variables, respectively. We also present analyses of real data, containing measures of folate intake and depression, from the Norwegian Women and Cancer study (NOWAC. Results In cases where extra information is available through replicated measurements and not validation data, regression calibration does not maintain important qualities of the true exposure distribution, thus estimates of variance and percentiles can be severely biased. We show that the outlined approach maintains much, in some cases all, of the misclassification found in the observed exposure. For that reason, regression analysis with the corrected variable included on a categorical scale is still biased. In some cases the corrected estimates are analytically equal to those obtained by the naive approach. Regression calibration is however vastly superior to the naive method when applying the medians of each category in the analysis. Conclusion Regression calibration in its most well-known form is not appropriate for measurement error correction when the exposure is analyzed on a
Directory of Open Access Journals (Sweden)
Guanghao Sun
2016-11-01
Full Text Available Background and Objectives: Heart rate variability (HRV has been intensively studied as a promising biological marker of major depressive disorder (MDD. Our previous study confirmed that autonomic activity and reactivity in depression revealed by HRV during rest and mental task (MT conditions can be used as diagnostic measures and in clinical evaluation. In this study, logistic regression analysis (LRA was utilized for the classification and prediction of MDD based on HRV data obtained in an MT paradigm.Methods: Power spectral analysis of HRV on R-R intervals before, during, and after an MT (random number generation was performed in 44 drug-naïve patients with MDD and 47 healthy control subjects at Department of Psychiatry in Shizuoka Saiseikai General Hospital. Logit scores of LRA determined by HRV indices and heart rates discriminated patients with MDD from healthy subjects. The high frequency (HF component of HRV and the ratio of the low frequency (LF component to the HF component (LF/HF correspond to parasympathetic and sympathovagal balance, respectively.Results: The LRA achieved a sensitivity and specificity of 80.0% and 79.0%, respectively, at an optimum cutoff logit score (0.28. Misclassifications occurred only when the logit score was close to the cutoff score. Logit scores also correlated significantly with subjective self-rating depression scale scores (p < 0.05.Conclusion: HRV indices recorded during a mental task may be an objective tool for screening patients with MDD in psychiatric practice. The proposed method appears promising for not only objective and rapid MDD screening, but also evaluation of its severity.
Korany, Mohamed A; Gazy, Azza A; Khamis, Essam F; Ragab, Marwa A A; Kamal, Miranda F
2018-03-26
This study outlines two robust regression approaches, namely least median of squares (LMS) and iteratively re-weighted least squares (IRLS) to investigate their application in instrument analysis of nutraceuticals (that is, fluorescence quenching of merbromin reagent upon lipoic acid addition). These robust regression methods were used to calculate calibration data from the fluorescence quenching reaction (∆F and F-ratio) under ideal or non-ideal linearity conditions. For each condition, data were treated using three regression fittings: Ordinary Least Squares (OLS), LMS and IRLS. Assessment of linearity, limits of detection (LOD) and quantitation (LOQ), accuracy and precision were carefully studied for each condition. LMS and IRLS regression line fittings showed significant improvement in correlation coefficients and all regression parameters for both methods and both conditions. In the ideal linearity condition, the intercept and slope changed insignificantly, but a dramatic change was observed for the non-ideal condition and linearity intercept. Under both linearity conditions, LOD and LOQ values after the robust regression line fitting of data were lower than those obtained before data treatment. The results obtained after statistical treatment indicated that the linearity ranges for drug determination could be expanded to lower limits of quantitation by enhancing the regression equation parameters after data treatment. Analysis results for lipoic acid in capsules, using both fluorimetric methods, treated by parametric OLS and after treatment by robust LMS and IRLS were compared for both linearity conditions. Copyright © 2018 John Wiley & Sons, Ltd.
Augmented chaos-multiple linear regression approach for prediction of wave parameters
Directory of Open Access Journals (Sweden)
M.A. Ghorbani
2017-06-01
The inter-comparisons demonstrated that the Chaos-MLR and pure MLR models yield almost the same accuracy in predicting the significant wave heights and the zero-up-crossing wave periods. Whereas, the augmented Chaos-MLR model is performed better results in term of the prediction accuracy vis-a-vis the previous prediction applications of the same case study.
Wang, J; Wang, F; Liu, Y; Xu, J; Lin, H; Jia, B; Zuo, W; Jiang, Y; Hu, L; Lin, F
2016-01-01
Overweight individuals are at higher risk for developing type II diabetes than the general population. We conducted this study to analyze the correlation between blood glucose and biochemical parameters, and developed a blood glucose prediction model tailored to overweight patients. A total of 346 overweight Chinese people patients ages 18-81 years were involved in this study. Their levels of fasting glucose (fs-GLU), blood lipids, and hepatic and renal functions were measured and analyzed by multiple linear regression (MLR). Based the MLR results, we developed a back propagation artificial neural network (BP-ANN) model by selecting tansig as the transfer function of the hidden layers nodes, and purelin for the output layer nodes, with training goal of 0.5×10(-5). There was significant correlation between fs-GLU with age, BMI, and blood biochemical indexes (P<0.05). The results of MLR analysis indicated that age, fasting alanine transaminase (fs-ALT), blood urea nitrogen (fs-BUN), total protein (fs-TP), uric acid (fs-BUN), and BMI are 6 independent variables related to fs-GLU. Based on these parameters, the BP-ANN model was performed well and reached high prediction accuracy when training 1 000 epoch (R=0.9987). The level of fs-GLU was predictable using the proposed BP-ANN model based on 6 related parameters (age, fs-ALT, fs-BUN, fs-TP, fs-UA and BMI) in overweight patients. © Georg Thieme Verlag KG Stuttgart · New York.
DEFF Research Database (Denmark)
Christensen, Steen; Moore, C.; Doherty, J.
2006-01-01
accurate and required a few hundred model calls to be computed. (b) The linearized regression-based interval (Cooley, 2004) required just over a hundred model calls and also appeared to be nearly correct. (c) The calibration-constrained Monte-Carlo interval (Doherty, 2003) was found to be narrower than......For a synthetic case we computed three types of individual prediction intervals for the location of the aquifer entry point of a particle that moves through a heterogeneous aquifer and ends up in a pumping well. (a) The nonlinear regression-based interval (Cooley, 2004) was found to be nearly...... the regression-based intervals but required about half a million model calls. It is unclear whether or not this type of prediction interval is accurate....
Gaussian process regression analysis for functional data
Shi, Jian Qing
2011-01-01
Gaussian Process Regression Analysis for Functional Data presents nonparametric statistical methods for functional regression analysis, specifically the methods based on a Gaussian process prior in a functional space. The authors focus on problems involving functional response variables and mixed covariates of functional and scalar variables.Covering the basics of Gaussian process regression, the first several chapters discuss functional data analysis, theoretical aspects based on the asymptotic properties of Gaussian process regression models, and new methodological developments for high dime
Hoogerheide, L.F.; Kaashoek, J.F.; van Dijk, H.K.
2007-01-01
Likelihoods and posteriors of instrumental variable (IV) regression models with strong endogeneity and/or weak instruments may exhibit rather non-elliptical contours in the parameter space. This may seriously affect inference based on Bayesian credible sets. When approximating posterior
L.F. Hoogerheide (Lennart); J.F. Kaashoek (Johan); H.K. van Dijk (Herman)
2005-01-01
textabstractLikelihoods and posteriors of instrumental variable regression models with strong endogeneity and/or weak instruments may exhibit rather non-elliptical contours in the parameter space. This may seriously affect inference based on Bayesian credible sets. When approximating such contours
Understanding logistic regression analysis
Sperandei, Sandro
2014-01-01
Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using ex...
Introduction to regression graphics
Cook, R Dennis
2009-01-01
Covers the use of dynamic and interactive computer graphics in linear regression analysis, focusing on analytical graphics. Features new techniques like plot rotation. The authors have composed their own regression code, using Xlisp-Stat language called R-code, which is a nearly complete system for linear regression analysis and can be utilized as the main computer program in a linear regression course. The accompanying disks, for both Macintosh and Windows computers, contain the R-code and Xlisp-Stat. An Instructor's Manual presenting detailed solutions to all the problems in the book is ava
Al-Mudhafar, W. J.
2013-12-01
Precisely prediction of rock facies leads to adequate reservoir characterization by improving the porosity-permeability relationships to estimate the properties in non-cored intervals. It also helps to accurately identify the spatial facies distribution to perform an accurate reservoir model for optimal future reservoir performance. In this paper, the facies estimation has been done through Multinomial logistic regression (MLR) with respect to the well logs and core data in a well in upper sandstone formation of South Rumaila oil field. The entire independent variables are gamma rays, formation density, water saturation, shale volume, log porosity, core porosity, and core permeability. Firstly, Robust Sequential Imputation Algorithm has been considered to impute the missing data. This algorithm starts from a complete subset of the dataset and estimates sequentially the missing values in an incomplete observation by minimizing the determinant of the covariance of the augmented data matrix. Then, the observation is added to the complete data matrix and the algorithm continues with the next observation with missing values. The MLR has been chosen to estimate the maximum likelihood and minimize the standard error for the nonlinear relationships between facies & core and log data. The MLR is used to predict the probabilities of the different possible facies given each independent variable by constructing a linear predictor function having a set of weights that are linearly combined with the independent variables by using a dot product. Beta distribution of facies has been considered as prior knowledge and the resulted predicted probability (posterior) has been estimated from MLR based on Baye's theorem that represents the relationship between predicted probability (posterior) with the conditional probability and the prior knowledge. To assess the statistical accuracy of the model, the bootstrap should be carried out to estimate extra-sample prediction error by randomly
Heddam, Salim
2014-11-01
The prediction of colored dissolved organic matter (CDOM) using artificial neural network approaches has received little attention in the past few decades. In this study, colored dissolved organic matter (CDOM) was modeled using generalized regression neural network (GRNN) and multiple linear regression (MLR) models as a function of Water temperature (TE), pH, specific conductance (SC), and turbidity (TU). Evaluation of the prediction accuracy of the models is based on the root mean square error (RMSE), mean absolute error (MAE), coefficient of correlation (CC), and Willmott's index of agreement (d). The results indicated that GRNN can be applied successfully for prediction of colored dissolved organic matter (CDOM).
Development and Validation of a Rule-Based Strength Scaling Method for Musculoskeletal Modelling
DEFF Research Database (Denmark)
Oomen, Pieter; Annegarn, Janneke; Rasmussen, John
2015-01-01
performed maximal isometric knee extensions. A multiple linear regression analysis (MLR) resulted in an empirical strength scaling equation, accounting for age, mass, height, gender, segment masses and segment lengths. For validation purpose, 20 newly included healthy subjects performed a maximal isometric...
Peng, Ying; Li, Su-Ning; Pei, Xuexue; Hao, Kun
2018-03-01
Amultivariate regression statisticstrategy was developed to clarify multi-components content-effect correlation ofpanaxginseng saponins extract and predict the pharmacological effect by components content. In example 1, firstly, we compared pharmacological effects between panax ginseng saponins extract and individual saponin combinations. Secondly, we examined the anti-platelet aggregation effect in seven different saponin combinations of ginsenoside Rb1, Rg1, Rh, Rd, Ra3 and notoginsenoside R1. Finally, the correlation between anti-platelet aggregation and the content of multiple components was analyzed by a partial least squares algorithm. In example 2, firstly, 18 common peaks were identified in ten different batches of panax ginseng saponins extracts from different origins. Then, we investigated the anti-myocardial ischemia reperfusion injury effects of the ten different panax ginseng saponins extracts. Finally, the correlation between the fingerprints and the cardioprotective effects was analyzed by a partial least squares algorithm. Both in example 1 and 2, the relationship between the components content and pharmacological effect was modeled well by the partial least squares regression equations. Importantly, the predicted effect curve was close to the observed data of dot marked on the partial least squares regression model. This study has given evidences that themulti-component content is a promising information for predicting the pharmacological effects of traditional Chinese medicine.
Directory of Open Access Journals (Sweden)
Ying Peng
2018-03-01
Full Text Available Amultivariate regression statisticstrategy was developed to clarify multi-components content-effect correlation ofpanaxginseng saponins extract and predict the pharmacological effect by components content. In example 1, firstly, we compared pharmacological effects between panax ginseng saponins extract and individual saponin combinations. Secondly, we examined the anti-platelet aggregation effect in seven different saponin combinations of ginsenoside Rb1, Rg1, Rh, Rd, Ra3 and notoginsenoside R1. Finally, the correlation between anti-platelet aggregation and the content of multiple components was analyzed by a partial least squares algorithm. In example 2, firstly, 18 common peaks were identified in ten different batches of panax ginseng saponins extracts from different origins. Then, we investigated the anti-myocardial ischemia reperfusion injury effects of the ten different panax ginseng saponins extracts. Finally, the correlation between the fingerprints and the cardioprotective effects was analyzed by a partial least squares algorithm. Both in example 1 and 2, the relationship between the components content and pharmacological effect was modeled well by the partial least squares regression equations. Importantly, the predicted effect curve was close to the observed data of dot marked on the partial least squares regression model. This study has given evidences that themulti-component content is a promising information for predicting the pharmacological effects of traditional Chinese medicine.
Zhang, Yanyan; Ma, Haile; Wang, Bei; Qu, Wenjuan; Wali, Asif; Zhou, Cunshan
2016-08-01
Ultrasound pretreatment of wheat gluten (WG) before enzymolysis can improve the angiotensin converting enzyme (ACE) inhibitory activity of the hydrolysates by alerting the structure of substrate proteins. Establishment of a relationship between the structure of WG and ACE inhibitory activity of the hydrolysates to judge the end point of the ultrasonic pretreatment is vital. The results of stepwise multiple linear regression (MLR) showed that the contents of free sulfhydryl, α-helix, disulfide bond, surface hydrophobicity and random coil were significantly correlated to ACE Inhibitory activity of the hydrolysate, with the standard partial regression coefficients were 3.729, -0.676, -0.252, 0.022 and 0.156, respectively. The R(2) of this model was 0.970. External validation showed that the stepwise MLR model could well predict the ACE inhibitory activity of hydrolysate based on the content of free sulfhydryl, α-helix, disulfide bond, surface hydrophobicity and random coil of WG before hydrolysis. A stepwise multiple linear regression model describing the quantitative relationships between the structure of WG and the ACE Inhibitory activity of the hydrolysates was established. This model can be used to predict the endpoint of the ultrasonic pretreatment. © 2015 Society of Chemical Industry. © 2015 Society of Chemical Industry.
Snedden, Gregg A.; Steyer, Gregory D.
2013-01-01
Understanding plant community zonation along estuarine stress gradients is critical for effective conservation and restoration of coastal wetland ecosystems. We related the presence of plant community types to estuarine hydrology at 173 sites across coastal Louisiana. Percent relative cover by species was assessed at each site near the end of the growing season in 2008, and hourly water level and salinity were recorded at each site Oct 2007–Sep 2008. Nine plant community types were delineated with k-means clustering, and indicator species were identified for each of the community types with indicator species analysis. An inverse relation between salinity and species diversity was observed. Canonical correspondence analysis (CCA) effectively segregated the sites across ordination space by community type, and indicated that salinity and tidal amplitude were both important drivers of vegetation composition. Multinomial logistic regression (MLR) and Akaike's Information Criterion (AIC) were used to predict the probability of occurrence of the nine vegetation communities as a function of salinity and tidal amplitude, and probability surfaces obtained from the MLR model corroborated the CCA results. The weighted kappa statistic, calculated from the confusion matrix of predicted versus actual community types, was 0.7 and indicated good agreement between observed community types and model predictions. Our results suggest that models based on a few key hydrologic variables can be valuable tools for predicting vegetation community development when restoring and managing coastal wetlands.
Snedden, Gregg A.; Steyer, Gregory D.
2013-02-01
Understanding plant community zonation along estuarine stress gradients is critical for effective conservation and restoration of coastal wetland ecosystems. We related the presence of plant community types to estuarine hydrology at 173 sites across coastal Louisiana. Percent relative cover by species was assessed at each site near the end of the growing season in 2008, and hourly water level and salinity were recorded at each site Oct 2007-Sep 2008. Nine plant community types were delineated with k-means clustering, and indicator species were identified for each of the community types with indicator species analysis. An inverse relation between salinity and species diversity was observed. Canonical correspondence analysis (CCA) effectively segregated the sites across ordination space by community type, and indicated that salinity and tidal amplitude were both important drivers of vegetation composition. Multinomial logistic regression (MLR) and Akaike's Information Criterion (AIC) were used to predict the probability of occurrence of the nine vegetation communities as a function of salinity and tidal amplitude, and probability surfaces obtained from the MLR model corroborated the CCA results. The weighted kappa statistic, calculated from the confusion matrix of predicted versus actual community types, was 0.7 and indicated good agreement between observed community types and model predictions. Our results suggest that models based on a few key hydrologic variables can be valuable tools for predicting vegetation community development when restoring and managing coastal wetlands.
Principal component regression analysis with SPSS.
Liu, R X; Kuang, J; Gong, Q; Hou, X L
2003-06-01
The paper introduces all indices of multicollinearity diagnoses, the basic principle of principal component regression and determination of 'best' equation method. The paper uses an example to describe how to do principal component regression analysis with SPSS 10.0: including all calculating processes of the principal component regression and all operations of linear regression, factor analysis, descriptives, compute variable and bivariate correlations procedures in SPSS 10.0. The principal component regression analysis can be used to overcome disturbance of the multicollinearity. The simplified, speeded up and accurate statistical effect is reached through the principal component regression analysis with SPSS.
Giacomino, Agnese; Abollino, Ornella; Malandrino, Mery; Mentasti, Edoardo
2011-03-04
Single and sequential extraction procedures are used for studying element mobility and availability in solid matrices, like soils, sediments, sludge, and airborne particulate matter. In the first part of this review we reported an overview on these procedures and described the applications of chemometric uni- and bivariate techniques and of multivariate pattern recognition techniques based on variable reduction to the experimental results obtained. The second part of the review deals with the use of chemometrics not only for the visualization and interpretation of data, but also for the investigation of the effects of experimental conditions on the response, the optimization of their values and the calculation of element fractionation. We will describe the principles of the multivariate chemometric techniques considered, the aims for which they were applied and the key findings obtained. The following topics will be critically addressed: pattern recognition by cluster analysis (CA), linear discriminant analysis (LDA) and other less common techniques; modelling by multiple linear regression (MLR); investigation of spatial distribution of variables by geostatistics; calculation of fractionation patterns by a mixture resolution method (Chemometric Identification of Substrates and Element Distributions, CISED); optimization and characterization of extraction procedures by experimental design; other multivariate techniques less commonly applied. Copyright © 2010 Elsevier B.V. All rights reserved.
Regression Analysis by Example. 5th Edition
Chatterjee, Samprit; Hadi, Ali S.
2012-01-01
Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. "Regression Analysis by Example, Fifth Edition" has been expanded and thoroughly…
Understanding logistic regression analysis.
Sperandei, Sandro
2014-01-01
Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using examples to make it as simple as possible. After definition of the technique, the basic interpretation of the results is highlighted and then some special issues are discussed.
Weisberg, Sanford
2013-01-01
Praise for the Third Edition ""...this is an excellent book which could easily be used as a course text...""-International Statistical Institute The Fourth Edition of Applied Linear Regression provides a thorough update of the basic theory and methodology of linear regression modeling. Demonstrating the practical applications of linear regression analysis techniques, the Fourth Edition uses interesting, real-world exercises and examples. Stressing central concepts such as model building, understanding parameters, assessing fit and reliability, and drawing conclusions, the new edition illus
Hosmer, David W; Sturdivant, Rodney X
2013-01-01
A new edition of the definitive guide to logistic regression modeling for health science and other applications This thoroughly expanded Third Edition provides an easily accessible introduction to the logistic regression (LR) model and highlights the power of this model by examining the relationship between a dichotomous outcome and a set of covariables. Applied Logistic Regression, Third Edition emphasizes applications in the health sciences and handpicks topics that best suit the use of modern statistical software. The book provides readers with state-of-
Survival analysis II: Cox regression
Stel, Vianda S.; Dekker, Friedo W.; Tripepi, Giovanni; Zoccali, Carmine; Jager, Kitty J.
2011-01-01
In contrast to the Kaplan-Meier method, Cox proportional hazards regression can provide an effect estimate by quantifying the difference in survival between patient groups and can adjust for confounding effects of other variables. The purpose of this article is to explain the basic concepts of the
Schiewe, M C; Fitz, T A; Brown, J L; Stuart, L D; Wildt, D E
1991-09-01
Ewes were treated with exogenous follicle-stimulating hormone (FSH) and oestrus was synchronized using either a dual prostaglandin F-2 alpha (PGF-2 alpha) injection regimen or pessaries impregnated with medroxy progesterone acetate (MAP). Natural cycling ewes served as controls. After oestrus or AI (Day 0), corpora lutea (CL) were enucleated surgically from the left and right ovaries on Days 3 and 6, respectively. The incidence of premature luteolysis was related (P less than 0.05) to PGF-2 alpha treatment and occurred in 7 of 8 ewes compared with 0 of 4 controls and 1 of 8 MAP-exposed females. Sheep with regressing CL had lower circulating and intraluteal progesterone concentrations and fewer total and small dissociated luteal cells on Day 3 than gonadotrophin-treated counterparts with normal CL. Progesterone concentration in the serum and luteal tissue was higher (P less than 0.05) in gonadotrophin-treated ewes with normal CL than in the controls; but luteinizing hormone (LH) receptors/cell were not different on Days 3 and 6. There were no apparent differences in the temporal patterns of circulating oestradiol-17 beta, FSH and LH. High progesterone in gonadotrophin-treated ewes with normal CL coincided with an increase in total luteal mass and numbers of cells, which were primarily reflected in more small luteal cells than in control ewes. Gonadotrophin-treated ewes with regressing CL on Day 3 tended (P less than 0.10) to have fewer small luteal cells and fewer (P less than 0.05) low-affinity PGF-2 alpha binding sites than sheep with normal CL. By Day 6, luteal integrity and cell viability was absent in ewes with prematurely regressed CL. These data demonstrate that (i) the incidence of premature luteal regression is highly correlated with the use of PGF-2 alpha; (ii) this abnormal luteal tissue is functionally competent for 2-3 days after ovulation, but deteriorates rapidly thereafter and (iii) luteal-dysfunctioning ewes experience a reduction in numbers of
Correlation and simple linear regression.
Zou, Kelly H; Tuncali, Kemal; Silverman, Stuart G
2003-06-01
In this tutorial article, the concepts of correlation and regression are reviewed and demonstrated. The authors review and compare two correlation coefficients, the Pearson correlation coefficient and the Spearman rho, for measuring linear and nonlinear relationships between two continuous variables. In the case of measuring the linear relationship between a predictor and an outcome variable, simple linear regression analysis is conducted. These statistical concepts are illustrated by using a data set from published literature to assess a computed tomography-guided interventional technique. These statistical methods are important for exploring the relationships between variables and can be applied to many radiologic studies.
Regression filter for signal resolution
International Nuclear Information System (INIS)
Matthes, W.
1975-01-01
The problem considered is that of resolving a measured pulse height spectrum of a material mixture, e.g. gamma ray spectrum, Raman spectrum, into a weighed sum of the spectra of the individual constituents. The model on which the analytical formulation is based is described. The problem reduces to that of a multiple linear regression. A stepwise linear regression procedure was constructed. The efficiency of this method was then tested by transforming the procedure in a computer programme which was used to unfold test spectra obtained by mixing some spectra, from a library of arbitrary chosen spectra, and adding a noise component. (U.K.)
Logistic regression for dichotomized counts.
Preisser, John S; Das, Kalyan; Benecha, Habtamu; Stamm, John W
2016-12-01
Sometimes there is interest in a dichotomized outcome indicating whether a count variable is positive or zero. Under this scenario, the application of ordinary logistic regression may result in efficiency loss, which is quantifiable under an assumed model for the counts. In such situations, a shared-parameter hurdle model is investigated for more efficient estimation of regression parameters relating to overall effects of covariates on the dichotomous outcome, while handling count data with many zeroes. One model part provides a logistic regression containing marginal log odds ratio effects of primary interest, while an ancillary model part describes the mean count of a Poisson or negative binomial process in terms of nuisance regression parameters. Asymptotic efficiency of the logistic model parameter estimators of the two-part models is evaluated with respect to ordinary logistic regression. Simulations are used to assess the properties of the models with respect to power and Type I error, the latter investigated under both misspecified and correctly specified models. The methods are applied to data from a randomized clinical trial of three toothpaste formulations to prevent incident dental caries in a large population of Scottish schoolchildren. © The Author(s) 2014.
Saputro, Dewi Retno Sari; Widyaningsih, Purnami
2017-08-01
In general, the parameter estimation of GWOLR model uses maximum likelihood method, but it constructs a system of nonlinear equations, making it difficult to find the solution. Therefore, an approximate solution is needed. There are two popular numerical methods: the methods of Newton and Quasi-Newton (QN). Newton's method requires large-scale time in executing the computation program since it contains Jacobian matrix (derivative). QN method overcomes the drawback of Newton's method by substituting derivative computation into a function of direct computation. The QN method uses Hessian matrix approach which contains Davidon-Fletcher-Powell (DFP) formula. The Broyden-Fletcher-Goldfarb-Shanno (BFGS) method is categorized as the QN method which has the DFP formula attribute of having positive definite Hessian matrix. The BFGS method requires large memory in executing the program so another algorithm to decrease memory usage is needed, namely Low Memory BFGS (LBFGS). The purpose of this research is to compute the efficiency of the LBFGS method in the iterative and recursive computation of Hessian matrix and its inverse for the GWOLR parameter estimation. In reference to the research findings, we found out that the BFGS and LBFGS methods have arithmetic operation schemes, including O(n2) and O(nm).
Quantile Regression With Measurement Error
Wei, Ying
2009-08-27
Regression quantiles can be substantially biased when the covariates are measured with error. In this paper we propose a new method that produces consistent linear quantile estimation in the presence of covariate measurement error. The method corrects the measurement error induced bias by constructing joint estimating equations that simultaneously hold for all the quantile levels. An iterative EM-type estimation algorithm to obtain the solutions to such joint estimation equations is provided. The finite sample performance of the proposed method is investigated in a simulation study, and compared to the standard regression calibration approach. Finally, we apply our methodology to part of the National Collaborative Perinatal Project growth data, a longitudinal study with an unusual measurement error structure. © 2009 American Statistical Association.
Multicollinearity and Regression Analysis
Daoud, Jamal I.
2017-12-01
In regression analysis it is obvious to have a correlation between the response and predictor(s), but having correlation among predictors is something undesired. The number of predictors included in the regression model depends on many factors among which, historical data, experience, etc. At the end selection of most important predictors is something objective due to the researcher. Multicollinearity is a phenomena when two or more predictors are correlated, if this happens, the standard error of the coefficients will increase [8]. Increased standard errors means that the coefficients for some or all independent variables may be found to be significantly different from In other words, by overinflating the standard errors, multicollinearity makes some variables statistically insignificant when they should be significant. In this paper we focus on the multicollinearity, reasons and consequences on the reliability of the regression model.
Cameron, Isobel M; Scott, Neil W; Adler, Mats; Reid, Ian C
2014-12-01
It is important for clinical practice and research that measurement scales of well-being and quality of life exhibit only minimal differential item functioning (DIF). DIF occurs where different groups of people endorse items in a scale to different extents after being matched by the intended scale attribute. We investigate the equivalence or otherwise of common methods of assessing DIF. Three methods of measuring age- and sex-related DIF (ordinal logistic regression, Rasch analysis and Mantel χ(2) procedure) were applied to Hospital Anxiety Depression Scale (HADS) data pertaining to a sample of 1,068 patients consulting primary care practitioners. Three items were flagged by all three approaches as having either age- or sex-related DIF with a consistent direction of effect; a further three items identified did not meet stricter criteria for important DIF using at least one method. When applying strict criteria for significant DIF, ordinal logistic regression was slightly less sensitive. Ordinal logistic regression, Rasch analysis and contingency table methods yielded consistent results when identifying DIF in the HADS depression and HADS anxiety scales. Regardless of methods applied, investigators should use a combination of statistical significance, magnitude of the DIF effect and investigator judgement when interpreting the results.
DEFF Research Database (Denmark)
Bache, Stefan Holst
A new and alternative quantile regression estimator is developed and it is shown that the estimator is root n-consistent and asymptotically normal. The estimator is based on a minimax ‘deviance function’ and has asymptotically equivalent properties to the usual quantile regression estimator. It is......, however, a different and therefore new estimator. It allows for both linear- and nonlinear model specifications. A simple algorithm for computing the estimates is proposed. It seems to work quite well in practice but whether it has theoretical justification is still an open question....
DEFF Research Database (Denmark)
Ozenne, Brice; Sørensen, Anne Lyngholm; Scheike, Thomas
2017-01-01
In the presence of competing risks a prediction of the time-dynamic absolute risk of an event can be based on cause-specific Cox regression models for the event and the competing risks (Benichou and Gail, 1990). We present computationally fast and memory optimized C++ functions with an R interface...... for predicting the covariate specific absolute risks, their confidence intervals, and their confidence bands based on right censored time to event data. We provide explicit formulas for our implementation of the estimator of the (stratified) baseline hazard function in the presence of tied event times. As a by...... functionals. The software presented here is implemented in the riskRegression package....
A general equation to obtain multiple cut-off scores on a test from multinomial logistic regression.
Bersabé, Rosa; Rivas, Teresa
2010-05-01
The authors derive a general equation to compute multiple cut-offs on a total test score in order to classify individuals into more than two ordinal categories. The equation is derived from the multinomial logistic regression (MLR) model, which is an extension of the binary logistic regression (BLR) model to accommodate polytomous outcome variables. From this analytical procedure, cut-off scores are established at the test score (the predictor variable) at which an individual is as likely to be in category j as in category j+1 of an ordinal outcome variable. The application of the complete procedure is illustrated by an example with data from an actual study on eating disorders. In this example, two cut-off scores on the Eating Attitudes Test (EAT-26) scores are obtained in order to classify individuals into three ordinal categories: asymptomatic, symptomatic and eating disorder. Diagnoses were made from the responses to a self-report (Q-EDD) that operationalises DSM-IV criteria for eating disorders. Alternatives to the MLR model to set multiple cut-off scores are discussed.
Regression with Sparse Approximations of Data
DEFF Research Database (Denmark)
Noorzad, Pardis; Sturm, Bob L.
2012-01-01
We propose sparse approximation weighted regression (SPARROW), a method for local estimation of the regression function that uses sparse approximation with a dictionary of measurements. SPARROW estimates the regression function at a point with a linear combination of a few regressands selected...... by a sparse approximation of the point in terms of the regressors. We show SPARROW can be considered a variant of \\(k\\)-nearest neighbors regression (\\(k\\)-NNR), and more generally, local polynomial kernel regression. Unlike \\(k\\)-NNR, however, SPARROW can adapt the number of regressors to use based...
Schwantes-An, Tae-Hwi; Sung, Heejong; Sabourin, Jeremy A; Justice, Cristina M; Sorant, Alexa J M; Wilson, Alexander F
2016-01-01
In this study, the effects of (a) the minor allele frequency of the single nucleotide variant (SNV), (b) the degree of departure from normality of the trait, and (c) the position of the SNVs on type I error rates were investigated in the Genetic Analysis Workshop (GAW) 19 whole exome sequence data. To test the distribution of the type I error rate, 5 simulated traits were considered: standard normal and gamma distributed traits; 2 transformed versions of the gamma trait (log 10 and rank-based inverse normal transformations); and trait Q1 provided by GAW 19. Each trait was tested with 313,340 SNVs. Tests of association were performed with simple linear regression and average type I error rates were determined for minor allele frequency classes. Rare SNVs (minor allele frequency < 0.05) showed inflated type I error rates for non-normally distributed traits that increased as the minor allele frequency decreased. The inflation of average type I error rates increased as the significance threshold decreased. Normally distributed traits did not show inflated type I error rates with respect to the minor allele frequency for rare SNVs. There was no consistent effect of transformation on the uniformity of the distribution of the location of SNVs with a type I error.
Directory of Open Access Journals (Sweden)
Guillaume Wattelez
2017-09-01
Full Text Available Particle transport by erosion from ultramafic lands in pristine tropical lagoons is a crucial problem, especially for the benthic and pelagic biodiversity associated with coral reefs. Satellite imagery is useful for assessing particle transport from land to sea. However, in the oligotrophic and shallow waters of tropical lagoons, the bottom reflection of downwelling light usually hampers the use of classical optical algorithms. In order to address this issue, a Support Vector Regression (SVR model was developed and tested. The proposed application concerns the lagoon of New Caledonia—the second longest continuous coral reef in the world—which is frequently exposed to river plumes from ultramafic watersheds. The SVR model is based on a large training sample of in-situ turbidity values representative of the annual variability in the Voh-Koné-Pouembout lagoon (Western Coast of New Caledonia during the 2014–2015 period and on coincident satellite reflectance values from MODerate Resolution Imaging Spectroradiometer (MODIS. It was trained with reflectance and two other explanatory parameters—bathymetry and bottom colour. This approach significantly improved the model’s capacity for retrieving the in-situ turbidity range from MODIS images, as compared with algorithms dedicated to deep oligotrophic or turbid waters, which were shown to be inadequate. This SVR model is applicable to the whole shallow lagoon waters from the Western Coast of New Caledonia and it is now ready to be tested over other oligotrophic shallow lagoon waters worldwide.
Directory of Open Access Journals (Sweden)
Seyed Ali Akbar Afjeh
2014-05-01
Full Text Available Market segmentation plays essential role on understanding the behavior of people’s interests in purchasing various products and services through various channels. This paper presents an empirical investigation to shed light on consumer’s purchasing attitude as well as gathering information in multi-channel environment. The proposed study of this paper designed a questionnaire and distributed it among 800 people who were at least 18 years of age and had some experiences on purchasing goods and services on internet, catalog or regular shopping centers. Self-organizing map, SOM, clustering technique was performed based on consumer’s interest in gathering information as well as purchasing products through internet, catalog and shopping centers and determined four segments. There were two types of questions for the proposed study of this paper. The first group considered participants’ personal characteristics such as age, gender, income, etc. The second group of questions was associated with participants’ psychographic characteristics including price consciousness, quality consciousness, time pressure, etc. Using multinominal logistic regression technique, the study determines consumers’ behaviors in each four segments.
Chen, Xi; Lu, Fang; Jiang, Lu-di; Cai, Yi-Lian; Li, Gong-Yu; Zhang, Yan-Ling
2016-07-01
Inhibition of cytochrome P450 (CYP450) enzymes is the most common reasons for drug interactions, so the study on early prediction of CYPs inhibitors can help to decrease the incidence of adverse reactions caused by drug interactions.CYP450 2E1(CYP2E1), as a key role in drug metabolism process, has broad spectrum of drug metabolism substrate. In this study, 32 CYP2E1 inhibitors were collected for the construction of support vector regression (SVR) model. The test set data were used to verify CYP2E1 quantitative models and obtain the optimal prediction model of CYP2E1 inhibitor. Meanwhile, one molecular docking program, CDOCKER, was utilized to analyze the interaction pattern between positive compounds and active pocket to establish the optimal screening model of CYP2E1 inhibitors.SVR model and molecular docking prediction model were combined to screen traditional Chinese medicine database (TCMD), which could improve the calculation efficiency and prediction accuracy. 6 376 traditional Chinese medicine (TCM) compounds predicted by SVR model were obtained, and in further verification by using molecular docking model, 247 TCM compounds with potential inhibitory activities against CYP2E1 were finally retained. Some of them have been verified by experiments. The results demonstrated that this study could provide guidance for the virtual screening of CYP450 inhibitors and the prediction of CYPs-mediated DDIs, and also provide references for clinical rational drug use. Copyright© by the Chinese Pharmaceutical Association.
On the Relationship Between Confidence Sets and Exchangeable Weights in Multiple Linear Regression.
Pek, Jolynn; Chalmers, R Philip; Monette, Georges
2016-01-01
When statistical models are employed to provide a parsimonious description of empirical relationships, the extent to which strong conclusions can be drawn rests on quantifying the uncertainty in parameter estimates. In multiple linear regression (MLR), regression weights carry two kinds of uncertainty represented by confidence sets (CSs) and exchangeable weights (EWs). Confidence sets quantify uncertainty in estimation whereas the set of EWs quantify uncertainty in the substantive interpretation of regression weights. As CSs and EWs share certain commonalities, we clarify the relationship between these two kinds of uncertainty about regression weights. We introduce a general framework describing how CSs and the set of EWs for regression weights are estimated from the likelihood-based and Wald-type approach, and establish the analytical relationship between CSs and sets of EWs. With empirical examples on posttraumatic growth of caregivers (Cadell et al., 2014; Schneider, Steele, Cadell & Hemsworth, 2011) and on graduate grade point average (Kuncel, Hezlett & Ones, 2001), we illustrate the usefulness of CSs and EWs for drawing strong scientific conclusions. We discuss the importance of considering both CSs and EWs as part of the scientific process, and provide an Online Appendix with R code for estimating Wald-type CSs and EWs for k regression weights.
Multiple linear regression analysis
Edwards, T. R.
1980-01-01
Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.
Bayesian logistic regression analysis
Van Erp, H.R.N.; Van Gelder, P.H.A.J.M.
2012-01-01
In this paper we present a Bayesian logistic regression analysis. It is found that if one wishes to derive the posterior distribution of the probability of some event, then, together with the traditional Bayes Theorem and the integrating out of nuissance parameters, the Jacobian transformation is an
Ritz, Christian; Parmigiani, Giovanni
2009-01-01
R is a rapidly evolving lingua franca of graphical display and statistical analysis of experiments from the applied sciences. This book provides a coherent treatment of nonlinear regression with R by means of examples from a diversity of applied sciences such as biology, chemistry, engineering, medicine and toxicology.
Bayesian ARTMAP for regression.
Sasu, L M; Andonie, R
2013-10-01
Bayesian ARTMAP (BA) is a recently introduced neural architecture which uses a combination of Fuzzy ARTMAP competitive learning and Bayesian learning. Training is generally performed online, in a single-epoch. During training, BA creates input data clusters as Gaussian categories, and also infers the conditional probabilities between input patterns and categories, and between categories and classes. During prediction, BA uses Bayesian posterior probability estimation. So far, BA was used only for classification. The goal of this paper is to analyze the efficiency of BA for regression problems. Our contributions are: (i) we generalize the BA algorithm using the clustering functionality of both ART modules, and name it BA for Regression (BAR); (ii) we prove that BAR is a universal approximator with the best approximation property. In other words, BAR approximates arbitrarily well any continuous function (universal approximation) and, for every given continuous function, there is one in the set of BAR approximators situated at minimum distance (best approximation); (iii) we experimentally compare the online trained BAR with several neural models, on the following standard regression benchmarks: CPU Computer Hardware, Boston Housing, Wisconsin Breast Cancer, and Communities and Crime. Our results show that BAR is an appropriate tool for regression tasks, both for theoretical and practical reasons. Copyright © 2013 Elsevier Ltd. All rights reserved.
Bounded Gaussian process regression
DEFF Research Database (Denmark)
Jensen, Bjørn Sand; Nielsen, Jens Brehm; Larsen, Jan
2013-01-01
We extend the Gaussian process (GP) framework for bounded regression by introducing two bounded likelihood functions that model the noise on the dependent variable explicitly. This is fundamentally different from the implicit noise assumption in the previously suggested warped GP framework. We...... with the proposed explicit noise-model extension....
Mechanisms of neuroblastoma regression
Brodeur, Garrett M.; Bagatell, Rochelle
2014-01-01
Recent genomic and biological studies of neuroblastoma have shed light on the dramatic heterogeneity in the clinical behaviour of this disease, which spans from spontaneous regression or differentiation in some patients, to relentless disease progression in others, despite intensive multimodality therapy. This evidence also suggests several possible mechanisms to explain the phenomena of spontaneous regression in neuroblastomas, including neurotrophin deprivation, humoral or cellular immunity, loss of telomerase activity and alterations in epigenetic regulation. A better understanding of the mechanisms of spontaneous regression might help to identify optimal therapeutic approaches for patients with these tumours. Currently, the most druggable mechanism is the delayed activation of developmentally programmed cell death regulated by the tropomyosin receptor kinase A pathway. Indeed, targeted therapy aimed at inhibiting neurotrophin receptors might be used in lieu of conventional chemotherapy or radiation in infants with biologically favourable tumours that require treatment. Alternative approaches consist of breaking immune tolerance to tumour antigens or activating neurotrophin receptor pathways to induce neuronal differentiation. These approaches are likely to be most effective against biologically favourable tumours, but they might also provide insights into treatment of biologically unfavourable tumours. We describe the different mechanisms of spontaneous neuroblastoma regression and the consequent therapeutic approaches. PMID:25331179
Crown, William H
2014-02-01
This paper examines the use of propensity score matching in economic analyses of observational data. Several excellent papers have previously reviewed practical aspects of propensity score estimation and other aspects of the propensity score literature. The purpose of this paper is to compare the conceptual foundation of propensity score models with alternative estimators of treatment effects. References are provided to empirical comparisons among methods that have appeared in the literature. These comparisons are available for a subset of the methods considered in this paper. However, in some cases, no pairwise comparisons of particular methods are yet available, and there are no examples of comparisons across all of the methods surveyed here. Irrespective of the availability of empirical comparisons, the goal of this paper is to provide some intuition about the relative merits of alternative estimators in health economic evaluations where nonlinearity, sample size, availability of pre/post data, heterogeneity, and missing variables can have important implications for choice of methodology. Also considered is the potential combination of propensity score matching with alternative methods such as differences-in-differences and decomposition methods that have not yet appeared in the empirical literature.
Kato, S; Ishii, A; Nishi, A; Kuriki, S; Koide, T
2014-01-01
Recent genetic studies have shown that genetic loci with significant effects in whole-genome quantitative trait loci (QTL) analyses were lost or weakened in congenic strains. Characterisation of the genetic basis of this attenuated QTL effect is important to our understanding of the genetic mechanisms of complex traits. We previously found that a consomic strain, B6-Chr6CMSM, which carries chromosome 6 of a wild-derived strain MSM/Ms on the genetic background of C57BL/6J, exhibited lower home-cage activity than C57BL/6J. In the present study, we conducted a composite interval QTL analysis using the F2 mice derived from a cross between C57BL/6J and B6-Chr6CMSM. We found one QTL peak that spans 17.6 Mbp of chromosome 6. A subconsomic strain that covers the entire QTL region also showed lower home-cage activity at the same level as the consomic strain. We developed 15 congenic strains, each of which carries a shorter MSM/Ms-derived chromosomal segment from the subconsomic strain. Given that the results of home-cage activity tests on the congenic strains cannot be explained by a simple single-gene model, we applied regression analysis to segregate the multiple genetic loci. The results revealed three loci (loci 1–3) that have the effect of reducing home-cage activity and one locus (locus 4) that increases activity. We also found that the combination of loci 3 and 4 cancels out the effects of the congenic strains, which indicates the existence of a genetic mechanism related to the loss of QTLs. PMID:24781804
Kumar, V.; Melet, A.; Meyssignac, B.; Ganachaud, A.; Kessler, W. S.; Singh, A.; Aucan, J.
2018-02-01
Rising sea levels are a critical concern in small island nations. The problem is especially serious in the western south Pacific, where the total sea level rise over the last 60 years has been up to 3 times the global average. In this study, we aim at reconstructing sea levels at selected sites in the region (Suva, Lautoka—Fiji, and Nouméa—New Caledonia) as a multilinear regression (MLR) of atmospheric and oceanic variables. We focus on sea level variability at interannual-to-interdecadal time scales, and trend over the 1988-2014 period. Local sea levels are first expressed as a sum of steric and mass changes. Then a dynamical approach is used based on wind stress curl as a proxy for the thermosteric component, as wind stress curl anomalies can modulate the thermocline depth and resultant sea levels via Rossby wave propagation. Statistically significant predictors among wind stress curl, halosteric sea level, zonal/meridional wind stress components, and sea surface temperature are used to construct a MLR model simulating local sea levels. Although we are focusing on the local scale, the global mean sea level needs to be adjusted for. Our reconstructions provide insights on key drivers of sea level variability at the selected sites, showing that while local dynamics and the global signal modulate sea level to a given extent, most of the variance is driven by regional factors. On average, the MLR model is able to reproduce 82% of the variance in island sea level, and could be used to derive local sea level projections via downscaling of climate models.
Clow, D. W.; Nanus, L.; Huggett, B. W.
2010-12-01
An abundance of exposed bedrock, sparse soil and vegetation, and fast hydrologic flushing rates make aquatic ecosystems in Yosemite National Park susceptible to nutrient enrichment and episodic acidification due to atmospheric deposition of nitrogen (N) and sulfur (S). In this study, multiple-linear regression (MLR) models were created to estimate fall-season nitrate and acid neutralizing capacity (ANC) in surface water in Yosemite wilderness. Input data included estimated winter N deposition, fall-season surface-water chemistry measurements at 52 sites, and basin characteristics derived from geographic information system layers of topography, geology, and vegetation. The MLR models accounted for 84% and 70% of the variance in surface-water nitrate and ANC, respectively. Explanatory variables (and the sign of their coefficients) for nitrate included elevation (positive) and the abundance of neoglacial and talus deposits (positive), unvegetated terrain (positive), alluvium (negative), and riparian (negative) areas in the basins. Explanatory variables for ANC included basin area (positive) and the abundance of metamorphic rocks (positive), unvegetated terrain (negative), water (negative), and winter N deposition (negative) in the basins. The MLR equations were applied to 1407 stream reaches delineated in the National Hydrography Dataset for Yosemite, and maps of predicted surface-water nitrate and ANC concentrations were created. Predicted surface-water nitrate concentrations were highest in small, high-elevation cirques, and concentrations declined downstream. Predicted ANC concentrations showed the opposite pattern, except in high-elevation areas underlain by metamorphic rocks along the Sierran Crest, which had relatively high predicted ANC (>200 µeq L-1). Maps were created to show where basin characteristics predispose aquatic resources to nutrient enrichment and acidification effects from N and S deposition. The maps can be used to help guide development of
Directory of Open Access Journals (Sweden)
M.A. Mousavi Shalmani
2014-08-01
Full Text Available In order to assessment of water quality and characterize seasonal variation in 18O and 2H in relation with different chemical and physiographical parameters and modelling of effective parameters, an study was conducted during 2010 to 2011 in 30 different ponds in the north of Iran. Samples were collected at three different seasons and analysed for chemical and isotopic components. Data shows that highest amounts of δ18O and δ2H were recorded in the summer (-1.15‰ and -12.11‰ and the lowest amounts were seen in the winter (-7.50‰ and -47.32‰ respectively. Data also reveals that there is significant increase in d-excess during spring and summer in ponds 20, 21, 22, 24, 25 and 26. We can conclude that residual surface runoff (from upper lands is an important source of water to transfer soluble salts in to these ponds. In this respect, high retention time may be the main reason for movements of light isotopes in to the ponds. This has led d-excess of pond 12 even greater in summer than winter. This could be an acceptable reason for ponds 25 and 26 (Siyahkal county with highest amount of d-excess and lowest amounts of δ18O and δ2H. It seems light water pumped from groundwater wells with minor source of salt (originated from sea deep percolation in to the ponds, could may be another reason for significant decrease in the heavy isotopes of water (18O and 2H for ponds 2, 12, 14 and 25 from spring to summer. Overall conclusion of multiple linear regression test indicate that firstly from 30 variables (under investigation only a few cases can be used for identifying of changes in 18O and 2H by applications. Secondly, among the variables (studied, phytoplankton content was a common factor for interpretation of 18O and 2H during spring and summer, and also total period (during a year. Thirdly, the use of water in the spring was recommended for sampling, for 18O and 2H interpretation compared with other seasons. This is because of function can be
Ridge Regression Signal Processing
Kuhl, Mark R.
1990-01-01
The introduction of the Global Positioning System (GPS) into the National Airspace System (NAS) necessitates the development of Receiver Autonomous Integrity Monitoring (RAIM) techniques. In order to guarantee a certain level of integrity, a thorough understanding of modern estimation techniques applied to navigational problems is required. The extended Kalman filter (EKF) is derived and analyzed under poor geometry conditions. It was found that the performance of the EKF is difficult to predict, since the EKF is designed for a Gaussian environment. A novel approach is implemented which incorporates ridge regression to explain the behavior of an EKF in the presence of dynamics under poor geometry conditions. The basic principles of ridge regression theory are presented, followed by the derivation of a linearized recursive ridge estimator. Computer simulations are performed to confirm the underlying theory and to provide a comparative analysis of the EKF and the recursive ridge estimator.
Seyedmahmoud, Rasoul; Mozetic, Pamela; Rainer, Alberto; Giannitelli, Sara Maria; Basoli, Francesco; Trombetta, Marcella; Traversa, Enrico; Licoccia, Silvia; Rinaldi, Antonio
2014-01-01
and interactions between a handful of key scaffold properties (Ys) and a set of process parameters (Xs) in electrospinning. While Part-1 dealt with the DOE methods to unveil the interactions between Xs in determining the morphomechanical properties (ref. Y1
Better Autologistic Regression
Directory of Open Access Journals (Sweden)
Mark A. Wolters
2017-11-01
Full Text Available Autologistic regression is an important probability model for dichotomous random variables observed along with covariate information. It has been used in various fields for analyzing binary data possessing spatial or network structure. The model can be viewed as an extension of the autologistic model (also known as the Ising model, quadratic exponential binary distribution, or Boltzmann machine to include covariates. It can also be viewed as an extension of logistic regression to handle responses that are not independent. Not all authors use exactly the same form of the autologistic regression model. Variations of the model differ in two respects. First, the variable coding—the two numbers used to represent the two possible states of the variables—might differ. Common coding choices are (zero, one and (minus one, plus one. Second, the model might appear in either of two algebraic forms: a standard form, or a recently proposed centered form. Little attention has been paid to the effect of these differences, and the literature shows ambiguity about their importance. It is shown here that changes to either coding or centering in fact produce distinct, non-nested probability models. Theoretical results, numerical studies, and analysis of an ecological data set all show that the differences among the models can be large and practically significant. Understanding the nature of the differences and making appropriate modeling choices can lead to significantly improved autologistic regression analyses. The results strongly suggest that the standard model with plus/minus coding, which we call the symmetric autologistic model, is the most natural choice among the autologistic variants.
Regression in organizational leadership.
Kernberg, O F
1979-02-01
The choice of good leaders is a major task for all organizations. Inforamtion regarding the prospective administrator's personality should complement questions regarding his previous experience, his general conceptual skills, his technical knowledge, and the specific skills in the area for which he is being selected. The growing psychoanalytic knowledge about the crucial importance of internal, in contrast to external, object relations, and about the mutual relationships of regression in individuals and in groups, constitutes an important practical tool for the selection of leaders.
Applied regression analysis a research tool
Pantula, Sastry; Dickey, David
1998-01-01
Least squares estimation, when used appropriately, is a powerful research tool. A deeper understanding of the regression concepts is essential for achieving optimal benefits from a least squares analysis. This book builds on the fundamentals of statistical methods and provides appropriate concepts that will allow a scientist to use least squares as an effective research tool. Applied Regression Analysis is aimed at the scientist who wishes to gain a working knowledge of regression analysis. The basic purpose of this book is to develop an understanding of least squares and related statistical methods without becoming excessively mathematical. It is the outgrowth of more than 30 years of consulting experience with scientists and many years of teaching an applied regression course to graduate students. Applied Regression Analysis serves as an excellent text for a service course on regression for non-statisticians and as a reference for researchers. It also provides a bridge between a two-semester introduction to...
Predicting Fuel Ignition Quality Using 1H NMR Spectroscopy and Multiple Linear Regression
Abdul Jameel, Abdul Gani
2016-09-14
An improved model for the prediction of ignition quality of hydrocarbon fuels has been developed using 1H nuclear magnetic resonance (NMR) spectroscopy and multiple linear regression (MLR) modeling. Cetane number (CN) and derived cetane number (DCN) of 71 pure hydrocarbons and 54 hydrocarbon blends were utilized as a data set to study the relationship between ignition quality and molecular structure. CN and DCN are functional equivalents and collectively referred to as D/CN, herein. The effect of molecular weight and weight percent of structural parameters such as paraffinic CH3 groups, paraffinic CH2 groups, paraffinic CH groups, olefinic CH–CH2 groups, naphthenic CH–CH2 groups, and aromatic C–CH groups on D/CN was studied. A particular emphasis on the effect of branching (i.e., methyl substitution) on the D/CN was studied, and a new parameter denoted as the branching index (BI) was introduced to quantify this effect. A new formula was developed to calculate the BI of hydrocarbon fuels using 1H NMR spectroscopy. Multiple linear regression (MLR) modeling was used to develop an empirical relationship between D/CN and the eight structural parameters. This was then used to predict the DCN of many hydrocarbon fuels. The developed model has a high correlation coefficient (R2 = 0.97) and was validated with experimentally measured DCN of twenty-two real fuel mixtures (e.g., gasolines and diesels) and fifty-nine blends of known composition, and the predicted values matched well with the experimental data.
Regression models of reactor diagnostic signals
International Nuclear Information System (INIS)
Vavrin, J.
1989-01-01
The application is described of an autoregression model as the simplest regression model of diagnostic signals in experimental analysis of diagnostic systems, in in-service monitoring of normal and anomalous conditions and their diagnostics. The method of diagnostics is described using a regression type diagnostic data base and regression spectral diagnostics. The diagnostics is described of neutron noise signals from anomalous modes in the experimental fuel assembly of a reactor. (author)
Directory of Open Access Journals (Sweden)
Hamid Reza Marateb
2015-01-01
Full Text Available Background: Coronary heart diseases/coronary artery diseases (CHDs/CAD, the most common form of cardiovascular disease (CVD, are a major cause for death and disability in developing/developed countries. CAD risk factors could be detected by physicians to prevent the CAD occurrence in the near future. Invasive coronary angiography, a current diagnosis method, is costly and associated with morbidity and mortality in CAD patients. The aim of this study was to design a computer-based noninvasive CAD diagnosis system with clinically interpretable rules. Materials and Methods: In this study, the Cleveland CAD dataset from the University of California UCI (Irvine was used. The interval-scale variables were discretized, with cut points taken from the literature. A fuzzy rule-based system was then formulated based on a neuro-fuzzy classifier (NFC whose learning procedure was speeded up by the scaled conjugate gradient algorithm. Two feature selection (FS methods, multiple logistic regression (MLR and sequential FS, were used to reduce the required attributes. The performance of the NFC (without/with FS was then assessed in a hold-out validation framework. Further cross-validation was performed on the best classifier. Results: In this dataset, 16 complete attributes along with the binary CHD diagnosis (gold standard for 272 subjects (68% male were analyzed. MLR + NFC showed the best performance. Its overall sensitivity, specificity, accuracy, type I error (α and statistical power were 79%, 89%, 84%, 0.1 and 79%, respectively. The selected features were "age and ST/heart rate slope categories," "exercise-induced angina status," fluoroscopy, and thallium-201 stress scintigraphy results. Conclusion: The proposed method showed "substantial agreement" with the gold standard. This algorithm is thus, a promising tool for screening CAD patients.
DEFF Research Database (Denmark)
Ozenne, Brice; Sørensen, Anne Lyngholm; Scheike, Thomas
2017-01-01
In the presence of competing risks a prediction of the time-dynamic absolute risk of an event can be based on cause-specific Cox regression models for the event and the competing risks (Benichou and Gail, 1990). We present computationally fast and memory optimized C++ functions with an R interface......-product we obtain fast access to the baseline hazards (compared to survival::basehaz()) and predictions of survival probabilities, their confidence intervals and confidence bands. Confidence intervals and confidence bands are based on point-wise asymptotic expansions of the corresponding statistical...
Adaptive metric kernel regression
DEFF Research Database (Denmark)
Goutte, Cyril; Larsen, Jan
2000-01-01
Kernel smoothing is a widely used non-parametric pattern recognition technique. By nature, it suffers from the curse of dimensionality and is usually difficult to apply to high input dimensions. In this contribution, we propose an algorithm that adapts the input metric used in multivariate...... regression by minimising a cross-validation estimate of the generalisation error. This allows to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms...
Adaptive Metric Kernel Regression
DEFF Research Database (Denmark)
Goutte, Cyril; Larsen, Jan
1998-01-01
Kernel smoothing is a widely used nonparametric pattern recognition technique. By nature, it suffers from the curse of dimensionality and is usually difficult to apply to high input dimensions. In this paper, we propose an algorithm that adapts the input metric used in multivariate regression...... by minimising a cross-validation estimate of the generalisation error. This allows one to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms the standard...
Regression of environmental noise in LIGO data
International Nuclear Information System (INIS)
Tiwari, V; Klimenko, S; Mitselmakher, G; Necula, V; Drago, M; Prodi, G; Frolov, V; Yakushin, I; Re, V; Salemi, F; Vedovato, G
2015-01-01
We address the problem of noise regression in the output of gravitational-wave (GW) interferometers, using data from the physical environmental monitors (PEM). The objective of the regression analysis is to predict environmental noise in the GW channel from the PEM measurements. One of the most promising regression methods is based on the construction of Wiener–Kolmogorov (WK) filters. Using this method, the seismic noise cancellation from the LIGO GW channel has already been performed. In the presented approach the WK method has been extended, incorporating banks of Wiener filters in the time–frequency domain, multi-channel analysis and regulation schemes, which greatly enhance the versatility of the regression analysis. Also we present the first results on regression of the bi-coherent noise in the LIGO data. (paper)
Forecasting exchange rates: a robust regression approach
Preminger, Arie; Franck, Raphael
2005-01-01
The least squares estimation method as well as other ordinary estimation method for regression models can be severely affected by a small number of outliers, thus providing poor out-of-sample forecasts. This paper suggests a robust regression approach, based on the S-estimation method, to construct forecasting models that are less sensitive to data contamination by outliers. A robust linear autoregressive (RAR) and a robust neural network (RNN) models are estimated to study the predictabil...
Energy Technology Data Exchange (ETDEWEB)
Altiparmakov, D; Zavaljevski, N; Dasic, N [Boris Kidric Institute of Nuclear Sciences Vinca, Beograd (Yugoslavia)
1988-12-15
The report presents an analysis of possible calculation procedure for automated data fitting. The problem is defined in the first part, and it is shown that solution demands optimisation under invariance conditions (stochastic errors) which is part of theory of planning optimal experiments. A rough review of knowledge in this field is given. Second part of the report some statistical and optimisation methods are analysed in more detail in order to be used for automated fitting. Evaluation of possible relevant calculation procedure is presented. Izvestaj prikazuje analizu mogucnosti realizacije racunskog postupka za automatizovano fitovanje poodataka. U prvom delu izvestaja razmatran je postavljeni problem i ukazano je da njegovo resavanje zahteva optimizaciju u uslovima neodredjenosti (slucajnih gresaka) sto spada u okvir teorije planiranja optimalnih eksperimenata. Dat je grubi pregled stepena izucenosti ove probblematike. U drugom delu izvestaja detaljnije su razmotrene neke statisticke i optimizacione metode koje bi se koristile u cilju automatizovanog fitovanja. Na osnovu toga data je procena mogucnosti realizacije odgovarajuuceg racunskog postupka. (author)
Southard, Rodney E.
2013-01-01
The weather and precipitation patterns in Missouri vary considerably from year to year. In 2008, the statewide average rainfall was 57.34 inches and in 2012, the statewide average rainfall was 30.64 inches. This variability in precipitation and resulting streamflow in Missouri underlies the necessity for water managers and users to have reliable streamflow statistics and a means to compute select statistics at ungaged locations for a better understanding of water availability. Knowledge of surface-water availability is dependent on the streamflow data that have been collected and analyzed by the U.S. Geological Survey for more than 100 years at approximately 350 streamgages throughout Missouri. The U.S. Geological Survey, in cooperation with the Missouri Department of Natural Resources, computed streamflow statistics at streamgages through the 2010 water year, defined periods of drought and defined methods to estimate streamflow statistics at ungaged locations, and developed regional regression equations to compute selected streamflow statistics at ungaged locations. Streamflow statistics and flow durations were computed for 532 streamgages in Missouri and in neighboring States of Missouri. For streamgages with more than 10 years of record, Kendall’s tau was computed to evaluate for trends in streamflow data. If trends were detected, the variable length method was used to define the period of no trend. Water years were removed from the dataset from the beginning of the record for a streamgage until no trend was detected. Low-flow frequency statistics were then computed for the entire period of record and for the period of no trend if 10 or more years of record were available for each analysis. Three methods are presented for computing selected streamflow statistics at ungaged locations. The first method uses power curve equations developed for 28 selected streams in Missouri and neighboring States that have multiple streamgages on the same streams. Statistical
Polylinear regression analysis in radiochemistry
International Nuclear Information System (INIS)
Kopyrin, A.A.; Terent'eva, T.N.; Khramov, N.N.
1995-01-01
A number of radiochemical problems have been formulated in the framework of polylinear regression analysis, which permits the use of conventional mathematical methods for their solution. The authors have considered features of the use of polylinear regression analysis for estimating the contributions of various sources to the atmospheric pollution, for studying irradiated nuclear fuel, for estimating concentrations from spectral data, for measuring neutron fields of a nuclear reactor, for estimating crystal lattice parameters from X-ray diffraction patterns, for interpreting data of X-ray fluorescence analysis, for estimating complex formation constants, and for analyzing results of radiometric measurements. The problem of estimating the target parameters can be incorrect at certain properties of the system under study. The authors showed the possibility of regularization by adding a fictitious set of data open-quotes obtainedclose quotes from the orthogonal design. To estimate only a part of the parameters under consideration, the authors used incomplete rank models. In this case, it is necessary to take into account the possibility of confounding estimates. An algorithm for evaluating the degree of confounding is presented which is realized using standard software or regression analysis
International Nuclear Information System (INIS)
Díaz, Santiago; Carta, José A.; Matías, José M.
2017-01-01
Highlights: • Eight measure-correlate-predict (MCP) models used to estimate the wind power densities (WPDs) at a target site are compared. • Support vector regressions are used as the main prediction techniques in the proposed MCPs. • The most precise MCP uses two sub-models which predict wind speed and air density in an unlinked manner. • The most precise model allows to construct a bivariable (wind speed and air density) WPD probability density function. • MCP models trained to minimise wind speed prediction error do not minimise WPD prediction error. - Abstract: The long-term annual mean wind power density (WPD) is an important indicator of wind as a power source which is usually included in regional wind resource maps as useful prior information to identify potentially attractive sites for the installation of wind projects. In this paper, a comparison is made of eight proposed Measure-Correlate-Predict (MCP) models to estimate the WPDs at a target site. Seven of these models use the Support Vector Regression (SVR) and the eighth the Multiple Linear Regression (MLR) technique, which serves as a basis to compare the performance of the other models. In addition, a wrapper technique with 10-fold cross-validation has been used to select the optimal set of input features for the SVR and MLR models. Some of the eight models were trained to directly estimate the mean hourly WPDs at a target site. Others, however, were firstly trained to estimate the parameters on which the WPD depends (i.e. wind speed and air density) and then, using these parameters, the target site mean hourly WPDs. The explanatory features considered are different combinations of the mean hourly wind speeds, wind directions and air densities recorded in 2014 at ten weather stations in the Canary Archipelago (Spain). The conclusions that can be drawn from the study undertaken include the argument that the most accurate method for the long-term estimation of WPDs requires the execution of a
DEFF Research Database (Denmark)
Hansen, Henrik; Tarp, Finn
2001-01-01
This paper examines the relationship between foreign aid and growth in real GDP per capita as it emerges from simple augmentations of popular cross country growth specifications. It is shown that aid in all likelihood increases the growth rate, and this result is not conditional on ‘good’ policy....... investment. We conclude by stressing the need for more theoretical work before this kind of cross-country regressions are used for policy purposes.......This paper examines the relationship between foreign aid and growth in real GDP per capita as it emerges from simple augmentations of popular cross country growth specifications. It is shown that aid in all likelihood increases the growth rate, and this result is not conditional on ‘good’ policy...
Balabin, Roman M; Lomakina, Ekaterina I
2011-06-28
A multilayer feed-forward artificial neural network (MLP-ANN) with a single, hidden layer that contains a finite number of neurons can be regarded as a universal non-linear approximator. Today, the ANN method and linear regression (MLR) model are widely used for quantum chemistry (QC) data analysis (e.g., thermochemistry) to improve their accuracy (e.g., Gaussian G2-G4, B3LYP/B3-LYP, X1, or W1 theoretical methods). In this study, an alternative approach based on support vector machines (SVMs) is used, the least squares support vector machine (LS-SVM) regression. It has been applied to ab initio (first principle) and density functional theory (DFT) quantum chemistry data. So, QC + SVM methodology is an alternative to QC + ANN one. The task of the study was to estimate the Møller-Plesset (MPn) or DFT (B3LYP, BLYP, BMK) energies calculated with large basis sets (e.g., 6-311G(3df,3pd)) using smaller ones (6-311G, 6-311G*, 6-311G**) plus molecular descriptors. A molecular set (BRM-208) containing a total of 208 organic molecules was constructed and used for the LS-SVM training, cross-validation, and testing. MP2, MP3, MP4(DQ), MP4(SDQ), and MP4/MP4(SDTQ) ab initio methods were tested. Hartree-Fock (HF/SCF) results were also reported for comparison. Furthermore, constitutional (CD: total number of atoms and mole fractions of different atoms) and quantum-chemical (QD: HOMO-LUMO gap, dipole moment, average polarizability, and quadrupole moment) molecular descriptors were used for the building of the LS-SVM calibration model. Prediction accuracies (MADs) of 1.62 ± 0.51 and 0.85 ± 0.24 kcal mol(-1) (1 kcal mol(-1) = 4.184 kJ mol(-1)) were reached for SVM-based approximations of ab initio and DFT energies, respectively. The LS-SVM model was more accurate than the MLR model. A comparison with the artificial neural network approach shows that the accuracy of the LS-SVM method is similar to the accuracy of ANN. The extrapolation and interpolation results show that LS-SVM is
Prediction, Regression and Critical Realism
DEFF Research Database (Denmark)
Næss, Petter
2004-01-01
This paper considers the possibility of prediction in land use planning, and the use of statistical research methods in analyses of relationships between urban form and travel behaviour. Influential writers within the tradition of critical realism reject the possibility of predicting social...... phenomena. This position is fundamentally problematic to public planning. Without at least some ability to predict the likely consequences of different proposals, the justification for public sector intervention into market mechanisms will be frail. Statistical methods like regression analyses are commonly...... seen as necessary in order to identify aggregate level effects of policy measures, but are questioned by many advocates of critical realist ontology. Using research into the relationship between urban structure and travel as an example, the paper discusses relevant research methods and the kinds...
Brix, Kevin V; DeForest, David K; Tear, Lucinda; Grosell, Martin; Adams, William J
2017-05-02
Biotic Ligand Models (BLMs) for metals are widely applied in ecological risk assessments and in the development of regulatory water quality guidelines in Europe, and in 2007 the United States Environmental Protection Agency (USEPA) recommended BLM-based water quality criteria (WQC) for Cu in freshwater. However, to-date, few states have adopted BLM-based Cu criteria into their water quality standards on a state-wide basis, which appears to be due to the perception that the BLM is too complicated or requires too many input variables. Using the mechanistic BLM framework to first identify key water chemistry parameters that influence Cu bioavailability, namely dissolved organic carbon (DOC), pH, and hardness, we developed Cu criteria using the same basic methodology used by the USEPA to derive hardness-based criteria but with the addition of DOC and pH. As an initial proof of concept, we developed stepwise multiple linear regression (MLR) models for species that have been tested over wide ranges of DOC, pH, and hardness conditions. These models predicted acute Cu toxicity values that were within a factor of ±2 in 77% to 97% of tests (5 species had adequate data) and chronic Cu toxicity values that were within a factor of ±2 in 92% of tests (1 species had adequate data). This level of accuracy is comparable to the BLM. Following USEPA guidelines for WQC development, the species data were then combined to develop a linear model with pooled slopes for each independent parameter (i.e., DOC, pH, and hardness) and species-specific intercepts using Analysis of Covariance. The pooled MLR and BLM models predicted species-specific toxicity with similar precision; adjusted R 2 and R 2 values ranged from 0.56 to 0.86 and 0.66-0.85, respectively. Graphical exploration of relationships between predicted and observed toxicity, residuals and observed toxicity, and residuals and concentrations of key input parameters revealed many similarities and a few key distinctions between the
Modified Regression Correlation Coefficient for Poisson Regression Model
Kaengthong, Nattacha; Domthong, Uthumporn
2017-09-01
This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).
Post-processing through linear regression
van Schaeybroeck, B.; Vannitsem, S.
2011-03-01
Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS) method, a new time-dependent Tikhonov regularization (TDTR) method, the total least-square method, a new geometric-mean regression (GM), a recently introduced error-in-variables (EVMOS) method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified. These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise). At long lead times the regression schemes (EVMOS, TDTR) which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.
Post-processing through linear regression
Directory of Open Access Journals (Sweden)
B. Van Schaeybroeck
2011-03-01
Full Text Available Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS method, a new time-dependent Tikhonov regularization (TDTR method, the total least-square method, a new geometric-mean regression (GM, a recently introduced error-in-variables (EVMOS method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified.
These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise. At long lead times the regression schemes (EVMOS, TDTR which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.
Luo, Chongliang; Liu, Jin; Dey, Dipak K; Chen, Kun
2016-07-01
In many fields, multi-view datasets, measuring multiple distinct but interrelated sets of characteristics on the same set of subjects, together with data on certain outcomes or phenotypes, are routinely collected. The objective in such a problem is often two-fold: both to explore the association structures of multiple sets of measurements and to develop a parsimonious model for predicting the future outcomes. We study a unified canonical variate regression framework to tackle the two problems simultaneously. The proposed criterion integrates multiple canonical correlation analysis with predictive modeling, balancing between the association strength of the canonical variates and their joint predictive power on the outcomes. Moreover, the proposed criterion seeks multiple sets of canonical variates simultaneously to enable the examination of their joint effects on the outcomes, and is able to handle multivariate and non-Gaussian outcomes. An efficient algorithm based on variable splitting and Lagrangian multipliers is proposed. Simulation studies show the superior performance of the proposed approach. We demonstrate the effectiveness of the proposed approach in an [Formula: see text] intercross mice study and an alcohol dependence study. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Regression modeling of ground-water flow
Cooley, R.L.; Naff, R.L.
1985-01-01
Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
Variable and subset selection in PLS regression
DEFF Research Database (Denmark)
Høskuldsson, Agnar
2001-01-01
The purpose of this paper is to present some useful methods for introductory analysis of variables and subsets in relation to PLS regression. We present here methods that are efficient in finding the appropriate variables or subset to use in the PLS regression. The general conclusion...... is that variable selection is important for successful analysis of chemometric data. An important aspect of the results presented is that lack of variable selection can spoil the PLS regression, and that cross-validation measures using a test set can show larger variation, when we use different subsets of X, than...
Discriminative Elastic-Net Regularized Linear Regression.
Zhang, Zheng; Lai, Zhihui; Xu, Yong; Shao, Ling; Wu, Jian; Xie, Guo-Sen
2017-03-01
In this paper, we aim at learning compact and discriminative linear regression models. Linear regression has been widely used in different problems. However, most of the existing linear regression methods exploit the conventional zero-one matrix as the regression targets, which greatly narrows the flexibility of the regression model. Another major limitation of these methods is that the learned projection matrix fails to precisely project the image features to the target space due to their weak discriminative capability. To this end, we present an elastic-net regularized linear regression (ENLR) framework, and develop two robust linear regression models which possess the following special characteristics. First, our methods exploit two particular strategies to enlarge the margins of different classes by relaxing the strict binary targets into a more feasible variable matrix. Second, a robust elastic-net regularization of singular values is introduced to enhance the compactness and effectiveness of the learned projection matrix. Third, the resulting optimization problem of ENLR has a closed-form solution in each iteration, which can be solved efficiently. Finally, rather than directly exploiting the projection matrix for recognition, our methods employ the transformed features as the new discriminate representations to make final image classification. Compared with the traditional linear regression model and some of its variants, our method is much more accurate in image classification. Extensive experiments conducted on publicly available data sets well demonstrate that the proposed framework can outperform the state-of-the-art methods. The MATLAB codes of our methods can be available at http://www.yongxu.org/lunwen.html.
Recursive Algorithm For Linear Regression
Varanasi, S. V.
1988-01-01
Order of model determined easily. Linear-regression algorithhm includes recursive equations for coefficients of model of increased order. Algorithm eliminates duplicative calculations, facilitates search for minimum order of linear-regression model fitting set of data satisfactory.
Tong, Xuming; Chen, Jinghang; Miao, Hongyu; Li, Tingting; Zhang, Le
2015-01-01
Agent-based models (ABM) and differential equations (DE) are two commonly used methods for immune system simulation. However, it is difficult for ABM to estimate key parameters of the model by incorporating experimental data, whereas the differential equation model is incapable of describing the complicated immune system in detail. To overcome these problems, we developed an integrated ABM regression model (IABMR). It can combine the advantages of ABM and DE by employing ABM to mimic the multi-scale immune system with various phenotypes and types of cells as well as using the input and output of ABM to build up the Loess regression for key parameter estimation. Next, we employed the greedy algorithm to estimate the key parameters of the ABM with respect to the same experimental data set and used ABM to describe a 3D immune system similar to previous studies that employed the DE model. These results indicate that IABMR not only has the potential to simulate the immune system at various scales, phenotypes and cell types, but can also accurately infer the key parameters like DE model. Therefore, this study innovatively developed a complex system development mechanism that could simulate the complicated immune system in detail like ABM and validate the reliability and efficiency of model like DE by fitting the experimental data. PMID:26535589
Regularized Label Relaxation Linear Regression.
Fang, Xiaozhao; Xu, Yong; Li, Xuelong; Lai, Zhihui; Wong, Wai Keung; Fang, Bingwu
2018-04-01
Linear regression (LR) and some of its variants have been widely used for classification problems. Most of these methods assume that during the learning phase, the training samples can be exactly transformed into a strict binary label matrix, which has too little freedom to fit the labels adequately. To address this problem, in this paper, we propose a novel regularized label relaxation LR method, which has the following notable characteristics. First, the proposed method relaxes the strict binary label matrix into a slack variable matrix by introducing a nonnegative label relaxation matrix into LR, which provides more freedom to fit the labels and simultaneously enlarges the margins between different classes as much as possible. Second, the proposed method constructs the class compactness graph based on manifold learning and uses it as the regularization item to avoid the problem of overfitting. The class compactness graph is used to ensure that the samples sharing the same labels can be kept close after they are transformed. Two different algorithms, which are, respectively, based on -norm and -norm loss functions are devised. These two algorithms have compact closed-form solutions in each iteration so that they are easily implemented. Extensive experiments show that these two algorithms outperform the state-of-the-art algorithms in terms of the classification accuracy and running time.
Introduction to the use of regression models in epidemiology.
Bender, Ralf
2009-01-01
Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.
Directory of Open Access Journals (Sweden)
Santana Isabel
2011-08-01
Full Text Available Abstract Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI, but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p Conclusions When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing.
Combining Alphas via Bounded Regression
Directory of Open Access Journals (Sweden)
Zura Kakushadze
2015-11-01
Full Text Available We give an explicit algorithm and source code for combining alpha streams via bounded regression. In practical applications, typically, there is insufficient history to compute a sample covariance matrix (SCM for a large number of alphas. To compute alpha allocation weights, one then resorts to (weighted regression over SCM principal components. Regression often produces alpha weights with insufficient diversification and/or skewed distribution against, e.g., turnover. This can be rectified by imposing bounds on alpha weights within the regression procedure. Bounded regression can also be applied to stock and other asset portfolio construction. We discuss illustrative examples.
Regression in autistic spectrum disorders.
Stefanatos, Gerry A
2008-12-01
A significant proportion of children diagnosed with Autistic Spectrum Disorder experience a developmental regression characterized by a loss of previously-acquired skills. This may involve a loss of speech or social responsitivity, but often entails both. This paper critically reviews the phenomena of regression in autistic spectrum disorders, highlighting the characteristics of regression, age of onset, temporal course, and long-term outcome. Important considerations for diagnosis are discussed and multiple etiological factors currently hypothesized to underlie the phenomenon are reviewed. It is argued that regressive autistic spectrum disorders can be conceptualized on a spectrum with other regressive disorders that may share common pathophysiological features. The implications of this viewpoint are discussed.
Two Paradoxes in Linear Regression Analysis
FENG, Ge; PENG, Jing; TU, Dongke; ZHENG, Julia Z.; FENG, Changyong
2016-01-01
Summary Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection. PMID:28638214
Lin, Yingzhi; Deng, Xiangzheng; Li, Xing; Ma, Enjun
2014-12-01
Spatially explicit simulation of land use change is the basis for estimating the effects of land use and cover change on energy fluxes, ecology and the environment. At the pixel level, logistic regression is one of the most common approaches used in spatially explicit land use allocation models to determine the relationship between land use and its causal factors in driving land use change, and thereby to evaluate land use suitability. However, these models have a drawback in that they do not determine/allocate land use based on the direct relationship between land use change and its driving factors. Consequently, a multinomial logistic regression method was introduced to address this flaw, and thereby, judge the suitability of a type of land use in any given pixel in a case study area of the Jiangxi Province, China. A comparison of the two regression methods indicated that the proportion of correctly allocated pixels using multinomial logistic regression was 92.98%, which was 8.47% higher than that obtained using logistic regression. Paired t-test results also showed that pixels were more clearly distinguished by multinomial logistic regression than by logistic regression. In conclusion, multinomial logistic regression is a more efficient and accurate method for the spatial allocation of land use changes. The application of this method in future land use change studies may improve the accuracy of predicting the effects of land use and cover change on energy fluxes, ecology, and environment.
Meta-Modeling by Symbolic Regression and Pareto Simulated Annealing
Stinstra, E.; Rennen, G.; Teeuwen, G.J.A.
2006-01-01
The subject of this paper is a new approach to Symbolic Regression.Other publications on Symbolic Regression use Genetic Programming.This paper describes an alternative method based on Pareto Simulated Annealing.Our method is based on linear regression for the estimation of constants.Interval
Ghaedi, M; Rahimi, Mahmoud Reza; Ghaedi, A M; Tyagi, Inderjeet; Agarwal, Shilpi; Gupta, Vinod Kumar
2016-01-01
Two novel and eco friendly adsorbents namely tin oxide nanoparticles loaded on activated carbon (SnO2-NP-AC) and activated carbon prepared from wood tree Pistacia atlantica (AC-PAW) were used for the rapid removal and fast adsorption of methyl orange (MO) from the aqueous phase. The dependency of MO removal with various adsorption influential parameters was well modeled and optimized using multiple linear regressions (MLR) and least squares support vector regression (LSSVR). The optimal parameters for the LSSVR model were found based on γ value of 0.76 and σ(2) of 0.15. For testing the data set, the mean square error (MSE) values of 0.0010 and the coefficient of determination (R(2)) values of 0.976 were obtained for LSSVR model, and the MSE value of 0.0037 and the R(2) value of 0.897 were obtained for the MLR model. The adsorption equilibrium and kinetic data was found to be well fitted and in good agreement with Langmuir isotherm model and second-order equation and intra-particle diffusion models respectively. The small amount of the proposed SnO2-NP-AC and AC-PAW (0.015 g and 0.08 g) is applicable for successful rapid removal of methyl orange (>95%). The maximum adsorption capacity for SnO2-NP-AC and AC-PAW was 250 mg g(-1) and 125 mg g(-1) respectively. Copyright © 2015 Elsevier Inc. All rights reserved.
Model-based Quantile Regression for Discrete Data
Padellini, Tullia; Rue, Haavard
2018-01-01
Quantile regression is a class of methods voted to the modelling of conditional quantiles. In a Bayesian framework quantile regression has typically been carried out exploiting the Asymmetric Laplace Distribution as a working likelihood. Despite
Energy Technology Data Exchange (ETDEWEB)
Sharifi, Hoda [Department of Radiation Oncology, Henry Ford Health System, Detroit, MI (United States); Department of Physics, Oakland University, Rochester, MI (United States); Zhang, Hong; Jin, Jian-Yyue; Kong, Feng-Ming [Department of Radiation Oncology, GRU Cancer Center, Augusta GA (United States); Chetty, Indrin J [Department of Radiation Oncology, Henry Ford Health System, Detroit, MI (United States); Zhong, Hualiang
2016-06-15
Purpose: In PET-guided adaptive radiotherapy (RT), changes in the metabolic activity at individual voxels cannot be derived until the duringtreatment CT images are appropriately registered to pre-treatment CT images. However, deformable image registration (DIR) usually does not preserve tumor volume. This may induce errors when comparing to the target. The aim of this study was to develop a DIR-integrated mechanical modeling technique to track radiation-induced metabolic changes on PET images. Methods: Three patients with non-small cell lung cancer (NSCLC) were treated with adaptive radiotherapy under RTOG 1106. Two PET/CT image sets were acquired 2 weeks before RT and 18 fractions after the start of treatment. DIR was performed to register the during-RT CT to the pre-RT CT using a B-spline algorithm and the resultant displacements in the region of tumor were remodeled using a hybrid finite element method (FEM). Gross tumor volume (GTV) was delineated on the during-RT PET/CT image sets and deformed using the 3D deformation vector fields generated by the CT-based registrations. Metabolic tumor volume (MTV) was calculated using the pre- and during–RT image set. The quality of the PET mapping was evaluated based on the constancy of the mapped MTV and landmark comparison. Results: The B-spline-based registrations changed MTVs by 7.3%, 4.6% and −5.9% for the 3 patients and the correspondent changes for the hybrid FEM method −2.9%, 1% and 6.3%, respectively. Landmark comparisons were used to evaluate the Rigid, B-Spline, and hybrid FEM registrations with the mean errors of 10.1 ± 1.6 mm, 4.4 ± 0.4 mm, and 3.6 ± 0.4 mm for three patients. The hybrid FEM method outperforms the B-Spline-only registration for patients with tumor regression Conclusion: The hybrid FEM modeling technique improves the B-Spline registrations in tumor regions. This technique may help compare metabolic activities between two PET/CT images with regressing tumors. The author gratefully
Retro-regression--another important multivariate regression improvement.
Randić, M
2001-01-01
We review the serious problem associated with instabilities of the coefficients of regression equations, referred to as the MRA (multivariate regression analysis) "nightmare of the first kind". This is manifested when in a stepwise regression a descriptor is included or excluded from a regression. The consequence is an unpredictable change of the coefficients of the descriptors that remain in the regression equation. We follow with consideration of an even more serious problem, referred to as the MRA "nightmare of the second kind", arising when optimal descriptors are selected from a large pool of descriptors. This process typically causes at different steps of the stepwise regression a replacement of several previously used descriptors by new ones. We describe a procedure that resolves these difficulties. The approach is illustrated on boiling points of nonanes which are considered (1) by using an ordered connectivity basis; (2) by using an ordering resulting from application of greedy algorithm; and (3) by using an ordering derived from an exhaustive search for optimal descriptors. A novel variant of multiple regression analysis, called retro-regression (RR), is outlined showing how it resolves the ambiguities associated with both "nightmares" of the first and the second kind of MRA.
Chromatography methods and chemometrics for determination of milk fat adulterants
Trbović, D.; Petronijević, R.; Đorđević, V.
2017-09-01
Milk and milk-based products are among the leading food categories according to reported cases of food adulteration. Although many authentication problems exist in all areas of the food industry, adequate control methods are required to evaluate the authenticity of milk and milk products in the dairy industry. Moreover, gas chromatography (GC) analysis of triacylglycerols (TAGs) or fatty acid (FA) profiles of milk fat (MF) in combination with multivariate statistical data processing have been used to detect adulterations of milk and dairy products with foreign fats. The adulteration of milk and butter is a major issue for the dairy industry. The major adulterants of MF are vegetable oils (soybean, sunflower, groundnut, coconut, palm and peanut oil) and animal fat (cow tallow and pork lard). Multivariate analysis enables adulterated MF to be distinguished from authentic MF, while taking into account many analytical factors. Various multivariate analysis methods have been proposed to quantitatively detect levels of adulterant non-MFs, with multiple linear regression (MLR) seemingly the most suitable. There is a need for increased use of chemometric data analyses to detect adulterated MF in foods and for their expanded use in routine quality assurance testing.
Panel Smooth Transition Regression Models
DEFF Research Database (Denmark)
González, Andrés; Terasvirta, Timo; Dijk, Dick van
We introduce the panel smooth transition regression model. This new model is intended for characterizing heterogeneous panels, allowing the regression coefficients to vary both across individuals and over time. Specifically, heterogeneity is allowed for by assuming that these coefficients are bou...
Logistic Regression: Concept and Application
Cokluk, Omay
2010-01-01
The main focus of logistic regression analysis is classification of individuals in different groups. The aim of the present study is to explain basic concepts and processes of binary logistic regression analysis intended to determine the combination of independent variables which best explain the membership in certain groups called dichotomous…
Vaeth, Michael; Skovlund, Eva
2004-06-15
For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.
Tumor regression patterns in retinoblastoma
International Nuclear Information System (INIS)
Zafar, S.N.; Siddique, S.N.; Zaheer, N.
2016-01-01
To observe the types of tumor regression after treatment, and identify the common pattern of regression in our patients. Study Design: Descriptive study. Place and Duration of Study: Department of Pediatric Ophthalmology and Strabismus, Al-Shifa Trust Eye Hospital, Rawalpindi, Pakistan, from October 2011 to October 2014. Methodology: Children with unilateral and bilateral retinoblastoma were included in the study. Patients were referred to Pakistan Institute of Medical Sciences, Islamabad, for chemotherapy. After every cycle of chemotherapy, dilated funds examination under anesthesia was performed to record response of the treatment. Regression patterns were recorded on RetCam II. Results: Seventy-four tumors were included in the study. Out of 74 tumors, 3 were ICRB group A tumors, 43 were ICRB group B tumors, 14 tumors belonged to ICRB group C, and remaining 14 were ICRB group D tumors. Type IV regression was seen in 39.1% (n=29) tumors, type II in 29.7% (n=22), type III in 25.6% (n=19), and type I in 5.4% (n=4). All group A tumors (100%) showed type IV regression. Seventeen (39.5%) group B tumors showed type IV regression. In group C, 5 tumors (35.7%) showed type II regression and 5 tumors (35.7%) showed type IV regression. In group D, 6 tumors (42.9%) regressed to type II non-calcified remnants. Conclusion: The response and success of the focal and systemic treatment, as judged by the appearance of different patterns of tumor regression, varies with the ICRB grouping of the tumor. (author)
On Solving Lq-Penalized Regressions
Directory of Open Access Journals (Sweden)
Tracy Zhou Wu
2007-01-01
Full Text Available Lq-penalized regression arises in multidimensional statistical modelling where all or part of the regression coefficients are penalized to achieve both accuracy and parsimony of statistical models. There is often substantial computational difficulty except for the quadratic penalty case. The difficulty is partly due to the nonsmoothness of the objective function inherited from the use of the absolute value. We propose a new solution method for the general Lq-penalized regression problem based on space transformation and thus efficient optimization algorithms. The new method has immediate applications in statistics, notably in penalized spline smoothing problems. In particular, the LASSO problem is shown to be polynomial time solvable. Numerical studies show promise of our approach.
Influence diagnostics in meta-regression model.
Shi, Lei; Zuo, ShanShan; Yu, Dalei; Zhou, Xiaohua
2017-09-01
This paper studies the influence diagnostics in meta-regression model including case deletion diagnostic and local influence analysis. We derive the subset deletion formulae for the estimation of regression coefficient and heterogeneity variance and obtain the corresponding influence measures. The DerSimonian and Laird estimation and maximum likelihood estimation methods in meta-regression are considered, respectively, to derive the results. Internal and external residual and leverage measure are defined. The local influence analysis based on case-weights perturbation scheme, responses perturbation scheme, covariate perturbation scheme, and within-variance perturbation scheme are explored. We introduce a method by simultaneous perturbing responses, covariate, and within-variance to obtain the local influence measure, which has an advantage of capable to compare the influence magnitude of influential studies from different perturbations. An example is used to illustrate the proposed methodology. Copyright © 2017 John Wiley & Sons, Ltd.
Regression to Causality : Regression-style presentation influences causal attribution
DEFF Research Database (Denmark)
Bordacconi, Mats Joe; Larsen, Martin Vinæs
2014-01-01
of equivalent results presented as either regression models or as a test of two sample means. Our experiment shows that the subjects who were presented with results as estimates from a regression model were more inclined to interpret these results causally. Our experiment implies that scholars using regression...... models – one of the primary vehicles for analyzing statistical results in political science – encourage causal interpretation. Specifically, we demonstrate that presenting observational results in a regression model, rather than as a simple comparison of means, makes causal interpretation of the results...... more likely. Our experiment drew on a sample of 235 university students from three different social science degree programs (political science, sociology and economics), all of whom had received substantial training in statistics. The subjects were asked to compare and evaluate the validity...
International Nuclear Information System (INIS)
Ghasemi, Jahanbakhsh; Saaidpour, Saadi
2007-01-01
A quantitative structure-property relationship (QSPR) study was performed to develop models those relate the structures of 150 drug organic compounds to their n-octanol-water partition coefficients (log P o/w ). Molecular descriptors derived solely from 3D structures of the molecular drugs. A genetic algorithm was also applied as a variable selection tool in QSPR analysis. The models were constructed using 110 molecules as training set, and predictive ability tested using 40 compounds. Modeling of log P o/w of these compounds as a function of the theoretically derived descriptors was established by multiple linear regression (MLR). Four descriptors for these compounds molecular volume (MV) (geometrical), hydrophilic-lipophilic balance (HLB) (constitutional), hydrogen bond forming ability (HB) (electronic) and polar surface area (PSA) (electrostatic) are taken as inputs for the model. The use of descriptors calculated only from molecular structure eliminates the need for experimental determination of properties for use in the correlation and allows for the estimation of log P o/w for molecules not yet synthesized. Application of the developed model to a testing set of 40 drug organic compounds demonstrates that the model is reliable with good predictive accuracy and simple formulation. The prediction results are in good agreement with the experimental value. The root mean square error of prediction (RMSEP) and square correlation coefficient (R 2 ) for MLR model were 0.22 and 0.99 for the prediction set log P o/w
Azadi, Sama; Karimi-Jashni, Ayoub
2016-02-01
Predicting the mass of solid waste generation plays an important role in integrated solid waste management plans. In this study, the performance of two predictive models, Artificial Neural Network (ANN) and Multiple Linear Regression (MLR) was verified to predict mean Seasonal Municipal Solid Waste Generation (SMSWG) rate. The accuracy of the proposed models is illustrated through a case study of 20 cities located in Fars Province, Iran. Four performance measures, MAE, MAPE, RMSE and R were used to evaluate the performance of these models. The MLR, as a conventional model, showed poor prediction performance. On the other hand, the results indicated that the ANN model, as a non-linear model, has a higher predictive accuracy when it comes to prediction of the mean SMSWG rate. As a result, in order to develop a more cost-effective strategy for waste management in the future, the ANN model could be used to predict the mean SMSWG rate. Copyright © 2015 Elsevier Ltd. All rights reserved.
Kavuncuoglu, Hatice; Kavuncuoglu, Erhan; Karatas, Seyda Merve; Benli, Büsra; Sagdic, Osman; Yalcin, Hasan
2018-04-09
The mathematical model was established to determine the diameter of inhibition zone of the walnut extract on the twelve bacterial species. Type of extraction, concentration, and pathogens were taken as input variables. Two models were used with the aim of designing this system. One of them was developed with artificial neural networks (ANN), and the other was formed with multiple linear regression (MLR). Four common training algorithms were used. Levenberg-Marquardt (LM), Bayesian regulation (BR), scaled conjugate gradient (SCG) and resilient back propagation (RP) were investigated, and the algorithms were compared. Root mean squared error and correlation coefficient were evaluated as performance criteria. When these criteria were analyzed, ANN showed high prediction performance, while MLR showed low prediction performance. As a result, it is seen that when the different input values are provided to the system developed with ANN, the most accurate inhibition zone (IZ) estimates were obtained. The results of this study could offer new perspectives, particularly in the field of microbiology, because these could be applied to other type of extraction, concentrations, and pathogens, without resorting to experiments. Copyright © 2018 Elsevier B.V. All rights reserved.
Golmohammadi, Hassan
2009-11-30
A quantitative structure-property relationship (QSPR) study was performed to develop models those relate the structure of 141 organic compounds to their octanol-water partition coefficients (log P(o/w)). A genetic algorithm was applied as a variable selection tool. Modeling of log P(o/w) of these compounds as a function of theoretically derived descriptors was established by multiple linear regression (MLR), partial least squares (PLS), and artificial neural network (ANN). The best selected descriptors that appear in the models are: atomic charge weighted partial positively charged surface area (PPSA-3), fractional atomic charge weighted partial positive surface area (FPSA-3), minimum atomic partial charge (Qmin), molecular volume (MV), total dipole moment of molecule (mu), maximum antibonding contribution of a molecule orbital in the molecule (MAC), and maximum free valency of a C atom in the molecule (MFV). The result obtained showed the ability of developed artificial neural network to prediction of partition coefficients of organic compounds. Also, the results revealed the superiority of ANN over the MLR and PLS models. Copyright 2009 Wiley Periodicals, Inc.
Directory of Open Access Journals (Sweden)
Apif M. Hajji
2017-09-01
Full Text Available Heavy duty diesel (HDD construction equipment which includes bulldozer is important in infrastructure development. This equipment consumes large amount of diesel fuel and emits high level of carbon dioxide (CO2. The total emissions are dependent upon the fuel use, and the fuel use is dependent upon the productivity of the equipment. This paper proposes a methodology and tool for estimating CO2 emissions from bulldozer based on the productivity rate. The methodology is formulated by using the result of multiple linear regressions (MLR of CAT’s data for obtaining the productivity model and combined with the EPA’s NONROAD model. The emission factors from NONROAD model were used to quantify the CO2 emissions. To display the function of the model, a case study and sensitivity analysis for a bulldozer’s activity is also presented. MLR results indicate that the productivity model generated from CAT’s data can be used as the basis for quantifying the total CO2 emissions for an earthwork activity.
Advanced statistics: linear regression, part II: multiple linear regression.
Marill, Keith A
2004-01-01
The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.
Logic regression and its extensions.
Schwender, Holger; Ruczinski, Ingo
2010-01-01
Logic regression is an adaptive classification and regression procedure, initially developed to reveal interacting single nucleotide polymorphisms (SNPs) in genetic association studies. In general, this approach can be used in any setting with binary predictors, when the interaction of these covariates is of primary interest. Logic regression searches for Boolean (logic) combinations of binary variables that best explain the variability in the outcome variable, and thus, reveals variables and interactions that are associated with the response and/or have predictive capabilities. The logic expressions are embedded in a generalized linear regression framework, and thus, logic regression can handle a variety of outcome types, such as binary responses in case-control studies, numeric responses, and time-to-event data. In this chapter, we provide an introduction to the logic regression methodology, list some applications in public health and medicine, and summarize some of the direct extensions and modifications of logic regression that have been proposed in the literature. Copyright © 2010 Elsevier Inc. All rights reserved.
International Nuclear Information System (INIS)
Kneale, G.W.; Stewart, A.M.; Mancuso, T.F.
1981-01-01
Results are presented from the study initiated by Mancuso into the health risks from low-level radiation in workers engaged in plutonium manufacture at Hanford Works, Washington State, USA, and attempts to answer criticisms of previous reports by an in-depth study. Previous reports have aroused much controversy because the reported risk per unit radiation dose for cancers of radiosensitive tissues was much greater than the risk generally accepted on the basis of other studies and widely used in setting safety levels for exposure to low-level radiation. The method of regression models in life-tables isolates the effect of radiation after statistically controlling for a wide range of possible interfering factors. Like the risk of lung cancer for uranium miners the dose-response relation showed a significant downward curve at about 10 rem. There may, therefore, be better agreement with other studies, conducted at higher doses, than is widely assumed. The findings on cancer latency (of about 25 years) and the effect of exposure age (increasing age increases the risk) are in general agreement with other studies. An unexplained finding is a significantly higher dose for all workers than for workers who developed cancers in tissues that are supposed to have low sensitivity to cancer induction by radiation. (author)
Measurement Error in Education and Growth Regressions
Portela, Miguel; Alessie, Rob; Teulings, Coen
2010-01-01
The use of the perpetual inventory method for the construction of education data per country leads to systematic measurement error. This paper analyzes its effect on growth regressions. We suggest a methodology for correcting this error. The standard attenuation bias suggests that using these
Finite Algorithms for Robust Linear Regression
DEFF Research Database (Denmark)
Madsen, Kaj; Nielsen, Hans Bruun
1990-01-01
The Huber M-estimator for robust linear regression is analyzed. Newton type methods for solution of the problem are defined and analyzed, and finite convergence is proved. Numerical experiments with a large number of test problems demonstrate efficiency and indicate that this kind of approach may...
Measurement Error in Education and Growth Regressions
Portela, M.; Teulings, C.N.; Alessie, R.
The perpetual inventory method used for the construction of education data per country leads to systematic measurement error. This paper analyses the effect of this measurement error on GDP regressions. There is a systematic difference in the education level between census data and observations
Measurement error in education and growth regressions
Portela, Miguel; Teulings, Coen; Alessie, R.
2004-01-01
The perpetual inventory method used for the construction of education data per country leads to systematic measurement error. This paper analyses the effect of this measurement error on GDP regressions. There is a systematic difference in the education level between census data and observations
International Nuclear Information System (INIS)
Li, Yanting; He, Yong; Su, Yan; Shu, Lianjie
2016-01-01
Highlights: • Suggests a nonparametric model based on MARS for output power prediction. • Compare the MARS model with a wide variety of prediction models. • Show that the MARS model is able to provide an overall good performance in both the training and testing stages. - Abstract: Both linear and nonlinear models have been proposed for forecasting the power output of photovoltaic systems. Linear models are simple to implement but less flexible. Due to the stochastic nature of the power output of PV systems, nonlinear models tend to provide better forecast than linear models. Motivated by this, this paper suggests a fairly simple nonlinear regression model known as multivariate adaptive regression splines (MARS), as an alternative to forecasting of solar power output. The MARS model is a data-driven modeling approach without any assumption about the relationship between the power output and predictors. It maintains simplicity of the classical multiple linear regression (MLR) model while possessing the capability of handling nonlinearity. It is simpler in format than other nonlinear models such as ANN, k-nearest neighbors (KNN), classification and regression tree (CART), and support vector machine (SVM). The MARS model was applied on the daily output of a grid-connected 2.1 kW PV system to provide the 1-day-ahead mean daily forecast of the power output. The comparisons with a wide variety of forecast models show that the MARS model is able to provide reliable forecast performance.
Wu, Chunhung
2016-04-01
Few researches have discussed about the applicability of applying the statistical landslide susceptibility (LS) model for extreme rainfall-induced landslide events. The researches focuses on the comparison and applicability of LS models based on four methods, including landslide ratio-based logistic regression (LRBLR), frequency ratio (FR), weight of evidence (WOE), and instability index (II) methods, in an extreme rainfall-induced landslide cases. The landslide inventory in the Chishan river watershed, Southwestern Taiwan, after 2009 Typhoon Morakot is the main materials in this research. The Chishan river watershed is a tributary watershed of Kaoping river watershed, which is a landslide- and erosion-prone watershed with the annual average suspended load of 3.6×107 MT/yr (ranks 11th in the world). Typhoon Morakot struck Southern Taiwan from Aug. 6-10 in 2009 and dumped nearly 2,000 mm of rainfall in the Chishan river watershed. The 24-hour, 48-hour, and 72-hours accumulated rainfall in the Chishan river watershed exceeded the 200-year return period accumulated rainfall. 2,389 landslide polygons in the Chishan river watershed were extracted from SPOT 5 images after 2009 Typhoon Morakot. The total landslide area is around 33.5 km2, equals to the landslide ratio of 4.1%. The main landslide types based on Varnes' (1978) classification are rotational and translational slides. The two characteristics of extreme rainfall-induced landslide event are dense landslide distribution and large occupation of downslope landslide areas owing to headward erosion and bank erosion in the flooding processes. The area of downslope landslide in the Chishan river watershed after 2009 Typhoon Morakot is 3.2 times higher than that of upslope landslide areas. The prediction accuracy of LS models based on LRBLR, FR, WOE, and II methods have been proven over 70%. The model performance and applicability of four models in a landslide-prone watershed with dense distribution of rainfall
Regression Benchmarking: An Approach to Quality Assurance in Performance
Bulej, Lubomír
2005-01-01
The paper presents a short summary of our work in the area of regression benchmarking and its application to software development. Specially, we explain the concept of regression benchmarking, the requirements for employing regression testing in a software project, and methods used for analyzing the vast amounts of data resulting from repeated benchmarking. We present the application of regression benchmarking on a real software project and conclude with a glimpse at the challenges for the fu...
Abstract Expression Grammar Symbolic Regression
Korns, Michael F.
This chapter examines the use of Abstract Expression Grammars to perform the entire Symbolic Regression process without the use of Genetic Programming per se. The techniques explored produce a symbolic regression engine which has absolutely no bloat, which allows total user control of the search space and output formulas, which is faster, and more accurate than the engines produced in our previous papers using Genetic Programming. The genome is an all vector structure with four chromosomes plus additional epigenetic and constraint vectors, allowing total user control of the search space and the final output formulas. A combination of specialized compiler techniques, genetic algorithms, particle swarm, aged layered populations, plus discrete and continuous differential evolution are used to produce an improved symbolic regression sytem. Nine base test cases, from the literature, are used to test the improvement in speed and accuracy. The improved results indicate that these techniques move us a big step closer toward future industrial strength symbolic regression systems.
From Rasch scores to regression
DEFF Research Database (Denmark)
Christensen, Karl Bang
2006-01-01
Rasch models provide a framework for measurement and modelling latent variables. Having measured a latent variable in a population a comparison of groups will often be of interest. For this purpose the use of observed raw scores will often be inadequate because these lack interval scale propertie....... This paper compares two approaches to group comparison: linear regression models using estimated person locations as outcome variables and latent regression models based on the distribution of the score....
Testing Heteroscedasticity in Robust Regression
Czech Academy of Sciences Publication Activity Database
Kalina, Jan
2011-01-01
Roč. 1, č. 4 (2011), s. 25-28 ISSN 2045-3345 Grant - others:GA ČR(CZ) GA402/09/0557 Institutional research plan: CEZ:AV0Z10300504 Keywords : robust regression * heteroscedasticity * regression quantiles * diagnostics Subject RIV: BB - Applied Statistics , Operational Research http://www.researchjournals.co.uk/documents/Vol4/06%20Kalina.pdf
Forecasting with Dynamic Regression Models
Pankratz, Alan
2012-01-01
One of the most widely used tools in statistical forecasting, single equation regression models is examined here. A companion to the author's earlier work, Forecasting with Univariate Box-Jenkins Models: Concepts and Cases, the present text pulls together recent time series ideas and gives special attention to possible intertemporal patterns, distributed lag responses of output to input series and the auto correlation patterns of regression disturbance. It also includes six case studies.
Piecewise linear regression splines with hyperbolic covariates
International Nuclear Information System (INIS)
Cologne, John B.; Sposto, Richard
1992-09-01
Consider the problem of fitting a curve to data that exhibit a multiphase linear response with smooth transitions between phases. We propose substituting hyperbolas as covariates in piecewise linear regression splines to obtain curves that are smoothly joined. The method provides an intuitive and easy way to extend the two-phase linear hyperbolic response model of Griffiths and Miller and Watts and Bacon to accommodate more than two linear segments. The resulting regression spline with hyperbolic covariates may be fit by nonlinear regression methods to estimate the degree of curvature between adjoining linear segments. The added complexity of fitting nonlinear, as opposed to linear, regression models is not great. The extra effort is particularly worthwhile when investigators are unwilling to assume that the slope of the response changes abruptly at the join points. We can also estimate the join points (the values of the abscissas where the linear segments would intersect if extrapolated) if their number and approximate locations may be presumed known. An example using data on changing age at menarche in a cohort of Japanese women illustrates the use of the method for exploratory data analysis. (author)
Using the Ridge Regression Procedures to Estimate the Multiple Linear Regression Coefficients
Gorgees, HazimMansoor; Mahdi, FatimahAssim
2018-05-01
This article concerns with comparing the performance of different types of ordinary ridge regression estimators that have been already proposed to estimate the regression parameters when the near exact linear relationships among the explanatory variables is presented. For this situations we employ the data obtained from tagi gas filling company during the period (2008-2010). The main result we reached is that the method based on the condition number performs better than other methods since it has smaller mean square error (MSE) than the other stated methods.
A QSAR Study of Environmental Estrogens Based on a Novel Variable Selection Method
Directory of Open Access Journals (Sweden)
Aiqian Zhang
2012-05-01
Full Text Available A large number of descriptors were employed to characterize the molecular structure of 53 natural, synthetic, and environmental chemicals which are suspected of disrupting endocrine functions by mimicking or antagonizing natural hormones and may thus pose a serious threat to the health of humans and wildlife. In this work, a robust quantitative structure-activity relationship (QSAR model with a novel variable selection method has been proposed for the effective estrogens. The variable selection method is based on variable interaction (VSMVI with leave-multiple-out cross validation (LMOCV to select the best subset. During variable selection, model construction and assessment, the Organization for Economic Co-operation and Development (OECD principles for regulation of QSAR acceptability were fully considered, such as using an unambiguous multiple-linear regression (MLR algorithm to build the model, using several validation methods to assessment the performance of the model, giving the define of applicability domain and analyzing the outliers with the results of molecular docking. The performance of the QSAR model indicates that the VSMVI is an effective, feasible and practical tool for rapid screening of the best subset from large molecular descriptors.
Producing The New Regressive Left
DEFF Research Database (Denmark)
Crone, Christine
members, this thesis investigates a growing political trend and ideological discourse in the Arab world that I have called The New Regressive Left. On the premise that a media outlet can function as a forum for ideology production, the thesis argues that an analysis of this material can help to trace...... the contexture of The New Regressive Left. If the first part of the thesis lays out the theoretical approach and draws the contextual framework, through an exploration of the surrounding Arab media-and ideoscapes, the second part is an analytical investigation of the discourse that permeates the programmes aired...... becomes clear from the analytical chapters is the emergence of the new cross-ideological alliance of The New Regressive Left. This emerging coalition between Shia Muslims, religious minorities, parts of the Arab Left, secular cultural producers, and the remnants of the political,strategic resistance...
Sparse reduced-rank regression with covariance estimation
Chen, Lisha
2014-12-08
Improving the predicting performance of the multiple response regression compared with separate linear regressions is a challenging question. On the one hand, it is desirable to seek model parsimony when facing a large number of parameters. On the other hand, for certain applications it is necessary to take into account the general covariance structure for the errors of the regression model. We assume a reduced-rank regression model and work with the likelihood function with general error covariance to achieve both objectives. In addition we propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty, and to estimate the error covariance matrix simultaneously by using a similar penalty on the precision matrix. We develop a numerical algorithm to solve the penalized regression problem. In a simulation study and real data analysis, the new method is compared with two recent methods for multivariate regression and exhibits competitive performance in prediction and variable selection.
Sparse reduced-rank regression with covariance estimation
Chen, Lisha; Huang, Jianhua Z.
2014-01-01
Improving the predicting performance of the multiple response regression compared with separate linear regressions is a challenging question. On the one hand, it is desirable to seek model parsimony when facing a large number of parameters. On the other hand, for certain applications it is necessary to take into account the general covariance structure for the errors of the regression model. We assume a reduced-rank regression model and work with the likelihood function with general error covariance to achieve both objectives. In addition we propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty, and to estimate the error covariance matrix simultaneously by using a similar penalty on the precision matrix. We develop a numerical algorithm to solve the penalized regression problem. In a simulation study and real data analysis, the new method is compared with two recent methods for multivariate regression and exhibits competitive performance in prediction and variable selection.
A Matlab program for stepwise regression
Directory of Open Access Journals (Sweden)
Yanhong Qi
2016-03-01
Full Text Available The stepwise linear regression is a multi-variable regression for identifying statistically significant variables in the linear regression equation. In present study, we presented the Matlab program of stepwise regression.
Nonparametric Mixture of Regression Models.
Huang, Mian; Li, Runze; Wang, Shaoli
2013-07-01
Motivated by an analysis of US house price index data, we propose nonparametric finite mixture of regression models. We study the identifiability issue of the proposed models, and develop an estimation procedure by employing kernel regression. We further systematically study the sampling properties of the proposed estimators, and establish their asymptotic normality. A modified EM algorithm is proposed to carry out the estimation procedure. We show that our algorithm preserves the ascent property of the EM algorithm in an asymptotic sense. Monte Carlo simulations are conducted to examine the finite sample performance of the proposed estimation procedure. An empirical analysis of the US house price index data is illustrated for the proposed methodology.
Multitask Quantile Regression under the Transnormal Model.
Fan, Jianqing; Xue, Lingzhou; Zou, Hui
2016-01-01
We consider estimating multi-task quantile regression under the transnormal model, with focus on high-dimensional setting. We derive a surprisingly simple closed-form solution through rank-based covariance regularization. In particular, we propose the rank-based ℓ 1 penalization with positive definite constraints for estimating sparse covariance matrices, and the rank-based banded Cholesky decomposition regularization for estimating banded precision matrices. By taking advantage of alternating direction method of multipliers, nearest correlation matrix projection is introduced that inherits sampling properties of the unprojected one. Our work combines strengths of quantile regression and rank-based covariance regularization to simultaneously deal with nonlinearity and nonnormality for high-dimensional regression. Furthermore, the proposed method strikes a good balance between robustness and efficiency, achieves the "oracle"-like convergence rate, and provides the provable prediction interval under the high-dimensional setting. The finite-sample performance of the proposed method is also examined. The performance of our proposed rank-based method is demonstrated in a real application to analyze the protein mass spectroscopy data.
Kempe, P T; van Oppen, P; de Haan, E; Twisk, J W R; Sluis, A; Smit, J H; van Dyck, R; van Balkom, A J L M
2007-09-01
Two methods for predicting remissions in obsessive-compulsive disorder (OCD) treatment are evaluated. Y-BOCS measurements of 88 patients with a primary OCD (DSM-III-R) diagnosis were performed over a 16-week treatment period, and during three follow-ups. Remission at any measurement was defined as a Y-BOCS score lower than thirteen combined with a reduction of seven points when compared with baseline. Logistic regression models were compared with a Cox regression for recurrent events model. Logistic regression yielded different models at different evaluation times. The recurrent events model remained stable when fewer measurements were used. Higher baseline levels of neuroticism and more severe OCD symptoms were associated with a lower chance of remission, early age of onset and more depressive symptoms with a higher chance. Choice of outcome time affects logistic regression prediction models. Recurrent events analysis uses all information on remissions and relapses. Short- and long-term predictors for OCD remission show overlap.
Cactus: An Introduction to Regression
Hyde, Hartley
2008-01-01
When the author first used "VisiCalc," the author thought it a very useful tool when he had the formulas. But how could he design a spreadsheet if there was no known formula for the quantities he was trying to predict? A few months later, the author relates he learned to use multiple linear regression software and suddenly it all clicked into…
Regression Models for Repairable Systems
Czech Academy of Sciences Publication Activity Database
Novák, Petr
2015-01-01
Roč. 17, č. 4 (2015), s. 963-972 ISSN 1387-5841 Institutional support: RVO:67985556 Keywords : Reliability analysis * Repair models * Regression Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.782, year: 2015 http://library.utia.cas.cz/separaty/2015/SI/novak-0450902.pdf
Kernel regression with functional response
Ferraty, Frédéric; Laksaci, Ali; Tadj, Amel; Vieu, Philippe
2011-01-01
We consider kernel regression estimate when both the response variable and the explanatory one are functional. The rates of uniform almost complete convergence are stated as function of the small ball probability of the predictor and as function of the entropy of the set on which uniformity is obtained.
Evans, Wiley; Mathis, Jeremy T.; Winsor, Peter; Statscewich, Hank; Whitledge, Terry E.
2013-01-01
northern Gulf of Alaska (GOA) shelf experiences carbonate system variability on seasonal and annual time scales, but little information exists to resolve higher frequency variability in this region. To resolve this variability using platforms-of-opportunity, we present multiple linear regression (MLR) models constructed from hydrographic data collected along the Northeast Pacific Global Ocean Ecosystems Dynamics (GLOBEC) Seward Line. The empirical algorithms predict dissolved inorganic carbon (DIC) and total alkalinity (TA) using observations of nitrate (NO3-), temperature, salinity and pressure from the surface to 500 m, with R2s > 0.97 and RMSE values of 11 µmol kg-1 for DIC and 9 µmol kg-1 for TA. We applied these relationships to high-resolution NO3- data sets collected during a novel 20 h glider flight and a GLOBEC mesoscale SeaSoar survey. Results from the glider flight demonstrated time/space along-isopycnal variability of aragonite saturations (Ωarag) associated with a dicothermal layer (a cold near-surface layer found in high latitude oceans) that rivaled changes seen vertically through the thermocline. The SeaSoar survey captured the uplift to aragonite saturation horizon (depth where Ωarag = 1) shoaled to a previously unseen depth in the northern GOA. This work is similar to recent studies aimed at predicting the carbonate system in continental margin settings, albeit demonstrates that a NO3--based approach can be applied to high-latitude data collected from platforms capable of high-frequency measurements.
Forecasting on the total volumes of Malaysia's imports and exports by multiple linear regression
Beh, W. L.; Yong, M. K. Au
2017-04-01
This study is to give an insight on the doubt of the important of macroeconomic variables that affecting the total volumes of Malaysia's imports and exports by using multiple linear regression (MLR) analysis. The time frame for this study will be determined by using quarterly data of the total volumes of Malaysia's imports and exports covering the period between 2000-2015. The macroeconomic variables will be limited to eleven variables which are the exchange rate of US Dollar with Malaysia Ringgit (USD-MYR), exchange rate of China Yuan with Malaysia Ringgit (RMB-MYR), exchange rate of European Euro with Malaysia Ringgit (EUR-MYR), exchange rate of Singapore Dollar with Malaysia Ringgit (SGD-MYR), crude oil prices, gold prices, producer price index (PPI), interest rate, consumer price index (CPI), industrial production index (IPI) and gross domestic product (GDP). This study has applied the Johansen Co-integration test to investigate the relationship among the total volumes to Malaysia's imports and exports. The result shows that crude oil prices, RMB-MYR, EUR-MYR and IPI play important roles in the total volumes of Malaysia's imports. Meanwhile crude oil price, USD-MYR and GDP play important roles in the total volumes of Malaysia's exports.
Directory of Open Access Journals (Sweden)
W.S. Tassinari
2013-04-01
Full Text Available Brazilian beekeeping has been developed from the africanization of the honeybees and its high performance launches Brazil as one of the world´s largest honey producer. The Southeastern region has an expressive position in this market (45%, but the state of Rio de Janeiro is the smallest producer, despite presenting large areas of wild vegetation for honey production. In order to analyze the honey productivity in the state of Rio de Janeiro, this research used classic and spatial regression approaches. The data used in this study comprised the responses regarding beekeeping from 1418 beekeepers distributed throughout 72 counties of this state. The best statistical fit was a semiparametric spatial model. The proposed model could be used to estimate the annual honey yield per hive in regions and to detect production factors more related to beekeeping. Honey productivity was associated with the number of hives, wild swarm collection and losses in the apiaries. This paper highlights that the beekeeping sector needs support and help to elucidate the problems plaguing beekeepers, and the inclusion of spatial effects in the regression models is a useful tool in geographical data.A apicultura brasileira se desenvolveu a partir da africanização das abelhas melíferas, e seu bom desempenho permitiu lançar o Brasil como um dos maiores produtores mundiais de mel. A região Sudeste ocupa uma posição significativa no mercado, mas o estado do Rio de Janeiro é o menor produtor, apesar de apresentar áreas expressivas de vegetação silvestre para a produção de mel. Para analisar a produtividade de mel no estado do Rio de Janeiro, esta pesquisa estudou diversos métodos de regressão clássica e espacial. Os dados analisados compreenderam respostas sobre apicultura de 1418 apicultores distribuídos em 72 municípios do Rio de Janeiro. O melhor ajuste estatístico utilizado foi um modelo semiparamétrico espacial. A utilidade do modelo proposto é estimar
Functional data analysis of generalized regression quantiles
Guo, Mengmeng
2013-11-05
Generalized regression quantiles, including the conditional quantiles and expectiles as special cases, are useful alternatives to the conditional means for characterizing a conditional distribution, especially when the interest lies in the tails. We develop a functional data analysis approach to jointly estimate a family of generalized regression quantiles. Our approach assumes that the generalized regression quantiles share some common features that can be summarized by a small number of principal component functions. The principal component functions are modeled as splines and are estimated by minimizing a penalized asymmetric loss measure. An iterative least asymmetrically weighted squares algorithm is developed for computation. While separate estimation of individual generalized regression quantiles usually suffers from large variability due to lack of sufficient data, by borrowing strength across data sets, our joint estimation approach significantly improves the estimation efficiency, which is demonstrated in a simulation study. The proposed method is applied to data from 159 weather stations in China to obtain the generalized quantile curves of the volatility of the temperature at these stations. © 2013 Springer Science+Business Media New York.
Functional data analysis of generalized regression quantiles
Guo, Mengmeng; Zhou, Lan; Huang, Jianhua Z.; Hä rdle, Wolfgang Karl
2013-01-01
Generalized regression quantiles, including the conditional quantiles and expectiles as special cases, are useful alternatives to the conditional means for characterizing a conditional distribution, especially when the interest lies in the tails. We develop a functional data analysis approach to jointly estimate a family of generalized regression quantiles. Our approach assumes that the generalized regression quantiles share some common features that can be summarized by a small number of principal component functions. The principal component functions are modeled as splines and are estimated by minimizing a penalized asymmetric loss measure. An iterative least asymmetrically weighted squares algorithm is developed for computation. While separate estimation of individual generalized regression quantiles usually suffers from large variability due to lack of sufficient data, by borrowing strength across data sets, our joint estimation approach significantly improves the estimation efficiency, which is demonstrated in a simulation study. The proposed method is applied to data from 159 weather stations in China to obtain the generalized quantile curves of the volatility of the temperature at these stations. © 2013 Springer Science+Business Media New York.
Complex regression Doppler optical coherence tomography
Elahi, Sahar; Gu, Shi; Thrane, Lars; Rollins, Andrew M.; Jenkins, Michael W.
2018-04-01
We introduce a new method to measure Doppler shifts more accurately and extend the dynamic range of Doppler optical coherence tomography (OCT). The two-point estimate of the conventional Doppler method is replaced with a regression that is applied to high-density B-scans in polar coordinates. We built a high-speed OCT system using a 1.68-MHz Fourier domain mode locked laser to acquire high-density B-scans (16,000 A-lines) at high enough frame rates (˜100 fps) to accurately capture the dynamics of the beating embryonic heart. Flow phantom experiments confirm that the complex regression lowers the minimum detectable velocity from 12.25 mm / s to 374 μm / s, whereas the maximum velocity of 400 mm / s is measured without phase wrapping. Complex regression Doppler OCT also demonstrates higher accuracy and precision compared with the conventional method, particularly when signal-to-noise ratio is low. The extended dynamic range allows monitoring of blood flow over several stages of development in embryos without adjusting the imaging parameters. In addition, applying complex averaging recovers hidden features in structural images.
Face Alignment via Regressing Local Binary Features.
Ren, Shaoqing; Cao, Xudong; Wei, Yichen; Sun, Jian
2016-03-01
This paper presents a highly efficient and accurate regression approach for face alignment. Our approach has two novel components: 1) a set of local binary features and 2) a locality principle for learning those features. The locality principle guides us to learn a set of highly discriminative local binary features for each facial landmark independently. The obtained local binary features are used to jointly learn a linear regression for the final output. This approach achieves the state-of-the-art results when tested on the most challenging benchmarks to date. Furthermore, because extracting and regressing local binary features are computationally very cheap, our system is much faster than previous methods. It achieves over 3000 frames per second (FPS) on a desktop or 300 FPS on a mobile phone for locating a few dozens of landmarks. We also study a key issue that is important but has received little attention in the previous research, which is the face detector used to initialize alignment. We investigate several face detectors and perform quantitative evaluation on how they affect alignment accuracy. We find that an alignment friendly detector can further greatly boost the accuracy of our alignment method, reducing the error up to 16% relatively. To facilitate practical usage of face detection/alignment methods, we also propose a convenient metric to measure how good a detector is for alignment initialization.
General regression and representation model for classification.
Directory of Open Access Journals (Sweden)
Jianjun Qian
Full Text Available Recently, the regularized coding-based classification methods (e.g. SRC and CRC show a great potential for pattern classification. However, most existing coding methods assume that the representation residuals are uncorrelated. In real-world applications, this assumption does not hold. In this paper, we take account of the correlations of the representation residuals and develop a general regression and representation model (GRR for classification. GRR not only has advantages of CRC, but also takes full use of the prior information (e.g. the correlations between representation residuals and representation coefficients and the specific information (weight matrix of image pixels to enhance the classification performance. GRR uses the generalized Tikhonov regularization and K Nearest Neighbors to learn the prior information from the training data. Meanwhile, the specific information is obtained by using an iterative algorithm to update the feature (or image pixel weights of the test sample. With the proposed model as a platform, we design two classifiers: basic general regression and representation classifier (B-GRR and robust general regression and representation classifier (R-GRR. The experimental results demonstrate the performance advantages of proposed methods over state-of-the-art algorithms.
Forkuor, Gerald; Hounkpatin, Ozias K L; Welp, Gerhard; Thiel, Michael
2017-01-01
Accurate and detailed spatial soil information is essential for environmental modelling, risk assessment and decision making. The use of Remote Sensing data as secondary sources of information in digital soil mapping has been found to be cost effective and less time consuming compared to traditional soil mapping approaches. But the potentials of Remote Sensing data in improving knowledge of local scale soil information in West Africa have not been fully explored. This study investigated the use of high spatial resolution satellite data (RapidEye and Landsat), terrain/climatic data and laboratory analysed soil samples to map the spatial distribution of six soil properties-sand, silt, clay, cation exchange capacity (CEC), soil organic carbon (SOC) and nitrogen-in a 580 km2 agricultural watershed in south-western Burkina Faso. Four statistical prediction models-multiple linear regression (MLR), random forest regression (RFR), support vector machine (SVM), stochastic gradient boosting (SGB)-were tested and compared. Internal validation was conducted by cross validation while the predictions were validated against an independent set of soil samples considering the modelling area and an extrapolation area. Model performance statistics revealed that the machine learning techniques performed marginally better than the MLR, with the RFR providing in most cases the highest accuracy. The inability of MLR to handle non-linear relationships between dependent and independent variables was found to be a limitation in accurately predicting soil properties at unsampled locations. Satellite data acquired during ploughing or early crop development stages (e.g. May, June) were found to be the most important spectral predictors while elevation, temperature and precipitation came up as prominent terrain/climatic variables in predicting soil properties. The results further showed that shortwave infrared and near infrared channels of Landsat8 as well as soil specific indices of redness
Directory of Open Access Journals (Sweden)
Gerald Forkuor
Full Text Available Accurate and detailed spatial soil information is essential for environmental modelling, risk assessment and decision making. The use of Remote Sensing data as secondary sources of information in digital soil mapping has been found to be cost effective and less time consuming compared to traditional soil mapping approaches. But the potentials of Remote Sensing data in improving knowledge of local scale soil information in West Africa have not been fully explored. This study investigated the use of high spatial resolution satellite data (RapidEye and Landsat, terrain/climatic data and laboratory analysed soil samples to map the spatial distribution of six soil properties-sand, silt, clay, cation exchange capacity (CEC, soil organic carbon (SOC and nitrogen-in a 580 km2 agricultural watershed in south-western Burkina Faso. Four statistical prediction models-multiple linear regression (MLR, random forest regression (RFR, support vector machine (SVM, stochastic gradient boosting (SGB-were tested and compared. Internal validation was conducted by cross validation while the predictions were validated against an independent set of soil samples considering the modelling area and an extrapolation area. Model performance statistics revealed that the machine learning techniques performed marginally better than the MLR, with the RFR providing in most cases the highest accuracy. The inability of MLR to handle non-linear relationships between dependent and independent variables was found to be a limitation in accurately predicting soil properties at unsampled locations. Satellite data acquired during ploughing or early crop development stages (e.g. May, June were found to be the most important spectral predictors while elevation, temperature and precipitation came up as prominent terrain/climatic variables in predicting soil properties. The results further showed that shortwave infrared and near infrared channels of Landsat8 as well as soil specific indices
Directory of Open Access Journals (Sweden)
H. Guan
2010-07-01
concentration (a factor of 10 and above generally indicates either a chloride disequilibrium, or cross-catchment water transfer, or both, for which the CMB method is not applicable. Based on regression between chloride O/I values and annual precipitation for type VI catchments, a catchment with annual precipitation of 900 mm in MLR has most likely reached new chloride equilibrium, and the CMB method can be applied if no cross-catchment water transfer occurs. CMB is applied to one catchment at chloride equilibrium, suggesting a net groundwater recharge of 27 mm/yr, about 3% of annual precipitation.
Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon
2015-01-01
Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended.
Multivariate and semiparametric kernel regression
Härdle, Wolfgang; Müller, Marlene
1997-01-01
The paper gives an introduction to theory and application of multivariate and semiparametric kernel smoothing. Multivariate nonparametric density estimation is an often used pilot tool for examining the structure of data. Regression smoothing helps in investigating the association between covariates and responses. We concentrate on kernel smoothing using local polynomial fitting which includes the Nadaraya-Watson estimator. Some theory on the asymptotic behavior and bandwidth selection is pro...
Regression algorithm for emotion detection
Berthelon , Franck; Sander , Peter
2013-01-01
International audience; We present here two components of a computational system for emotion detection. PEMs (Personalized Emotion Maps) store links between bodily expressions and emotion values, and are individually calibrated to capture each person's emotion profile. They are an implementation based on aspects of Scherer's theoretical complex system model of emotion~\\cite{scherer00, scherer09}. We also present a regression algorithm that determines a person's emotional feeling from sensor m...
Directional quantile regression in R
Czech Academy of Sciences Publication Activity Database
Boček, Pavel; Šiman, Miroslav
2017-01-01
Roč. 53, č. 3 (2017), s. 480-492 ISSN 0023-5954 R&D Projects: GA ČR GA14-07234S Institutional support: RVO:67985556 Keywords : multivariate quantile * regression quantile * halfspace depth * depth contour Subject RIV: BD - Theory of Information OBOR OECD: Applied mathematics Impact factor: 0.379, year: 2016 http://library.utia.cas.cz/separaty/2017/SI/bocek-0476587.pdf
Demonstration of a Fiber Optic Regression Probe
Korman, Valentin; Polzin, Kurt A.
2010-01-01
The capability to provide localized, real-time monitoring of material regression rates in various applications has the potential to provide a new stream of data for development testing of various components and systems, as well as serving as a monitoring tool in flight applications. These applications include, but are not limited to, the regression of a combusting solid fuel surface, the ablation of the throat in a chemical rocket or the heat shield of an aeroshell, and the monitoring of erosion in long-life plasma thrusters. The rate of regression in the first application is very fast, while the second and third are increasingly slower. A recent fundamental sensor development effort has led to a novel regression, erosion, and ablation sensor technology (REAST). The REAST sensor allows for measurement of real-time surface erosion rates at a discrete surface location. The sensor is optical, using two different, co-located fiber-optics to perform the regression measurement. The disparate optical transmission properties of the two fiber-optics makes it possible to measure the regression rate by monitoring the relative light attenuation through the fibers. As the fibers regress along with the parent material in which they are embedded, the relative light intensities through the two fibers changes, providing a measure of the regression rate. The optical nature of the system makes it relatively easy to use in a variety of harsh, high temperature environments, and it is also unaffected by the presence of electric and magnetic fields. In addition, the sensor could be used to perform optical spectroscopy on the light emitted by a process and collected by fibers, giving localized measurements of various properties. The capability to perform an in-situ measurement of material regression rates is useful in addressing a variety of physical issues in various applications. An in-situ measurement allows for real-time data regarding the erosion rates, providing a quick method for
Gaussian Process Regression Model in Spatial Logistic Regression
Sofro, A.; Oktaviarina, A.
2018-01-01
Spatial analysis has developed very quickly in the last decade. One of the favorite approaches is based on the neighbourhood of the region. Unfortunately, there are some limitations such as difficulty in prediction. Therefore, we offer Gaussian process regression (GPR) to accommodate the issue. In this paper, we will focus on spatial modeling with GPR for binomial data with logit link function. The performance of the model will be investigated. We will discuss the inference of how to estimate the parameters and hyper-parameters and to predict as well. Furthermore, simulation studies will be explained in the last section.
SDE based regression for random PDEs
Bayer, Christian
2016-01-01
A simulation based method for the numerical solution of PDE with random coefficients is presented. By the Feynman-Kac formula, the solution can be represented as conditional expectation of a functional of a corresponding stochastic differential equation driven by independent noise. A time discretization of the SDE for a set of points in the domain and a subsequent Monte Carlo regression lead to an approximation of the global solution of the random PDE. We provide an initial error and complexity analysis of the proposed method along with numerical examples illustrating its behaviour.
Bayesian regression of piecewise homogeneous Poisson processes
Directory of Open Access Journals (Sweden)
Diego Sevilla
2015-12-01
Full Text Available In this paper, a Bayesian method for piecewise regression is adapted to handle counting processes data distributed as Poisson. A numerical code in Mathematica is developed and tested analyzing simulated data. The resulting method is valuable for detecting breaking points in the count rate of time series for Poisson processes. Received: 2 November 2015, Accepted: 27 November 2015; Edited by: R. Dickman; Reviewed by: M. Hutter, Australian National University, Canberra, Australia.; DOI: http://dx.doi.org/10.4279/PIP.070018 Cite as: D J R Sevilla, Papers in Physics 7, 070018 (2015
Fixed kernel regression for voltammogram feature extraction
International Nuclear Information System (INIS)
Acevedo Rodriguez, F J; López-Sastre, R J; Gil-Jiménez, P; Maldonado Bascón, S; Ruiz-Reyes, N
2009-01-01
Cyclic voltammetry is an electroanalytical technique for obtaining information about substances under analysis without the need for complex flow systems. However, classifying the information in voltammograms obtained using this technique is difficult. In this paper, we propose the use of fixed kernel regression as a method for extracting features from these voltammograms, reducing the information to a few coefficients. The proposed approach has been applied to a wine classification problem with accuracy rates of over 98%. Although the method is described here for extracting voltammogram information, it can be used for other types of signals
SDE based regression for random PDEs
Bayer, Christian
2016-01-06
A simulation based method for the numerical solution of PDE with random coefficients is presented. By the Feynman-Kac formula, the solution can be represented as conditional expectation of a functional of a corresponding stochastic differential equation driven by independent noise. A time discretization of the SDE for a set of points in the domain and a subsequent Monte Carlo regression lead to an approximation of the global solution of the random PDE. We provide an initial error and complexity analysis of the proposed method along with numerical examples illustrating its behaviour.
Spectral density regression for bivariate extremes
Castro Camilo, Daniela
2016-05-11
We introduce a density regression model for the spectral density of a bivariate extreme value distribution, that allows us to assess how extremal dependence can change over a covariate. Inference is performed through a double kernel estimator, which can be seen as an extension of the Nadaraya–Watson estimator where the usual scalar responses are replaced by mean constrained densities on the unit interval. Numerical experiments with the methods illustrate their resilience in a variety of contexts of practical interest. An extreme temperature dataset is used to illustrate our methods. © 2016 Springer-Verlag Berlin Heidelberg
Regression: The Apple Does Not Fall Far From the Tree.
Vetter, Thomas R; Schober, Patrick
2018-05-15
Researchers and clinicians are frequently interested in either: (1) assessing whether there is a relationship or association between 2 or more variables and quantifying this association; or (2) determining whether 1 or more variables can predict another variable. The strength of such an association is mainly described by the correlation. However, regression analysis and regression models can be used not only to identify whether there is a significant relationship or association between variables but also to generate estimations of such a predictive relationship between variables. This basic statistical tutorial discusses the fundamental concepts and techniques related to the most common types of regression analysis and modeling, including simple linear regression, multiple regression, logistic regression, ordinal regression, and Poisson regression, as well as the common yet often underrecognized phenomenon of regression toward the mean. The various types of regression analysis are powerful statistical techniques, which when appropriately applied, can allow for the valid interpretation of complex, multifactorial data. Regression analysis and models can assess whether there is a relationship or association between 2 or more observed variables and estimate the strength of this association, as well as determine whether 1 or more variables can predict another variable. Regression is thus being applied more commonly in anesthesia, perioperative, critical care, and pain research. However, it is crucial to note that regression can identify plausible risk factors; it does not prove causation (a definitive cause and effect relationship). The results of a regression analysis instead identify independent (predictor) variable(s) associated with the dependent (outcome) variable. As with other statistical methods, applying regression requires that certain assumptions be met, which can be tested with specific diagnostics.