WorldWideScience

Sample records for regression mlr partial

  1. Boosted regression trees, multivariate adaptive regression splines and their two-step combinations with multiple linear regression or partial least squares to predict blood-brain barrier passage: a case study.

    Science.gov (United States)

    Deconinck, E; Zhang, M H; Petitet, F; Dubus, E; Ijjaali, I; Coomans, D; Vander Heyden, Y

    2008-02-18

    The use of some unconventional non-linear modeling techniques, i.e. classification and regression trees and multivariate adaptive regression splines-based methods, was explored to model the blood-brain barrier (BBB) passage of drugs and drug-like molecules. The data set contains BBB passage values for 299 structural and pharmacological diverse drugs, originating from a structured knowledge-based database. Models were built using boosted regression trees (BRT) and multivariate adaptive regression splines (MARS), as well as their respective combinations with stepwise multiple linear regression (MLR) and partial least squares (PLS) regression in two-step approaches. The best models were obtained using combinations of MARS with either stepwise MLR or PLS. It could be concluded that the use of combinations of a linear with a non-linear modeling technique results in some improved properties compared to the individual linear and non-linear models and that, when the use of such a combination is appropriate, combinations using MARS as non-linear technique should be preferred over those with BRT, due to some serious drawbacks of the BRT approaches.

  2. Role of regression model selection and station distribution on the estimation of oceanic anthropogenic carbon change by eMLR

    Directory of Open Access Journals (Sweden)

    Y. Plancherel

    2013-07-01

    Full Text Available Quantifying oceanic anthropogenic carbon uptake by monitoring interior dissolved inorganic carbon (DIC concentrations is complicated by the influence of natural variability. The "eMLR method" aims to address this issue by using empirical regression fits of the data instead of the data themselves, inferring the change in anthropogenic carbon in time by difference between predictions generated by the regressions at each time. The advantages of the method are that it provides in principle a means to filter out natural variability, which theoretically becomes the regression residuals, and a way to deal with sparsely and unevenly distributed data. The degree to which these advantages are realized in practice is unclear, however. The ability of the eMLR method to recover the anthropogenic carbon signal is tested here using a global circulation and biogeochemistry model in which the true signal is known. Results show that regression model selection is particularly important when the observational network changes in time. When the observational network is fixed, the likelihood that co-located systematic misfits between the empirical model and the underlying, yet unknown, true model cancel is greater, improving eMLR results. Changing the observational network modifies how the spatio-temporal variance pattern is captured by the respective datasets, resulting in empirical models that are dynamically or regionally inconsistent, leading to systematic errors. In consequence, the use of regression formulae that change in time to represent systematically best-fit models at all times does not guarantee the best estimates of anthropogenic carbon change if the spatial distributions of the stations emphasize hydrographic features differently in time. Other factors, such as a balanced and representative station coverage, vertical continuity of the regression formulae consistent with the hydrographic context and resiliency of the spatial distribution of the residual

  3. Combined genetic algorithm and multiple linear regression (GA-MLR) optimizer: Application to multi-exponential fluorescence decay surface.

    Science.gov (United States)

    Fisz, Jacek J

    2006-12-07

    The optimization approach based on the genetic algorithm (GA) combined with multiple linear regression (MLR) method, is discussed. The GA-MLR optimizer is designed for the nonlinear least-squares problems in which the model functions are linear combinations of nonlinear functions. GA optimizes the nonlinear parameters, and the linear parameters are calculated from MLR. GA-MLR is an intuitive optimization approach and it exploits all advantages of the genetic algorithm technique. This optimization method results from an appropriate combination of two well-known optimization methods. The MLR method is embedded in the GA optimizer and linear and nonlinear model parameters are optimized in parallel. The MLR method is the only one strictly mathematical "tool" involved in GA-MLR. The GA-MLR approach simplifies and accelerates considerably the optimization process because the linear parameters are not the fitted ones. Its properties are exemplified by the analysis of the kinetic biexponential fluorescence decay surface corresponding to a two-excited-state interconversion process. A short discussion of the variable projection (VP) algorithm, designed for the same class of the optimization problems, is presented. VP is a very advanced mathematical formalism that involves the methods of nonlinear functionals, algebra of linear projectors, and the formalism of Fréchet derivatives and pseudo-inverses. Additional explanatory comments are added on the application of recently introduced the GA-NR optimizer to simultaneous recovery of linear and weakly nonlinear parameters occurring in the same optimization problem together with nonlinear parameters. The GA-NR optimizer combines the GA method with the NR method, in which the minimum-value condition for the quadratic approximation to chi(2), obtained from the Taylor series expansion of chi(2), is recovered by means of the Newton-Raphson algorithm. The application of the GA-NR optimizer to model functions which are multi

  4. Prediction of octanol-water partition coefficients of organic compounds by multiple linear regression, partial least squares, and artificial neural network.

    Science.gov (United States)

    Golmohammadi, Hassan

    2009-11-30

    A quantitative structure-property relationship (QSPR) study was performed to develop models those relate the structure of 141 organic compounds to their octanol-water partition coefficients (log P(o/w)). A genetic algorithm was applied as a variable selection tool. Modeling of log P(o/w) of these compounds as a function of theoretically derived descriptors was established by multiple linear regression (MLR), partial least squares (PLS), and artificial neural network (ANN). The best selected descriptors that appear in the models are: atomic charge weighted partial positively charged surface area (PPSA-3), fractional atomic charge weighted partial positive surface area (FPSA-3), minimum atomic partial charge (Qmin), molecular volume (MV), total dipole moment of molecule (mu), maximum antibonding contribution of a molecule orbital in the molecule (MAC), and maximum free valency of a C atom in the molecule (MFV). The result obtained showed the ability of developed artificial neural network to prediction of partition coefficients of organic compounds. Also, the results revealed the superiority of ANN over the MLR and PLS models. Copyright 2009 Wiley Periodicals, Inc.

  5. Evaluation for Long Term PM10 Concentration Forecasting using Multi Linear Regression (MLR and Principal Component Regression (PCR Models

    Directory of Open Access Journals (Sweden)

    Samsuri Abdullah

    2016-07-01

    Full Text Available Air pollution in Peninsular Malaysia is dominated by particulate matter which is demonstrated by having the highest Air Pollution Index (API value compared to the other pollutants at most part of the country. Particulate Matter (PM10 forecasting models development is crucial because it allows the authority and citizens of a community to take necessary actions to limit their exposure to harmful levels of particulates pollution and implement protection measures to significantly improve air quality on designated locations. This study aims in improving the ability of MLR using PCs inputs for PM10 concentrations forecasting. Daily observations for PM10 in Kuala Terengganu, Malaysia from January 2003 till December 2011 were utilized to forecast PM10 concentration levels. MLR and PCR (using PCs input models were developed and the performance was evaluated using RMSE, NAE and IA. Results revealed that PCR performed better than MLR due to the implementation of PCA which reduce intricacy and eliminate data multi-collinearity.

  6. MLR reactor

    International Nuclear Information System (INIS)

    Ryazantsev, E.P.; Egorenkov, P.M.; Nasonov, V.A.; Smimov, A.M.; Taliev, A.V.; Gromov, B.F.; Kousin, V.V.; Lantsov, M.N.; Radchenko, V.P.; Sharapov, V.N.

    1998-01-01

    The Material Testing Loop Reactor (MLR) development was commenced in 1991 with the aim of updating and widening Russia's experimental base to validate the selected directions of further progress of the nuclear power industry in Russia and to enhance its reliability and safety. The MLR reactor is the pool-type one. As coolant it applies light water and as side reflector beryllium. The direction of water circulation in the core is upward. The core comprises 30 FA arranged as hexagonal lattice with the 90-95 mm pitch. The central materials channel and six loop channels are sited in the core. The reflector includes up to 11 loop channels. The reactor power is 100 MW. The average power density of the core is 0.4 MW/I (maximal value 1.0 MW/l). The maximum neutron flux density is 7.10 14 n/cm 2 s in the core (E>0.1 MeV), and 5.10 14 n/cm 2 s in the reflector (E<0.625 eV). In 1995 due to the lack of funding the MLR designing was suspended. (author)

  7. Characterization of Enzymatic Activity of MlrB and MlrC Proteins Involved in Bacterial Degradation of Cyanotoxins Microcystins.

    Science.gov (United States)

    Dziga, Dariusz; Zielinska, Gabriela; Wladyka, Benedykt; Bochenska, Oliwia; Maksylewicz, Anna; Strzalka, Wojciech; Meriluoto, Jussi

    2016-03-16

    Bacterial degradation of toxic microcystins produced by cyanobacteria is a common phenomenon. However, our understanding of the mechanisms of these processes is rudimentary. In this paper several novel discoveries regarding the action of the enzymes of the mlr cluster responsible for microcystin biodegradation are presented using recombinant proteins. In particular, the predicted active sites of the recombinant MlrB and MlrC were analyzed using functional enzymes and their inactive muteins. A new degradation intermediate, a hexapeptide derived from linearized microcystins by MlrC, was discovered. Furthermore, the involvement of MlrA and MlrB in further degradation of the hexapeptides was confirmed and a corrected biochemical pathway of microcystin biodegradation has been proposed.

  8. Characterization of Enzymatic Activity of MlrB and MlrC Proteins Involved in Bacterial Degradation of Cyanotoxins Microcystins

    Directory of Open Access Journals (Sweden)

    Dariusz Dziga

    2016-03-01

    Full Text Available Bacterial degradation of toxic microcystins produced by cyanobacteria is a common phenomenon. However, our understanding of the mechanisms of these processes is rudimentary. In this paper several novel discoveries regarding the action of the enzymes of the mlr cluster responsible for microcystin biodegradation are presented using recombinant proteins. In particular, the predicted active sites of the recombinant MlrB and MlrC were analyzed using functional enzymes and their inactive muteins. A new degradation intermediate, a hexapeptide derived from linearized microcystins by MlrC, was discovered. Furthermore, the involvement of MlrA and MlrB in further degradation of the hexapeptides was confirmed and a corrected biochemical pathway of microcystin biodegradation has been proposed.

  9. Physics constrained nonlinear regression models for time series

    International Nuclear Information System (INIS)

    Majda, Andrew J; Harlim, John

    2013-01-01

    A central issue in contemporary science is the development of data driven statistical nonlinear dynamical models for time series of partial observations of nature or a complex physical model. It has been established recently that ad hoc quadratic multi-level regression (MLR) models can have finite-time blow up of statistical solutions and/or pathological behaviour of their invariant measure. Here a new class of physics constrained multi-level quadratic regression models are introduced, analysed and applied to build reduced stochastic models from data of nonlinear systems. These models have the advantages of incorporating memory effects in time as well as the nonlinear noise from energy conserving nonlinear interactions. The mathematical guidelines for the performance and behaviour of these physics constrained MLR models as well as filtering algorithms for their implementation are developed here. Data driven applications of these new multi-level nonlinear regression models are developed for test models involving a nonlinear oscillator with memory effects and the difficult test case of the truncated Burgers–Hopf model. These new physics constrained quadratic MLR models are proposed here as process models for Bayesian estimation through Markov chain Monte Carlo algorithms of low frequency behaviour in complex physical data. (paper)

  10. Predicting blood β-hydroxybutyrate using milk Fourier transform infrared spectrum, milk composition, and producer-reported variables with multiple linear regression, partial least squares regression, and artificial neural network.

    Science.gov (United States)

    Pralle, R S; Weigel, K W; White, H M

    2018-05-01

    Prediction of postpartum hyperketonemia (HYK) using Fourier transform infrared (FTIR) spectrometry analysis could be a practical diagnostic option for farms because these data are now available from routine milk analysis during Dairy Herd Improvement testing. The objectives of this study were to (1) develop and evaluate blood β-hydroxybutyrate (BHB) prediction models using multivariate linear regression (MLR), partial least squares regression (PLS), and artificial neural network (ANN) methods and (2) evaluate whether milk FTIR spectrum (mFTIR)-based models are improved with the inclusion of test-day variables (mTest; milk composition and producer-reported data). Paired blood and milk samples were collected from multiparous cows 5 to 18 d postpartum at 3 Wisconsin farms (3,629 observations from 1,013 cows). Blood BHB concentration was determined by a Precision Xtra meter (Abbot Diabetes Care, Alameda, CA), and milk samples were analyzed by a privately owned laboratory (AgSource, Menomonie, WI) for components and FTIR spectrum absorbance. Producer-recorded variables were extracted from farm management software. A blood BHB ≥1.2 mmol/L was considered HYK. The data set was divided into a training set (n = 3,020) and an external testing set (n = 609). Model fitting was implemented with JMP 12 (SAS Institute, Cary, NC). A 5-fold cross-validation was performed on the training data set for the MLR, PLS, and ANN prediction methods, with square root of blood BHB as the dependent variable. Each method was fitted using 3 combinations of variables: mFTIR, mTest, or mTest + mFTIR variables. Models were evaluated based on coefficient of determination, root mean squared error, and area under the receiver operating characteristic curve. Four models (PLS-mTest + mFTIR, ANN-mFTIR, ANN-mTest, and ANN-mTest + mFTIR) were chosen for further evaluation in the testing set after fitting to the full training set. In the cross-validation analysis, model fit was greatest for ANN, followed

  11. Multiple linear regression and regression with time series error models in forecasting PM10 concentrations in Peninsular Malaysia.

    Science.gov (United States)

    Ng, Kar Yong; Awang, Norhashidah

    2018-01-06

    Frequent haze occurrences in Malaysia have made the management of PM 10 (particulate matter with aerodynamic less than 10 μm) pollution a critical task. This requires knowledge on factors associating with PM 10 variation and good forecast of PM 10 concentrations. Hence, this paper demonstrates the prediction of 1-day-ahead daily average PM 10 concentrations based on predictor variables including meteorological parameters and gaseous pollutants. Three different models were built. They were multiple linear regression (MLR) model with lagged predictor variables (MLR1), MLR model with lagged predictor variables and PM 10 concentrations (MLR2) and regression with time series error (RTSE) model. The findings revealed that humidity, temperature, wind speed, wind direction, carbon monoxide and ozone were the main factors explaining the PM 10 variation in Peninsular Malaysia. Comparison among the three models showed that MLR2 model was on a same level with RTSE model in terms of forecasting accuracy, while MLR1 model was the worst.

  12. Multiple Linear Regression: A Realistic Reflector.

    Science.gov (United States)

    Nutt, A. T.; Batsell, R. R.

    Examples of the use of Multiple Linear Regression (MLR) techniques are presented. This is done to show how MLR aids data processing and decision-making by providing the decision-maker with freedom in phrasing questions and by accurately reflecting the data on hand. A brief overview of the rationale underlying MLR is given, some basic definitions…

  13. Opensource Software for MLR-Modelling of Solar Collectors

    DEFF Research Database (Denmark)

    Bacher, Peder; Perers, Bengt

    2011-01-01

    A first research version is now in operation of a software package for multiple linear regression (MLR) modeling and analysis of solar collectors according to ideas originating all the way from Walletun et. al. (1986), Perers, (1987 and 1993). The tool has been implemented in the free and open...... source program R http://www.r-project.org/. Applications of the software package includes: visual validation, resampling and conversion of data, collector performance testing analysis according to the European Standard EN 12975 (Fischer et al., 2004), statistical validation of results...

  14. QSAR Modeling of COX -2 Inhibitory Activity of Some Dihydropyridine and Hydroquinoline Derivatives Using Multiple Linear Regression (MLR) Method.

    Science.gov (United States)

    Akbari, Somaye; Zebardast, Tannaz; Zarghi, Afshin; Hajimahdi, Zahra

    2017-01-01

    COX-2 inhibitory activities of some 1,4-dihydropyridine and 5-oxo-1,4,5,6,7,8-hexahydroquinoline derivatives were modeled by quantitative structure-activity relationship (QSAR) using stepwise-multiple linear regression (SW-MLR) method. The built model was robust and predictive with correlation coefficient (R 2 ) of 0.972 and 0.531 for training and test groups, respectively. The quality of the model was evaluated by leave-one-out (LOO) cross validation (LOO correlation coefficient (Q 2 ) of 0.943) and Y-randomization. We also employed a leverage approach for the defining of applicability domain of model. Based on QSAR models results, COX-2 inhibitory activity of selected data set had correlation with BEHm6 (highest eigenvalue n. 6 of Burden matrix/weighted by atomic masses), Mor03u (signal 03/unweighted) and IVDE (Mean information content on the vertex degree equality) descriptors which derived from their structures.

  15. QSAR study of HCV NS5B polymerase inhibitors using the genetic algorithm-multiple linear regression (GA-MLR).

    Science.gov (United States)

    Rafiei, Hamid; Khanzadeh, Marziyeh; Mozaffari, Shahla; Bostanifar, Mohammad Hassan; Avval, Zhila Mohajeri; Aalizadeh, Reza; Pourbasheer, Eslam

    2016-01-01

    Quantitative structure-activity relationship (QSAR) study has been employed for predicting the inhibitory activities of the Hepatitis C virus (HCV) NS5B polymerase inhibitors . A data set consisted of 72 compounds was selected, and then different types of molecular descriptors were calculated. The whole data set was split into a training set (80 % of the dataset) and a test set (20 % of the dataset) using principle component analysis. The stepwise (SW) and the genetic algorithm (GA) techniques were used as variable selection tools. Multiple linear regression method was then used to linearly correlate the selected descriptors with inhibitory activities. Several validation technique including leave-one-out and leave-group-out cross-validation, Y-randomization method were used to evaluate the internal capability of the derived models. The external prediction ability of the derived models was further analyzed using modified r(2), concordance correlation coefficient values and Golbraikh and Tropsha acceptable model criteria's. Based on the derived results (GA-MLR), some new insights toward molecular structural requirements for obtaining better inhibitory activity were obtained.

  16. Exploring QSARs of the interaction of flavonoids with GABA (A) receptor using MLR, ANN and SVM techniques.

    Science.gov (United States)

    Deeb, Omar; Shaik, Basheerulla; Agrawal, Vijay K

    2014-10-01

    Quantitative Structure-Activity Relationship (QSAR) models for binding affinity constants (log Ki) of 78 flavonoid ligands towards the benzodiazepine site of GABA (A) receptor complex were calculated using the machine learning methods: artificial neural network (ANN) and support vector machine (SVM) techniques. The models obtained were compared with those obtained using multiple linear regression (MLR) analysis. The descriptor selection and model building were performed with 10-fold cross-validation using the training data set. The SVM and MLR coefficient of determination values are 0.944 and 0.879, respectively, for the training set and are higher than those of ANN models. Though the SVM model shows improvement of training set fitting, the ANN model was superior to SVM and MLR in predicting the test set. Randomization test is employed to check the suitability of the models.

  17. Estimation of Anti-HIV Activity of HEPT Analogues Using MLR, ANN, and SVM Techniques

    Directory of Open Access Journals (Sweden)

    Basheerulla Shaik

    2013-01-01

    value than those of MLR and SVM techniques. Rm2= metrics and ridge regression analysis indicated that the proposed four-variable model MATS5e, RDF080u, T(O⋯O, and MATS5m as correlating descriptors is the best for estimating the anti-HIV activity (log 1/C present set of compounds.

  18. A comparison of random forest regression and multiple linear regression for prediction in neuroscience.

    Science.gov (United States)

    Smith, Paul F; Ganesh, Siva; Liu, Ping

    2013-10-30

    Regression is a common statistical tool for prediction in neuroscience. However, linear regression is by far the most common form of regression used, with regression trees receiving comparatively little attention. In this study, the results of conventional multiple linear regression (MLR) were compared with those of random forest regression (RFR), in the prediction of the concentrations of 9 neurochemicals in the vestibular nucleus complex and cerebellum that are part of the l-arginine biochemical pathway (agmatine, putrescine, spermidine, spermine, l-arginine, l-ornithine, l-citrulline, glutamate and γ-aminobutyric acid (GABA)). The R(2) values for the MLRs were higher than the proportion of variance explained values for the RFRs: 6/9 of them were ≥ 0.70 compared to 4/9 for RFRs. Even the variables that had the lowest R(2) values for the MLRs, e.g. ornithine (0.50) and glutamate (0.61), had much lower proportion of variance explained values for the RFRs (0.27 and 0.49, respectively). The RSE values for the MLRs were lower than those for the RFRs in all but two cases. In general, MLRs seemed to be superior to the RFRs in terms of predictive value and error. In the case of this data set, MLR appeared to be superior to RFR in terms of its explanatory value and error. This result suggests that MLR may have advantages over RFR for prediction in neuroscience with this kind of data set, but that RFR can still have good predictive value in some cases. Copyright © 2013 Elsevier B.V. All rights reserved.

  19. Regression and regression analysis time series prediction modeling on climate data of quetta, pakistan

    International Nuclear Information System (INIS)

    Jafri, Y.Z.; Kamal, L.

    2007-01-01

    Various statistical techniques was used on five-year data from 1998-2002 of average humidity, rainfall, maximum and minimum temperatures, respectively. The relationships to regression analysis time series (RATS) were developed for determining the overall trend of these climate parameters on the basis of which forecast models can be corrected and modified. We computed the coefficient of determination as a measure of goodness of fit, to our polynomial regression analysis time series (PRATS). The correlation to multiple linear regression (MLR) and multiple linear regression analysis time series (MLRATS) were also developed for deciphering the interdependence of weather parameters. Spearman's rand correlation and Goldfeld-Quandt test were used to check the uniformity or non-uniformity of variances in our fit to polynomial regression (PR). The Breusch-Pagan test was applied to MLR and MLRATS, respectively which yielded homoscedasticity. We also employed Bartlett's test for homogeneity of variances on a five-year data of rainfall and humidity, respectively which showed that the variances in rainfall data were not homogenous while in case of humidity, were homogenous. Our results on regression and regression analysis time series show the best fit to prediction modeling on climatic data of Quetta, Pakistan. (author)

  20. QSAR studies of the bioactivity of hepatitis C virus (HCV) NS3/4A protease inhibitors by multiple linear regression (MLR) and support vector machine (SVM).

    Science.gov (United States)

    Qin, Zijian; Wang, Maolin; Yan, Aixia

    2017-07-01

    In this study, quantitative structure-activity relationship (QSAR) models using various descriptor sets and training/test set selection methods were explored to predict the bioactivity of hepatitis C virus (HCV) NS3/4A protease inhibitors by using a multiple linear regression (MLR) and a support vector machine (SVM) method. 512 HCV NS3/4A protease inhibitors and their IC 50 values which were determined by the same FRET assay were collected from the reported literature to build a dataset. All the inhibitors were represented with selected nine global and 12 2D property-weighted autocorrelation descriptors calculated from the program CORINA Symphony. The dataset was divided into a training set and a test set by a random and a Kohonen's self-organizing map (SOM) method. The correlation coefficients (r 2 ) of training sets and test sets were 0.75 and 0.72 for the best MLR model, 0.87 and 0.85 for the best SVM model, respectively. In addition, a series of sub-dataset models were also developed. The performances of all the best sub-dataset models were better than those of the whole dataset models. We believe that the combination of the best sub- and whole dataset SVM models can be used as reliable lead designing tools for new NS3/4A protease inhibitors scaffolds in a drug discovery pipeline. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. Group-wise partial least square regression

    NARCIS (Netherlands)

    Camacho, José; Saccenti, Edoardo

    2018-01-01

    This paper introduces the group-wise partial least squares (GPLS) regression. GPLS is a new sparse PLS technique where the sparsity structure is defined in terms of groups of correlated variables, similarly to what is done in the related group-wise principal component analysis. These groups are

  2. Bias due to two-stage residual-outcome regression analysis in genetic association studies.

    Science.gov (United States)

    Demissie, Serkalem; Cupples, L Adrienne

    2011-11-01

    Association studies of risk factors and complex diseases require careful assessment of potential confounding factors. Two-stage regression analysis, sometimes referred to as residual- or adjusted-outcome analysis, has been increasingly used in association studies of single nucleotide polymorphisms (SNPs) and quantitative traits. In this analysis, first, a residual-outcome is calculated from a regression of the outcome variable on covariates and then the relationship between the adjusted-outcome and the SNP is evaluated by a simple linear regression of the adjusted-outcome on the SNP. In this article, we examine the performance of this two-stage analysis as compared with multiple linear regression (MLR) analysis. Our findings show that when a SNP and a covariate are correlated, the two-stage approach results in biased genotypic effect and loss of power. Bias is always toward the null and increases with the squared-correlation between the SNP and the covariate (). For example, for , 0.1, and 0.5, two-stage analysis results in, respectively, 0, 10, and 50% attenuation in the SNP effect. As expected, MLR was always unbiased. Since individual SNPs often show little or no correlation with covariates, a two-stage analysis is expected to perform as well as MLR in many genetic studies; however, it produces considerably different results from MLR and may lead to incorrect conclusions when independent variables are highly correlated. While a useful alternative to MLR under , the two -stage approach has serious limitations. Its use as a simple substitute for MLR should be avoided. © 2011 Wiley Periodicals, Inc.

  3. Heterologous expression of mlrA in a photoautotrophic host - Engineering cyanobacteria to degrade microcystins.

    Science.gov (United States)

    Dexter, Jason; Dziga, Dariusz; Lv, Jing; Zhu, Junqi; Strzalka, Wojciech; Maksylewicz, Anna; Maroszek, Magdalena; Marek, Sylwia; Fu, Pengcheng

    2018-06-01

    In this report, we establish proof-of-principle demonstrating for the first time genetic engineering of a photoautotrophic microorganism for bioremediation of naturally occurring cyanotoxins. In model cyanobacterium Synechocystis sp. PCC 6803 we have heterologously expressed Sphingopyxis sp. USTB-05 microcystinase (MlrA) bearing a 23 amino acid N-terminus secretion peptide from native Synechocystis sp. PCC 6803 PilA (sll1694). The resultant whole cell biocatalyst displayed about 3 times higher activity against microcystin-LR compared to a native MlrA host (Sphingomonas sp. ACM 3962), normalized for optical density. In addition, MlrA activity was found to be almost entirely located in the cyanobacterial cytosolic fraction, despite the presence of the secretion tag, with crude cellular extracts showing MlrA activity comparable to extracts from MlrA expressing E. coli. Furthermore, despite approximately 9.4-fold higher initial MlrA activity of a whole cell E. coli biocatalyst, utilization of a photoautotrophic chassis resulted in prolonged stability of MlrA activity when cultured under semi-natural conditions (using lake water), with the heterologous MlrA biocatalytic activity of the E. coli culture disappearing after 4 days, while the cyanobacterial host displayed activity (3% of initial activity) after 9 days. In addition, the cyanobacterial cell density was maintained over the duration of this experiment while the cell density of the E. coli culture rapidly declined. Lastly, failure to establish a stable cyanobacterial isolate expressing native MlrA (without the N-terminus tag) via the strong cpcB560 promoter draws attention to the use of peptide tags to positively modulate expression of potentially toxic proteins. Copyright © 2018 Elsevier Ltd. All rights reserved.

  4. and Multinomial Logistic Regression

    African Journals Online (AJOL)

    This work presented the results of an experimental comparison of two models: Multinomial Logistic Regression (MLR) and Artificial Neural Network (ANN) for classifying students based on their academic performance. The predictive accuracy for each model was measured by their average Classification Correct Rate (CCR).

  5. A Comparison of Regression Techniques for Estimation of Above-Ground Winter Wheat Biomass Using Near-Surface Spectroscopy

    Directory of Open Access Journals (Sweden)

    Jibo Yue

    2018-01-01

    Full Text Available Above-ground biomass (AGB provides a vital link between solar energy consumption and yield, so its correct estimation is crucial to accurately monitor crop growth and predict yield. In this work, we estimate AGB by using 54 vegetation indexes (e.g., Normalized Difference Vegetation Index, Soil-Adjusted Vegetation Index and eight statistical regression techniques: artificial neural network (ANN, multivariable linear regression (MLR, decision-tree regression (DT, boosted binary regression tree (BBRT, partial least squares regression (PLSR, random forest regression (RF, support vector machine regression (SVM, and principal component regression (PCR, which are used to analyze hyperspectral data acquired by using a field spectrophotometer. The vegetation indexes (VIs determined from the spectra were first used to train regression techniques for modeling and validation to select the best VI input, and then summed with white Gaussian noise to study how remote sensing errors affect the regression techniques. Next, the VIs were divided into groups of different sizes by using various sampling methods for modeling and validation to test the stability of the techniques. Finally, the AGB was estimated by using a leave-one-out cross validation with these powerful techniques. The results of the study demonstrate that, of the eight techniques investigated, PLSR and MLR perform best in terms of stability and are most suitable when high-accuracy and stable estimates are required from relatively few samples. In addition, RF is extremely robust against noise and is best suited to deal with repeated observations involving remote-sensing data (i.e., data affected by atmosphere, clouds, observation times, and/or sensor noise. Finally, the leave-one-out cross-validation method indicates that PLSR provides the highest accuracy (R2 = 0.89, RMSE = 1.20 t/ha, MAE = 0.90 t/ha, NRMSE = 0.07, CV (RMSE = 0.18; thus, PLSR is best suited for works requiring high

  6. Relationships between the structure of wheat gluten and ACE inhibitory activity of hydrolysate: stepwise multiple linear regression analysis.

    Science.gov (United States)

    Zhang, Yanyan; Ma, Haile; Wang, Bei; Qu, Wenjuan; Wali, Asif; Zhou, Cunshan

    2016-08-01

    Ultrasound pretreatment of wheat gluten (WG) before enzymolysis can improve the angiotensin converting enzyme (ACE) inhibitory activity of the hydrolysates by alerting the structure of substrate proteins. Establishment of a relationship between the structure of WG and ACE inhibitory activity of the hydrolysates to judge the end point of the ultrasonic pretreatment is vital. The results of stepwise multiple linear regression (MLR) showed that the contents of free sulfhydryl, α-helix, disulfide bond, surface hydrophobicity and random coil were significantly correlated to ACE Inhibitory activity of the hydrolysate, with the standard partial regression coefficients were 3.729, -0.676, -0.252, 0.022 and 0.156, respectively. The R(2) of this model was 0.970. External validation showed that the stepwise MLR model could well predict the ACE inhibitory activity of hydrolysate based on the content of free sulfhydryl, α-helix, disulfide bond, surface hydrophobicity and random coil of WG before hydrolysis. A stepwise multiple linear regression model describing the quantitative relationships between the structure of WG and the ACE Inhibitory activity of the hydrolysates was established. This model can be used to predict the endpoint of the ultrasonic pretreatment. © 2015 Society of Chemical Industry. © 2015 Society of Chemical Industry.

  7. Prediction of the GC-MS Retention Indices for a Diverse Set of Terpenes as Constituent Components of Camu-camu (Myrciaria dubia (HBK Mc Vaugh Volatile Oil, Using Particle Swarm Optimization-Multiple Linear Regression (PSO-MLR

    Directory of Open Access Journals (Sweden)

    Majid Mohammadhosseini

    2014-05-01

    Full Text Available A reliable quantitative structure retention relationship (QSRR study has been evaluated to predict the retention indices (RIs of a broad spectrum of compounds, namely 118 non-linear, cyclic and heterocyclic terpenoids (both saturated and unsaturated, on an HP-5MS fused silica column. A principal component analysis showed that seven compounds lay outside of the main cluster. After elimination of the outliers, the data set was divided into training and test sets involving 80 and 28 compounds. The method was tested by application of the particle swarm optimization (PSO method to find the most effective molecular descriptors, followed by multiple linear regressions (MLR. The PSO-MLR model was further confirmed through “leave one out cross validation” (LOO-CV and “leave group out cross validation” (LGO-CV, as well as external validations. The promising statistical figures of merit associated with the proposed model (R2train=0.936, Q2LOO=0.928, Q2LGO=0.921, F=376.4 confirm its high ability to predict RIs with negligible relative errors of predictions (REP train=4.8%, REP test=6.0%.

  8. A consensus successive projections algorithm--multiple linear regression method for analyzing near infrared spectra.

    Science.gov (United States)

    Liu, Ke; Chen, Xiaojing; Li, Limin; Chen, Huiling; Ruan, Xiukai; Liu, Wenbin

    2015-02-09

    The successive projections algorithm (SPA) is widely used to select variables for multiple linear regression (MLR) modeling. However, SPA used only once may not obtain all the useful information of the full spectra, because the number of selected variables cannot exceed the number of calibration samples in the SPA algorithm. Therefore, the SPA-MLR method risks the loss of useful information. To make a full use of the useful information in the spectra, a new method named "consensus SPA-MLR" (C-SPA-MLR) is proposed herein. This method is the combination of consensus strategy and SPA-MLR method. In the C-SPA-MLR method, SPA-MLR is used to construct member models with different subsets of variables, which are selected from the remaining variables iteratively. A consensus prediction is obtained by combining the predictions of the member models. The proposed method is evaluated by analyzing the near infrared (NIR) spectra of corn and diesel. The results of C-SPA-MLR method showed a better prediction performance compared with the SPA-MLR and full-spectra PLS methods. Moreover, these results could serve as a reference for combination the consensus strategy and other variable selection methods when analyzing NIR spectra and other spectroscopic techniques. Copyright © 2014 Elsevier B.V. All rights reserved.

  9. Wavelet regression model in forecasting crude oil price

    Science.gov (United States)

    Hamid, Mohd Helmie; Shabri, Ani

    2017-05-01

    This study presents the performance of wavelet multiple linear regression (WMLR) technique in daily crude oil forecasting. WMLR model was developed by integrating the discrete wavelet transform (DWT) and multiple linear regression (MLR) model. The original time series was decomposed to sub-time series with different scales by wavelet theory. Correlation analysis was conducted to assist in the selection of optimal decomposed components as inputs for the WMLR model. The daily WTI crude oil price series has been used in this study to test the prediction capability of the proposed model. The forecasting performance of WMLR model were also compared with regular multiple linear regression (MLR), Autoregressive Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) using root mean square errors (RMSE) and mean absolute errors (MAE). Based on the experimental results, it appears that the WMLR model performs better than the other forecasting technique tested in this study.

  10. Caudal Regression Syndrome with Partial Agenesis of the Corpus callosum and Partial Lobar Holoprosencephaly

    Science.gov (United States)

    Hashami, Hilal Al; Bataclan, Maria F; Mathew, Mariam; Krishnan, Lalitha

    2010-01-01

    Caudal regression syndrome is a rare fetal condition of diabetic pregnancy. Although the exact mechanism is not known, hyperglycaemia during embryogenesis seems to act as a teratogen. Independently, caudal regression syndrome (CRS), agenesis of the corpus callosum (ACC) and partial lobar holoprosencephaly (HPE) have been reported in infants of diabetic mothers. To our knowledge, a combination of all these three conditions has not been reported so far. PMID:21509087

  11. Deviance-Related Responses along the Auditory Hierarchy: Combined FFR, MLR and MMN Evidence

    Science.gov (United States)

    Shiga, Tetsuya; Althen, Heike; Cornella, Miriam; Zarnowiec, Katarzyna; Yabe, Hirooki; Escera, Carles

    2015-01-01

    The mismatch negativity (MMN) provides a correlate of automatic auditory discrimination in human auditory cortex that is elicited in response to violation of any acoustic regularity. Recently, deviance-related responses were found at much earlier cortical processing stages as reflected by the middle latency response (MLR) of the auditory evoked potential, and even at the level of the auditory brainstem as reflected by the frequency following response (FFR). However, no study has reported deviance-related responses in the FFR, MLR and long latency response (LLR) concurrently in a single recording protocol. Amplitude-modulated (AM) sounds were presented to healthy human participants in a frequency oddball paradigm to investigate deviance-related responses along the auditory hierarchy in the ranges of FFR, MLR and LLR. AM frequency deviants modulated the FFR, the Na and Nb components of the MLR, and the LLR eliciting the MMN. These findings demonstrate that it is possible to elicit deviance-related responses at three different levels (FFR, MLR and LLR) in one single recording protocol, highlight the involvement of the whole auditory hierarchy in deviance detection and have implications for cognitive and clinical auditory neuroscience. Moreover, the present protocol provides a new research tool into clinical neuroscience so that the functional integrity of the auditory novelty system can now be tested as a whole in a range of clinical populations where the MMN was previously shown to be defective. PMID:26348628

  12. 75 FR 82277 - Health Insurance Issuers Implementing Medical Loss Ratio (MLR) Requirements Under the Patient...

    Science.gov (United States)

    2010-12-30

    ...-AA06 Health Insurance Issuers Implementing Medical Loss Ratio (MLR) Requirements Under the Patient... Register (FR Doc 2010-29596 (75 FR 74864)) entitled ``Health Insurance Issuers Implementing Medical Loss... request for comments entitled ``Health Insurance Issuers Implementing Medical Loss Ratio (MLR...

  13. Reporting quality of multivariable logistic regression in selected Indian medical journals.

    Science.gov (United States)

    Kumar, R; Indrayan, A; Chhabra, P

    2012-01-01

    Use of multivariable logistic regression (MLR) modeling has steeply increased in the medical literature over the past few years. Testing of model assumptions and adequate reporting of MLR allow the reader to interpret results more accurately. To review the fulfillment of assumptions and reporting quality of MLR in selected Indian medical journals using established criteria. Analysis of published literature. Medknow.com publishes 68 Indian medical journals with open access. Eight of these journals had at least five articles using MLR between the years 1994 to 2008. Articles from each of these journals were evaluated according to the previously established 10-point quality criteria for reporting and to test the MLR model assumptions. SPSS 17 software and non-parametric test (Kruskal-Wallis H, Mann Whitney U, Spearman Correlation). One hundred and nine articles were finally found using MLR for analyzing the data in the selected eight journals. The number of such articles gradually increased after year 2003, but quality score remained almost similar over time. P value, odds ratio, and 95% confidence interval for coefficients in MLR was reported in 75.2% and sufficient cases (>10) per covariate of limiting sample size were reported in the 58.7% of the articles. No article reported the test for conformity of linear gradient for continuous covariates. Total score was not significantly different across the journals. However, involvement of statistician or epidemiologist as a co-author improved the average quality score significantly (P=0.014). Reporting of MLR in many Indian journals is incomplete. Only one article managed to score 8 out of 10 among 109 articles under review. All others scored less. Appropriate guidelines in instructions to authors, and pre-publication review of articles using MLR by a qualified statistician may improve quality of reporting.

  14. Modeling daily soil temperature over diverse climate conditions in Iran—a comparison of multiple linear regression and support vector regression techniques

    Science.gov (United States)

    Delbari, Masoomeh; Sharifazari, Salman; Mohammadi, Ehsan

    2018-02-01

    The knowledge of soil temperature at different depths is important for agricultural industry and for understanding climate change. The aim of this study is to evaluate the performance of a support vector regression (SVR)-based model in estimating daily soil temperature at 10, 30 and 100 cm depth at different climate conditions over Iran. The obtained results were compared to those obtained from a more classical multiple linear regression (MLR) model. The correlation sensitivity for the input combinations and periodicity effect were also investigated. Climatic data used as inputs to the models were minimum and maximum air temperature, solar radiation, relative humidity, dew point, and the atmospheric pressure (reduced to see level), collected from five synoptic stations Kerman, Ahvaz, Tabriz, Saghez, and Rasht located respectively in the hyper-arid, arid, semi-arid, Mediterranean, and hyper-humid climate conditions. According to the results, the performance of both MLR and SVR models was quite well at surface layer, i.e., 10-cm depth. However, SVR performed better than MLR in estimating soil temperature at deeper layers especially 100 cm depth. Moreover, both models performed better in humid climate condition than arid and hyper-arid areas. Further, adding a periodicity component into the modeling process considerably improved the models' performance especially in the case of SVR.

  15. Caudal Regression Syndrome with Partial Agenesis of the Corpus callosum and Partial Lobar Holoprosencephaly: Case report.

    Science.gov (United States)

    Hashami, Hilal Al; Bataclan, Maria F; Mathew, Mariam; Krishnan, Lalitha

    2010-04-01

    Caudal regression syndrome is a rare fetal condition of diabetic pregnancy. Although the exact mechanism is not known, hyperglycaemia during embryogenesis seems to act as a teratogen. Independently, caudal regression syndrome (CRS), agenesis of the corpus callosum (ACC) and partial lobar holoprosencephaly (HPE) have been reported in infants of diabetic mothers. To our knowledge, a combination of all these three conditions has not been reported so far.

  16. Application of Soft Computing Techniques and Multiple Regression Models for CBR prediction of Soils

    Directory of Open Access Journals (Sweden)

    Fatimah Khaleel Ibrahim

    2017-08-01

    Full Text Available The techniques of soft computing technique such as Artificial Neutral Network (ANN have improved the predicting capability and have actually discovered application in Geotechnical engineering. The aim of this research is to utilize the soft computing technique and Multiple Regression Models (MLR for forecasting the California bearing ratio CBR( of soil from its index properties. The indicator of CBR for soil could be predicted from various soils characterizing parameters with the assist of MLR and ANN methods. The data base that collected from the laboratory by conducting tests on 86 soil samples that gathered from different projects in Basrah districts. Data gained from the experimental result were used in the regression models and soft computing techniques by using artificial neural network. The liquid limit, plastic index , modified compaction test and the CBR test have been determined. In this work, different ANN and MLR models were formulated with the different collection of inputs to be able to recognize their significance in the prediction of CBR. The strengths of the models that were developed been examined in terms of regression coefficient (R2, relative error (RE% and mean square error (MSE values. From the results of this paper, it absolutely was noticed that all the proposed ANN models perform better than that of MLR model. In a specific ANN model with all input parameters reveals better outcomes than other ANN models.

  17. Application of genetic algorithm - multiple linear regressions to predict the activity of RSK inhibitors

    Directory of Open Access Journals (Sweden)

    Avval Zhila Mohajeri

    2015-01-01

    Full Text Available This paper deals with developing a linear quantitative structure-activity relationship (QSAR model for predicting the RSK inhibition activity of some new compounds. A dataset consisting of 62 pyrazino [1,2-α] indole, diazepino [1,2-α] indole, and imidazole derivatives with known inhibitory activities was used. Multiple linear regressions (MLR technique combined with the stepwise (SW and the genetic algorithm (GA methods as variable selection tools was employed. For more checking stability, robustness and predictability of the proposed models, internal and external validation techniques were used. Comparison of the results obtained, indicate that the GA-MLR model is superior to the SW-MLR model and that it isapplicable for designing novel RSK inhibitors.

  18. Generalised Partially Linear Regression with Misclassified Data and an Application to Labour Market Transitions

    DEFF Research Database (Denmark)

    Dlugosz, Stephan; Mammen, Enno; Wilke, Ralf

    We consider the semiparametric generalised linear regression model which has mainstream empirical models such as the (partially) linear mean regression, logistic and multinomial regression as special cases. As an extension to related literature we allow a misclassified covariate to be interacted...

  19. Source apportionment of soil heavy metals using robust absolute principal component scores-robust geographically weighted regression (RAPCS-RGWR) receptor model.

    Science.gov (United States)

    Qu, Mingkai; Wang, Yan; Huang, Biao; Zhao, Yongcun

    2018-06-01

    The traditional source apportionment models, such as absolute principal component scores-multiple linear regression (APCS-MLR), are usually susceptible to outliers, which may be widely present in the regional geochemical dataset. Furthermore, the models are merely built on variable space instead of geographical space and thus cannot effectively capture the local spatial characteristics of each source contributions. To overcome the limitations, a new receptor model, robust absolute principal component scores-robust geographically weighted regression (RAPCS-RGWR), was proposed based on the traditional APCS-MLR model. Then, the new method was applied to the source apportionment of soil metal elements in a region of Wuhan City, China as a case study. Evaluations revealed that: (i) RAPCS-RGWR model had better performance than APCS-MLR model in the identification of the major sources of soil metal elements, and (ii) source contributions estimated by RAPCS-RGWR model were more close to the true soil metal concentrations than that estimated by APCS-MLR model. It is shown that the proposed RAPCS-RGWR model is a more effective source apportionment method than APCS-MLR (i.e., non-robust and global model) in dealing with the regional geochemical dataset. Copyright © 2018 Elsevier B.V. All rights reserved.

  20. Reliable and accurate point-based prediction of cumulative infiltration using soil readily available characteristics: A comparison between GMDH, ANN, and MLR

    Science.gov (United States)

    Rahmati, Mehdi

    2017-08-01

    Developing accurate and reliable pedo-transfer functions (PTFs) to predict soil non-readily available characteristics is one of the most concerned topic in soil science and selecting more appropriate predictors is a crucial factor in PTFs' development. Group method of data handling (GMDH), which finds an approximate relationship between a set of input and output variables, not only provide an explicit procedure to select the most essential PTF input variables, but also results in more accurate and reliable estimates than other mostly applied methodologies. Therefore, the current research was aimed to apply GMDH in comparison with multivariate linear regression (MLR) and artificial neural network (ANN) to develop several PTFs to predict soil cumulative infiltration point-basely at specific time intervals (0.5-45 min) using soil readily available characteristics (RACs). In this regard, soil infiltration curves as well as several soil RACs including soil primary particles (clay (CC), silt (Si), and sand (Sa)), saturated hydraulic conductivity (Ks), bulk (Db) and particle (Dp) densities, organic carbon (OC), wet-aggregate stability (WAS), electrical conductivity (EC), and soil antecedent (θi) and field saturated (θfs) water contents were measured at 134 different points in Lighvan watershed, northwest of Iran. Then, applying GMDH, MLR, and ANN methodologies, several PTFs have been developed to predict cumulative infiltrations using two sets of selected soil RACs including and excluding Ks. According to the test data, results showed that developed PTFs by GMDH and MLR procedures using all soil RACs including Ks resulted in more accurate (with E values of 0.673-0.963) and reliable (with CV values lower than 11 percent) predictions of cumulative infiltrations at different specific time steps. In contrast, ANN procedure had lower accuracy (with E values of 0.356-0.890) and reliability (with CV values up to 50 percent) compared to GMDH and MLR. The results also revealed

  1. COMPARISON OF PARTIAL LEAST SQUARES REGRESSION METHOD ALGORITHMS: NIPALS AND PLS-KERNEL AND AN APPLICATION

    Directory of Open Access Journals (Sweden)

    ELİF BULUT

    2013-06-01

    Full Text Available Partial Least Squares Regression (PLSR is a multivariate statistical method that consists of partial least squares and multiple linear regression analysis. Explanatory variables, X, having multicollinearity are reduced to components which explain the great amount of covariance between explanatory and response variable. These components are few in number and they don’t have multicollinearity problem. Then multiple linear regression analysis is applied to those components to model the response variable Y. There are various PLSR algorithms. In this study NIPALS and PLS-Kernel algorithms will be studied and illustrated on a real data set.

  2. Predicting Fuel Ignition Quality Using 1H NMR Spectroscopy and Multiple Linear Regression

    KAUST Repository

    Abdul Jameel, Abdul Gani; Naser, Nimal; Emwas, Abdul-Hamid M.; Dooley, Stephen; Sarathy, Mani

    2016-01-01

    An improved model for the prediction of ignition quality of hydrocarbon fuels has been developed using 1H nuclear magnetic resonance (NMR) spectroscopy and multiple linear regression (MLR) modeling. Cetane number (CN) and derived cetane number (DCN

  3. SOME STATISTICAL ISSUES RELATED TO MULTIPLE LINEAR REGRESSION MODELING OF BEACH BACTERIA CONCENTRATIONS

    Science.gov (United States)

    As a fast and effective technique, the multiple linear regression (MLR) method has been widely used in modeling and prediction of beach bacteria concentrations. Among previous works on this subject, however, several issues were insufficiently or inconsistently addressed. Those is...

  4. Water quality assessment and apportionment of pollution sources using APCS-MLR and PMF receptor modeling techniques in three major rivers of South Florida.

    Science.gov (United States)

    Haji Gholizadeh, Mohammad; Melesse, Assefa M; Reddi, Lakshmi

    2016-10-01

    In this study, principal component analysis (PCA), factor analysis (FA), and the absolute principal component score-multiple linear regression (APCS-MLR) receptor modeling technique were used to assess the water quality and identify and quantify the potential pollution sources affecting the water quality of three major rivers of South Florida. For this purpose, 15years (2000-2014) dataset of 12 water quality variables covering 16 monitoring stations, and approximately 35,000 observations was used. The PCA/FA method identified five and four potential pollution sources in wet and dry seasons, respectively, and the effective mechanisms, rules and causes were explained. The APCS-MLR apportioned their contributions to each water quality variable. Results showed that the point source pollution discharges from anthropogenic factors due to the discharge of agriculture waste and domestic and industrial wastewater were the major sources of river water contamination. Also, the studied variables were categorized into three groups of nutrients (total kjeldahl nitrogen, total phosphorus, total phosphate, and ammonia-N), water murkiness conducive parameters (total suspended solids, turbidity, and chlorophyll-a), and salt ions (magnesium, chloride, and sodium), and average contributions of different potential pollution sources to these categories were considered separately. The data matrix was also subjected to PMF receptor model using the EPA PMF-5.0 program and the two-way model described was performed for the PMF analyses. Comparison of the obtained results of PMF and APCS-MLR models showed that there were some significant differences in estimated contribution for each potential pollution source, especially in the wet season. Eventually, it was concluded that the APCS-MLR receptor modeling approach appears to be more physically plausible for the current study. It is believed that the results of apportionment could be very useful to the local authorities for the control and

  5. ATLS Hypovolemic Shock Classification by Prediction of Blood Loss in Rats Using Regression Models.

    Science.gov (United States)

    Choi, Soo Beom; Choi, Joon Yul; Park, Jee Soo; Kim, Deok Won

    2016-07-01

    In our previous study, our input data set consisted of 78 rats, the blood loss in percent as a dependent variable, and 11 independent variables (heart rate, systolic blood pressure, diastolic blood pressure, mean arterial pressure, pulse pressure, respiration rate, temperature, perfusion index, lactate concentration, shock index, and new index (lactate concentration/perfusion)). The machine learning methods for multicategory classification were applied to a rat model in acute hemorrhage to predict the four Advanced Trauma Life Support (ATLS) hypovolemic shock classes for triage in our previous study. However, multicategory classification is much more difficult and complicated than binary classification. We introduce a simple approach for classifying ATLS hypovolaemic shock class by predicting blood loss in percent using support vector regression and multivariate linear regression (MLR). We also compared the performance of the classification models using absolute and relative vital signs. The accuracies of support vector regression and MLR models with relative values by predicting blood loss in percent were 88.5% and 84.6%, respectively. These were better than the best accuracy of 80.8% of the direct multicategory classification using the support vector machine one-versus-one model in our previous study for the same validation data set. Moreover, the simple MLR models with both absolute and relative values could provide possibility of the future clinical decision support system for ATLS classification. The perfusion index and new index were more appropriate with relative changes than absolute values.

  6. Journal of Chemical Sciences | Indian Academy of Sciences

    Indian Academy of Sciences (India)

    Decision tree, random forest, moving average analysis (MAA), multiple linear regression (MLR), partial least square regression (PLSR) and principal component regression (PCR) were used to develop models for prediction of CDK4 inhibitory activity. The statistical significance of models was assessed through specificity, ...

  7. Prediction of gas chromatography/electron capture detector retention times of chlorinated pesticides, herbicides, and organohalides by multivariate chemometrics methods

    International Nuclear Information System (INIS)

    Ghasemi, Jahanbakhsh; Asadpour, Saeid; Abdolmaleki, Azizeh

    2007-01-01

    A quantitative structure-retention relationship (QSRR) study, has been carried out on the gas chromatograph/electron capture detector (GC/ECD) system retention times (t R s) of 38 diverse chlorinated pesticides, herbicides, and organohalides by using molecular structural descriptors. Modeling of retention times of these compounds as a function of the theoretically derived descriptors was established by multiple linear regression (MLR) and partial least squares (PLS) regression. The stepwise regression using SPSS was used for the selection of the variables that resulted in the best-fitted models. Appropriate models with low standard errors and high correlation coefficients were obtained. Three types of molecular descriptors including electronic, steric and thermodynamic were used to develop a quantitative relationship between the retention times and structural properties. MLR and PLS analysis has been carried out to derive the best QSRR models. After variables selection, MLR and PLS methods used with leave-one-out cross validation for building the regression models. The predictive quality of the QSRR models were tested for an external prediction set of 12 compounds randomly chosen from 38 compounds. The PLS regression method was used to model the structure-retention relationships, more accurately. However, the results surprisingly showed more or less the same quality for MLR and PLS modeling according to squared regression coefficients R 2 which were 0.951 and 0.948 for MLR and PLS, respectively

  8. Prediction of size-fractionated airborne particle-bound metals using MLR, BP-ANN and SVM analyses.

    Science.gov (United States)

    Leng, Xiang'zi; Wang, Jinhua; Ji, Haibo; Wang, Qin'geng; Li, Huiming; Qian, Xin; Li, Fengying; Yang, Meng

    2017-08-01

    Size-fractionated heavy metal concentrations were observed in airborne particulate matter (PM) samples collected from 2014 to 2015 (spanning all four seasons) from suburban (Xianlin) and industrial (Pukou) areas in Nanjing, a megacity of southeast China. Rapid prediction models of size-fractionated metals were established based on multiple linear regression (MLR), back propagation artificial neural network (BP-ANN) and support vector machine (SVM) by using meteorological factors and PM concentrations as input parameters. About 38% and 77% of PM 2.5 concentrations in Xianlin and Pukou, respectively, were beyond the Chinese National Ambient Air Quality Standard limit of 75 μg/m 3 . Nearly all elements had higher concentrations in industrial areas, and in winter among the four seasons. Anthropogenic elements such as Pb, Zn, Cd and Cu showed larger percentages in the fine fraction (ø≤2.5 μm), whereas the crustal elements including Al, Ba, Fe, Ni, Sr and Ti showed larger percentages in the coarse fraction (ø > 2.5 μm). SVM showed a higher training correlation coefficient (R), and lower mean absolute error (MAE) as well as lower root mean square error (RMSE), than MLR and BP-ANN for most metals. All the three methods showed better prediction results for Ni, Al, V, Cd and As, whereas relatively poor for Cr and Fe. The daily airborne metal concentrations in 2015 were then predicted by the fully trained SVM models and the results showed the heaviest pollution of airborne heavy metals occurred in December and January, whereas the lightest pollution occurred in June and July. Copyright © 2017 Elsevier Ltd. All rights reserved.

  9. Crude Oil Price Forecasting Based on Hybridizing Wavelet Multiple Linear Regression Model, Particle Swarm Optimization Techniques, and Principal Component Analysis

    Science.gov (United States)

    Shabri, Ani; Samsudin, Ruhaidah

    2014-01-01

    Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series. PMID:24895666

  10. Crude oil price forecasting based on hybridizing wavelet multiple linear regression model, particle swarm optimization techniques, and principal component analysis.

    Science.gov (United States)

    Shabri, Ani; Samsudin, Ruhaidah

    2014-01-01

    Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series.

  11. INTRODUCTION TO A COMBINED MULTIPLE LINEAR REGRESSION AND ARMA MODELING APPROACH FOR BEACH BACTERIA PREDICTION

    Science.gov (United States)

    Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...

  12. Stx1 prophage excision in Escherichia coli strain PA20 confers strong curli and biofilm formation by restoring native mlrA

    Science.gov (United States)

    Prophage insertions in Escherichia coli O157:H7 mlrA contribute to the low expression of curli fimbriae and biofilm observed in many clinical isolates. Varying levels of CsgD-dependent curli/biofilm expression are restored to strains bearing prophage insertions in mlrA by mutation of regulatory gene...

  13. Comparison of a neural network with multiple linear regression for quantitative analysis in ICP-atomic emission spectroscopy

    International Nuclear Information System (INIS)

    Schierle, C.; Otto, M.

    1992-01-01

    A two layer perceptron with backpropagation of error is used for quantitative analysis in ICP-AES. The network was trained by emission spectra of two interfering lines of Cd and As and the concentrations of both elements were subsequently estimated from mixture spectra. The spectra of the Cd and As lines were also used to perform multiple linear regression (MLR) via the calculation of the pseudoinverse S + of the sensitivity matrix S. In the present paper it is shown that there exist close relations between the operation of the perceptron and the MLR procedure. These are most clearly apparent in the correlation between the weights of the backpropagation network and the elements of the pseudoinverse. Using MLR, the confidence intervals over the predictions are exploited to correct for the optical device of the wavelength shift. (orig.)

  14. Seasonal Variability of Aragonite Saturation State in the North Pacific Ocean Predicted by Multiple Linear Regression

    Science.gov (United States)

    Kim, T. W.; Park, G. H.

    2014-12-01

    Seasonal variation of aragonite saturation state (Ωarag) in the North Pacific Ocean (NPO) was investigated, using multiple linear regression (MLR) models produced from the PACIFICA (Pacific Ocean interior carbon) dataset. Data within depth ranges of 50-1200m were used to derive MLR models, and three parameters (potential temperature, nitrate, and apparent oxygen utilization (AOU)) were chosen as predictor variables because these parameters are associated with vertical mixing, DIC (dissolved inorganic carbon) removal and release which all affect Ωarag in water column directly or indirectly. The PACIFICA dataset was divided into 5° × 5° grids, and a MLR model was produced in each grid, giving total 145 independent MLR models over the NPO. Mean RMSE (root mean square error) and r2 (coefficient of determination) of all derived MLR models were approximately 0.09 and 0.96, respectively. Then the obtained MLR coefficients for each of predictor variables and an intercept were interpolated over the study area, thereby making possible to allocate MLR coefficients to data-sparse ocean regions. Predictability from the interpolated coefficients was evaluated using Hawaiian time-series data, and as a result mean residual between measured and predicted Ωarag values was approximately 0.08, which is less than the mean RMSE of our MLR models. The interpolated MLR coefficients were combined with seasonal climatology of World Ocean Atlas 2013 (1° × 1°) to produce seasonal Ωarag distributions over various depths. Large seasonal variability in Ωarag was manifested in the mid-latitude Western NPO (24-40°N, 130-180°E) and low-latitude Eastern NPO (0-12°N, 115-150°W). In the Western NPO, seasonal fluctuations of water column stratification appeared to be responsible for the seasonal variation in Ωarag (~ 0.5 at 50 m) because it closely followed temperature variations in a layer of 0-75 m. In contrast, remineralization of organic matter was the main cause for the seasonal

  15. Predictive modelling of chromium removal using multiple linear and nonlinear regression with special emphasis on operating parameters of bioelectrochemical reactor.

    Science.gov (United States)

    More, Anand Govind; Gupta, Sunil Kumar

    2018-03-24

    Bioelectrochemical system (BES) is a novel, self-sustaining metal removal technology functioning on the utilization of chemical energy of organic matter with the help of microorganisms. Experimental trials of two chambered BES reactor were conducted with varying substrate concentration using sodium acetate (500 mg/L to 2000 mg/L COD) and different initial chromium concentration (Cr i ) (10-100 mg/L) at different cathode pH (pH 1-7). In the current study mathematical models based on multiple linear regression (MLR) and non-linear regression (NLR) approach were developed using laboratory experimental data for determining chromium removal efficiency (CRE) in the cathode chamber of BES. Substrate concentration, rate of substrate consumption, Cr i , pH, temperature and hydraulic retention time (HRT) were the operating process parameters of the reactor considered for development of the proposed models. MLR showed a better correlation coefficient (0.972) as compared to NLR (0.952). Validation of the models using t-test analysis revealed unbiasedness of both the models, with t critical value (2.04) greater than t-calculated values for MLR (-0.708) and NLR (-0.86). The root-mean-square error (RMSE) for MLR and NLR were 5.06 % and 7.45 %, respectively. Comparison between both models suggested MLR to be best suited model for predicting the chromium removal behavior using the BES technology to specify a set of operating conditions for BES. Modelling the behavior of CRE will be helpful for scale up of BES technology at industrial level. Copyright © 2018 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.

  16. An improved partial least-squares regression method for Raman spectroscopy

    Science.gov (United States)

    Momenpour Tehran Monfared, Ali; Anis, Hanan

    2017-10-01

    It is known that the performance of partial least-squares (PLS) regression analysis can be improved using the backward variable selection method (BVSPLS). In this paper, we further improve the BVSPLS based on a novel selection mechanism. The proposed method is based on sorting the weighted regression coefficients, and then the importance of each variable of the sorted list is evaluated using root mean square errors of prediction (RMSEP) criterion in each iteration step. Our Improved BVSPLS (IBVSPLS) method has been applied to leukemia and heparin data sets and led to an improvement in limit of detection of Raman biosensing ranged from 10% to 43% compared to PLS. Our IBVSPLS was also compared to the jack-knifing (simpler) and Genetic Algorithm (more complex) methods. Our method was consistently better than the jack-knifing method and showed either a similar or a better performance compared to the genetic algorithm.

  17. Partial F-tests with multiply imputed data in the linear regression framework via coefficient of determination.

    Science.gov (United States)

    Chaurasia, Ashok; Harel, Ofer

    2015-02-10

    Tests for regression coefficients such as global, local, and partial F-tests are common in applied research. In the framework of multiple imputation, there are several papers addressing tests for regression coefficients. However, for simultaneous hypothesis testing, the existing methods are computationally intensive because they involve calculation with vectors and (inversion of) matrices. In this paper, we propose a simple method based on the scalar entity, coefficient of determination, to perform (global, local, and partial) F-tests with multiply imputed data. The proposed method is evaluated using simulated data and applied to suicide prevention data. Copyright © 2014 John Wiley & Sons, Ltd.

  18. Crude Oil Price Forecasting Based on Hybridizing Wavelet Multiple Linear Regression Model, Particle Swarm Optimization Techniques, and Principal Component Analysis

    Directory of Open Access Journals (Sweden)

    Ani Shabri

    2014-01-01

    Full Text Available Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI, has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series.

  19. An Application of Robust Method in Multiple Linear Regression Model toward Credit Card Debt

    Science.gov (United States)

    Amira Azmi, Nur; Saifullah Rusiman, Mohd; Khalid, Kamil; Roslan, Rozaini; Sufahani, Suliadi; Mohamad, Mahathir; Salleh, Rohayu Mohd; Hamzah, Nur Shamsidah Amir

    2018-04-01

    Credit card is a convenient alternative replaced cash or cheque, and it is essential component for electronic and internet commerce. In this study, the researchers attempt to determine the relationship and significance variables between credit card debt and demographic variables such as age, household income, education level, years with current employer, years at current address, debt to income ratio and other debt. The provided data covers 850 customers information. There are three methods that applied to the credit card debt data which are multiple linear regression (MLR) models, MLR models with least quartile difference (LQD) method and MLR models with mean absolute deviation method. After comparing among three methods, it is found that MLR model with LQD method became the best model with the lowest value of mean square error (MSE). According to the final model, it shows that the years with current employer, years at current address, household income in thousands and debt to income ratio are positively associated with the amount of credit debt. Meanwhile variables for age, level of education and other debt are negatively associated with amount of credit debt. This study may serve as a reference for the bank company by using robust methods, so that they could better understand their options and choice that is best aligned with their goals for inference regarding to the credit card debt.

  20. Computational Tools for Probing Interactions in Multiple Linear Regression, Multilevel Modeling, and Latent Curve Analysis

    Science.gov (United States)

    Preacher, Kristopher J.; Curran, Patrick J.; Bauer, Daniel J.

    2006-01-01

    Simple slopes, regions of significance, and confidence bands are commonly used to evaluate interactions in multiple linear regression (MLR) models, and the use of these techniques has recently been extended to multilevel or hierarchical linear modeling (HLM) and latent curve analysis (LCA). However, conducting these tests and plotting the…

  1. QSAR models for prediction study of HIV protease inhibitors using support vector machines, neural networks and multiple linear regression

    Directory of Open Access Journals (Sweden)

    Rachid Darnag

    2017-02-01

    Full Text Available Support vector machines (SVM represent one of the most promising Machine Learning (ML tools that can be applied to develop a predictive quantitative structure–activity relationship (QSAR models using molecular descriptors. Multiple linear regression (MLR and artificial neural networks (ANNs were also utilized to construct quantitative linear and non linear models to compare with the results obtained by SVM. The prediction results are in good agreement with the experimental value of HIV activity; also, the results reveal the superiority of the SVM over MLR and ANN model. The contribution of each descriptor to the structure–activity relationships was evaluated.

  2. QSAR Study of Insecticides of Phthalamide Derivatives Using Multiple Linear Regression and Artificial Neural Network Methods

    Directory of Open Access Journals (Sweden)

    Adi Syahputra

    2014-03-01

    Full Text Available Quantitative structure activity relationship (QSAR for 21 insecticides of phthalamides containing hydrazone (PCH was studied using multiple linear regression (MLR, principle component regression (PCR and artificial neural network (ANN. Five descriptors were included in the model for MLR and ANN analysis, and five latent variables obtained from principle component analysis (PCA were used in PCR analysis. Calculation of descriptors was performed using semi-empirical PM6 method. ANN analysis was found to be superior statistical technique compared to the other methods and gave a good correlation between descriptors and activity (r2 = 0.84. Based on the obtained model, we have successfully designed some new insecticides with higher predicted activity than those of previously synthesized compounds, e.g.2-(decalinecarbamoyl-5-chloro-N’-((5-methylthiophen-2-ylmethylene benzohydrazide, 2-(decalinecarbamoyl-5-chloro-N’-((thiophen-2-yl-methylene benzohydrazide and 2-(decaline carbamoyl-N’-(4-fluorobenzylidene-5-chlorobenzohydrazide with predicted log LC50 of 1.640, 1.672, and 1.769 respectively.

  3. Seasonal prediction of winter extreme precipitation over Canada by support vector regression

    Directory of Open Access Journals (Sweden)

    Z. Zeng

    2011-01-01

    Full Text Available For forecasting the maximum 5-day accumulated precipitation over the winter season at lead times of 3, 6, 9 and 12 months over Canada from 1950 to 2007, two nonlinear and two linear regression models were used, where the models were support vector regression (SVR (nonlinear and linear versions, nonlinear Bayesian neural network (BNN and multiple linear regression (MLR. The 118 stations were grouped into six geographic regions by K-means clustering. For each region, the leading principal components of the winter maximum 5-d accumulated precipitation anomalies were the predictands. Potential predictors included quasi-global sea surface temperature anomalies and 500 hPa geopotential height anomalies over the Northern Hemisphere, as well as six climate indices (the Niño-3.4 region sea surface temperature, the North Atlantic Oscillation, the Pacific-North American teleconnection, the Pacific Decadal Oscillation, the Scandinavia pattern, and the East Atlantic pattern. The results showed that in general the two robust SVR models tended to have better forecast skills than the two non-robust models (MLR and BNN, and the nonlinear SVR model tended to forecast slightly better than the linear SVR model. Among the six regions, the Prairies region displayed the highest forecast skills, and the Arctic region the second highest. The strongest nonlinearity was manifested over the Prairies and the weakest nonlinearity over the Arctic.

  4. Principal component regression for crop yield estimation

    CERN Document Server

    Suryanarayana, T M V

    2016-01-01

    This book highlights the estimation of crop yield in Central Gujarat, especially with regard to the development of Multiple Regression Models and Principal Component Regression (PCR) models using climatological parameters as independent variables and crop yield as a dependent variable. It subsequently compares the multiple linear regression (MLR) and PCR results, and discusses the significance of PCR for crop yield estimation. In this context, the book also covers Principal Component Analysis (PCA), a statistical procedure used to reduce a number of correlated variables into a smaller number of uncorrelated variables called principal components (PC). This book will be helpful to the students and researchers, starting their works on climate and agriculture, mainly focussing on estimation models. The flow of chapters takes the readers in a smooth path, in understanding climate and weather and impact of climate change, and gradually proceeds towards downscaling techniques and then finally towards development of ...

  5. Substituting random forest for multiple linear regression improves binding affinity prediction of scoring functions: Cyscore as a case study.

    Science.gov (United States)

    Li, Hongjian; Leung, Kwong-Sak; Wong, Man-Hon; Ballester, Pedro J

    2014-08-27

    State-of-the-art protein-ligand docking methods are generally limited by the traditionally low accuracy of their scoring functions, which are used to predict binding affinity and thus vital for discriminating between active and inactive compounds. Despite intensive research over the years, classical scoring functions have reached a plateau in their predictive performance. These assume a predetermined additive functional form for some sophisticated numerical features, and use standard multivariate linear regression (MLR) on experimental data to derive the coefficients. In this study we show that such a simple functional form is detrimental for the prediction performance of a scoring function, and replacing linear regression by machine learning techniques like random forest (RF) can improve prediction performance. We investigate the conditions of applying RF under various contexts and find that given sufficient training samples RF manages to comprehensively capture the non-linearity between structural features and measured binding affinities. Incorporating more structural features and training with more samples can both boost RF performance. In addition, we analyze the importance of structural features to binding affinity prediction using the RF variable importance tool. Lastly, we use Cyscore, a top performing empirical scoring function, as a baseline for comparison study. Machine-learning scoring functions are fundamentally different from classical scoring functions because the former circumvents the fixed functional form relating structural features with binding affinities. RF, but not MLR, can effectively exploit more structural features and more training samples, leading to higher prediction performance. The future availability of more X-ray crystal structures will further widen the performance gap between RF-based and MLR-based scoring functions. This further stresses the importance of substituting RF for MLR in scoring function development.

  6. A comparison of artificial neural networks with other statistical approaches for the prediction of true metabolizable energy of meat and bone meal.

    Science.gov (United States)

    Perai, A H; Nassiri Moghaddam, H; Asadpour, S; Bahrampour, J; Mansoori, Gh

    2010-07-01

    There has been a considerable and continuous interest to develop equations for rapid and accurate prediction of the ME of meat and bone meal. In this study, an artificial neural network (ANN), a partial least squares (PLS), and a multiple linear regression (MLR) statistical method were used to predict the TME(n) of meat and bone meal based on its CP, ether extract, and ash content. The accuracy of the models was calculated by R(2) value, MS error, mean absolute percentage error, mean absolute deviation, bias, and Theil's U. The predictive ability of an ANN was compared with a PLS and a MLR model using the same training data sets. The squared regression coefficients of prediction for the MLR, PLS, and ANN models were 0.38, 0.36, and 0.94, respectively. The results revealed that ANN produced more accurate predictions of TME(n) as compared with PLS and MLR methods. Based on the results of this study, ANN could be used as a promising approach for rapid prediction of nutritive value of meat and bone meal.

  7. Assessing the accuracy of ANFIS, EEMD-GRNN, PCR, and MLR models in predicting PM2.5

    Science.gov (United States)

    Ausati, Shadi; Amanollahi, Jamil

    2016-10-01

    Since Sanandaj is considered one of polluted cities of Iran, prediction of any type of pollution especially prediction of suspended particles of PM2.5, which are the cause of many diseases, could contribute to health of society by timely announcements and prior to increase of PM2.5. In order to predict PM2.5 concentration in the Sanandaj air the hybrid models consisting of an ensemble empirical mode decomposition and general regression neural network (EEMD-GRNN), Adaptive Neuro-Fuzzy Inference System (ANFIS), principal component regression (PCR), and linear model such as multiple liner regression (MLR) model were used. In these models the data of suspended particles of PM2.5 were the dependent variable and the data related to air quality including PM2.5, PM10, SO2, NO2, CO, O3 and meteorological data including average minimum temperature (Min T), average maximum temperature (Max T), average atmospheric pressure (AP), daily total precipitation (TP), daily relative humidity level of the air (RH) and daily wind speed (WS) for the year 2014 in Sanandaj were the independent variables. Among the used models, EEMD-GRNN model with values of R2 = 0.90, root mean square error (RMSE) = 4.9218 and mean absolute error (MAE) = 3.4644 in the training phase and with values of R2 = 0.79, RMSE = 5.0324 and MAE = 3.2565 in the testing phase, exhibited the best function in predicting this phenomenon. It can be concluded that hybrid models have accurate results to predict PM2.5 concentration compared with linear model.

  8. 75 FR 74863 - Health Insurance Issuers Implementing Medical Loss Ratio (MLR) Requirements Under the Patient...

    Science.gov (United States)

    2010-12-01

    ... Part III Department of Health and Human Services 45 CFR Part 158 Health Insurance Issuers... 0950-AA06 Health Insurance Issuers Implementing Medical Loss Ratio (MLR) Requirements Under the Patient... health insurance issuers under the Public Health Service Act, as added by the Patient Protection and...

  9. 2D Quantitative Structure-Property Relationship Study of Mycotoxins by Multiple Linear Regression and Support Vector Machine

    Directory of Open Access Journals (Sweden)

    Fereshteh Shiri

    2010-08-01

    Full Text Available In the present work, support vector machines (SVMs and multiple linear regression (MLR techniques were used for quantitative structure–property relationship (QSPR studies of retention time (tR in standardized liquid chromatography–UV–mass spectrometry of 67 mycotoxins (aflatoxins, trichothecenes, roquefortines and ochratoxins based on molecular descriptors calculated from the optimized 3D structures. By applying missing value, zero and multicollinearity tests with a cutoff value of 0.95, and genetic algorithm method of variable selection, the most relevant descriptors were selected to build QSPR models. MLRand SVMs methods were employed to build QSPR models. The robustness of the QSPR models was characterized by the statistical validation and applicability domain (AD. The prediction results from the MLR and SVM models are in good agreement with the experimental values. The correlation and predictability measure by r2 and q2 are 0.931 and 0.932, repectively, for SVM and 0.923 and 0.915, respectively, for MLR. The applicability domain of the model was investigated using William’s plot. The effects of different descriptors on the retention times are described.

  10. Brightness-normalized Partial Least Squares Regression for hyperspectral data

    International Nuclear Information System (INIS)

    Feilhauer, Hannes; Asner, Gregory P.; Martin, Roberta E.; Schmidtlein, Sebastian

    2010-01-01

    Developed in the field of chemometrics, Partial Least Squares Regression (PLSR) has become an established technique in vegetation remote sensing. PLSR was primarily designed for laboratory analysis of prepared material samples. Under field conditions in vegetation remote sensing, the performance of the technique may be negatively affected by differences in brightness due to amount and orientation of plant tissues in canopies or the observing conditions. To minimize these effects, we introduced brightness normalization to the PLSR approach and tested whether this modification improves the performance under changing canopy and observing conditions. This test was carried out using high-fidelity spectral data (400-2510 nm) to model observed leaf chemistry. The spectral data was combined with a canopy radiative transfer model to simulate effects of varying canopy structure and viewing geometry. Brightness normalization enhanced the performance of PLSR by dampening the effects of canopy shade, thus providing a significant improvement in predictions of leaf chemistry (up to 3.6% additional explained variance in validation) compared to conventional PLSR. Little improvement was made on effects due to variable leaf area index, while minor improvement (mostly not significant) was observed for effects of variable viewing geometry. In general, brightness normalization increased the stability of model fits and regression coefficients for all canopy scenarios. Brightness-normalized PLSR is thus a promising approach for application on airborne and space-based imaging spectrometer data.

  11. A general equation to obtain multiple cut-off scores on a test from multinomial logistic regression.

    Science.gov (United States)

    Bersabé, Rosa; Rivas, Teresa

    2010-05-01

    The authors derive a general equation to compute multiple cut-offs on a total test score in order to classify individuals into more than two ordinal categories. The equation is derived from the multinomial logistic regression (MLR) model, which is an extension of the binary logistic regression (BLR) model to accommodate polytomous outcome variables. From this analytical procedure, cut-off scores are established at the test score (the predictor variable) at which an individual is as likely to be in category j as in category j+1 of an ordinal outcome variable. The application of the complete procedure is illustrated by an example with data from an actual study on eating disorders. In this example, two cut-off scores on the Eating Attitudes Test (EAT-26) scores are obtained in order to classify individuals into three ordinal categories: asymptomatic, symptomatic and eating disorder. Diagnoses were made from the responses to a self-report (Q-EDD) that operationalises DSM-IV criteria for eating disorders. Alternatives to the MLR model to set multiple cut-off scores are discussed.

  12. Sulfur Speciation of Crude Oils by Partial Least Squares Regression Modeling of Their Infrared Spectra

    NARCIS (Netherlands)

    de Peinder, P.; Visser, T.; Wagemans, R.W.P.; Blomberg, J.; Chaabani, H.; Soulimani, F.; Weckhuysen, B.M.

    2013-01-01

    Research has been carried out to determine the feasibility of partial least-squares regression (PLS) modeling of infrared (IR) spectra of crude oils as a tool for fast sulfur speciation. The study is a continuation of a previously developed method to predict long and short residue properties of

  13. Bayesian quantile regression-based partially linear mixed-effects joint models for longitudinal data with multiple features.

    Science.gov (United States)

    Zhang, Hanze; Huang, Yangxin; Wang, Wei; Chen, Henian; Langland-Orban, Barbara

    2017-01-01

    In longitudinal AIDS studies, it is of interest to investigate the relationship between HIV viral load and CD4 cell counts, as well as the complicated time effect. Most of common models to analyze such complex longitudinal data are based on mean-regression, which fails to provide efficient estimates due to outliers and/or heavy tails. Quantile regression-based partially linear mixed-effects models, a special case of semiparametric models enjoying benefits of both parametric and nonparametric models, have the flexibility to monitor the viral dynamics nonparametrically and detect the varying CD4 effects parametrically at different quantiles of viral load. Meanwhile, it is critical to consider various data features of repeated measurements, including left-censoring due to a limit of detection, covariate measurement error, and asymmetric distribution. In this research, we first establish a Bayesian joint models that accounts for all these data features simultaneously in the framework of quantile regression-based partially linear mixed-effects models. The proposed models are applied to analyze the Multicenter AIDS Cohort Study (MACS) data. Simulation studies are also conducted to assess the performance of the proposed methods under different scenarios.

  14. Augmented chaos-multiple linear regression approach for prediction of wave parameters

    Directory of Open Access Journals (Sweden)

    M.A. Ghorbani

    2017-06-01

    The inter-comparisons demonstrated that the Chaos-MLR and pure MLR models yield almost the same accuracy in predicting the significant wave heights and the zero-up-crossing wave periods. Whereas, the augmented Chaos-MLR model is performed better results in term of the prediction accuracy vis-a-vis the previous prediction applications of the same case study.

  15. 77 FR 28788 - Health Insurance Issuers Implementing Medical Loss Ratio (MLR) Under the Patient Protection and...

    Science.gov (United States)

    2012-05-16

    ... DEPARTMENT OF HEALTH AND HUMAN SERVICES 45 CFR Part 158 [CMS-9998-IFC3] Health Insurance Issuers..., entitled ``Health Insurance Issuers Implementing Medical Loss Ratio (MLR) Requirements Under the Patient...) requirements for health insurance issuers under section 2718 of the Public Health Service Act, as added by the...

  16. Application of principal component regression and partial least squares regression in ultraviolet spectrum water quality detection

    Science.gov (United States)

    Li, Jiangtong; Luo, Yongdao; Dai, Honglin

    2018-01-01

    Water is the source of life and the essential foundation of all life. With the development of industrialization, the phenomenon of water pollution is becoming more and more frequent, which directly affects the survival and development of human. Water quality detection is one of the necessary measures to protect water resources. Ultraviolet (UV) spectral analysis is an important research method in the field of water quality detection, which partial least squares regression (PLSR) analysis method is becoming predominant technology, however, in some special cases, PLSR's analysis produce considerable errors. In order to solve this problem, the traditional principal component regression (PCR) analysis method was improved by using the principle of PLSR in this paper. The experimental results show that for some special experimental data set, improved PCR analysis method performance is better than PLSR. The PCR and PLSR is the focus of this paper. Firstly, the principal component analysis (PCA) is performed by MATLAB to reduce the dimensionality of the spectral data; on the basis of a large number of experiments, the optimized principal component is extracted by using the principle of PLSR, which carries most of the original data information. Secondly, the linear regression analysis of the principal component is carried out with statistic package for social science (SPSS), which the coefficients and relations of principal components can be obtained. Finally, calculating a same water spectral data set by PLSR and improved PCR, analyzing and comparing two results, improved PCR and PLSR is similar for most data, but improved PCR is better than PLSR for data near the detection limit. Both PLSR and improved PCR can be used in Ultraviolet spectral analysis of water, but for data near the detection limit, improved PCR's result better than PLSR.

  17. Taking into account latency, amplitude, and morphology: improved estimation of single-trial ERPs by wavelet filtering and multiple linear regression.

    Science.gov (United States)

    Hu, L; Liang, M; Mouraux, A; Wise, R G; Hu, Y; Iannetti, G D

    2011-12-01

    Across-trial averaging is a widely used approach to enhance the signal-to-noise ratio (SNR) of event-related potentials (ERPs). However, across-trial variability of ERP latency and amplitude may contain physiologically relevant information that is lost by across-trial averaging. Hence, we aimed to develop a novel method that uses 1) wavelet filtering (WF) to enhance the SNR of ERPs and 2) a multiple linear regression with a dispersion term (MLR(d)) that takes into account shape distortions to estimate the single-trial latency and amplitude of ERP peaks. Using simulated ERP data sets containing different levels of noise, we provide evidence that, compared with other approaches, the proposed WF+MLR(d) method yields the most accurate estimate of single-trial ERP features. When applied to a real laser-evoked potential data set, the WF+MLR(d) approach provides reliable estimation of single-trial latency, amplitude, and morphology of ERPs and thereby allows performing meaningful correlations at single-trial level. We obtained three main findings. First, WF significantly enhances the SNR of single-trial ERPs. Second, MLR(d) effectively captures and measures the variability in the morphology of single-trial ERPs, thus providing an accurate and unbiased estimate of their peak latency and amplitude. Third, intensity of pain perception significantly correlates with the single-trial estimates of N2 and P2 amplitude. These results indicate that WF+MLR(d) can be used to explore the dynamics between different ERP features, behavioral variables, and other neuroimaging measures of brain activity, thus providing new insights into the functional significance of the different brain processes underlying the brain responses to sensory stimuli.

  18. Hourly predictive Levenberg-Marquardt ANN and multi linear regression models for predicting of dew point temperature

    Science.gov (United States)

    Zounemat-Kermani, Mohammad

    2012-08-01

    In this study, the ability of two models of multi linear regression (MLR) and Levenberg-Marquardt (LM) feed-forward neural network was examined to estimate the hourly dew point temperature. Dew point temperature is the temperature at which water vapor in the air condenses into liquid. This temperature can be useful in estimating meteorological variables such as fog, rain, snow, dew, and evapotranspiration and in investigating agronomical issues as stomatal closure in plants. The availability of hourly records of climatic data (air temperature, relative humidity and pressure) which could be used to predict dew point temperature initiated the practice of modeling. Additionally, the wind vector (wind speed magnitude and direction) and conceptual input of weather condition were employed as other input variables. The three quantitative standard statistical performance evaluation measures, i.e. the root mean squared error, mean absolute error, and absolute logarithmic Nash-Sutcliffe efficiency coefficient ( {| {{{Log}}({{NS}})} |} ) were employed to evaluate the performances of the developed models. The results showed that applying wind vector and weather condition as input vectors along with meteorological variables could slightly increase the ANN and MLR predictive accuracy. The results also revealed that LM-NN was superior to MLR model and the best performance was obtained by considering all potential input variables in terms of different evaluation criteria.

  19. Multivariate linear regression of high-dimensional fMRI data with multiple target variables.

    Science.gov (United States)

    Valente, Giancarlo; Castellanos, Agustin Lage; Vanacore, Gianluca; Formisano, Elia

    2014-05-01

    Multivariate regression is increasingly used to study the relation between fMRI spatial activation patterns and experimental stimuli or behavioral ratings. With linear models, informative brain locations are identified by mapping the model coefficients. This is a central aspect in neuroimaging, as it provides the sought-after link between the activity of neuronal populations and subject's perception, cognition or behavior. Here, we show that mapping of informative brain locations using multivariate linear regression (MLR) may lead to incorrect conclusions and interpretations. MLR algorithms for high dimensional data are designed to deal with targets (stimuli or behavioral ratings, in fMRI) separately, and the predictive map of a model integrates information deriving from both neural activity patterns and experimental design. Not accounting explicitly for the presence of other targets whose associated activity spatially overlaps with the one of interest may lead to predictive maps of troublesome interpretation. We propose a new model that can correctly identify the spatial patterns associated with a target while achieving good generalization. For each target, the training is based on an augmented dataset, which includes all remaining targets. The estimation on such datasets produces both maps and interaction coefficients, which are then used to generalize. The proposed formulation is independent of the regression algorithm employed. We validate this model on simulated fMRI data and on a publicly available dataset. Results indicate that our method achieves high spatial sensitivity and good generalization and that it helps disentangle specific neural effects from interaction with predictive maps associated with other targets. Copyright © 2013 Wiley Periodicals, Inc.

  20. Genetic variability, partial regression, Co-heritability studies and their implication in selection of high yielding potato gen

    International Nuclear Information System (INIS)

    Iqbal, Z.M.; Khan, S.A.

    2003-01-01

    Partial regression coefficient, genotypic and phenotypic variabilities, heritability co-heritability and genetic advance were studied in 15 Potato varieties of exotic and local origin. Both genotypic and phenotypic coefficients of variations were high for scab and rhizoctonia incidence percentage. Significant partial regression coefficient for emergence percentage indicated its relative importance in tuber yield. High heritability (broadsense) estimates coupled with high genetic advance for plant height, number of stems per plant and scab percentage revealed substantial contribution of additive genetic variance in the expression of these traits. Hence, the selection based on these characters could play a significant role in their improvement the dominance and epistatic variance was more important for character expression of yield ha/sup -1/, emergence and rhizoctonia percentage. This phenomenon is mainly due to the accumulative effects of low heritability and low to moderate genetic advance. The high co-heritability coupled with negative genotypic and phenotypic covariance revealed that selection of varieties having low scab and rhizoctonia percentage resulted in more potato yield. (author)

  1. Comparison of Multiple Linear Regressions and Neural Networks based QSAR models for the design of new antitubercular compounds.

    Science.gov (United States)

    Ventura, Cristina; Latino, Diogo A R S; Martins, Filomena

    2013-01-01

    The performance of two QSAR methodologies, namely Multiple Linear Regressions (MLR) and Neural Networks (NN), towards the modeling and prediction of antitubercular activity was evaluated and compared. A data set of 173 potentially active compounds belonging to the hydrazide family and represented by 96 descriptors was analyzed. Models were built with Multiple Linear Regressions (MLR), single Feed-Forward Neural Networks (FFNNs), ensembles of FFNNs and Associative Neural Networks (AsNNs) using four different data sets and different types of descriptors. The predictive ability of the different techniques used were assessed and discussed on the basis of different validation criteria and results show in general a better performance of AsNNs in terms of learning ability and prediction of antitubercular behaviors when compared with all other methods. MLR have, however, the advantage of pinpointing the most relevant molecular characteristics responsible for the behavior of these compounds against Mycobacterium tuberculosis. The best results for the larger data set (94 compounds in training set and 18 in test set) were obtained with AsNNs using seven descriptors (R(2) of 0.874 and RMSE of 0.437 against R(2) of 0.845 and RMSE of 0.472 in MLRs, for test set). Counter-Propagation Neural Networks (CPNNs) were trained with the same data sets and descriptors. From the scrutiny of the weight levels in each CPNN and the information retrieved from MLRs, a rational design of potentially active compounds was attempted. Two new compounds were synthesized and tested against M. tuberculosis showing an activity close to that predicted by the majority of the models. Copyright © 2013 Elsevier Masson SAS. All rights reserved.

  2. Multiple Linear Regression and Artificial Neural Network to Predict Blood Glucose in Overweight Patients.

    Science.gov (United States)

    Wang, J; Wang, F; Liu, Y; Xu, J; Lin, H; Jia, B; Zuo, W; Jiang, Y; Hu, L; Lin, F

    2016-01-01

    Overweight individuals are at higher risk for developing type II diabetes than the general population. We conducted this study to analyze the correlation between blood glucose and biochemical parameters, and developed a blood glucose prediction model tailored to overweight patients. A total of 346 overweight Chinese people patients ages 18-81 years were involved in this study. Their levels of fasting glucose (fs-GLU), blood lipids, and hepatic and renal functions were measured and analyzed by multiple linear regression (MLR). Based the MLR results, we developed a back propagation artificial neural network (BP-ANN) model by selecting tansig as the transfer function of the hidden layers nodes, and purelin for the output layer nodes, with training goal of 0.5×10(-5). There was significant correlation between fs-GLU with age, BMI, and blood biochemical indexes (P<0.05). The results of MLR analysis indicated that age, fasting alanine transaminase (fs-ALT), blood urea nitrogen (fs-BUN), total protein (fs-TP), uric acid (fs-BUN), and BMI are 6 independent variables related to fs-GLU. Based on these parameters, the BP-ANN model was performed well and reached high prediction accuracy when training 1 000 epoch (R=0.9987). The level of fs-GLU was predictable using the proposed BP-ANN model based on 6 related parameters (age, fs-ALT, fs-BUN, fs-TP, fs-UA and BMI) in overweight patients. © Georg Thieme Verlag KG Stuttgart · New York.

  3. Measuring decision weights in recognition experiments with multiple response alternatives: comparing the correlation and multinomial-logistic-regression methods.

    Science.gov (United States)

    Dai, Huanping; Micheyl, Christophe

    2012-11-01

    Psychophysical "reverse-correlation" methods allow researchers to gain insight into the perceptual representations and decision weighting strategies of individual subjects in perceptual tasks. Although these methods have gained momentum, until recently their development was limited to experiments involving only two response categories. Recently, two approaches for estimating decision weights in m-alternative experiments have been put forward. One approach extends the two-category correlation method to m > 2 alternatives; the second uses multinomial logistic regression (MLR). In this article, the relative merits of the two methods are discussed, and the issues of convergence and statistical efficiency of the methods are evaluated quantitatively using Monte Carlo simulations. The results indicate that, for a range of values of the number of trials, the estimated weighting patterns are closer to their asymptotic values for the correlation method than for the MLR method. Moreover, for the MLR method, weight estimates for different stimulus components can exhibit strong correlations, making the analysis and interpretation of measured weighting patterns less straightforward than for the correlation method. These and other advantages of the correlation method, which include computational simplicity and a close relationship to other well-established psychophysical reverse-correlation methods, make it an attractive tool to uncover decision strategies in m-alternative experiments.

  4. [Partial regression of Barret esophagus with high grade dysplasia and adenocarcinoma after photocoagulation and endocurietherapy under antisecretory treatment].

    Science.gov (United States)

    Fremond, L; Bouché, O; Diébold, M D; Demange, L; Zeitoun, P; Thiefin, G

    1995-01-01

    Barrett's oesophagus is a premalignant condition. The possibility of eradicating at least partially the metaplastic epithelium has been reported recently. In this case report, a patient with Barrett's oesophagus complicated by high grade dysplasia and focal adenocarcinoma was treated by Nd:Yag laser then high dose rate intraluminal irradiation while on omeprazole 40 mg/day. A partial eradication of Barrett's oesophagus and a transient tumoural regression were obtained. Histologically, residual specialized-type glandular tissue was observed beneath regenerative squamous epithelium. Four months after intraluminal irradiation, a local tumoural recurrence was detected while the area of restored squamous epithelium was unchanged on omeprazole 40 mg/day. This indicates that physical destruction of Barrett's oesophagus associated with potent antisecretory treatment can induce a regression of the metaplastic epithelium, even in presence of high grade dysplasia. The persistence of specialized-type glands beneath the squamous epithelium raises important issues about its potential malignant degeneration.

  5. The use of artificial neural networks and multiple linear regression to predict rate of medical waste generation

    International Nuclear Information System (INIS)

    Jahandideh, Sepideh; Jahandideh, Samad; Asadabadi, Ebrahim Barzegari; Askarian, Mehrdad; Movahedi, Mohammad Mehdi; Hosseini, Somayyeh; Jahandideh, Mina

    2009-01-01

    Prediction of the amount of hospital waste production will be helpful in the storage, transportation and disposal of hospital waste management. Based on this fact, two predictor models including artificial neural networks (ANNs) and multiple linear regression (MLR) were applied to predict the rate of medical waste generation totally and in different types of sharp, infectious and general. In this study, a 5-fold cross-validation procedure on a database containing total of 50 hospitals of Fars province (Iran) were used to verify the performance of the models. Three performance measures including MAR, RMSE and R 2 were used to evaluate performance of models. The MLR as a conventional model obtained poor prediction performance measure values. However, MLR distinguished hospital capacity and bed occupancy as more significant parameters. On the other hand, ANNs as a more powerful model, which has not been introduced in predicting rate of medical waste generation, showed high performance measure values, especially 0.99 value of R 2 confirming the good fit of the data. Such satisfactory results could be attributed to the non-linear nature of ANNs in problem solving which provides the opportunity for relating independent variables to dependent ones non-linearly. In conclusion, the obtained results showed that our ANN-based model approach is very promising and may play a useful role in developing a better cost-effective strategy for waste management in future.

  6. Variable selection in near-infrared spectroscopy: Benchmarking of feature selection methods on biodiesel data

    International Nuclear Information System (INIS)

    Balabin, Roman M.; Smirnov, Sergey V.

    2011-01-01

    During the past several years, near-infrared (near-IR/NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields from petroleum to biomedical sectors. The NIR spectrum (above 4000 cm -1 ) of a sample is typically measured by modern instruments at a few hundred of wavelengths. Recently, considerable effort has been directed towards developing procedures to identify variables (wavelengths) that contribute useful information. Variable selection (VS) or feature selection, also called frequency selection or wavelength selection, is a critical step in data analysis for vibrational spectroscopy (infrared, Raman, or NIRS). In this paper, we compare the performance of 16 different feature selection methods for the prediction of properties of biodiesel fuel, including density, viscosity, methanol content, and water concentration. The feature selection algorithms tested include stepwise multiple linear regression (MLR-step), interval partial least squares regression (iPLS), backward iPLS (BiPLS), forward iPLS (FiPLS), moving window partial least squares regression (MWPLS), (modified) changeable size moving window partial least squares (CSMWPLS/MCSMWPLSR), searching combination moving window partial least squares (SCMWPLS), successive projections algorithm (SPA), uninformative variable elimination (UVE, including UVE-SPA), simulated annealing (SA), back-propagation artificial neural networks (BP-ANN), Kohonen artificial neural network (K-ANN), and genetic algorithms (GAs, including GA-iPLS). Two linear techniques for calibration model building, namely multiple linear regression (MLR) and partial least squares regression/projection to latent structures (PLS/PLSR), are used for the evaluation of biofuel properties. A comparison with a non-linear calibration model, artificial neural networks (ANN-MLP), is also provided. Discussion of gasoline, ethanol-gasoline (bioethanol), and diesel fuel data is presented. The results of other spectroscopic

  7. Kinetic microplate bioassays for relative potency of antibiotics improved by partial Least Square (PLS) regression.

    Science.gov (United States)

    Francisco, Fabiane Lacerda; Saviano, Alessandro Morais; Almeida, Túlia de Souza Botelho; Lourenço, Felipe Rebello

    2016-05-01

    Microbiological assays are widely used to estimate the relative potencies of antibiotics in order to guarantee the efficacy, safety, and quality of drug products. Despite of the advantages of turbidimetric bioassays when compared to other methods, it has limitations concerning the linearity and range of the dose-response curve determination. Here, we proposed to use partial least squares (PLS) regression to solve these limitations and to improve the prediction of relative potencies of antibiotics. Kinetic-reading microplate turbidimetric bioassays for apramacyin and vancomycin were performed using Escherichia coli (ATCC 8739) and Bacillus subtilis (ATCC 6633), respectively. Microbial growths were measured as absorbance up to 180 and 300min for apramycin and vancomycin turbidimetric bioassays, respectively. Conventional dose-response curves (absorbances or area under the microbial growth curve vs. log of antibiotic concentration) showed significant regression, however there were significant deviation of linearity. Thus, they could not be used for relative potency estimations. PLS regression allowed us to construct a predictive model for estimating the relative potencies of apramycin and vancomycin without over-fitting and it improved the linear range of turbidimetric bioassay. In addition, PLS regression provided predictions of relative potencies equivalent to those obtained from agar diffusion official methods. Therefore, we conclude that PLS regression may be used to estimate the relative potencies of antibiotics with significant advantages when compared to conventional dose-response curve determination. Copyright © 2016 Elsevier B.V. All rights reserved.

  8. Determination of carbohydrates present in Saccharomyces cerevisiae using mid-infrared spectroscopy and partial least squares regression

    OpenAIRE

    Plata, Maria R.; Koch, Cosima; Wechselberger, Patrick; Herwig, Christoph; Lendl, Bernhard

    2013-01-01

    A fast and simple method to control variations in carbohydrate composition of Saccharomyces cerevisiae, baker's yeast, during fermentation was developed using mid-infrared (mid-IR) spectroscopy. The method allows for precise and accurate determinations with minimal or no sample preparation and reagent consumption based on mid-IR spectra and partial least squares (PLS) regression. The PLS models were developed employing the results from reference analysis of the yeast cells. The reference anal...

  9. Application of least squares support vector regression and linear multiple regression for modeling removal of methyl orange onto tin oxide nanoparticles loaded on activated carbon and activated carbon prepared from Pistacia atlantica wood.

    Science.gov (United States)

    Ghaedi, M; Rahimi, Mahmoud Reza; Ghaedi, A M; Tyagi, Inderjeet; Agarwal, Shilpi; Gupta, Vinod Kumar

    2016-01-01

    Two novel and eco friendly adsorbents namely tin oxide nanoparticles loaded on activated carbon (SnO2-NP-AC) and activated carbon prepared from wood tree Pistacia atlantica (AC-PAW) were used for the rapid removal and fast adsorption of methyl orange (MO) from the aqueous phase. The dependency of MO removal with various adsorption influential parameters was well modeled and optimized using multiple linear regressions (MLR) and least squares support vector regression (LSSVR). The optimal parameters for the LSSVR model were found based on γ value of 0.76 and σ(2) of 0.15. For testing the data set, the mean square error (MSE) values of 0.0010 and the coefficient of determination (R(2)) values of 0.976 were obtained for LSSVR model, and the MSE value of 0.0037 and the R(2) value of 0.897 were obtained for the MLR model. The adsorption equilibrium and kinetic data was found to be well fitted and in good agreement with Langmuir isotherm model and second-order equation and intra-particle diffusion models respectively. The small amount of the proposed SnO2-NP-AC and AC-PAW (0.015 g and 0.08 g) is applicable for successful rapid removal of methyl orange (>95%). The maximum adsorption capacity for SnO2-NP-AC and AC-PAW was 250 mg g(-1) and 125 mg g(-1) respectively. Copyright © 2015 Elsevier Inc. All rights reserved.

  10. Multiple linear regression models for predicting chronic aluminum toxicity to freshwater aquatic organisms and developing water quality guidelines.

    Science.gov (United States)

    DeForest, David K; Brix, Kevin V; Tear, Lucinda M; Adams, William J

    2018-01-01

    The bioavailability of aluminum (Al) to freshwater aquatic organisms varies as a function of several water chemistry parameters, including pH, dissolved organic carbon (DOC), and water hardness. We evaluated the ability of multiple linear regression (MLR) models to predict chronic Al toxicity to a green alga (Pseudokirchneriella subcapitata), a cladoceran (Ceriodaphnia dubia), and a fish (Pimephales promelas) as a function of varying DOC, pH, and hardness conditions. The MLR models predicted toxicity values that were within a factor of 2 of observed values in 100% of the cases for P. subcapitata (10 and 20% effective concentrations [EC10s and EC20s]), 91% of the cases for C. dubia (EC10s and EC20s), and 95% (EC10s) and 91% (EC20s) of the cases for P. promelas. The MLR models were then applied to all species with Al toxicity data to derive species and genus sensitivity distributions that could be adjusted as a function of varying DOC, pH, and hardness conditions (the P. subcapitata model was applied to algae and macrophytes, the C. dubia model was applied to invertebrates, and the P. promelas model was applied to fish). Hazardous concentrations to 5% of the species or genera were then derived in 2 ways: 1) fitting a log-normal distribution to species-mean EC10s for all species (following the European Union methodology), and 2) fitting a triangular distribution to genus-mean EC20s for animals only (following the US Environmental Protection Agency methodology). Overall, MLR-based models provide a viable approach for deriving Al water quality guidelines that vary as a function of DOC, pH, and hardness conditions and are a significant improvement over bioavailability corrections based on single parameters. Environ Toxicol Chem 2018;37:80-90. © 2017 SETAC. © 2017 SETAC.

  11. Extracting information from two-dimensional electrophoresis gels by partial least squares regression

    DEFF Research Database (Denmark)

    Jessen, Flemming; Lametsch, R.; Bendixen, E.

    2002-01-01

    of all proteins/spots in the gels. In the present study it is demonstrated how information can be extracted by multivariate data analysis. The strategy is based on partial least squares regression followed by variable selection to find proteins that individually or in combination with other proteins vary......Two-dimensional gel electrophoresis (2-DE) produces large amounts of data and extraction of relevant information from these data demands a cautious and time consuming process of spot pattern matching between gels. The classical approach of data analysis is to detect protein markers that appear...... or disappear depending on the experimental conditions. Such biomarkers are found by comparing the relative volumes of individual spots in the individual gels. Multivariate statistical analysis and modelling of 2-DE data for comparison and classification is an alternative approach utilising the combination...

  12. The Multivariate Regression Statistics Strategy to Investigate Content-Effect Correlation of Multiple Components in Traditional Chinese Medicine Based on a Partial Least Squares Method.

    Science.gov (United States)

    Peng, Ying; Li, Su-Ning; Pei, Xuexue; Hao, Kun

    2018-03-01

    Amultivariate regression statisticstrategy was developed to clarify multi-components content-effect correlation ofpanaxginseng saponins extract and predict the pharmacological effect by components content. In example 1, firstly, we compared pharmacological effects between panax ginseng saponins extract and individual saponin combinations. Secondly, we examined the anti-platelet aggregation effect in seven different saponin combinations of ginsenoside Rb1, Rg1, Rh, Rd, Ra3 and notoginsenoside R1. Finally, the correlation between anti-platelet aggregation and the content of multiple components was analyzed by a partial least squares algorithm. In example 2, firstly, 18 common peaks were identified in ten different batches of panax ginseng saponins extracts from different origins. Then, we investigated the anti-myocardial ischemia reperfusion injury effects of the ten different panax ginseng saponins extracts. Finally, the correlation between the fingerprints and the cardioprotective effects was analyzed by a partial least squares algorithm. Both in example 1 and 2, the relationship between the components content and pharmacological effect was modeled well by the partial least squares regression equations. Importantly, the predicted effect curve was close to the observed data of dot marked on the partial least squares regression model. This study has given evidences that themulti-component content is a promising information for predicting the pharmacological effects of traditional Chinese medicine.

  13. Prediction of Caffeine Content in Java Preanger Coffee Beans by NIR Spectroscopy Using PLS and MLR Method

    Science.gov (United States)

    Budiastra, I. W.; Sutrisno; Widyotomo, S.; Ayu, P. C.

    2018-05-01

    Caffeine is one of important components in coffee that contributes to the coffee beverages flavor. Caffeine concentration in coffee bean is usually determined by chemical method which is time consuming and destructive method. A nondestructive method using NIR spectroscopy was successfully applied to determine the caffeine concentration of Arabica gayo coffee bean. In this study, NIR Spectroscopy was assessed to determine the caffeine concentration of java preanger coffee bean. A hundred samples, each consist of 96 g coffee beans were prepared for reflectance and chemical measurement. Reflectance of the sample was measured by FT-NIR spectrometer in the wavelength of 1000-2500 nm (10000-4000 cm-1) followed by determination of caffeine content using LCMS method. Calibration of NIR spectra and the caffeine content was carried out using PLS and MLR methods. Several spectra data processing was conducted to increase the accuracy of prediction. The result of the study showed that caffeine content could be determined by PLS model using 7 factors and spectra data processing of combination of the first derivative and MSC of spectra absorbance (r = 0.946; CV = 1.54 %; RPD = 2.28). A lower accuracy was obtained by MLR model consisted of three caffeine and other four absorption wavelengths (r = 0.683; CV = 3.31%; RPD = 1.18).

  14. Predicting Fuel Ignition Quality Using 1H NMR Spectroscopy and Multiple Linear Regression

    KAUST Repository

    Abdul Jameel, Abdul Gani

    2016-09-14

    An improved model for the prediction of ignition quality of hydrocarbon fuels has been developed using 1H nuclear magnetic resonance (NMR) spectroscopy and multiple linear regression (MLR) modeling. Cetane number (CN) and derived cetane number (DCN) of 71 pure hydrocarbons and 54 hydrocarbon blends were utilized as a data set to study the relationship between ignition quality and molecular structure. CN and DCN are functional equivalents and collectively referred to as D/CN, herein. The effect of molecular weight and weight percent of structural parameters such as paraffinic CH3 groups, paraffinic CH2 groups, paraffinic CH groups, olefinic CH–CH2 groups, naphthenic CH–CH2 groups, and aromatic C–CH groups on D/CN was studied. A particular emphasis on the effect of branching (i.e., methyl substitution) on the D/CN was studied, and a new parameter denoted as the branching index (BI) was introduced to quantify this effect. A new formula was developed to calculate the BI of hydrocarbon fuels using 1H NMR spectroscopy. Multiple linear regression (MLR) modeling was used to develop an empirical relationship between D/CN and the eight structural parameters. This was then used to predict the DCN of many hydrocarbon fuels. The developed model has a high correlation coefficient (R2 = 0.97) and was validated with experimentally measured DCN of twenty-two real fuel mixtures (e.g., gasolines and diesels) and fifty-nine blends of known composition, and the predicted values matched well with the experimental data.

  15. Multiclass Prediction with Partial Least Square Regression for Gene Expression Data: Applications in Breast Cancer Intrinsic Taxonomy

    Directory of Open Access Journals (Sweden)

    Chi-Cheng Huang

    2013-01-01

    Full Text Available Multiclass prediction remains an obstacle for high-throughput data analysis such as microarray gene expression profiles. Despite recent advancements in machine learning and bioinformatics, most classification tools were limited to the applications of binary responses. Our aim was to apply partial least square (PLS regression for breast cancer intrinsic taxonomy, of which five distinct molecular subtypes were identified. The PAM50 signature genes were used as predictive variables in PLS analysis, and the latent gene component scores were used in binary logistic regression for each molecular subtype. The 139 prototypical arrays for PAM50 development were used as training dataset, and three independent microarray studies with Han Chinese origin were used for independent validation (n=535. The agreement between PAM50 centroid-based single sample prediction (SSP and PLS-regression was excellent (weighted Kappa: 0.988 within the training samples, but deteriorated substantially in independent samples, which could attribute to much more unclassified samples by PLS-regression. If these unclassified samples were removed, the agreement between PAM50 SSP and PLS-regression improved enormously (weighted Kappa: 0.829 as opposed to 0.541 when unclassified samples were analyzed. Our study ascertained the feasibility of PLS-regression in multi-class prediction, and distinct clinical presentations and prognostic discrepancies were observed across breast cancer molecular subtypes.

  16. Prediction of beef marblingusing Hyperspectral Imaging (HSI and Partial Least Squares Regression (PLSR

    Directory of Open Access Journals (Sweden)

    Victor Aredo

    2017-01-01

    Full Text Available The aim of this study was to build a model to predict the beef marbling using HSI and Partial Least Squares Regression (PLSR. Totally 58 samples of longissmus dorsi muscle were scanned by a HSI system (400 - 1000 nm in reflectance mode, using 44 samples to build t he PLSR model and 14 samples to model validation. The Japanese Beef Marbling Standard (BMS was used as reference by 15 middle - trained judges for the samples evaluation. The scores were assigned as continuous values and varied from 1.2 to 5.3 BMS. The PLSR model showed a high correlation coefficient in the prediction (r = 0.95, a low Standard Error of Calibration (SEC of 0.2 BMS score, and a low Standard Error of Prediction (SEP of 0.3 BMS score.

  17. The Multivariate Regression Statistics Strategy to Investigate Content-Effect Correlation of Multiple Components in Traditional Chinese Medicine Based on a Partial Least Squares Method

    Directory of Open Access Journals (Sweden)

    Ying Peng

    2018-03-01

    Full Text Available Amultivariate regression statisticstrategy was developed to clarify multi-components content-effect correlation ofpanaxginseng saponins extract and predict the pharmacological effect by components content. In example 1, firstly, we compared pharmacological effects between panax ginseng saponins extract and individual saponin combinations. Secondly, we examined the anti-platelet aggregation effect in seven different saponin combinations of ginsenoside Rb1, Rg1, Rh, Rd, Ra3 and notoginsenoside R1. Finally, the correlation between anti-platelet aggregation and the content of multiple components was analyzed by a partial least squares algorithm. In example 2, firstly, 18 common peaks were identified in ten different batches of panax ginseng saponins extracts from different origins. Then, we investigated the anti-myocardial ischemia reperfusion injury effects of the ten different panax ginseng saponins extracts. Finally, the correlation between the fingerprints and the cardioprotective effects was analyzed by a partial least squares algorithm. Both in example 1 and 2, the relationship between the components content and pharmacological effect was modeled well by the partial least squares regression equations. Importantly, the predicted effect curve was close to the observed data of dot marked on the partial least squares regression model. This study has given evidences that themulti-component content is a promising information for predicting the pharmacological effects of traditional Chinese medicine.

  18. Comparison of multiple linear regression, partial least squares and artificial neural networks for prediction of gas chromatographic relative retention times of trimethylsilylated anabolic androgenic steroids.

    Science.gov (United States)

    Fragkaki, A G; Farmaki, E; Thomaidis, N; Tsantili-Kakoulidou, A; Angelis, Y S; Koupparis, M; Georgakopoulos, C

    2012-09-21

    The comparison among different modelling techniques, such as multiple linear regression, partial least squares and artificial neural networks, has been performed in order to construct and evaluate models for prediction of gas chromatographic relative retention times of trimethylsilylated anabolic androgenic steroids. The performance of the quantitative structure-retention relationship study, using the multiple linear regression and partial least squares techniques, has been previously conducted. In the present study, artificial neural networks models were constructed and used for the prediction of relative retention times of anabolic androgenic steroids, while their efficiency is compared with that of the models derived from the multiple linear regression and partial least squares techniques. For overall ranking of the models, a novel procedure [Trends Anal. Chem. 29 (2010) 101-109] based on sum of ranking differences was applied, which permits the best model to be selected. The suggested models are considered useful for the estimation of relative retention times of designer steroids for which no analytical data are available. Copyright © 2012 Elsevier B.V. All rights reserved.

  19. Multinomial Logistic Regression & Bootstrapping for Bayesian Estimation of Vertical Facies Prediction in Heterogeneous Sandstone Reservoirs

    Science.gov (United States)

    Al-Mudhafar, W. J.

    2013-12-01

    Precisely prediction of rock facies leads to adequate reservoir characterization by improving the porosity-permeability relationships to estimate the properties in non-cored intervals. It also helps to accurately identify the spatial facies distribution to perform an accurate reservoir model for optimal future reservoir performance. In this paper, the facies estimation has been done through Multinomial logistic regression (MLR) with respect to the well logs and core data in a well in upper sandstone formation of South Rumaila oil field. The entire independent variables are gamma rays, formation density, water saturation, shale volume, log porosity, core porosity, and core permeability. Firstly, Robust Sequential Imputation Algorithm has been considered to impute the missing data. This algorithm starts from a complete subset of the dataset and estimates sequentially the missing values in an incomplete observation by minimizing the determinant of the covariance of the augmented data matrix. Then, the observation is added to the complete data matrix and the algorithm continues with the next observation with missing values. The MLR has been chosen to estimate the maximum likelihood and minimize the standard error for the nonlinear relationships between facies & core and log data. The MLR is used to predict the probabilities of the different possible facies given each independent variable by constructing a linear predictor function having a set of weights that are linearly combined with the independent variables by using a dot product. Beta distribution of facies has been considered as prior knowledge and the resulted predicted probability (posterior) has been estimated from MLR based on Baye's theorem that represents the relationship between predicted probability (posterior) with the conditional probability and the prior knowledge. To assess the statistical accuracy of the model, the bootstrap should be carried out to estimate extra-sample prediction error by randomly

  20. On the Relationship Between Confidence Sets and Exchangeable Weights in Multiple Linear Regression.

    Science.gov (United States)

    Pek, Jolynn; Chalmers, R Philip; Monette, Georges

    2016-01-01

    When statistical models are employed to provide a parsimonious description of empirical relationships, the extent to which strong conclusions can be drawn rests on quantifying the uncertainty in parameter estimates. In multiple linear regression (MLR), regression weights carry two kinds of uncertainty represented by confidence sets (CSs) and exchangeable weights (EWs). Confidence sets quantify uncertainty in estimation whereas the set of EWs quantify uncertainty in the substantive interpretation of regression weights. As CSs and EWs share certain commonalities, we clarify the relationship between these two kinds of uncertainty about regression weights. We introduce a general framework describing how CSs and the set of EWs for regression weights are estimated from the likelihood-based and Wald-type approach, and establish the analytical relationship between CSs and sets of EWs. With empirical examples on posttraumatic growth of caregivers (Cadell et al., 2014; Schneider, Steele, Cadell & Hemsworth, 2011) and on graduate grade point average (Kuncel, Hezlett & Ones, 2001), we illustrate the usefulness of CSs and EWs for drawing strong scientific conclusions. We discuss the importance of considering both CSs and EWs as part of the scientific process, and provide an Online Appendix with R code for estimating Wald-type CSs and EWs for k regression weights.

  1. A graphical method to evaluate spectral preprocessing in multivariate regression calibrations: example with Savitzky-Golay filters and partial least squares regression.

    Science.gov (United States)

    Delwiche, Stephen R; Reeves, James B

    2010-01-01

    In multivariate regression analysis of spectroscopy data, spectral preprocessing is often performed to reduce unwanted background information (offsets, sloped baselines) or accentuate absorption features in intrinsically overlapping bands. These procedures, also known as pretreatments, are commonly smoothing operations or derivatives. While such operations are often useful in reducing the number of latent variables of the actual decomposition and lowering residual error, they also run the risk of misleading the practitioner into accepting calibration equations that are poorly adapted to samples outside of the calibration. The current study developed a graphical method to examine this effect on partial least squares (PLS) regression calibrations of near-infrared (NIR) reflection spectra of ground wheat meal with two analytes, protein content and sodium dodecyl sulfate sedimentation (SDS) volume (an indicator of the quantity of the gluten proteins that contribute to strong doughs). These two properties were chosen because of their differing abilities to be modeled by NIR spectroscopy: excellent for protein content, fair for SDS sedimentation volume. To further demonstrate the potential pitfalls of preprocessing, an artificial component, a randomly generated value, was included in PLS regression trials. Savitzky-Golay (digital filter) smoothing, first-derivative, and second-derivative preprocess functions (5 to 25 centrally symmetric convolution points, derived from quadratic polynomials) were applied to PLS calibrations of 1 to 15 factors. The results demonstrated the danger of an over reliance on preprocessing when (1) the number of samples used in a multivariate calibration is low (<50), (2) the spectral response of the analyte is weak, and (3) the goodness of the calibration is based on the coefficient of determination (R(2)) rather than a term based on residual error. The graphical method has application to the evaluation of other preprocess functions and various

  2. Combining Off-the-Job Productivity Regression Model with EPA’s NONROAD Model in Estimating CO2 Emissions from Bulldozer

    Directory of Open Access Journals (Sweden)

    Apif M. Hajji

    2017-09-01

    Full Text Available Heavy duty diesel (HDD construction equipment which includes bulldozer is important in infrastructure development. This equipment consumes large amount of diesel fuel and emits high level of carbon dioxide (CO2. The total emissions are dependent upon the fuel use, and the fuel use is dependent upon the productivity of the equipment. This paper proposes a methodology and tool for estimating CO2 emissions from bulldozer based on the productivity rate. The methodology is formulated by using the result of multiple linear regressions (MLR of CAT’s data for obtaining the productivity model and combined with the EPA’s NONROAD model. The emission factors from NONROAD model were used to quantify the CO2 emissions. To display the function of the model, a case study and sensitivity analysis for a bulldozer’s activity is also presented. MLR results indicate that the productivity model generated from CAT’s data can be used as the basis for quantifying the total CO2 emissions for an earthwork activity.

  3. A non-parametric test for partial monotonicity in multiple regression

    NARCIS (Netherlands)

    van Beek, M.; Daniëls, H.A.M.

    Partial positive (negative) monotonicity in a dataset is the property that an increase in an independent variable, ceteris paribus, generates an increase (decrease) in the dependent variable. A test for partial monotonicity in datasets could (1) increase model performance if monotonicity may be

  4. Environmetrics. Part 1. Modeling of water salinity and air quality data

    International Nuclear Information System (INIS)

    Braibanti, A.; Gollapalli, N. R.; Jonnalagaddaj, S. B.; Duvvuru, S.; Rupenaguntla, S. R.

    2001-01-01

    Environmetrics utilities advanced mathematical, statistical and information tools to extract information. Two typical environmental data sets are analysed using MVATOB (Multi Variate Tool Box). The first data set corresponds to the variable river salinity. Least median squares (LMS) detected the outliers whereas linear least squares (LLS) could not detect and remove the outliers. The second data set consists of daily readings of air quality values. Outliers are detected by LMS and unbiased regression coefficients are estimated by multi-linear regression (MLR). As explanatory variables are not independent, principal component regression (PCR) and partial least squares regression (PLSR) are used. Both examples demonstrate the superiority of LMS over LLS [it

  5. Generalized Partially Linear Regression with Misclassified Data and an Application to Labour Market Transitions

    DEFF Research Database (Denmark)

    Dlugosz, Stephan; Mammen, Enno; Wilke, Ralf

    2017-01-01

    Large data sets that originate from administrative or operational activity are increasingly used for statistical analysis as they often contain very precise information and a large number of observations. But there is evidence that some variables can be subject to severe misclassification...... or contain missing values. Given the size of the data, a flexible semiparametric misclassification model would be good choice but their use in practise is scarce. To close this gap a semiparametric model for the probability of observing labour market transitions is estimated using a sample of 20 m...... observations from Germany. It is shown that estimated marginal effects of a number of covariates are sizeably affected by misclassification and missing values in the analysis data. The proposed generalized partially linear regression extends existing models by allowing a misclassified discrete covariate...

  6. Automatic Craniomaxillofacial Landmark Digitization via Segmentation-guided Partially-joint Regression Forest Model and Multi-scale Statistical Features

    Science.gov (United States)

    Zhang, Jun; Gao, Yaozong; Wang, Li; Tang, Zhen; Xia, James J.; Shen, Dinggang

    2016-01-01

    Objective The goal of this paper is to automatically digitize craniomaxillofacial (CMF) landmarks efficiently and accurately from cone-beam computed tomography (CBCT) images, by addressing the challenge caused by large morphological variations across patients and image artifacts of CBCT images. Methods We propose a Segmentation-guided Partially-joint Regression Forest (S-PRF) model to automatically digitize CMF landmarks. In this model, a regression voting strategy is first adopted to localize each landmark by aggregating evidences from context locations, thus potentially relieving the problem caused by image artifacts near the landmark. Second, CBCT image segmentation is utilized to remove uninformative voxels caused by morphological variations across patients. Third, a partially-joint model is further proposed to separately localize landmarks based on the coherence of landmark positions to improve the digitization reliability. In addition, we propose a fast vector quantization (VQ) method to extract high-level multi-scale statistical features to describe a voxel's appearance, which has low dimensionality, high efficiency, and is also invariant to the local inhomogeneity caused by artifacts. Results Mean digitization errors for 15 landmarks, in comparison to the ground truth, are all less than 2mm. Conclusion Our model has addressed challenges of both inter-patient morphological variations and imaging artifacts. Experiments on a CBCT dataset show that our approach achieves clinically acceptable accuracy for landmark digitalization. Significance Our automatic landmark digitization method can be used clinically to reduce the labor cost and also improve digitalization consistency. PMID:26625402

  7. Multiple linear regression and artificial neural networks for delta-endotoxin and protease yields modelling of Bacillus thuringiensis.

    Science.gov (United States)

    Ennouri, Karim; Ben Ayed, Rayda; Triki, Mohamed Ali; Ottaviani, Ennio; Mazzarello, Maura; Hertelli, Fathi; Zouari, Nabil

    2017-07-01

    The aim of the present work was to develop a model that supplies accurate predictions of the yields of delta-endotoxins and proteases produced by B. thuringiensis var. kurstaki HD-1. Using available medium ingredients as variables, a mathematical method, based on Plackett-Burman design (PB), was employed to analyze and compare data generated by the Bootstrap method and processed by multiple linear regressions (MLR) and artificial neural networks (ANN) including multilayer perceptron (MLP) and radial basis function (RBF) models. The predictive ability of these models was evaluated by comparison of output data through the determination of coefficient (R 2 ) and mean square error (MSE) values. The results demonstrate that the prediction of the yields of delta-endotoxin and protease was more accurate by ANN technique (87 and 89% for delta-endotoxin and protease determination coefficients, respectively) when compared with MLR method (73.1 and 77.2% for delta-endotoxin and protease determination coefficients, respectively), suggesting that the proposed ANNs, especially MLP, is a suitable new approach for determining yields of bacterial products that allow us to make more appropriate predictions in a shorter time and with less engineering effort.

  8. Reconstruction of Local Sea Levels at South West Pacific Islands—A Multiple Linear Regression Approach (1988-2014)

    Science.gov (United States)

    Kumar, V.; Melet, A.; Meyssignac, B.; Ganachaud, A.; Kessler, W. S.; Singh, A.; Aucan, J.

    2018-02-01

    Rising sea levels are a critical concern in small island nations. The problem is especially serious in the western south Pacific, where the total sea level rise over the last 60 years has been up to 3 times the global average. In this study, we aim at reconstructing sea levels at selected sites in the region (Suva, Lautoka—Fiji, and Nouméa—New Caledonia) as a multilinear regression (MLR) of atmospheric and oceanic variables. We focus on sea level variability at interannual-to-interdecadal time scales, and trend over the 1988-2014 period. Local sea levels are first expressed as a sum of steric and mass changes. Then a dynamical approach is used based on wind stress curl as a proxy for the thermosteric component, as wind stress curl anomalies can modulate the thermocline depth and resultant sea levels via Rossby wave propagation. Statistically significant predictors among wind stress curl, halosteric sea level, zonal/meridional wind stress components, and sea surface temperature are used to construct a MLR model simulating local sea levels. Although we are focusing on the local scale, the global mean sea level needs to be adjusted for. Our reconstructions provide insights on key drivers of sea level variability at the selected sites, showing that while local dynamics and the global signal modulate sea level to a given extent, most of the variance is driven by regional factors. On average, the MLR model is able to reproduce 82% of the variance in island sea level, and could be used to derive local sea level projections via downscaling of climate models.

  9. Quantile Regression Methods

    DEFF Research Database (Denmark)

    Fitzenberger, Bernd; Wilke, Ralf Andreas

    2015-01-01

    if the mean regression model does not. We provide a short informal introduction into the principle of quantile regression which includes an illustrative application from empirical labor market research. This is followed by briefly sketching the underlying statistical model for linear quantile regression based......Quantile regression is emerging as a popular statistical approach, which complements the estimation of conditional mean models. While the latter only focuses on one aspect of the conditional distribution of the dependent variable, the mean, quantile regression provides more detailed insights...... by modeling conditional quantiles. Quantile regression can therefore detect whether the partial effect of a regressor on the conditional quantiles is the same for all quantiles or differs across quantiles. Quantile regression can provide evidence for a statistical relationship between two variables even...

  10. ToxiM: A Toxicity Prediction Tool for Small Molecules Developed Using Machine Learning and Chemoinformatics Approaches

    Directory of Open Access Journals (Sweden)

    Ashok K. Sharma

    2017-11-01

    Full Text Available The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93% and Matthews's correlation coefficient (0.84. The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84–0.87 on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better (R2 = 0.84 than the multi-linear regression (MLR and partial least square regression (PLSR models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2 performed better (R2 = 0.68 in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity

  11. Generalized regression neural network (GRNN)-based approach for colored dissolved organic matter (CDOM) retrieval: case study of Connecticut River at Middle Haddam Station, USA.

    Science.gov (United States)

    Heddam, Salim

    2014-11-01

    The prediction of colored dissolved organic matter (CDOM) using artificial neural network approaches has received little attention in the past few decades. In this study, colored dissolved organic matter (CDOM) was modeled using generalized regression neural network (GRNN) and multiple linear regression (MLR) models as a function of Water temperature (TE), pH, specific conductance (SC), and turbidity (TU). Evaluation of the prediction accuracy of the models is based on the root mean square error (RMSE), mean absolute error (MAE), coefficient of correlation (CC), and Willmott's index of agreement (d). The results indicated that GRNN can be applied successfully for prediction of colored dissolved organic matter (CDOM).

  12. Use of Multiple Linear Regression Models for Setting Water Quality Criteria for Copper: A Complementary Approach to the Biotic Ligand Model.

    Science.gov (United States)

    Brix, Kevin V; DeForest, David K; Tear, Lucinda; Grosell, Martin; Adams, William J

    2017-05-02

    Biotic Ligand Models (BLMs) for metals are widely applied in ecological risk assessments and in the development of regulatory water quality guidelines in Europe, and in 2007 the United States Environmental Protection Agency (USEPA) recommended BLM-based water quality criteria (WQC) for Cu in freshwater. However, to-date, few states have adopted BLM-based Cu criteria into their water quality standards on a state-wide basis, which appears to be due to the perception that the BLM is too complicated or requires too many input variables. Using the mechanistic BLM framework to first identify key water chemistry parameters that influence Cu bioavailability, namely dissolved organic carbon (DOC), pH, and hardness, we developed Cu criteria using the same basic methodology used by the USEPA to derive hardness-based criteria but with the addition of DOC and pH. As an initial proof of concept, we developed stepwise multiple linear regression (MLR) models for species that have been tested over wide ranges of DOC, pH, and hardness conditions. These models predicted acute Cu toxicity values that were within a factor of ±2 in 77% to 97% of tests (5 species had adequate data) and chronic Cu toxicity values that were within a factor of ±2 in 92% of tests (1 species had adequate data). This level of accuracy is comparable to the BLM. Following USEPA guidelines for WQC development, the species data were then combined to develop a linear model with pooled slopes for each independent parameter (i.e., DOC, pH, and hardness) and species-specific intercepts using Analysis of Covariance. The pooled MLR and BLM models predicted species-specific toxicity with similar precision; adjusted R 2 and R 2 values ranged from 0.56 to 0.86 and 0.66-0.85, respectively. Graphical exploration of relationships between predicted and observed toxicity, residuals and observed toxicity, and residuals and concentrations of key input parameters revealed many similarities and a few key distinctions between the

  13. A modification of the successive projections algorithm for spectral variable selection in the presence of unknown interferents.

    Science.gov (United States)

    Soares, Sófacles Figueredo Carreiro; Galvão, Roberto Kawakami Harrop; Araújo, Mário César Ugulino; da Silva, Edvan Cirino; Pereira, Claudete Fernandes; de Andrade, Stéfani Iury Evangelista; Leite, Flaviano Carvalho

    2011-03-09

    This work proposes a modification to the successive projections algorithm (SPA) aimed at selecting spectral variables for multiple linear regression (MLR) in the presence of unknown interferents not included in the calibration data set. The modified algorithm favours the selection of variables in which the effect of the interferent is less pronounced. The proposed procedure can be regarded as an adaptive modelling technique, because the spectral features of the samples to be analyzed are considered in the variable selection process. The advantages of this new approach are demonstrated in two analytical problems, namely (1) ultraviolet-visible spectrometric determination of tartrazine, allure red and sunset yellow in aqueous solutions under the interference of erythrosine, and (2) near-infrared spectrometric determination of ethanol in gasoline under the interference of toluene. In these case studies, the performance of conventional MLR-SPA models is substantially degraded by the presence of the interferent. This problem is circumvented by applying the proposed Adaptive MLR-SPA approach, which results in prediction errors smaller than those obtained by three other multivariate calibration techniques, namely stepwise regression, full-spectrum partial-least-squares (PLS) and PLS with variables selected by a genetic algorithm. An inspection of the variable selection results reveals that the Adaptive approach successfully avoids spectral regions in which the interference is more intense. Copyright © 2011 Elsevier B.V. All rights reserved.

  14. Combined computational-experimental approach to predict blood-brain barrier (BBB) permeation based on "green" salting-out thin layer chromatography supported by simple molecular descriptors.

    Science.gov (United States)

    Ciura, Krzesimir; Belka, Mariusz; Kawczak, Piotr; Bączek, Tomasz; Markuszewski, Michał J; Nowakowska, Joanna

    2017-09-05

    The objective of this paper is to build QSRR/QSAR model for predicting the blood-brain barrier (BBB) permeability. The obtained models are based on salting-out thin layer chromatography (SOTLC) constants and calculated molecular descriptors. Among chromatographic methods SOTLC was chosen, since the mobile phases are free of organic solvent. As consequences, there are less toxic, and have lower environmental impact compared to classical reserved phases liquid chromatography (RPLC). During the study three stationary phase silica gel, cellulose plates and neutral aluminum oxide were examined. The model set of solutes presents a wide range of log BB values, containing compounds which cross the BBB readily and molecules poorly distributed to the brain including drugs acting on the nervous system as well as peripheral acting drugs. Additionally, the comparison of three regression models: multiple linear regression (MLR), partial least-squares (PLS) and orthogonal partial least squares (OPLS) were performed. The designed QSRR/QSAR models could be useful to predict BBB of systematically synthesized newly compounds in the drug development pipeline and are attractive alternatives of time-consuming and demanding directed methods for log BB measurement. The study also shown that among several regression techniques, significant differences can be obtained in models performance, measured by R 2 and Q 2 , hence it is strongly suggested to evaluate all available options as MLR, PLS and OPLS. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. Prediction of long-residue properties of potential blends from mathematically mixed infrared spectra of pure crude oils by partial least-squares regression models

    NARCIS (Netherlands)

    de Peinder, P.; Visser, T.; Petrauskas, D.D.; Salvatori, F.; Soulimani, F.; Weckhuysen, B.M.

    2009-01-01

    Research has been carried out to determine the feasibility of partial least-squares (PLS) regression models to predict the long-residue (LR) properties of potential blends from infrared (IR) spectra that have been created by linearly co-adding the IR spectra of crude oils. The study is the follow-up

  16. Modeling of Soil Aggregate Stability using Support Vector Machines and Multiple Linear Regression

    Directory of Open Access Journals (Sweden)

    Ali Asghar Besalatpour

    2016-02-01

    by 20-m digital elevation model (DEM. The data set was divided into two subsets of training and testing. The training subset was randomly chosen from 70% of the total set of the data and the remaining samples (30% of the data were used as the testing set. The correlation coefficient (r, mean square error (MSE, and error percentage (ERROR% between the measured and the predicted GMD values were used to evaluate the performance of the models. Results and Discussion: The description statistics showed that there was little variability in the sample distributions of the variables used in this study to develop the GMD prediction models, indicating that their values were all normally distributed. The constructed SVM model had better performance in predicting GMD compared to the traditional multiple linear regression model. The obtained MSE and r values for the developed SVM model for soil aggregate stability prediction were 0.005 and 0.86, respectively. The obtained ERROR% value for soil aggregate stability prediction using the SVM model was 10.7% while it was 15.7% for the regression model. The scatter plot figures also showed that the SVM model was more accurate in GMD estimation than the MLR model, since the predicted GMD values were closer in agreement with the measured values for most of the samples. The worse performance of the MLR model might be due to the larger amount of data that is required for developing a sustainable regression model compared to intelligent systems. Furthermore, only the linear effects of the predictors on the dependent variable can be extracted by linear models while in many cases the effects may not be linear in nature. Meanwhile, the SVM model is suitable for modelling nonlinear relationships and its major advantage is that the method can be developed without knowing the exact form of the analytical function on which the model should be built. All these indicate that the SVM approach would be a better choice for predicting soil aggregate

  17. Verifying the performance of artificial neural network and multiple linear regression in predicting the mean seasonal municipal solid waste generation rate: A case study of Fars province, Iran.

    Science.gov (United States)

    Azadi, Sama; Karimi-Jashni, Ayoub

    2016-02-01

    Predicting the mass of solid waste generation plays an important role in integrated solid waste management plans. In this study, the performance of two predictive models, Artificial Neural Network (ANN) and Multiple Linear Regression (MLR) was verified to predict mean Seasonal Municipal Solid Waste Generation (SMSWG) rate. The accuracy of the proposed models is illustrated through a case study of 20 cities located in Fars Province, Iran. Four performance measures, MAE, MAPE, RMSE and R were used to evaluate the performance of these models. The MLR, as a conventional model, showed poor prediction performance. On the other hand, the results indicated that the ANN model, as a non-linear model, has a higher predictive accuracy when it comes to prediction of the mean SMSWG rate. As a result, in order to develop a more cost-effective strategy for waste management in the future, the ANN model could be used to predict the mean SMSWG rate. Copyright © 2015 Elsevier Ltd. All rights reserved.

  18. Hierarchical Cluster-based Partial Least Squares Regression (HC-PLSR is an efficient tool for metamodelling of nonlinear dynamic models

    Directory of Open Access Journals (Sweden)

    Omholt Stig W

    2011-06-01

    Full Text Available Abstract Background Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs to variation in features of the trajectories of the state variables (outputs throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR, where fuzzy C-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR and ordinary least squares (OLS regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function. Results Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback

  19. Hierarchical cluster-based partial least squares regression (HC-PLSR) is an efficient tool for metamodelling of nonlinear dynamic models.

    Science.gov (United States)

    Tøndel, Kristin; Indahl, Ulf G; Gjuvsland, Arne B; Vik, Jon Olav; Hunter, Peter; Omholt, Stig W; Martens, Harald

    2011-06-01

    Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs) to variation in features of the trajectories of the state variables (outputs) throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR), where fuzzy C-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR) and ordinary least squares (OLS) regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function. Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback loops. HC-PLSR is a promising approach for

  20. Assessing the impact of local meteorological variables on surface ozone in Hong Kong during 2000-2015 using quantile and multiple line regression models

    Science.gov (United States)

    Zhao, Wei; Fan, Shaojia; Guo, Hai; Gao, Bo; Sun, Jiaren; Chen, Laiguo

    2016-11-01

    The quantile regression (QR) method has been increasingly introduced to atmospheric environmental studies to explore the non-linear relationship between local meteorological conditions and ozone mixing ratios. In this study, we applied QR for the first time, together with multiple linear regression (MLR), to analyze the dominant meteorological parameters influencing the mean, 10th percentile, 90th percentile and 99th percentile of maximum daily 8-h average (MDA8) ozone concentrations in 2000-2015 in Hong Kong. The dominance analysis (DA) was used to assess the relative importance of meteorological variables in the regression models. Results showed that the MLR models worked better at suburban and rural sites than at urban sites, and worked better in winter than in summer. QR models performed better in summer for 99th and 90th percentiles and performed better in autumn and winter for 10th percentile. And QR models also performed better in suburban and rural areas for 10th percentile. The top 3 dominant variables associated with MDA8 ozone concentrations, changing with seasons and regions, were frequently associated with the six meteorological parameters: boundary layer height, humidity, wind direction, surface solar radiation, total cloud cover and sea level pressure. Temperature rarely became a significant variable in any season, which could partly explain the peak of monthly average ozone concentrations in October in Hong Kong. And we found the effect of solar radiation would be enhanced during extremely ozone pollution episodes (i.e., the 99th percentile). Finally, meteorological effects on MDA8 ozone had no significant changes before and after the 2010 Asian Games.

  1. Prediction of the antimicrobial activity of walnut (Juglans regia L.) kernel aqueous extracts using artificial neural network and multiple linear regression.

    Science.gov (United States)

    Kavuncuoglu, Hatice; Kavuncuoglu, Erhan; Karatas, Seyda Merve; Benli, Büsra; Sagdic, Osman; Yalcin, Hasan

    2018-04-09

    The mathematical model was established to determine the diameter of inhibition zone of the walnut extract on the twelve bacterial species. Type of extraction, concentration, and pathogens were taken as input variables. Two models were used with the aim of designing this system. One of them was developed with artificial neural networks (ANN), and the other was formed with multiple linear regression (MLR). Four common training algorithms were used. Levenberg-Marquardt (LM), Bayesian regulation (BR), scaled conjugate gradient (SCG) and resilient back propagation (RP) were investigated, and the algorithms were compared. Root mean squared error and correlation coefficient were evaluated as performance criteria. When these criteria were analyzed, ANN showed high prediction performance, while MLR showed low prediction performance. As a result, it is seen that when the different input values are provided to the system developed with ANN, the most accurate inhibition zone (IZ) estimates were obtained. The results of this study could offer new perspectives, particularly in the field of microbiology, because these could be applied to other type of extraction, concentrations, and pathogens, without resorting to experiments. Copyright © 2018 Elsevier B.V. All rights reserved.

  2. Predictive occurrence models for coastal wetland plant communities: Delineating hydrologic response surfaces with multinomial logistic regression

    Science.gov (United States)

    Snedden, Gregg A.; Steyer, Gregory D.

    2013-02-01

    Understanding plant community zonation along estuarine stress gradients is critical for effective conservation and restoration of coastal wetland ecosystems. We related the presence of plant community types to estuarine hydrology at 173 sites across coastal Louisiana. Percent relative cover by species was assessed at each site near the end of the growing season in 2008, and hourly water level and salinity were recorded at each site Oct 2007-Sep 2008. Nine plant community types were delineated with k-means clustering, and indicator species were identified for each of the community types with indicator species analysis. An inverse relation between salinity and species diversity was observed. Canonical correspondence analysis (CCA) effectively segregated the sites across ordination space by community type, and indicated that salinity and tidal amplitude were both important drivers of vegetation composition. Multinomial logistic regression (MLR) and Akaike's Information Criterion (AIC) were used to predict the probability of occurrence of the nine vegetation communities as a function of salinity and tidal amplitude, and probability surfaces obtained from the MLR model corroborated the CCA results. The weighted kappa statistic, calculated from the confusion matrix of predicted versus actual community types, was 0.7 and indicated good agreement between observed community types and model predictions. Our results suggest that models based on a few key hydrologic variables can be valuable tools for predicting vegetation community development when restoring and managing coastal wetlands.

  3. Vanadium NMR Chemical Shifts of (Imido)vanadium(V) Dichloride Complexes with Imidazolin-2-iminato and Imidazolidin-2-iminato Ligands: Cooperation with Quantum-Chemical Calculations and Multiple Linear Regression Analyses.

    Science.gov (United States)

    Yi, Jun; Yang, Wenhong; Sun, Wen-Hua; Nomura, Kotohiro; Hada, Masahiko

    2017-11-30

    The NMR chemical shifts of vanadium ( 51 V) in (imido)vanadium(V) dichloride complexes with imidazolin-2-iminato and imidazolidin-2-iminato ligands were calculated by the density functional theory (DFT) method with GIAO. The calculated 51 V NMR chemical shifts were analyzed by the multiple linear regression (MLR) analysis (MLRA) method with a series of calculated molecular properties. Some of calculated NMR chemical shifts were incorrect using the optimized molecular geometries of the X-ray structures. After the global minimum geometries of all of the molecules were determined, the trend of the observed chemical shifts was well reproduced by the present DFT method. The MLRA method was performed to investigate the correlation between the 51 V NMR chemical shift and the natural charge, band energy gap, and Wiberg bond index of the V═N bond. The 51 V NMR chemical shifts obtained with the present MLR model were well reproduced with a correlation coefficient of 0.97.

  4. Use of regression-based models to map sensitivity of aquatic resources to atmospheric deposition in Yosemite National Park, USA

    Science.gov (United States)

    Clow, D. W.; Nanus, L.; Huggett, B. W.

    2010-12-01

    An abundance of exposed bedrock, sparse soil and vegetation, and fast hydrologic flushing rates make aquatic ecosystems in Yosemite National Park susceptible to nutrient enrichment and episodic acidification due to atmospheric deposition of nitrogen (N) and sulfur (S). In this study, multiple-linear regression (MLR) models were created to estimate fall-season nitrate and acid neutralizing capacity (ANC) in surface water in Yosemite wilderness. Input data included estimated winter N deposition, fall-season surface-water chemistry measurements at 52 sites, and basin characteristics derived from geographic information system layers of topography, geology, and vegetation. The MLR models accounted for 84% and 70% of the variance in surface-water nitrate and ANC, respectively. Explanatory variables (and the sign of their coefficients) for nitrate included elevation (positive) and the abundance of neoglacial and talus deposits (positive), unvegetated terrain (positive), alluvium (negative), and riparian (negative) areas in the basins. Explanatory variables for ANC included basin area (positive) and the abundance of metamorphic rocks (positive), unvegetated terrain (negative), water (negative), and winter N deposition (negative) in the basins. The MLR equations were applied to 1407 stream reaches delineated in the National Hydrography Dataset for Yosemite, and maps of predicted surface-water nitrate and ANC concentrations were created. Predicted surface-water nitrate concentrations were highest in small, high-elevation cirques, and concentrations declined downstream. Predicted ANC concentrations showed the opposite pattern, except in high-elevation areas underlain by metamorphic rocks along the Sierran Crest, which had relatively high predicted ANC (>200 µeq L-1). Maps were created to show where basin characteristics predispose aquatic resources to nutrient enrichment and acidification effects from N and S deposition. The maps can be used to help guide development of

  5. High Resolution Mapping of Soil Properties Using Remote Sensing Variables in South-Western Burkina Faso: A Comparison of Machine Learning and Multiple Linear Regression Models.

    Science.gov (United States)

    Forkuor, Gerald; Hounkpatin, Ozias K L; Welp, Gerhard; Thiel, Michael

    2017-01-01

    Accurate and detailed spatial soil information is essential for environmental modelling, risk assessment and decision making. The use of Remote Sensing data as secondary sources of information in digital soil mapping has been found to be cost effective and less time consuming compared to traditional soil mapping approaches. But the potentials of Remote Sensing data in improving knowledge of local scale soil information in West Africa have not been fully explored. This study investigated the use of high spatial resolution satellite data (RapidEye and Landsat), terrain/climatic data and laboratory analysed soil samples to map the spatial distribution of six soil properties-sand, silt, clay, cation exchange capacity (CEC), soil organic carbon (SOC) and nitrogen-in a 580 km2 agricultural watershed in south-western Burkina Faso. Four statistical prediction models-multiple linear regression (MLR), random forest regression (RFR), support vector machine (SVM), stochastic gradient boosting (SGB)-were tested and compared. Internal validation was conducted by cross validation while the predictions were validated against an independent set of soil samples considering the modelling area and an extrapolation area. Model performance statistics revealed that the machine learning techniques performed marginally better than the MLR, with the RFR providing in most cases the highest accuracy. The inability of MLR to handle non-linear relationships between dependent and independent variables was found to be a limitation in accurately predicting soil properties at unsampled locations. Satellite data acquired during ploughing or early crop development stages (e.g. May, June) were found to be the most important spectral predictors while elevation, temperature and precipitation came up as prominent terrain/climatic variables in predicting soil properties. The results further showed that shortwave infrared and near infrared channels of Landsat8 as well as soil specific indices of redness

  6. Structure-based predictions of 13C-NMR chemical shifts for a series of 2-functionalized 5-(methylsulfonyl)-1-phenyl-1H-indoles derivatives using GA-based MLR method

    Science.gov (United States)

    Ghavami, Raouf; Sadeghi, Faridoon; Rasouli, Zolikha; Djannati, Farhad

    2012-12-01

    Experimental values for the 13C NMR chemical shifts (ppm, TMS = 0) at 300 K ranging from 96.28 ppm (C4' of indole derivative 17) to 159.93 ppm (C4' of indole derivative 23) relative to deuteride chloroform (CDCl3, 77.0 ppm) or dimethylsulfoxide (DMSO, 39.50 ppm) as internal reference in CDCl3 or DMSO-d6 solutions have been collected from literature for thirty 2-functionalized 5-(methylsulfonyl)-1-phenyl-1H-indole derivatives containing different substituted groups. An effective quantitative structure-property relationship (QSPR) models were built using hybrid method combining genetic algorithm (GA) based on stepwise selection multiple linear regression (SWS-MLR) as feature-selection tools and correlation models between each carbon atom of indole derivative and calculated descriptors. Each compound was depicted by molecular structural descriptors that encode constitutional, topological, geometrical, electrostatic, and quantum chemical features. The accuracy of all developed models were confirmed using different types of internal and external procedures and various statistical tests. Furthermore, the domain of applicability for each model which indicates the area of reliable predictions was defined.

  7. Multiple Linear Regression for Reconstruction of Gene Regulatory Networks in Solving Cascade Error Problems.

    Science.gov (United States)

    Salleh, Faridah Hani Mohamed; Zainudin, Suhaila; Arif, Shereena M

    2017-01-01

    Gene regulatory network (GRN) reconstruction is the process of identifying regulatory gene interactions from experimental data through computational analysis. One of the main reasons for the reduced performance of previous GRN methods had been inaccurate prediction of cascade motifs. Cascade error is defined as the wrong prediction of cascade motifs, where an indirect interaction is misinterpreted as a direct interaction. Despite the active research on various GRN prediction methods, the discussion on specific methods to solve problems related to cascade errors is still lacking. In fact, the experiments conducted by the past studies were not specifically geared towards proving the ability of GRN prediction methods in avoiding the occurrences of cascade errors. Hence, this research aims to propose Multiple Linear Regression (MLR) to infer GRN from gene expression data and to avoid wrongly inferring of an indirect interaction (A → B → C) as a direct interaction (A → C). Since the number of observations of the real experiment datasets was far less than the number of predictors, some predictors were eliminated by extracting the random subnetworks from global interaction networks via an established extraction method. In addition, the experiment was extended to assess the effectiveness of MLR in dealing with cascade error by using a novel experimental procedure that had been proposed in this work. The experiment revealed that the number of cascade errors had been very minimal. Apart from that, the Belsley collinearity test proved that multicollinearity did affect the datasets used in this experiment greatly. All the tested subnetworks obtained satisfactory results, with AUROC values above 0.5.

  8. Predictive occurrence models for coastal wetland plant communities: delineating hydrologic response surfaces with multinomial logistic regression

    Science.gov (United States)

    Snedden, Gregg A.; Steyer, Gregory D.

    2013-01-01

    Understanding plant community zonation along estuarine stress gradients is critical for effective conservation and restoration of coastal wetland ecosystems. We related the presence of plant community types to estuarine hydrology at 173 sites across coastal Louisiana. Percent relative cover by species was assessed at each site near the end of the growing season in 2008, and hourly water level and salinity were recorded at each site Oct 2007–Sep 2008. Nine plant community types were delineated with k-means clustering, and indicator species were identified for each of the community types with indicator species analysis. An inverse relation between salinity and species diversity was observed. Canonical correspondence analysis (CCA) effectively segregated the sites across ordination space by community type, and indicated that salinity and tidal amplitude were both important drivers of vegetation composition. Multinomial logistic regression (MLR) and Akaike's Information Criterion (AIC) were used to predict the probability of occurrence of the nine vegetation communities as a function of salinity and tidal amplitude, and probability surfaces obtained from the MLR model corroborated the CCA results. The weighted kappa statistic, calculated from the confusion matrix of predicted versus actual community types, was 0.7 and indicated good agreement between observed community types and model predictions. Our results suggest that models based on a few key hydrologic variables can be valuable tools for predicting vegetation community development when restoring and managing coastal wetlands.

  9. Current Mathematical Methods Used in QSAR/QSPR Studies

    Directory of Open Access Journals (Sweden)

    Peixun Liu

    2009-04-01

    Full Text Available This paper gives an overview of the mathematical methods currently used in quantitative structure-activity/property relationship (QASR/QSPR studies. Recently, the mathematical methods applied to the regression of QASR/QSPR models are developing very fast, and new methods, such as Gene Expression Programming (GEP, Project Pursuit Regression (PPR and Local Lazy Regression (LLR have appeared on the QASR/QSPR stage. At the same time, the earlier methods, including Multiple Linear Regression (MLR, Partial Least Squares (PLS, Neural Networks (NN, Support Vector Machine (SVM and so on, are being upgraded to improve their performance in QASR/QSPR studies. These new and upgraded methods and algorithms are described in detail, and their advantages and disadvantages are evaluated and discussed, to show their application potential in QASR/QSPR studies in the future.

  10. Non-destructive and rapid prediction of moisture content in red pepper (Capsicum annuum L.) powder using near-infrared spectroscopy and a partial least squares regression model

    Science.gov (United States)

    Purpose: The aim of this study was to develop a technique for the non-destructive and rapid prediction of the moisture content in red pepper powder using near-infrared (NIR) spectroscopy and a partial least squares regression (PLSR) model. Methods: Three red pepper powder products were separated in...

  11. Determination of carbohydrates present in Saccharomyces cerevisiae using mid-infrared spectroscopy and partial least squares regression.

    Science.gov (United States)

    Plata, Maria R; Koch, Cosima; Wechselberger, Patrick; Herwig, Christoph; Lendl, Bernhard

    2013-10-01

    A fast and simple method to control variations in carbohydrate composition of Saccharomyces cerevisiae, baker's yeast, during fermentation was developed using mid-infrared (mid-IR) spectroscopy. The method allows for precise and accurate determinations with minimal or no sample preparation and reagent consumption based on mid-IR spectra and partial least squares (PLS) regression. The PLS models were developed employing the results from reference analysis of the yeast cells. The reference analyses quantify the amount of trehalose, glucose, glycogen, and mannan in S. cerevisiae. The selection and optimization of pretreatment steps of samples such as the disruption of the yeast cells and the hydrolysis of mannan and glycogen to obtain monosaccharides were carried out. Trehalose, glucose, and mannose were determined using high-performance liquid chromatography coupled with a refractive index detector and total carbohydrates were measured using the phenol-sulfuric method. Linear concentration range, accuracy, precision, LOD and LOQ were examined to check the reliability of the chromatographic method for each analyte.

  12. A quantitative structure- property relationship of gas chromatographic/mass spectrometric retention data of 85 volatile organic compounds as air pollutant materials by multivariate methods

    Directory of Open Access Journals (Sweden)

    Sarkhosh Maryam

    2012-05-01

    Full Text Available Abstract A quantitative structure-property relationship (QSPR study is suggested for the prediction of retention times of volatile organic compounds. Various kinds of molecular descriptors were calculated to represent the molecular structure of compounds. Modeling of retention times of these compounds as a function of the theoretically derived descriptors was established by multiple linear regression (MLR and artificial neural network (ANN. The stepwise regression was used for the selection of the variables which gives the best-fitted models. After variable selection ANN, MLR methods were used with leave-one-out cross validation for building the regression models. The prediction results are in very good agreement with the experimental values. MLR as the linear regression method shows good ability in the prediction of the retention times of the prediction set. This provided a new and effective method for predicting the chromatography retention index for the volatile organic compounds.

  13. High Resolution Mapping of Soil Properties Using Remote Sensing Variables in South-Western Burkina Faso: A Comparison of Machine Learning and Multiple Linear Regression Models.

    Directory of Open Access Journals (Sweden)

    Gerald Forkuor

    Full Text Available Accurate and detailed spatial soil information is essential for environmental modelling, risk assessment and decision making. The use of Remote Sensing data as secondary sources of information in digital soil mapping has been found to be cost effective and less time consuming compared to traditional soil mapping approaches. But the potentials of Remote Sensing data in improving knowledge of local scale soil information in West Africa have not been fully explored. This study investigated the use of high spatial resolution satellite data (RapidEye and Landsat, terrain/climatic data and laboratory analysed soil samples to map the spatial distribution of six soil properties-sand, silt, clay, cation exchange capacity (CEC, soil organic carbon (SOC and nitrogen-in a 580 km2 agricultural watershed in south-western Burkina Faso. Four statistical prediction models-multiple linear regression (MLR, random forest regression (RFR, support vector machine (SVM, stochastic gradient boosting (SGB-were tested and compared. Internal validation was conducted by cross validation while the predictions were validated against an independent set of soil samples considering the modelling area and an extrapolation area. Model performance statistics revealed that the machine learning techniques performed marginally better than the MLR, with the RFR providing in most cases the highest accuracy. The inability of MLR to handle non-linear relationships between dependent and independent variables was found to be a limitation in accurately predicting soil properties at unsampled locations. Satellite data acquired during ploughing or early crop development stages (e.g. May, June were found to be the most important spectral predictors while elevation, temperature and precipitation came up as prominent terrain/climatic variables in predicting soil properties. The results further showed that shortwave infrared and near infrared channels of Landsat8 as well as soil specific indices

  14. Preoperative Monocyte-to-Lymphocyte Ratio in Peripheral Blood Predicts Stages, Metastasis, and Histological Grades in Patients with Ovarian Cancer

    Directory of Open Access Journals (Sweden)

    Jiangdong Xiang

    2017-02-01

    Full Text Available PURPOSE: The monocyte-to-lymphocyte ratio (MLR has been shown to be associated with the prognosis of various solid tumors. This study sought to evaluate the important value of the MLR in ovarian cancer patients. METHODS: A total of 133 ovarian cancer patients and 43 normal controls were retrospectively reviewed. The patients' demographics were analyzed along with clinical and pathologic data. The counts of peripheral neutrophils, lymphocytes, monocytes, and platelets were collected and used to calculate the MLR, neutrophil-to-lymphocyte ratio (NLR. and platelet-to-lymphocyte ratio (PLR. The optimal cutoff value of the MLR was determined by using receiver operating characteristic curve analysis. We compared the MLR, NLR, and PLR between ovarian cancer and normal control patients and among patients with different stages and different grades, as well as between patients with lymph node metastasis and non–lymph node metastasis. We then investigated the value of the MLR in predicting the stage, grade, and lymph node positivity by using logistic regression. The impact of the MLR on overall survival (OS was calculated by Kaplan-Meier method and compared by log-rank test. RESULTS: Statistically significant differences in the MLR were observed between ovarian cancer patients and normal controls. However, no difference was found for the NLR and PLR. Highly significant differences in the MLR were found among patients with different stages (stage I-II and stage III-IV, grades (G1 and >G1, and lymph node metastasis status. The MLR was a significant and independent risk factor for lymph node metastasis, as determined by logistic regression. The optimal cutoff value of the MLR was 0.23. We also classified the data according to tumor markers (CA125, CA199, HE4, AFP, and CEA and conventional coagulation parameters (International Normalized Ratio [INR] and fibrinogen. Highly significant differences in CA125, CA199, HE4, INR, fibrinogen levels, and lactate

  15. Statistically extrapolated nowcasting of summertime precipitation over the Eastern Alps

    Science.gov (United States)

    Chen, Min; Bica, Benedikt; Tüchler, Lukas; Kann, Alexander; Wang, Yong

    2017-07-01

    This paper presents a new multiple linear regression (MLR) approach to updating the hourly, extrapolated precipitation forecasts generated by the INCA (Integrated Nowcasting through Comprehensive Analysis) system for the Eastern Alps. The generalized form of the model approximates the updated precipitation forecast as a linear response to combinations of predictors selected through a backward elimination algorithm from a pool of predictors. The predictors comprise the raw output of the extrapolated precipitation forecast, the latest radar observations, the convective analysis, and the precipitation analysis. For every MLR model, bias and distribution correction procedures are designed to further correct the systematic regression errors. Applications of the MLR models to a verification dataset containing two months of qualified samples, and to one-month gridded data, are performed and evaluated. Generally, MLR yields slight, but definite, improvements in the intensity accuracy of forecasts during the late evening to morning period, and significantly improves the forecasts for large thresholds. The structure-amplitude-location scores, used to evaluate the performance of the MLR approach, based on its simulation of morphological features, indicate that MLR typically reduces the overestimation of amplitudes and generates similar horizontal structures in precipitation patterns and slightly degraded location forecasts, when compared with the extrapolated nowcasting.

  16. Development of Building Thermal Load and Discomfort Degree Hour Prediction Models Using Data Mining Approaches

    Directory of Open Access Journals (Sweden)

    Yaolin Lin

    2018-06-01

    Full Text Available Thermal load and indoor comfort level are two important building performance indicators, rapid predictions of which can help significantly reduce the computation time during design optimization. In this paper, a three-step approach is used to develop and evaluate prediction models. Firstly, the Latin Hypercube Sampling Method (LHSM is used to generate a representative 19-dimensional design database and DesignBuilder is then used to obtain the thermal load and discomfort degree hours through simulation. Secondly, samples from the database are used to develop and validate seven prediction models, using data mining approaches including multilinear regression (MLR, chi-square automatic interaction detector (CHAID, exhaustive CHAID (ECHAID, back-propagation neural network (BPNN, radial basis function network (RBFN, classification and regression trees (CART, and support vector machines (SVM. It is found that the MLR and BPNN models outperform the others in the prediction of thermal load with average absolute error of less than 1.19%, and the BPNN model is the best at predicting discomfort degree hour with 0.62% average absolute error. Finally, two hybrid models—MLR (MLR + BPNN and MLR-BPNN—are developed. The MLR-BPNN models are found to be the best prediction models, with average absolute error of 0.82% in thermal load and 0.59% in discomfort degree hour.

  17. Application of Machine-Learning Models to Predict Tacrolimus Stable Dose in Renal Transplant Recipients

    Science.gov (United States)

    Tang, Jie; Liu, Rong; Zhang, Yue-Li; Liu, Mou-Ze; Hu, Yong-Fang; Shao, Ming-Jie; Zhu, Li-Jun; Xin, Hua-Wen; Feng, Gui-Wen; Shang, Wen-Jun; Meng, Xiang-Guang; Zhang, Li-Rong; Ming, Ying-Zi; Zhang, Wei

    2017-02-01

    Tacrolimus has a narrow therapeutic window and considerable variability in clinical use. Our goal was to compare the performance of multiple linear regression (MLR) and eight machine learning techniques in pharmacogenetic algorithm-based prediction of tacrolimus stable dose (TSD) in a large Chinese cohort. A total of 1,045 renal transplant patients were recruited, 80% of which were randomly selected as the “derivation cohort” to develop dose-prediction algorithm, while the remaining 20% constituted the “validation cohort” to test the final selected algorithm. MLR, artificial neural network (ANN), regression tree (RT), multivariate adaptive regression splines (MARS), boosted regression tree (BRT), support vector regression (SVR), random forest regression (RFR), lasso regression (LAR) and Bayesian additive regression trees (BART) were applied and their performances were compared in this work. Among all the machine learning models, RT performed best in both derivation [0.71 (0.67-0.76)] and validation cohorts [0.73 (0.63-0.82)]. In addition, the ideal rate of RT was 4% higher than that of MLR. To our knowledge, this is the first study to use machine learning models to predict TSD, which will further facilitate personalized medicine in tacrolimus administration in the future.

  18. Hyperspectral analysis of soil organic matter in coal mining regions using wavelets, correlations, and partial least squares regression.

    Science.gov (United States)

    Lin, Lixin; Wang, Yunjia; Teng, Jiyao; Wang, Xuchen

    2016-02-01

    Hyperspectral estimation of soil organic matter (SOM) in coal mining regions is an important tool for enhancing fertilization in soil restoration programs. The correlation--partial least squares regression (PLSR) method effectively solves the information loss problem of correlation--multiple linear stepwise regression, but results of the correlation analysis must be optimized to improve precision. This study considers the relationship between spectral reflectance and SOM based on spectral reflectance curves of soil samples collected from coal mining regions. Based on the major absorption troughs in the 400-1006 nm spectral range, PLSR analysis was performed using 289 independent bands of the second derivative (SDR) with three levels and measured SOM values. A wavelet-correlation-PLSR (W-C-PLSR) model was then constructed. By amplifying useful information that was previously obscured by noise, the W-C-PLSR model was optimal for estimating SOM content, with smaller prediction errors in both calibration (R(2) = 0.970, root mean square error (RMSEC) = 3.10, and mean relative error (MREC) = 8.75) and validation (RMSEV = 5.85 and MREV = 14.32) analyses, as compared with other models. Results indicate that W-C-PLSR has great potential to estimate SOM in coal mining regions.

  19. Multiple Linear Regression for Reconstruction of Gene Regulatory Networks in Solving Cascade Error Problems

    Directory of Open Access Journals (Sweden)

    Faridah Hani Mohamed Salleh

    2017-01-01

    Full Text Available Gene regulatory network (GRN reconstruction is the process of identifying regulatory gene interactions from experimental data through computational analysis. One of the main reasons for the reduced performance of previous GRN methods had been inaccurate prediction of cascade motifs. Cascade error is defined as the wrong prediction of cascade motifs, where an indirect interaction is misinterpreted as a direct interaction. Despite the active research on various GRN prediction methods, the discussion on specific methods to solve problems related to cascade errors is still lacking. In fact, the experiments conducted by the past studies were not specifically geared towards proving the ability of GRN prediction methods in avoiding the occurrences of cascade errors. Hence, this research aims to propose Multiple Linear Regression (MLR to infer GRN from gene expression data and to avoid wrongly inferring of an indirect interaction (A → B → C as a direct interaction (A → C. Since the number of observations of the real experiment datasets was far less than the number of predictors, some predictors were eliminated by extracting the random subnetworks from global interaction networks via an established extraction method. In addition, the experiment was extended to assess the effectiveness of MLR in dealing with cascade error by using a novel experimental procedure that had been proposed in this work. The experiment revealed that the number of cascade errors had been very minimal. Apart from that, the Belsley collinearity test proved that multicollinearity did affect the datasets used in this experiment greatly. All the tested subnetworks obtained satisfactory results, with AUROC values above 0.5.

  20. Variable Selection via Partial Correlation.

    Science.gov (United States)

    Li, Runze; Liu, Jingyuan; Lou, Lejia

    2017-07-01

    Partial correlation based variable selection method was proposed for normal linear regression models by Bühlmann, Kalisch and Maathuis (2010) as a comparable alternative method to regularization methods for variable selection. This paper addresses two important issues related to partial correlation based variable selection method: (a) whether this method is sensitive to normality assumption, and (b) whether this method is valid when the dimension of predictor increases in an exponential rate of the sample size. To address issue (a), we systematically study this method for elliptical linear regression models. Our finding indicates that the original proposal may lead to inferior performance when the marginal kurtosis of predictor is not close to that of normal distribution. Our simulation results further confirm this finding. To ensure the superior performance of partial correlation based variable selection procedure, we propose a thresholded partial correlation (TPC) approach to select significant variables in linear regression models. We establish the selection consistency of the TPC in the presence of ultrahigh dimensional predictors. Since the TPC procedure includes the original proposal as a special case, our theoretical results address the issue (b) directly. As a by-product, the sure screening property of the first step of TPC was obtained. The numerical examples also illustrate that the TPC is competitively comparable to the commonly-used regularization methods for variable selection.

  1. New applications of partial residual methodology

    International Nuclear Information System (INIS)

    Uslu, V.R.

    1999-12-01

    The formulation of a problem of interest in the framework of a statistical analysis starts with collecting the data, choosing a model, making certain assumptions as described in the basic paradigm by Box (1980). This stage is is called model building. Then the estimation stage is in order by pretending as if the formulation of the problem was true to obtain estimates, to make tests and inferences. In the final stage, called diagnostic checking, checking of whether there are some disagreements between the data and the model fitted is done by using diagnostic measures and diagnostic plots. It is well known that statistical methods perform best under the condition that all assumptions related to the methods are satisfied. However it is true that having the ideal case in practice is very difficult. Diagnostics are therefore becoming important so are diagnostic plots because they provide a immediate assessment. Partial residual plots that are the main interest of the present study are playing the major role among the diagnostic plots in multiple regression analysis. In statistical literature it is admitted that partial residual plots are more useful than ordinary residual plots in detecting outliers, nonconstant variance, and especially discovering curvatures. In this study we consider the partial residual methodology in statistical methods rather than multiple regression. We have shown that for the same purpose as in the multiple regression the use of partial residual plots is possible particularly in autoregressive time series models, transfer function models, linear mixed models and ridge regression. (author)

  2. Quantitative structure-property relationship study of n-octanol-water partition coefficients of some of diverse drugs using multiple linear regression

    International Nuclear Information System (INIS)

    Ghasemi, Jahanbakhsh; Saaidpour, Saadi

    2007-01-01

    A quantitative structure-property relationship (QSPR) study was performed to develop models those relate the structures of 150 drug organic compounds to their n-octanol-water partition coefficients (log P o/w ). Molecular descriptors derived solely from 3D structures of the molecular drugs. A genetic algorithm was also applied as a variable selection tool in QSPR analysis. The models were constructed using 110 molecules as training set, and predictive ability tested using 40 compounds. Modeling of log P o/w of these compounds as a function of the theoretically derived descriptors was established by multiple linear regression (MLR). Four descriptors for these compounds molecular volume (MV) (geometrical), hydrophilic-lipophilic balance (HLB) (constitutional), hydrogen bond forming ability (HB) (electronic) and polar surface area (PSA) (electrostatic) are taken as inputs for the model. The use of descriptors calculated only from molecular structure eliminates the need for experimental determination of properties for use in the correlation and allows for the estimation of log P o/w for molecules not yet synthesized. Application of the developed model to a testing set of 40 drug organic compounds demonstrates that the model is reliable with good predictive accuracy and simple formulation. The prediction results are in good agreement with the experimental value. The root mean square error of prediction (RMSEP) and square correlation coefficient (R 2 ) for MLR model were 0.22 and 0.99 for the prediction set log P o/w

  3. Journal of Earth System Science | Indian Academy of Sciences

    Indian Academy of Sciences (India)

    In this study, multi-linear regression (MLR) approach is used to construct intermittent reservoir daily inflow forecasting system. To illustrate the applicability and effect of using lumped and distributed input data in MLR approach, Koyna river watershed in Maharashtra, India is chosen as a case study. The results are also ...

  4. Prediction of clinical depression scores and detection of changes in whole-brain using resting-state functional MRI data with partial least squares regression.

    Directory of Open Access Journals (Sweden)

    Kosuke Yoshida

    Full Text Available In diagnostic applications of statistical machine learning methods to brain imaging data, common problems include data high-dimensionality and co-linearity, which often cause over-fitting and instability. To overcome these problems, we applied partial least squares (PLS regression to resting-state functional magnetic resonance imaging (rs-fMRI data, creating a low-dimensional representation that relates symptoms to brain activity and that predicts clinical measures. Our experimental results, based upon data from clinically depressed patients and healthy controls, demonstrated that PLS and its kernel variants provided significantly better prediction of clinical measures than ordinary linear regression. Subsequent classification using predicted clinical scores distinguished depressed patients from healthy controls with 80% accuracy. Moreover, loading vectors for latent variables enabled us to identify brain regions relevant to depression, including the default mode network, the right superior frontal gyrus, and the superior motor area.

  5. Simplified fate, exposure and effect modelling of chemical compounds in the case of lacking complete assessment data sets

    DEFF Research Database (Denmark)

    Birkved, Morten; Heijungs, R; Olsen, Stig Irving

    2004-01-01

    availability limitations to select key parameters that explain much of the variance and at the same time are relatively easily available. Further, PLSR was used to derive linear SBM models. In further investigations multiple linear regression (MLR) will be used to derive predictive equations for SBM...... characterisation factors. The result of this will be tested on common sense and environmental knowledge and a mechanistically understandable SBM will be developed by rounding off the coefficients of the regression equations. Preliminary results including PLSR derived linear SBM’s of this work is presented........g. in terms of how the input parameters enter the regression equation. In the absence of a final OMNIITOX BM a model of similar complexity USES-LCA, has been used as surrogate BM. We have applied partial least square of latent structure regression (PLSR) and combined insights from this with knowledge on data...

  6. Forecasting the daily power output of a grid-connected photovoltaic system based on multivariate adaptive regression splines

    International Nuclear Information System (INIS)

    Li, Yanting; He, Yong; Su, Yan; Shu, Lianjie

    2016-01-01

    Highlights: • Suggests a nonparametric model based on MARS for output power prediction. • Compare the MARS model with a wide variety of prediction models. • Show that the MARS model is able to provide an overall good performance in both the training and testing stages. - Abstract: Both linear and nonlinear models have been proposed for forecasting the power output of photovoltaic systems. Linear models are simple to implement but less flexible. Due to the stochastic nature of the power output of PV systems, nonlinear models tend to provide better forecast than linear models. Motivated by this, this paper suggests a fairly simple nonlinear regression model known as multivariate adaptive regression splines (MARS), as an alternative to forecasting of solar power output. The MARS model is a data-driven modeling approach without any assumption about the relationship between the power output and predictors. It maintains simplicity of the classical multiple linear regression (MLR) model while possessing the capability of handling nonlinearity. It is simpler in format than other nonlinear models such as ANN, k-nearest neighbors (KNN), classification and regression tree (CART), and support vector machine (SVM). The MARS model was applied on the daily output of a grid-connected 2.1 kW PV system to provide the 1-day-ahead mean daily forecast of the power output. The comparisons with a wide variety of forecast models show that the MARS model is able to provide reliable forecast performance.

  7. Discrimination of Transgenic Rice Based on Near Infrared Reflectance Spectroscopy and Partial Least Squares Regression Discriminant Analysis

    Directory of Open Access Journals (Sweden)

    ZHANG Long

    2015-09-01

    Full Text Available Near infrared reflectance spectroscopy (NIRS, a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA to discriminate the transgenic (TCTP and mi166 and wild type (Zhonghua 11 rice. Furthermore, rice lines transformed with protein gene (OsTCTP and regulation gene (Osmi166 were also discriminated by the NIRS method. The performances of PLS-DA in spectral ranges of 4 000–8 000 cm-1 and 4 000–10 000 cm-1 were compared to obtain the optimal spectral range. As a result, the transgenic and wild type rice were distinguished from each other in the range of 4 000–10 000 cm-1, and the correct classification rate was 100.0% in the validation test. The transgenic rice TCTP and mi166 were also distinguished from each other in the range of 4 000–10 000 cm-1, and the correct classification rate was also 100.0%. In conclusion, NIRS combined with PLS-DA can be used for the discrimination of transgenic rice.

  8. Development of nondestructive detection method for adulterated powder products using Raman spectroscopy and partial least squares regression

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Sang Dae; Lohumi, Santosh; Cho, Byoung Kwan [Dept. of Biosystems Machinery Engineering, Chungnam National University, Daejeon (Korea, Republic of); Kim, Moon Sung [United States Department of Agriculture Agricultural Research Service, Washington (United States); Lee, Soo Hee [Life and Technology Co.,Ltd., Hwasung (Korea, Republic of)

    2014-08-15

    This study was conducted to develop a non-destructive detection method for adulterated powder products using Raman spectroscopy and partial least squares regression(PLSR). Garlic and ginger powder, which are used as natural seasoning and in health supplement foods, were selected for this experiment. Samples were adulterated with corn starch in concentrations of 5-35%. PLSR models for adulterated garlic and ginger powders were developed and their performances evaluated using cross validation. The R{sup 2}{sub c} and SEC of an optimal PLSR model were 0.99 and 2.16 for the garlic powder samples, and 0.99 and 0.84 for the ginger samples, respectively. The variable importance in projection (VIP) score is a useful and simple tool for the evaluation of the importance of each variable in a PLSR model. After the VIP scores were taken pre-selection, the Raman spectrum data was reduced by one third. New PLSR models, based on a reduced number of wavelengths selected by the VIP scores technique, gave good predictions for the adulterated garlic and ginger powder samples.

  9. Development of nondestructive detection method for adulterated powder products using Raman spectroscopy and partial least squares regression

    International Nuclear Information System (INIS)

    Lee, Sang Dae; Lohumi, Santosh; Cho, Byoung Kwan; Kim, Moon Sung; Lee, Soo Hee

    2014-01-01

    This study was conducted to develop a non-destructive detection method for adulterated powder products using Raman spectroscopy and partial least squares regression(PLSR). Garlic and ginger powder, which are used as natural seasoning and in health supplement foods, were selected for this experiment. Samples were adulterated with corn starch in concentrations of 5-35%. PLSR models for adulterated garlic and ginger powders were developed and their performances evaluated using cross validation. The R 2 c and SEC of an optimal PLSR model were 0.99 and 2.16 for the garlic powder samples, and 0.99 and 0.84 for the ginger samples, respectively. The variable importance in projection (VIP) score is a useful and simple tool for the evaluation of the importance of each variable in a PLSR model. After the VIP scores were taken pre-selection, the Raman spectrum data was reduced by one third. New PLSR models, based on a reduced number of wavelengths selected by the VIP scores technique, gave good predictions for the adulterated garlic and ginger powder samples.

  10. Prediction of air-to-blood partition coefficients of volatile organic compounds using genetic algorithm and artificial neural network

    International Nuclear Information System (INIS)

    Konoz, Elahe; Golmohammadi, Hassan

    2008-01-01

    An artificial neural network (ANN) was constructed and trained for the prediction of air-to-blood partition coefficients of volatile organic compounds. The inputs of this neural network are theoretically derived descriptors that were chosen by genetic algorithm (GA) and multiple linear regression (MLR) features selection techniques. These descriptors are: R maximal autocorrelation of lag 1 weighted by atomic Sanderson electronegativities (R1E+), electron density on the most negative atom in molecule (EDNA), maximum partial charge for C atom (MXPCC), surface weighted charge partial surface area (WNSA1), fractional charge partial surface area (FNSA2) and atomic charge weighted partial positive surface area (PPSA3). The standard errors of training, test and validation sets for the ANN model are 0.095, 0.148 and 0.120, respectively. Result obtained showed that nonlinear model can simulate the relationship between structural descriptors and the partition coefficients of the molecules in data set accurately

  11. Two-step superresolution approach for surveillance face image through radial basis function-partial least squares regression and locality-induced sparse representation

    Science.gov (United States)

    Jiang, Junjun; Hu, Ruimin; Han, Zhen; Wang, Zhongyuan; Chen, Jun

    2013-10-01

    Face superresolution (SR), or face hallucination, refers to the technique of generating a high-resolution (HR) face image from a low-resolution (LR) one with the help of a set of training examples. It aims at transcending the limitations of electronic imaging systems. Applications of face SR include video surveillance, in which the individual of interest is often far from cameras. A two-step method is proposed to infer a high-quality and HR face image from a low-quality and LR observation. First, we establish the nonlinear relationship between LR face images and HR ones, according to radial basis function and partial least squares (RBF-PLS) regression, to transform the LR face into the global face space. Then, a locality-induced sparse representation (LiSR) approach is presented to enhance the local facial details once all the global faces for each LR training face are constructed. A comparison of some state-of-the-art SR methods shows the superiority of the proposed two-step approach, RBF-PLS global face regression followed by LiSR-based local patch reconstruction. Experiments also demonstrate the effectiveness under both simulation conditions and some real conditions.

  12. Efectivity of Additive Spline for Partial Least Square Method in Regression Model Estimation

    Directory of Open Access Journals (Sweden)

    Ahmad Bilfarsah

    2005-04-01

    Full Text Available Additive Spline of Partial Least Square method (ASPL as one generalization of Partial Least Square (PLS method. ASPLS method can be acommodation to non linear and multicollinearity case of predictor variables. As a principle, The ASPLS method approach is cahracterized by two idea. The first is to used parametric transformations of predictors by spline function; the second is to make ASPLS components mutually uncorrelated, to preserve properties of the linear PLS components. The performance of ASPLS compared with other PLS method is illustrated with the fisher economic application especially the tuna fish production.

  13. Quantitative structure-retention relationship studies with immobilized artificial membrane chromatography II: partial least squares regression.

    Science.gov (United States)

    Li, Jie; Sun, Jin; He, Zhonggui

    2007-01-26

    We aimed to establish quantitative structure-retention relationship (QSRR) with immobilized artificial membrane (IAM) chromatography using easily understood and obtained physicochemical molecular descriptors and to elucidate which descriptors are critical to affect the interaction process between solutes and immobilized phospholipid membranes. The retention indices (logk(IAM)) of 55 structurally diverse drugs were determined on an immobilized artificial membrane column (IAM.PC.DD2) directly or obtained by extrapolation method for highly hydrophobic compounds. Ten simple physicochemical property descriptors (clogP, rings, rotatory bond, hydro-bond counting, etc.) of these drugs were collected and used to establish QSRR and predict the retention data by partial least squares regression (PLSR). Five descriptors, clogP, rotatory bond (RotB), rings, molecular weight (MW) and total surface area (TSA), were reserved by using the Variable Importance for Projection (VIP) values as criterion to build the final PLSR model. An external test set was employed to verify the QSRR based on the training set with the five variables, and QSRR by PLSR exhibited a satisfying predictive ability with R(p)=0.902 and RMSE(p)=0.400. Comparison of coefficients of centered and scaled variables by PLSR demonstrated that, for the descriptors studied, clogP and TSA have the most significant positive effect but the rotatable bond has significant negative effect on drug IAM chromatographic retention.

  14. The density, the refractive index and the adjustment of the excess thermodynamic properties by means of the multiple linear regression method for the ternary system ethylbenzene–octane–propylbenzene

    International Nuclear Information System (INIS)

    Lisa, C.; Ungureanu, M.; Cosmaţchi, P.C.; Bolat, G.

    2015-01-01

    Graphical abstract: - Highlights: • Thermodynamic properties of the ethylbenzene–octane–propylbenzene system. • Equations with much lower standard deviations in comparison with other models. • The prediction of the V E based on the refractive index by means of the MLR method. - Abstract: The density (ρ) and the refractive index (n) have been experimentally determined for the ethylbenzene (1)–octane (2)–propylbenzene (3) ternary system in the entire variation range of the composition, at three temperatures: 298.15, 308.15 and 318.15 K and pressure 0.1 MPa. The excess thermodynamic properties that had been calculated based on the experimental determinations have been used to build empirical models which, despite of the disadvantage of having a greater number of coefficients, result in much lower standard deviations in comparison with the Redlich–Kister type models. The statistical processing of experimental data by means of the multiple linear regression method (MLR) was used in order to model the excess thermodynamic properties. Lower standard deviations than the Redlich–Kister type models were also obtained. The adjustment of the excess molar volume (V E ) based on refractive index by means of the Multiple linear regression of the SigmaPlot 11.2 program was made for the ethylbenzene (1)–octane (2)–propylbenzene (3) ternary system, obtaining a simple mathematical model which correlates the excess molar volume with the refractive index, the normalized temperature and the composition of the ternary mixture: V E = A 0 + A 1 X 1 + A 2 X 2 + A 3 (T/298.15) + A 4 n for which the standard deviation is 0.03.

  15. Further Analysis of Boiling Points of Small Molecules, CH[subscript w]F[subscript x]Cl[subscript y]Br[subscript z

    Science.gov (United States)

    Beauchamp, Guy

    2005-01-01

    A study to present specific hypothesis that satisfactorily explain the boiling point of a number of molecules, CH[subscript w]F[subscript x]Cl[subscript y]Br[subscript z] having similar structure, and then analyze the model with the help of multiple linear regression (MLR), a data analysis tool. The MLR analysis was useful in selecting the…

  16. QSPR study of the retention/release property of odorant molecules in pectin gels using statistical methods

    Directory of Open Access Journals (Sweden)

    Assia Belhassan

    2017-11-01

    Full Text Available The ACD/ChemSketch, MarvinSketch, and ChemOffice programmes were used to calculate several molecular descriptors of 51 odorant molecules (15 alcohols, 11 aldehydes, 9 ketones and 16 esters. The best descriptors were selected to establish the Quantitative Structure-Property Relationship (QSPR of the retention/release property of odorant molecules in pectin gels using Principal Components Analysis (PCA, Multiple Linear Regression (MLR, Multiple Non-linear Regression (MNLR and Artificial Neural Network (ANN methods We propose a quantitative model based on these analyses. PCA has been used to select descriptors that exhibit high correlation with the retention/release property. The MLR method yielded correlation coefficients of 0.960 and 0.958 for PG-0.4 (pectin concentration: 0.4% w/w and PG-0.8 (pectin concentration: 0.8% w/w media, respectively. Internal and external validations were used to determine the statistical quality of the QSPR of the two MLR models. The MNLR method, considering the relevant descriptors obtained from the MLR, yielded correlation coefficients of 0.978 and 0.975 for PG-0.4 and PG-0.8 media, respectively. The applicability domain of MLR models was investigated using simple and leverage approaches to detect outliers and outside compounds. The effects of different descriptors on the retention/release property are described, and these descriptors were used to study and design new compounds with higher and lower values of the property than the existing ones. Keywords: Odorant Molecules, Retention/Release, Pectin Gels, Quantitative Structure Property Relationship, Multiple Linear Regression, Artificial Neural Network

  17. EPMLR: sequence-based linear B-cell epitope prediction method using multiple linear regression.

    Science.gov (United States)

    Lian, Yao; Ge, Meng; Pan, Xian-Ming

    2014-12-19

    B-cell epitopes have been studied extensively due to their immunological applications, such as peptide-based vaccine development, antibody production, and disease diagnosis and therapy. Despite several decades of research, the accurate prediction of linear B-cell epitopes has remained a challenging task. In this work, based on the antigen's primary sequence information, a novel linear B-cell epitope prediction model was developed using the multiple linear regression (MLR). A 10-fold cross-validation test on a large non-redundant dataset was performed to evaluate the performance of our model. To alleviate the problem caused by the noise of negative dataset, 300 experiments utilizing 300 sub-datasets were performed. We achieved overall sensitivity of 81.8%, precision of 64.1% and area under the receiver operating characteristic curve (AUC) of 0.728. We have presented a reliable method for the identification of linear B cell epitope using antigen's primary sequence information. Moreover, a web server EPMLR has been developed for linear B-cell epitope prediction: http://www.bioinfo.tsinghua.edu.cn/epitope/EPMLR/ .

  18. Rapid Quantitative Analysis of Forest Biomass Using Fourier Transform Infrared Spectroscopy and Partial Least Squares Regression

    Directory of Open Access Journals (Sweden)

    Gifty E. Acquah

    2016-01-01

    Full Text Available Fourier transform infrared reflectance (FTIR spectroscopy has been used to predict properties of forest logging residue, a very heterogeneous feedstock material. Properties studied included the chemical composition, thermal reactivity, and energy content. The ability to rapidly determine these properties is vital in the optimization of conversion technologies for the successful commercialization of biobased products. Partial least squares regression of first derivative treated FTIR spectra had good correlations with the conventionally measured properties. For the chemical composition, constructed models generally did a better job of predicting the extractives and lignin content than the carbohydrates. In predicting the thermochemical properties, models for volatile matter and fixed carbon performed very well (i.e., R2 > 0.80, RPD > 2.0. The effect of reducing the wavenumber range to the fingerprint region for PLS modeling and the relationship between the chemical composition and higher heating value of logging residue were also explored. This study is new and different in that it is the first to use FTIR spectroscopy to quantitatively analyze forest logging residue, an abundant resource that can be used as a feedstock in the emerging low carbon economy. Furthermore, it provides a complete and systematic characterization of this heterogeneous raw material.

  19. PPARγ partial agonist GQ-16 strongly represses a subset of genes in 3T3-L1 adipocytes

    Energy Technology Data Exchange (ETDEWEB)

    Milton, Flora Aparecida [Faculdade de Ciências da Saúde, Laboratório de Farmacologia Molecular, Universidade de Brasília (Brazil); Genomic Medicine, Houston Methodist Research Institute, Houston, TX (United States); Cvoro, Aleksandra [Genomic Medicine, Houston Methodist Research Institute, Houston, TX (United States); Amato, Angelica A. [Faculdade de Ciências da Saúde, Laboratório de Farmacologia Molecular, Universidade de Brasília (Brazil); Sieglaff, Douglas H.; Filgueira, Carly S.; Arumanayagam, Anithachristy Sigamani [Genomic Medicine, Houston Methodist Research Institute, Houston, TX (United States); Caro Alves de Lima, Maria do; Rocha Pitta, Ivan [Laboratório de Planejamento e Síntese de Fármacos – LPSF, Universidade Federal de Pernambuco (Brazil); Assis Rocha Neves, Francisco de [Faculdade de Ciências da Saúde, Laboratório de Farmacologia Molecular, Universidade de Brasília (Brazil); Webb, Paul, E-mail: pwebb@HoustonMethodist.org [Genomic Medicine, Houston Methodist Research Institute, Houston, TX (United States)

    2015-08-28

    Thiazolidinediones (TZDs) are peroxisome proliferator-activated receptor gamma (PPARγ) agonists that improve insulin resistance but trigger side effects such as weight gain, edema, congestive heart failure and bone loss. GQ-16 is a PPARγ partial agonist that improves glucose tolerance and insulin sensitivity in mouse models of obesity and diabetes without inducing weight gain or edema. It is not clear whether GQ-16 acts as a partial agonist at all PPARγ target genes, or whether it displays gene-selective actions. To determine how GQ-16 influences PPARγ activity on a gene by gene basis, we compared effects of rosiglitazone (Rosi) and GQ-16 in mature 3T3-L1 adipocytes using microarray and qRT-PCR. Rosi changed expression of 1156 genes in 3T3-L1, but GQ-16 only changed 89 genes. GQ-16 generally showed weak effects upon Rosi induced genes, consistent with partial agonist actions, but a subset of modestly Rosi induced and strongly repressed genes displayed disproportionately strong GQ-16 responses. PPARγ partial agonists MLR24 and SR1664 also exhibit disproportionately strong effects on transcriptional repression. We conclude that GQ-16 displays a continuum of weak partial agonist effects but efficiently represses some negatively regulated PPARγ responsive genes. Strong repressive effects could contribute to physiologic actions of GQ-16. - Highlights: • GQ-16 is an insulin sensitizing PPARγ ligand with reduced harmful side effects. • GQ-16 displays a continuum of weak partial agonist activities at PPARγ-induced genes. • GQ-16 exerts strong repressive effects at a subset of genes. • These inhibitor actions should be evaluated in models of adipose tissue inflammation.

  20. PPARγ partial agonist GQ-16 strongly represses a subset of genes in 3T3-L1 adipocytes

    International Nuclear Information System (INIS)

    Milton, Flora Aparecida; Cvoro, Aleksandra; Amato, Angelica A.; Sieglaff, Douglas H.; Filgueira, Carly S.; Arumanayagam, Anithachristy Sigamani; Caro Alves de Lima, Maria do; Rocha Pitta, Ivan; Assis Rocha Neves, Francisco de; Webb, Paul

    2015-01-01

    Thiazolidinediones (TZDs) are peroxisome proliferator-activated receptor gamma (PPARγ) agonists that improve insulin resistance but trigger side effects such as weight gain, edema, congestive heart failure and bone loss. GQ-16 is a PPARγ partial agonist that improves glucose tolerance and insulin sensitivity in mouse models of obesity and diabetes without inducing weight gain or edema. It is not clear whether GQ-16 acts as a partial agonist at all PPARγ target genes, or whether it displays gene-selective actions. To determine how GQ-16 influences PPARγ activity on a gene by gene basis, we compared effects of rosiglitazone (Rosi) and GQ-16 in mature 3T3-L1 adipocytes using microarray and qRT-PCR. Rosi changed expression of 1156 genes in 3T3-L1, but GQ-16 only changed 89 genes. GQ-16 generally showed weak effects upon Rosi induced genes, consistent with partial agonist actions, but a subset of modestly Rosi induced and strongly repressed genes displayed disproportionately strong GQ-16 responses. PPARγ partial agonists MLR24 and SR1664 also exhibit disproportionately strong effects on transcriptional repression. We conclude that GQ-16 displays a continuum of weak partial agonist effects but efficiently represses some negatively regulated PPARγ responsive genes. Strong repressive effects could contribute to physiologic actions of GQ-16. - Highlights: • GQ-16 is an insulin sensitizing PPARγ ligand with reduced harmful side effects. • GQ-16 displays a continuum of weak partial agonist activities at PPARγ-induced genes. • GQ-16 exerts strong repressive effects at a subset of genes. • These inhibitor actions should be evaluated in models of adipose tissue inflammation

  1. Prediction of retention indices for frequently reported compounds of plant essential oils using multiple linear regression, partial least squares, and support vector machine.

    Science.gov (United States)

    Yan, Jun; Huang, Jian-Hua; He, Min; Lu, Hong-Bing; Yang, Rui; Kong, Bo; Xu, Qing-Song; Liang, Yi-Zeng

    2013-08-01

    Retention indices for frequently reported compounds of plant essential oils on three different stationary phases were investigated. Multivariate linear regression, partial least squares, and support vector machine combined with a new variable selection approach called random-frog recently proposed by our group, were employed to model quantitative structure-retention relationships. Internal and external validations were performed to ensure the stability and predictive ability. All the three methods could obtain an acceptable model, and the optimal results by support vector machine based on a small number of informative descriptors with the square of correlation coefficient for cross validation, values of 0.9726, 0.9759, and 0.9331 on the dimethylsilicone stationary phase, the dimethylsilicone phase with 5% phenyl groups, and the PEG stationary phase, respectively. The performances of two variable selection approaches, random-frog and genetic algorithm, are compared. The importance of the variables was found to be consistent when estimated from correlation coefficients in multivariate linear regression equations and selection probability in model spaces. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  2. Supremum Norm Posterior Contraction and Credible Sets for Nonparametric Multivariate Regression

    NARCIS (Netherlands)

    Yoo, W.W.; Ghosal, S

    2016-01-01

    In the setting of nonparametric multivariate regression with unknown error variance, we study asymptotic properties of a Bayesian method for estimating a regression function f and its mixed partial derivatives. We use a random series of tensor product of B-splines with normal basis coefficients as a

  3. Spontaneous regression of metastases from malignant melanoma: a case report

    DEFF Research Database (Denmark)

    Kalialis, Louise V; Drzewiecki, Krzysztof T; Mohammadi, Mahin

    2008-01-01

    A case of a 61-year-old male with widespread metastatic melanoma is presented 5 years after complete spontaneous cure. Spontaneous regression occurred in cutaneous, pulmonary, hepatic and cerebral metastases. A review of the literature reveals seven cases of regression of cerebral metastases......; this report is the first to document complete spontaneous regression of cerebral metastases from malignant melanoma by means of computed tomography scans. Spontaneous regression is defined as the partial or complete disappearance of a malignant tumour in the absence of all treatment or in the presence...

  4. QSRR modeling for the chromatographic retention behavior of some β-lactam antibiotics using forward and firefly variable selection algorithms coupled with multiple linear regression.

    Science.gov (United States)

    Fouad, Marwa A; Tolba, Enas H; El-Shal, Manal A; El Kerdawy, Ahmed M

    2018-05-11

    The justified continuous emerging of new β-lactam antibiotics provokes the need for developing suitable analytical methods that accelerate and facilitate their analysis. A face central composite experimental design was adopted using different levels of phosphate buffer pH, acetonitrile percentage at zero time and after 15 min in a gradient program to obtain the optimum chromatographic conditions for the elution of 31 β-lactam antibiotics. Retention factors were used as the target property to build two QSRR models utilizing the conventional forward selection and the advanced nature-inspired firefly algorithm for descriptor selection, coupled with multiple linear regression. The obtained models showed high performance in both internal and external validation indicating their robustness and predictive ability. Williams-Hotelling test and student's t-test showed that there is no statistical significant difference between the models' results. Y-randomization validation showed that the obtained models are due to significant correlation between the selected molecular descriptors and the analytes' chromatographic retention. These results indicate that the generated FS-MLR and FFA-MLR models are showing comparable quality on both the training and validation levels. They also gave comparable information about the molecular features that influence the retention behavior of β-lactams under the current chromatographic conditions. We can conclude that in some cases simple conventional feature selection algorithm can be used to generate robust and predictive models comparable to that are generated using advanced ones. Copyright © 2018 Elsevier B.V. All rights reserved.

  5. [On-site evaluation of raw milk qualities by portable Vis/NIR transmittance technique].

    Science.gov (United States)

    Wang, Jia-Hua; Zhang, Xiao-Wei; Wang, Jun; Han, Dong-Hai

    2014-10-01

    To ensure the material safety of dairy products, visible (Vis)/near infrared (NIR) spectroscopy combined with che- mometrics methods was used to develop models for fat, protein, dry matter (DM) and lactose on-site evaluation. A total of 88 raw milk samples were collected from individual livestocks in different years. The spectral of raw milk were measured by a porta- ble Vis/NIR spectrometer with diffused transmittance accessory. To remove the scatter effect and baseline drift, the diffused transmittance spectra were preprocessed by 2nd order derivative with Savitsky-Golay (polynomial order 2, data point 25). Changeable size moving window partial least squares (CSMWPLS) and genetic algorithms partial least squares (GAPLS) meth- ods were suggested to select informative regions for PLS calibration. The PLS and multiple linear regression (MLR) methods were used to develop models for predicting quality index of raw milk. The prediction performance of CSMWPLS models were similar to GAPLS models for fat, protein, DM and lactose evaluation, the root mean standard errors of prediction (RMSEP) were 0.115 6/0.103 3, 0.096 2/0.113 7, 0.201 3/0.123 7 and 0.077 4/0.066 8, and the relative standard deviations of prediction (RPD) were 8.99/10.06, 3.53/2.99, 5.76/9.38 and 1.81/2.10, respectively. Meanwhile, the MLR models were also cal- ibrated with 8, 10, 9 and 7 variables for fat, protein, DM and lactose, respectively. The prediction performance of MLR models was better than or close to PLS models. The MLR models to predict fat, protein, DM and lactose yielded the RMSEP of 0.107 0, 0.093 0, 0.136 0 and 0.065 8, and the RPD of 9.72, 3.66, 8.53 and 2.13, respectively. The results demonstrated the usefulness of Vis/NIR spectra combined with multivariate calibration methods as an objective and rapid method for the quality evaluation of complicated raw milks. And the results obtained also highlight the potential of portable Vis/NIR instruments for on-site assessing quality indexes of

  6. Factors influencing chloride deposition in a coastal hilly area and application to chloride deposition mapping

    Directory of Open Access Journals (Sweden)

    H. Guan

    2010-05-01

    Full Text Available Chloride is commonly used as an environmental tracer for studying water flow and solute transport in the environment. It is especially useful for estimating groundwater recharge based on the commonly used chloride mass balance (CMB method. Strong spatial variability in chloride deposition in coastal areas is one difficulty encountered in appropriately applying the method. A high-resolution bulk chloride deposition map in the coastal region is thus needed. The aim of this study is to construct a chloride deposition map in the Mount Lofty Ranges (MLR, a coastal hilly area of approximately 9000 km2 spatial extent in South Australia. We examined geographic (related to coastal distance, orographic, and atmospheric factors that may influence chloride deposition, using partial correlation and regression analyses. The results indicate that coastal distance, elevation, as well as terrain aspect and slope, appear to be significant factors controlling chloride deposition in the study area. Coastal distance accounts for 70% of spatial variability in bulk chloride deposition, with elevation, terrain aspect and slope an additional 15%. The results are incorporated into a de-trended residual kriging model (ASOADeK to produce a 1 km×1 km resolution bulk chloride deposition and concentration maps. The average uncertainty of the deposition map is about 20–30% in the western MLR, and 40–50% in the eastern MLR. The maps will form a useful basis for examining catchment chloride balance for the CMB application in the study area.

  7. Transfer from blue light or green light to white light partially reverses changes in ocular refraction and anatomy of developing guinea pigs.

    Science.gov (United States)

    Qian, Yi-Feng; Liu, Rui; Dai, Jin-Hui; Chen, Min-Jie; Zhou, Xing-Tao; Chu, Ren-Yuan

    2013-09-26

    Relative to the broadband white light (BL), postnatal guinea pigs develop myopia in a monochromic middle-wavelength light (ML, 530 nm) environment and develop hyperopia in a monochromic short-wavelength light (SL, 430 nm) environment. We investigated whether transfer from SL or ML to BL leads to recuperation of ocular refraction and anatomy of developing guinea pigs. Two-week-old guinea pigs were given (a) SL for 20 weeks, (b) SL recuperation (SLR, SL for 10 weeks then BL for 10 weeks), (c) ML for 20 weeks, (d) ML recuperation (MLR, ML for 10 weeks then BL for 10 weeks), or (e) BL for 20 weeks. Two weeks after transfer from ML to BL (MLR group), ocular refraction increased from 1.95 ± 0.35 D to 2.58 ± 0.24 D, and vitreous length decreased from 3.48 ± 0.06 mm to 3.41 ± 0.06 mm. Two weeks after transfer from SL to BL (SLR group), ocular refraction decreased from 5.65 ± 0.61 D to 4.33 ± 0.49 D, and vitreous length increased from 3.18 ± 0.07 mm to 3.26 ± 0.11 mm. The MLR and SLR groups had final ocular refractions that were significantly different from those of the ML and SL groups at 20 weeks (ML vs. MLR: p < 0.0001; SL vs. SLR: p < 0.0001) but were still significantly different from the BL group (BL vs. MLR: p = 0.0120; BL vs. SLR: p = 0.0010). These results suggest that recuperation was not complete after return to BL for 10 weeks.

  8. New approach to breast cancer CAD using partial least squares and kernel-partial least squares

    Science.gov (United States)

    Land, Walker H., Jr.; Heine, John; Embrechts, Mark; Smith, Tom; Choma, Robert; Wong, Lut

    2005-04-01

    Breast cancer is second only to lung cancer as a tumor-related cause of death in women. Currently, the method of choice for the early detection of breast cancer is mammography. While sensitive to the detection of breast cancer, its positive predictive value (PPV) is low, resulting in biopsies that are only 15-34% likely to reveal malignancy. This paper explores the use of two novel approaches called Partial Least Squares (PLS) and Kernel-PLS (K-PLS) to the diagnosis of breast cancer. The approach is based on optimization for the partial least squares (PLS) algorithm for linear regression and the K-PLS algorithm for non-linear regression. Preliminary results show that both the PLS and K-PLS paradigms achieved comparable results with three separate support vector learning machines (SVLMs), where these SVLMs were known to have been trained to a global minimum. That is, the average performance of the three separate SVLMs were Az = 0.9167927, with an average partial Az (Az90) = 0.5684283. These results compare favorably with the K-PLS paradigm, which obtained an Az = 0.907 and partial Az = 0.6123. The PLS paradigm provided comparable results. Secondly, both the K-PLS and PLS paradigms out performed the ANN in that the Az index improved by about 14% (Az ~ 0.907 compared to the ANN Az of ~ 0.8). The "Press R squared" value for the PLS and K-PLS machine learning algorithms were 0.89 and 0.9, respectively, which is in good agreement with the other MOP values.

  9. Cox regression with missing covariate data using a modified partial likelihood method

    DEFF Research Database (Denmark)

    Martinussen, Torben; Holst, Klaus K.; Scheike, Thomas H.

    2016-01-01

    Missing covariate values is a common problem in survival analysis. In this paper we propose a novel method for the Cox regression model that is close to maximum likelihood but avoids the use of the EM-algorithm. It exploits that the observed hazard function is multiplicative in the baseline hazard...

  10. Non-Asymptotic Oracle Inequalities for the High-Dimensional Cox Regression via Lasso.

    Science.gov (United States)

    Kong, Shengchun; Nan, Bin

    2014-01-01

    We consider finite sample properties of the regularized high-dimensional Cox regression via lasso. Existing literature focuses on linear models or generalized linear models with Lipschitz loss functions, where the empirical risk functions are the summations of independent and identically distributed (iid) losses. The summands in the negative log partial likelihood function for censored survival data, however, are neither iid nor Lipschitz.We first approximate the negative log partial likelihood function by a sum of iid non-Lipschitz terms, then derive the non-asymptotic oracle inequalities for the lasso penalized Cox regression using pointwise arguments to tackle the difficulties caused by lacking iid Lipschitz losses.

  11. Chemical composition of the essential oils of Citrus sinensis cv. valencia and a quantitative structure-retention relationship study for the prediction of retention indices by multiple linear regression

    Directory of Open Access Journals (Sweden)

    Larijani Kambiz

    2011-01-01

    Full Text Available The chemical composition of the volatile fraction obtained by head-space solid phase microextraction (HS-SPME, single drop microextraction (SDME and the essential oil obtained by cold-press from the peels of C. sinensis cv. valencia were analyzed employing gas chromatography-flame ionization detector (GC-FID and gas chromatography-mass spectrometry (GC-MS. The main components were limonene (61.34 %, 68.27 %, 90.50 %, myrcene (17.55 %, 12.35 %, 2.50 %, sabinene (6.50 %, 7.62 %, 0.5 % and α-pinene (0 %, 6.65 %, 1.4 % respectively obtained by HS-SPME, SDME and cold-press. Then a quantitative structure-retention relationship (QSRR study for the prediction of retention indices (RI of the compounds was developed by application of structural descriptors and the multiple linear regression (MLR method. Principal components analysis was used to select the training set. A simple model with low standard errors and high correlation coefficients was obtained. The results illustrated that linear techniques such as MLR combined with a successful variable selection procedure are capable of generating an efficient QSRR model for prediction of the retention indices of different compounds. This model, with high statistical significance (R2 train = 0.983, R2 test = 0.970, Q2 LOO = 0.962, Q2 LGO = 0.936, REP(% = 3.00, could be used adequately for the prediction and description of the retention indices of the volatile compounds.

  12. Reference evapotranspiration forecasting based on local meteorological and global climate information screened by partial mutual information

    Science.gov (United States)

    Fang, Wei; Huang, Shengzhi; Huang, Qiang; Huang, Guohe; Meng, Erhao; Luan, Jinkai

    2018-06-01

    In this study, reference evapotranspiration (ET0) forecasting models are developed for the least economically developed regions subject to meteorological data scarcity. Firstly, the partial mutual information (PMI) capable of capturing the linear and nonlinear dependence is investigated regarding its utility to identify relevant predictors and exclude those that are redundant through the comparison with partial linear correlation. An efficient input selection technique is crucial for decreasing model data requirements. Then, the interconnection between global climate indices and regional ET0 is identified. Relevant climatic indices are introduced as additional predictors to comprise information regarding ET0, which ought to be provided by meteorological data unavailable. The case study in the Jing River and Beiluo River basins, China, reveals that PMI outperforms the partial linear correlation in excluding the redundant information, favouring the yield of smaller predictor sets. The teleconnection analysis identifies the correlation between Nino 1 + 2 and regional ET0, indicating influences of ENSO events on the evapotranspiration process in the study area. Furthermore, introducing Nino 1 + 2 as predictors helps to yield more accurate ET0 forecasts. A model performance comparison also shows that non-linear stochastic models (SVR or RF with input selection through PMI) do not always outperform linear models (MLR with inputs screen by linear correlation). However, the former can offer quite comparable performance depending on smaller predictor sets. Therefore, efforts such as screening model inputs through PMI and incorporating global climatic indices interconnected with ET0 can benefit the development of ET0 forecasting models suitable for data-scarce regions.

  13. Testing of a simplified LED based vis/NIR system for rapid ripeness evaluation of white grape (Vitis vinifera L.) for Franciacorta wine.

    Science.gov (United States)

    Giovenzana, Valentina; Civelli, Raffaele; Beghi, Roberto; Oberti, Roberto; Guidetti, Riccardo

    2015-11-01

    The aim of this work was to test a simplified optical prototype for a rapid estimation of the ripening parameters of white grape for Franciacorta wine directly in field. Spectral acquisition based on reflectance at four wavelengths (630, 690, 750 and 850 nm) was proposed. The integration of a simple processing algorithm in the microcontroller software would allow to visualize real time values of spectral reflectance. Non-destructive analyses were carried out on 95 grape bunches for a total of 475 berries. Samplings were performed weekly during the last ripening stages. Optical measurements were carried out both using the simplified system and a portable commercial vis/NIR spectrophotometer, as reference instrument for performance comparison. Chemometric analyses were performed in order to extract the maximum useful information from optical data. Principal component analysis (PCA) was performed for a preliminary evaluation of the data. Correlations between the optical data matrix and ripening parameters (total soluble solids content, SSC; titratable acidity, TA) were carried out using partial least square (PLS) regression for spectra and using multiple linear regression (MLR) for data from the simplified device. Classification analysis were also performed with the aim of discriminate ripe and unripe samples. PCA, MLR and classification analyses show the effectiveness of the simplified system in separating samples among different sampling dates and in discriminating ripe from unripe samples. Finally, simple equations for SSC and TA prediction were calculated. Copyright © 2015 Elsevier B.V. All rights reserved.

  14. PENGGUNAAN PARTIAL LEAST SQUARE REGRESSION (PLSR UNTUK MENGATASI MULTIKOLINEARITAS DALAM ESTIMASI KLOROFIL DAUN TANAMAN PADI DENGAN CITRA HIPERSPEKTRAL

    Directory of Open Access Journals (Sweden)

    Abdi Sukmono

    2015-02-01

    Full Text Available Klorofil merupakan pigmen yang paling penting dalam proses fotosintesis. Tanaman sehat yang mampu tumbuh maksimum umumnya  memiliki jumlah klorofil yang lebih besar daripada tanaman yang tidak sehat. Dalam Estimasi kandungan klorofil tanaman padi dengan airborne hiperspektral dibutuhkan model khusus untuk mendaaptkan akurasi yang baik. Citra Hhiperspektral mempunyai ratusan band dan julat yang sempit pada setiap bandnya, sehingga mempunyai kemampuan yang cukup baik untuk estimasi klorofil. Akan tetapi karena julat yang cukup sempit ini menyebabkan adanya efek multikolinearitas. Objek dari penelitian ini mengembangkan reflektan in situ menjadi model  estimasi kandungan klorofil tanaman padi untuk citra airborne hiperspektral dengan menggunakan metode partial least square regression untuk menghilangkan efek multikolinearitas.  Dalam penelitian ini dengan menggunakan teknik hubungan reflektan dan klorofil dipilih band-band yang berhungan dan efektif untuk estimasi klorofil. Dari hasil seleksi tersebut terpilih 44 band yang efektif untuk estimasi kandungan klorofil daun tanaman padi. Hasil dari penelitian ini menunjukkan mertode PLSR dapat menghasilkan model yang cukup baik untuk estimasi kandungan klorofil tanaman padi dengan nilai Koefisien determinasi (R2 mencapai 0.75 pada PC no 11 dan mempunyai RMSE sebesar 1.44 SPAD unit. Validasi menggunakan data citra airborne hiperspektral menghasilkan RMSE sebesar 1.07 SPAD Unit.

  15. Comparison of partial least squares and lasso regression techniques as applied to laser-induced breakdown spectroscopy of geological samples

    International Nuclear Information System (INIS)

    Dyar, M.D.; Carmosino, M.L.; Breves, E.A.; Ozanne, M.V.; Clegg, S.M.; Wiens, R.C.

    2012-01-01

    A remote laser-induced breakdown spectrometer (LIBS) designed to simulate the ChemCam instrument on the Mars Science Laboratory Rover Curiosity was used to probe 100 geologic samples at a 9-m standoff distance. ChemCam consists of an integrated remote LIBS instrument that will probe samples up to 7 m from the mast of the rover and a remote micro-imager (RMI) that will record context images. The elemental compositions of 100 igneous and highly-metamorphosed rocks are determined with LIBS using three variations of multivariate analysis, with a goal of improving the analytical accuracy. Two forms of partial least squares (PLS) regression are employed with finely-tuned parameters: PLS-1 regresses a single response variable (elemental concentration) against the observation variables (spectra, or intensity at each of 6144 spectrometer channels), while PLS-2 simultaneously regresses multiple response variables (concentrations of the ten major elements in rocks) against the observation predictor variables, taking advantage of natural correlations between elements. Those results are contrasted with those from the multivariate regression technique of the least absolute shrinkage and selection operator (lasso), which is a penalized shrunken regression method that selects the specific channels for each element that explain the most variance in the concentration of that element. To make this comparison, we use results of cross-validation and of held-out testing, and employ unscaled and uncentered spectral intensity data because all of the input variables are already in the same units. Results demonstrate that the lasso, PLS-1, and PLS-2 all yield comparable results in terms of accuracy for this dataset. However, the interpretability of these methods differs greatly in terms of fundamental understanding of LIBS emissions. PLS techniques generate principal components, linear combinations of intensities at any number of spectrometer channels, which explain as much variance in the

  16. Prediction of Biomass Production and Nutrient Uptake in Land Application Using Partial Least Squares Regression Analysis

    Directory of Open Access Journals (Sweden)

    Vasileios A. Tzanakakis

    2014-12-01

    Full Text Available Partial Least Squares Regression (PLSR can integrate a great number of variables and overcome collinearity problems, a fact that makes it suitable for intensive agronomical practices such as land application. In the present study a PLSR model was developed to predict important management goals, including biomass production and nutrient recovery (i.e., nitrogen and phosphorus, associated with treatment potential, environmental impacts, and economic benefits. Effluent loading and a considerable number of soil parameters commonly monitored in effluent irrigated lands were considered as potential predictor variables during the model development. All data were derived from a three year field trial including plantations of four different plant species (Acacia cyanophylla, Eucalyptus camaldulensis, Populus nigra, and Arundo donax, irrigated with pre-treated domestic effluent. PLSR method was very effective despite the small sample size and the wide nature of data set (with many highly correlated inputs and several highly correlated responses. Through PLSR method the number of initial predictor variables was reduced and only several variables were remained and included in the final PLSR model. The important input variables maintained were: Effluent loading, electrical conductivity (EC, available phosphorus (Olsen-P, Na+, Ca2+, Mg2+, K2+, SAR, and NO3−-N. Among these variables, effluent loading, EC, and nitrates had the greater contribution to the final PLSR model. PLSR is highly compatible with intensive agronomical practices such as land application, in which a large number of highly collinear and noisy input variables is monitored to assess plant species performance and to detect impacts on the environment.

  17. 8th International Conference on Partial Least Squares and Related Methods

    CERN Document Server

    Vinzi, Vincenzo; Russolillo, Giorgio; Saporta, Gilbert; Trinchera, Laura

    2016-01-01

    This volume presents state of the art theories, new developments, and important applications of Partial Least Square (PLS) methods. The text begins with the invited communications of current leaders in the field who cover the history of PLS, an overview of methodological issues, and recent advances in regression and multi-block approaches. The rest of the volume comprises selected, reviewed contributions from the 8th International Conference on Partial Least Squares and Related Methods held in Paris, France, on 26-28 May, 2014. They are organized in four coherent sections: 1) new developments in genomics and brain imaging, 2) new and alternative methods for multi-table and path analysis, 3) advances in partial least square regression (PLSR), and 4) partial least square path modeling (PLS-PM) breakthroughs and applications. PLS methods are very versatile methods that are now used in areas as diverse as engineering, life science, sociology, psychology, brain imaging, genomics, and business among both academics ...

  18. Modeling of methane emissions using artificial neural network approach

    Directory of Open Access Journals (Sweden)

    Stamenković Lidija J.

    2015-01-01

    Full Text Available The aim of this study was to develop a model for forecasting CH4 emissions at the national level, using Artificial Neural Networks (ANN with broadly available sustainability, economical and industrial indicators as their inputs. ANN modeling was performed using two different types of architecture; a Backpropagation Neural Network (BPNN and a General Regression Neural Network (GRNN. A conventional multiple linear regression (MLR model was also developed in order to compare model performance and assess which model provides the best results. ANN and MLR models were developed and tested using the same annual data for 20 European countries. The ANN model demonstrated very good performance, significantly better than the MLR model. It was shown that a forecast of CH4 emissions at the national level using the ANN model can be made successfully and accurately for a future period of up to two years, thereby opening the possibility to apply such a modeling technique which can be used to support the implementation of sustainable development strategies and environmental management policies. [Projekat Ministarstva nauke Republike Srbije, br. 172007

  19. Particle swarm optimization and genetic algorithm as feature selection techniques for the QSAR modeling of imidazo[1,5-a]pyrido[3,2-e]pyrazines, inhibitors of phosphodiesterase 10A.

    Science.gov (United States)

    Goodarzi, Mohammad; Saeys, Wouter; Deeb, Omar; Pieters, Sigrid; Vander Heyden, Yvan

    2013-12-01

    Quantitative structure-activity relationship (QSAR) modeling was performed for imidazo[1,5-a]pyrido[3,2-e]pyrazines, which constitute a class of phosphodiesterase 10A inhibitors. Particle swarm optimization (PSO) and genetic algorithm (GA) were used as feature selection techniques to find the most reliable molecular descriptors from a large pool. Modeling of the relationship between the selected descriptors and the pIC50 activity data was achieved by linear [multiple linear regression (MLR)] and non-linear [locally weighted regression (LWR) based on both Euclidean (E) and Mahalanobis (M) distances] methods. In addition, a stepwise MLR model was built using only a limited number of quantum chemical descriptors, selected because of their correlation with the pIC50 . The model was not found interesting. It was concluded that the LWR model, based on the Euclidean distance, applied on the descriptors selected by PSO has the best prediction ability. However, some other models behaved similarly. The root-mean-squared errors of prediction (RMSEP) for the test sets obtained by PSO/MLR, GA/MLR, PSO/LWRE, PSO/LWRM, GA/LWRE, and GA/LWRM models were 0.333, 0.394, 0.313, 0.333, 0.421, and 0.424, respectively. The PSO-selected descriptors resulted in the best prediction models, both linear and non-linear. © 2013 John Wiley & Sons A/S.

  20. Recent predictors of Indian summer monsoon based on Indian and Pacific Ocean SST

    Science.gov (United States)

    Shahi, Namendra Kumar; Rai, Shailendra; Mishra, Nishant

    2018-02-01

    This study investigates the relationship between sea surface temperature (SST) of various geographical locations of Indian and Pacific Ocean with the Indian summer monsoon rainfall (ISMR) to identify possible predictors of ISMR. We identified eight SST predictors based on spatial patterns of correlation coefficients between ISMR and SST of the regions mentioned above during the time domain 1982-2013. The five multiple linear regression (MLR) models have been developed by these predictors in various combinations. The stability and performance of these MLR models are verified using cross-validation method and other statistical methods. The skill of forecast to predict observed ISMR from these MLR models is found to be substantially better based on various statistical verification measures. It is observed that the MLR models constructed using the combination of SST indices in tropical and extra tropical Indian and Pacific is able to predict ISMR accurately for almost all the years during the time domain of our study. We tried to propose the physical mechanism of the teleconnection through regression analysis with wind over Indian subcontinent and the eight predictors and the results are in the conformity with correlation coefficient analysis. The robustness of these models is seen by predicting the ISMR during recent independent years of 2014-2017 and found the model 5 is able to predict ISMR accurately in these years also.

  1. Prediction of aged red wine aroma properties from aroma chemical composition. Partial least squares regression models.

    Science.gov (United States)

    Aznar, Margarita; López, Ricardo; Cacho, Juan; Ferreira, Vicente

    2003-04-23

    Partial least squares regression (PLSR) models able to predict some of the wine aroma nuances from its chemical composition have been developed. The aromatic sensory characteristics of 57 Spanish aged red wines were determined by 51 experts from the wine industry. The individual descriptions given by the experts were recorded, and the frequency with which a sensory term was used to define a given wine was taken as a measurement of its intensity. The aromatic chemical composition of the wines was determined by already published gas chromatography (GC)-flame ionization detector and GC-mass spectrometry methods. In the whole, 69 odorants were analyzed. Both matrixes, the sensory and chemical data, were simplified by grouping and rearranging correlated sensory terms or chemical compounds and by the exclusion of secondary aroma terms or of weak aroma chemicals. Finally, models were developed for 18 sensory terms and 27 chemicals or groups of chemicals. Satisfactory models, explaining more than 45% of the original variance, could be found for nine of the most important sensory terms (wood-vanillin-cinnamon, animal-leather-phenolic, toasted-coffee, old wood-reduction, vegetal-pepper, raisin-flowery, sweet-candy-cacao, fruity, and berry fruit). For this set of terms, the correlation coefficients between the measured and predicted Y (determined by cross-validation) ranged from 0.62 to 0.81. Models confirmed the existence of complex multivariate relationships between chemicals and odors. In general, pleasant descriptors were positively correlated to chemicals with pleasant aroma, such as vanillin, beta damascenone, or (E)-beta-methyl-gamma-octalactone, and negatively correlated to compounds showing less favorable odor properties, such as 4-ethyl and vinyl phenols, 3-(methylthio)-1-propanol, or phenylacetaldehyde.

  2. Development of multiple linear regression models as predictive tools for fecal indicator concentrations in a stretch of the lower Lahn River, Germany.

    Science.gov (United States)

    Herrig, Ilona M; Böer, Simone I; Brennholt, Nicole; Manz, Werner

    2015-11-15

    Since rivers are typically subject to rapid changes in microbiological water quality, tools are needed to allow timely water quality assessment. A promising approach is the application of predictive models. In our study, we developed multiple linear regression (MLR) models in order to predict the abundance of the fecal indicator organisms Escherichia coli (EC), intestinal enterococci (IE) and somatic coliphages (SC) in the Lahn River, Germany. The models were developed on the basis of an extensive set of environmental parameters collected during a 12-months monitoring period. Two models were developed for each type of indicator: 1) an extended model including the maximum number of variables significantly explaining variations in indicator abundance and 2) a simplified model reduced to the three most influential explanatory variables, thus obtaining a model which is less resource-intensive with regard to required data. Both approaches have the ability to model multiple sites within one river stretch. The three most important predictive variables in the optimized models for the bacterial indicators were NH4-N, turbidity and global solar irradiance, whereas chlorophyll a content, discharge and NH4-N were reliable model variables for somatic coliphages. Depending on indicator type, the extended mode models also included the additional variables rainfall, O2 content, pH and chlorophyll a. The extended mode models could explain 69% (EC), 74% (IE) and 72% (SC) of the observed variance in fecal indicator concentrations. The optimized models explained the observed variance in fecal indicator concentrations to 65% (EC), 70% (IE) and 68% (SC). Site-specific efficiencies ranged up to 82% (EC) and 81% (IE, SC). Our results suggest that MLR models are a promising tool for a timely water quality assessment in the Lahn area. Copyright © 2015 Elsevier Ltd. All rights reserved.

  3. Linear and non-linear quantitative structure-activity relationship models on indole substitution patterns as inhibitors of HIV-1 attachment.

    Science.gov (United States)

    Nirouei, Mahyar; Ghasemi, Ghasem; Abdolmaleki, Parviz; Tavakoli, Abdolreza; Shariati, Shahab

    2012-06-01

    The antiviral drugs that inhibit human immunodeficiency virus (HIV) entry to the target cells are already in different phases of clinical trials. They prevent viral entry and have a highly specific mechanism of action with a low toxicity profile. Few QSAR studies have been performed on this group of inhibitors. This study was performed to develop a quantitative structure-activity relationship (QSAR) model of the biological activity of indole glyoxamide derivatives as inhibitors of the interaction between HIV glycoprotein gp120 and host cell CD4 receptors. Forty different indole glyoxamide derivatives were selected as a sample set and geometrically optimized using Gaussian 98W. Different combinations of multiple linear regression (MLR), genetic algorithms (GA) and artificial neural networks (ANN) were then utilized to construct the QSAR models. These models were also utilized to select the most efficient subsets of descriptors in a cross-validation procedure for non-linear log (1/EC50) prediction. The results that were obtained using GA-ANN were compared with MLR-MLR and MLR-ANN models. A high predictive ability was observed for the MLR, MLR-ANN and GA-ANN models, with root mean sum square errors (RMSE) of 0.99, 0.91 and 0.67, respectively (N = 40). In summary, machine learning methods were highly effective in designing QSAR models when compared to statistical method.

  4. Potable NIR spectroscopy predicting soluble solids content of pears based on LEDs

    Energy Technology Data Exchange (ETDEWEB)

    Liu Yande; Liu Wei; Sun Xudong; Gao Rongjie; Pan Yuanyuan; Ouyang Aiguo, E-mail: jxliuyd@163.com [School of Mechatronics Engineering, East China Jiaotong University, Changbei Open and Developing District, Nanchang, 330013 (China)

    2011-01-01

    A portable near-infrared (NIR) instrument was developed for predicting soluble solids content (SSC) of pears equipped with light emitting diodes (LEDs). NIR spectra were collected on the calibration and prediction sets (145:45). Relationships between spectra and SSC were developed by multivariate linear regression (MLR), partial least squares (PLS) and artificial neural networks (ANNs) in the calibration set. The 45 unknown pears were applied to evaluate the performance of them in terms of root mean square errors of prediction (RMSEP) and correlation coefficients (r). The best result was obtained by PLS with RMSEP of 0.62{sup 0}Brix and r of 0.82. The results showed that the SSC of pears could be predicted by the portable NIR instrument.

  5. Potable NIR spectroscopy predicting soluble solids content of pears based on LEDs

    International Nuclear Information System (INIS)

    Liu Yande; Liu Wei; Sun Xudong; Gao Rongjie; Pan Yuanyuan; Ouyang Aiguo

    2011-01-01

    A portable near-infrared (NIR) instrument was developed for predicting soluble solids content (SSC) of pears equipped with light emitting diodes (LEDs). NIR spectra were collected on the calibration and prediction sets (145:45). Relationships between spectra and SSC were developed by multivariate linear regression (MLR), partial least squares (PLS) and artificial neural networks (ANNs) in the calibration set. The 45 unknown pears were applied to evaluate the performance of them in terms of root mean square errors of prediction (RMSEP) and correlation coefficients (r). The best result was obtained by PLS with RMSEP of 0.62 0 Brix and r of 0.82. The results showed that the SSC of pears could be predicted by the portable NIR instrument.

  6. Data-driven discovery of partial differential equations.

    Science.gov (United States)

    Rudy, Samuel H; Brunton, Steven L; Proctor, Joshua L; Kutz, J Nathan

    2017-04-01

    We propose a sparse regression method capable of discovering the governing partial differential equation(s) of a given system by time series measurements in the spatial domain. The regression framework relies on sparsity-promoting techniques to select the nonlinear and partial derivative terms of the governing equations that most accurately represent the data, bypassing a combinatorially large search through all possible candidate models. The method balances model complexity and regression accuracy by selecting a parsimonious model via Pareto analysis. Time series measurements can be made in an Eulerian framework, where the sensors are fixed spatially, or in a Lagrangian framework, where the sensors move with the dynamics. The method is computationally efficient, robust, and demonstrated to work on a variety of canonical problems spanning a number of scientific domains including Navier-Stokes, the quantum harmonic oscillator, and the diffusion equation. Moreover, the method is capable of disambiguating between potentially nonunique dynamical terms by using multiple time series taken with different initial data. Thus, for a traveling wave, the method can distinguish between a linear wave equation and the Korteweg-de Vries equation, for instance. The method provides a promising new technique for discovering governing equations and physical laws in parameterized spatiotemporal systems, where first-principles derivations are intractable.

  7. Comparison of partial least squares and lasso regression techniques as applied to laser-induced breakdown spectroscopy of geological samples

    Energy Technology Data Exchange (ETDEWEB)

    Dyar, M.D., E-mail: mdyar@mtholyoke.edu [Dept. of Astronomy, Mount Holyoke College, 50 College St., South Hadley, MA 01075 (United States); Carmosino, M.L.; Breves, E.A.; Ozanne, M.V. [Dept. of Astronomy, Mount Holyoke College, 50 College St., South Hadley, MA 01075 (United States); Clegg, S.M.; Wiens, R.C. [Los Alamos National Laboratory, P.O. Box 1663, MS J565, Los Alamos, NM 87545 (United States)

    2012-04-15

    A remote laser-induced breakdown spectrometer (LIBS) designed to simulate the ChemCam instrument on the Mars Science Laboratory Rover Curiosity was used to probe 100 geologic samples at a 9-m standoff distance. ChemCam consists of an integrated remote LIBS instrument that will probe samples up to 7 m from the mast of the rover and a remote micro-imager (RMI) that will record context images. The elemental compositions of 100 igneous and highly-metamorphosed rocks are determined with LIBS using three variations of multivariate analysis, with a goal of improving the analytical accuracy. Two forms of partial least squares (PLS) regression are employed with finely-tuned parameters: PLS-1 regresses a single response variable (elemental concentration) against the observation variables (spectra, or intensity at each of 6144 spectrometer channels), while PLS-2 simultaneously regresses multiple response variables (concentrations of the ten major elements in rocks) against the observation predictor variables, taking advantage of natural correlations between elements. Those results are contrasted with those from the multivariate regression technique of the least absolute shrinkage and selection operator (lasso), which is a penalized shrunken regression method that selects the specific channels for each element that explain the most variance in the concentration of that element. To make this comparison, we use results of cross-validation and of held-out testing, and employ unscaled and uncentered spectral intensity data because all of the input variables are already in the same units. Results demonstrate that the lasso, PLS-1, and PLS-2 all yield comparable results in terms of accuracy for this dataset. However, the interpretability of these methods differs greatly in terms of fundamental understanding of LIBS emissions. PLS techniques generate principal components, linear combinations of intensities at any number of spectrometer channels, which explain as much variance in the

  8. Incorporating wind availability into land use regression modelling of air quality in mountainous high-density urban environment.

    Science.gov (United States)

    Shi, Yuan; Lau, Kevin Ka-Lun; Ng, Edward

    2017-08-01

    Urban air quality serves as an important function of the quality of urban life. Land use regression (LUR) modelling of air quality is essential for conducting health impacts assessment but more challenging in mountainous high-density urban scenario due to the complexities of the urban environment. In this study, a total of 21 LUR models are developed for seven kinds of air pollutants (gaseous air pollutants CO, NO 2 , NO x , O 3 , SO 2 and particulate air pollutants PM 2.5 , PM 10 ) with reference to three different time periods (summertime, wintertime and annual average of 5-year long-term hourly monitoring data from local air quality monitoring network) in Hong Kong. Under the mountainous high-density urban scenario, we improved the traditional LUR modelling method by incorporating wind availability information into LUR modelling based on surface geomorphometrical analysis. As a result, 269 independent variables were examined to develop the LUR models by using the "ADDRESS" independent variable selection method and stepwise multiple linear regression (MLR). Cross validation has been performed for each resultant model. The results show that wind-related variables are included in most of the resultant models as statistically significant independent variables. Compared with the traditional method, a maximum increase of 20% was achieved in the prediction performance of annual averaged NO 2 concentration level by incorporating wind-related variables into LUR model development. Copyright © 2017 Elsevier Inc. All rights reserved.

  9. Variance Function Partially Linear Single-Index Models1.

    Science.gov (United States)

    Lian, Heng; Liang, Hua; Carroll, Raymond J

    2015-01-01

    We consider heteroscedastic regression models where the mean function is a partially linear single index model and the variance function depends upon a generalized partially linear single index model. We do not insist that the variance function depend only upon the mean function, as happens in the classical generalized partially linear single index model. We develop efficient and practical estimation methods for the variance function and for the mean function. Asymptotic theory for the parametric and nonparametric parts of the model is developed. Simulations illustrate the results. An empirical example involving ozone levels is used to further illustrate the results, and is shown to be a case where the variance function does not depend upon the mean function.

  10. Focused information criterion and model averaging based on weighted composite quantile regression

    KAUST Repository

    Xu, Ganggang; Wang, Suojin; Huang, Jianhua Z.

    2013-01-01

    We study the focused information criterion and frequentist model averaging and their application to post-model-selection inference for weighted composite quantile regression (WCQR) in the context of the additive partial linear models. With the non

  11. Forecasting on the total volumes of Malaysia's imports and exports by multiple linear regression

    Science.gov (United States)

    Beh, W. L.; Yong, M. K. Au

    2017-04-01

    This study is to give an insight on the doubt of the important of macroeconomic variables that affecting the total volumes of Malaysia's imports and exports by using multiple linear regression (MLR) analysis. The time frame for this study will be determined by using quarterly data of the total volumes of Malaysia's imports and exports covering the period between 2000-2015. The macroeconomic variables will be limited to eleven variables which are the exchange rate of US Dollar with Malaysia Ringgit (USD-MYR), exchange rate of China Yuan with Malaysia Ringgit (RMB-MYR), exchange rate of European Euro with Malaysia Ringgit (EUR-MYR), exchange rate of Singapore Dollar with Malaysia Ringgit (SGD-MYR), crude oil prices, gold prices, producer price index (PPI), interest rate, consumer price index (CPI), industrial production index (IPI) and gross domestic product (GDP). This study has applied the Johansen Co-integration test to investigate the relationship among the total volumes to Malaysia's imports and exports. The result shows that crude oil prices, RMB-MYR, EUR-MYR and IPI play important roles in the total volumes of Malaysia's imports. Meanwhile crude oil price, USD-MYR and GDP play important roles in the total volumes of Malaysia's exports.

  12. Application of Fourier transform infrared spectroscopy and orthogonal projections to latent structures/partial least squares regression for estimation of procyanidins average degree of polymerisation.

    Science.gov (United States)

    Passos, Cláudia P; Cardoso, Susana M; Barros, António S; Silva, Carlos M; Coimbra, Manuel A

    2010-02-28

    Fourier transform infrared (FTIR) spectroscopy has being emphasised as a widespread technique in the quick assess of food components. In this work, procyanidins were extracted with methanol and acetone/water from the seeds of white and red grape varieties. A fractionation by graded methanol/chloroform precipitations allowed to obtain 26 samples that were characterised using thiolysis as pre-treatment followed by HPLC-UV and MS detection. The average degree of polymerisation (DPn) of the procyanidins in the samples ranged from 2 to 11 flavan-3-ol residues. FTIR spectroscopy within the wavenumbers region of 1800-700 cm(-1) allowed to build a partial least squares (PLS1) regression model with 8 latent variables (LVs) for the estimation of the DPn, giving a RMSECV of 11.7%, with a R(2) of 0.91 and a RMSEP of 2.58. The application of orthogonal projection to latent structures (O-PLS1) clarifies the interpretation of the regression model vectors. Moreover, the O-PLS procedure has removed 88% of non-correlated variations with the DPn, allowing to relate the increase of the absorbance peaks at 1203 and 1099 cm(-1) with the increase of the DPn due to the higher proportion of substitutions in the aromatic ring of the polymerised procyanidin molecules. Copyright 2009 Elsevier B.V. All rights reserved.

  13. Artificial neural networks and multiple linear regression model using principal components to estimate rainfall over South America

    Directory of Open Access Journals (Sweden)

    T. Soares dos Santos

    2016-01-01

    model output and observed monthly precipitation. We used general circulation model (GCM experiments for the 20th century (RCP historical; 1970–1999 and two scenarios (RCP 2.6 and 8.5; 2070–2100. The model test results indicate that the ANNs significantly outperform the MLR downscaling of monthly precipitation variability.

  14. ANYOLS, Least Square Fit by Stepwise Regression

    International Nuclear Information System (INIS)

    Atwoods, C.L.; Mathews, S.

    1986-01-01

    Description of program or function: ANYOLS is a stepwise program which fits data using ordinary or weighted least squares. Variables are selected for the model in a stepwise way based on a user- specified input criterion or a user-written subroutine. The order in which variables are entered can be influenced by user-defined forcing priorities. Instead of stepwise selection, ANYOLS can try all possible combinations of any desired subset of the variables. Automatic output for the final model in a stepwise search includes plots of the residuals, 'studentized' residuals, and leverages; if the model is not too large, the output also includes partial regression and partial leverage plots. A data set may be re-used so that several selection criteria can be tried. Flexibility is increased by allowing the substitution of user-written subroutines for several default subroutines

  15. Dendroclimatic transfer functions revisited: Little Ice Age and Medieval Warm Period summer temperatures reconstructed using artificial neural networks and linear algorithms

    Energy Technology Data Exchange (ETDEWEB)

    Helama, S.; Holopainen, J.; Eronen, M. [Department of Geology, University of Helsinki, (Finland); Makarenko, N.G. [Russian Academy of Sciences, St. Petersburg (Russian Federation). Pulkovo Astronomical Observatory; Karimova, L.M.; Kruglun, O.A. [Institute of Mathematics, Almaty (Kazakhstan); Timonen, M. [Finnish Forest Research Institute, Rovaniemi Research Unit (Finland); Merilaeinen, J. [SAIMA Unit of the Savonlinna Department of Teacher Education, University of Joensuu (Finland)

    2009-07-01

    Tree-rings tell of past climates. To do so, tree-ring chronologies comprising numerous climate-sensitive living-tree and subfossil time-series need to be 'transferred' into palaeoclimate estimates using transfer functions. The purpose of this study is to compare different types of transfer functions, especially linear and nonlinear algorithms. Accordingly, multiple linear regression (MLR), linear scaling (LSC) and artificial neural networks (ANN, nonlinear algorithm) were compared. Transfer functions were built using a regional tree-ring chronology and instrumental temperature observations from Lapland (northern Finland and Sweden). In addition, conventional MLR was compared with a hybrid model whereby climate was reconstructed separately for short- and long-period timescales prior to combining the bands of timescales into a single hybrid model. The fidelity of the different reconstructions was validated against instrumental climate data. The reconstructions by MLR and ANN showed reliable reconstruction capabilities over the instrumental period (AD 1802-1998). LCS failed to reach reasonable verification statistics and did not qualify as a reliable reconstruction: this was due mainly to exaggeration of the low-frequency climatic variance. Over this instrumental period, the reconstructed low-frequency amplitudes of climate variability were rather similar by MLR and ANN. Notably greater differences between the models were found over the actual reconstruction period (AD 802-1801). A marked temperature decline, as reconstructed by MLR, from the Medieval Warm Period (AD 931-1180) to the Little Ice Age (AD 1601-1850), was evident in all the models. This decline was approx. 0.5 C as reconstructed by MLR. Different ANN based palaeotemperatures showed simultaneous cooling of 0.2 to 0.5 C, depending on algorithm. The hybrid MLR did not seem to provide further benefit above conventional MLR in our sample. The robustness of the conventional MLR over the calibration

  16. Estimating the exceedance probability of rain rate by logistic regression

    Science.gov (United States)

    Chiu, Long S.; Kedem, Benjamin

    1990-01-01

    Recent studies have shown that the fraction of an area with rain intensity above a fixed threshold is highly correlated with the area-averaged rain rate. To estimate the fractional rainy area, a logistic regression model, which estimates the conditional probability that rain rate over an area exceeds a fixed threshold given the values of related covariates, is developed. The problem of dependency in the data in the estimation procedure is bypassed by the method of partial likelihood. Analyses of simulated scanning multichannel microwave radiometer and observed electrically scanning microwave radiometer data during the Global Atlantic Tropical Experiment period show that the use of logistic regression in pixel classification is superior to multiple regression in predicting whether rain rate at each pixel exceeds a given threshold, even in the presence of noisy data. The potential of the logistic regression technique in satellite rain rate estimation is discussed.

  17. Evaluation and prediction of shrub cover in coastal Oregon forests (USA)

    Science.gov (United States)

    Becky K. Kerns; Janet L. Ohmann

    2004-01-01

    We used data from regional forest inventories and research programs, coupled with mapped climatic and topographic information, to explore relationships and develop multiple linear regression (MLR) and regression tree models for total and deciduous shrub cover in the Oregon coastal province. Results from both types of models indicate that forest structure variables were...

  18. Differentiating regressed melanoma from regressed lichenoid keratosis.

    Science.gov (United States)

    Chan, Aegean H; Shulman, Kenneth J; Lee, Bonnie A

    2017-04-01

    Distinguishing regressed lichen planus-like keratosis (LPLK) from regressed melanoma can be difficult on histopathologic examination, potentially resulting in mismanagement of patients. We aimed to identify histopathologic features by which regressed melanoma can be differentiated from regressed LPLK. Twenty actively inflamed LPLK, 12 LPLK with regression and 15 melanomas with regression were compared and evaluated by hematoxylin and eosin staining as well as Melan-A, microphthalmia transcription factor (MiTF) and cytokeratin (AE1/AE3) immunostaining. (1) A total of 40% of regressed melanomas showed complete or near complete loss of melanocytes within the epidermis with Melan-A and MiTF immunostaining, while 8% of regressed LPLK exhibited this finding. (2) Necrotic keratinocytes were seen in the epidermis in 33% regressed melanomas as opposed to all of the regressed LPLK. (3) A dense infiltrate of melanophages in the papillary dermis was seen in 40% of regressed melanomas, a feature not seen in regressed LPLK. In summary, our findings suggest that a complete or near complete loss of melanocytes within the epidermis strongly favors a regressed melanoma over a regressed LPLK. In addition, necrotic epidermal keratinocytes and the presence of a dense band-like distribution of dermal melanophages can be helpful in differentiating these lesions. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  19. Support vector machine regression (LS-SVM)--an alternative to artificial neural networks (ANNs) for the analysis of quantum chemistry data?

    Science.gov (United States)

    Balabin, Roman M; Lomakina, Ekaterina I

    2011-06-28

    A multilayer feed-forward artificial neural network (MLP-ANN) with a single, hidden layer that contains a finite number of neurons can be regarded as a universal non-linear approximator. Today, the ANN method and linear regression (MLR) model are widely used for quantum chemistry (QC) data analysis (e.g., thermochemistry) to improve their accuracy (e.g., Gaussian G2-G4, B3LYP/B3-LYP, X1, or W1 theoretical methods). In this study, an alternative approach based on support vector machines (SVMs) is used, the least squares support vector machine (LS-SVM) regression. It has been applied to ab initio (first principle) and density functional theory (DFT) quantum chemistry data. So, QC + SVM methodology is an alternative to QC + ANN one. The task of the study was to estimate the Møller-Plesset (MPn) or DFT (B3LYP, BLYP, BMK) energies calculated with large basis sets (e.g., 6-311G(3df,3pd)) using smaller ones (6-311G, 6-311G*, 6-311G**) plus molecular descriptors. A molecular set (BRM-208) containing a total of 208 organic molecules was constructed and used for the LS-SVM training, cross-validation, and testing. MP2, MP3, MP4(DQ), MP4(SDQ), and MP4/MP4(SDTQ) ab initio methods were tested. Hartree-Fock (HF/SCF) results were also reported for comparison. Furthermore, constitutional (CD: total number of atoms and mole fractions of different atoms) and quantum-chemical (QD: HOMO-LUMO gap, dipole moment, average polarizability, and quadrupole moment) molecular descriptors were used for the building of the LS-SVM calibration model. Prediction accuracies (MADs) of 1.62 ± 0.51 and 0.85 ± 0.24 kcal mol(-1) (1 kcal mol(-1) = 4.184 kJ mol(-1)) were reached for SVM-based approximations of ab initio and DFT energies, respectively. The LS-SVM model was more accurate than the MLR model. A comparison with the artificial neural network approach shows that the accuracy of the LS-SVM method is similar to the accuracy of ANN. The extrapolation and interpolation results show that LS-SVM is

  20. Research Article Special Issue

    African Journals Online (AJOL)

    2018-01-15

    Jan 15, 2018 ... The leaves contain essential oils and serve for their production, as .... Linear Regressions (MLR) is used in modeling data [21]. ..... Predicting hourly ozone concentration in Dali area of Taichung County based on multiple.

  1. Application of sequential and orthogonalised-partial least squares (SO-PLS) regression to predict sensory properties of Cabernet Sauvignon wines from grape chemical composition.

    Science.gov (United States)

    Niimi, Jun; Tomic, Oliver; Næs, Tormod; Jeffery, David W; Bastian, Susan E P; Boss, Paul K

    2018-08-01

    The current study determined the applicability of sequential and orthogonalised-partial least squares (SO-PLS) regression to relate Cabernet Sauvignon grape chemical composition to the sensory perception of the corresponding wines. Grape samples (n = 25) were harvested at a similar maturity and vinified identically in 2013. Twelve measures using various (bio)chemical methods were made on grapes. Wines were evaluated using descriptive analysis with a trained panel (n = 10) for sensory profiling. Data was analysed globally using SO-PLS for the entire sensory profiles (SO-PLS2), as well as for single sensory attributes (SO-PLS1). SO-PLS1 models were superior in validated explained variances than SO-PLS2. SO-PLS provided a structured approach in the selection of predictor chemical data sets that best contributed to the correlation of important sensory attributes. This new approach presents great potential for application in other explorative metabolomics studies of food and beverages to address factors such as quality and regional influences. Copyright © 2018 Elsevier Ltd. All rights reserved.

  2. Comparison of several measure-correlate-predict models using support vector regression techniques to estimate wind power densities. A case study

    International Nuclear Information System (INIS)

    Díaz, Santiago; Carta, José A.; Matías, José M.

    2017-01-01

    Highlights: • Eight measure-correlate-predict (MCP) models used to estimate the wind power densities (WPDs) at a target site are compared. • Support vector regressions are used as the main prediction techniques in the proposed MCPs. • The most precise MCP uses two sub-models which predict wind speed and air density in an unlinked manner. • The most precise model allows to construct a bivariable (wind speed and air density) WPD probability density function. • MCP models trained to minimise wind speed prediction error do not minimise WPD prediction error. - Abstract: The long-term annual mean wind power density (WPD) is an important indicator of wind as a power source which is usually included in regional wind resource maps as useful prior information to identify potentially attractive sites for the installation of wind projects. In this paper, a comparison is made of eight proposed Measure-Correlate-Predict (MCP) models to estimate the WPDs at a target site. Seven of these models use the Support Vector Regression (SVR) and the eighth the Multiple Linear Regression (MLR) technique, which serves as a basis to compare the performance of the other models. In addition, a wrapper technique with 10-fold cross-validation has been used to select the optimal set of input features for the SVR and MLR models. Some of the eight models were trained to directly estimate the mean hourly WPDs at a target site. Others, however, were firstly trained to estimate the parameters on which the WPD depends (i.e. wind speed and air density) and then, using these parameters, the target site mean hourly WPDs. The explanatory features considered are different combinations of the mean hourly wind speeds, wind directions and air densities recorded in 2014 at ten weather stations in the Canary Archipelago (Spain). The conclusions that can be drawn from the study undertaken include the argument that the most accurate method for the long-term estimation of WPDs requires the execution of a

  3. Modeling of Temperature Effect on Modal Frequency of Concrete Beam Based on Field Monitoring Data

    Directory of Open Access Journals (Sweden)

    Wenchen Shan

    2018-01-01

    Full Text Available Temperature variation has been widely demonstrated to produce significant effect on modal frequencies that even exceed the effect of actual damage. In order to eliminate the temperature effect on modal frequency, an effective method is to construct quantitative models which accurately predict the modal frequency corresponding to temperature variation. In this paper, principal component analysis (PCA is conducted on the temperatures taken from all embedded thermocouples for extracting input parameters of regression models. Three regression-based numerical models using multiple linear regression (MLR, back-propagation neural network (BPNN, and support vector regression (SVR techniques are constructed to capture the relationships between modal frequencies and temperature distributions from measurements of a concrete beam during a period of forty days of monitoring. A comparison with respect to the performance of various optimally configured regression models has been performed on measurement data. Results indicate that the SVR exhibits a better reproduction and prediction capability than BPNN and MLR models for predicting the modal frequencies with respect to nonuniformly distributed temperatures. It is succeeded that temperature effects on modal frequencies can be effectively eliminated based on the optimally formulated SVR model.

  4. Prediction of Outcome in Acute Lower Gastrointestinal Bleeding Using Gradient Boosting.

    Directory of Open Access Journals (Sweden)

    Lakshmana Ayaru

    Full Text Available There are no widely used models in clinical care to predict outcome in acute lower gastro-intestinal bleeding (ALGIB. If available these could help triage patients at presentation to appropriate levels of care/intervention and improve medical resource utilisation. We aimed to apply a state-of-the-art machine learning classifier, gradient boosting (GB, to predict outcome in ALGIB using non-endoscopic measurements as predictors.Non-endoscopic variables from patients with ALGIB attending the emergency departments of two teaching hospitals were analysed retrospectively for training/internal validation (n=170 and external validation (n=130 of the GB model. The performance of the GB algorithm in predicting recurrent bleeding, clinical intervention and severe bleeding was compared to a multiple logic regression (MLR model and two published MLR-based prediction algorithms (BLEED and Strate prediction rule.The GB algorithm had the best negative predictive values for the chosen outcomes (>88%. On internal validation the accuracy of the GB algorithm for predicting recurrent bleeding, therapeutic intervention and severe bleeding were (88%, 88% and 78% respectively and superior to the BLEED classification (64%, 68% and 63%, Strate prediction rule (78%, 78%, 67% and conventional MLR (74%, 74% 62%. On external validation the accuracy was similar to conventional MLR for recurrent bleeding (88% vs. 83% and therapeutic intervention (91% vs. 87% but superior for severe bleeding (83% vs. 71%.The gradient boosting algorithm accurately predicts outcome in patients with acute lower gastrointestinal bleeding and outperforms multiple logistic regression based models. These may be useful for risk stratification of patients on presentation to the emergency department.

  5. Partial Least Squares Regression for Determining the Control Factors for Runoff and Suspended Sediment Yield during Rainfall Events

    Directory of Open Access Journals (Sweden)

    Nufang Fang

    2015-07-01

    Full Text Available Multivariate statistics are commonly used to identify the factors that control the dynamics of runoff or sediment yields during hydrological processes. However, one issue with the use of conventional statistical methods to address relationships between variables and runoff or sediment yield is multicollinearity. The main objectives of this study were to apply a method for effectively identifying runoff and sediment control factors during hydrological processes and apply that method to a case study. The method combines the clustering approach and partial least squares regression (PLSR models. The case study was conducted in a mountainous watershed in the Three Gorges Area. A total of 29 flood events in three hydrological years in areas with different land uses were obtained. In total, fourteen related variables were separated from hydrographs using the classical hydrograph separation method. Twenty-nine rainfall events were classified into two rainfall regimes (heavy Rainfall Regime I and moderate Rainfall Regime II based on rainfall characteristics and K-means clustering. Four separate PLSR models were constructed to identify the main variables that control runoff and sediment yield for the two rainfall regimes. For Rainfall Regime I, the dominant first-order factors affecting the changes in sediment yield in our study were all of the four rainfall-related variables, flood peak discharge, maximum flood suspended sediment concentration, runoff, and the percentages of forest and farmland. For Rainfall Regime II, antecedent condition-related variables have more effects on both runoff and sediment yield than in Rainfall Regime I. The results suggest that the different control factors of the two rainfall regimes are determined by the rainfall characteristics and thus different runoff mechanisms.

  6. Modeling of energy consumption and related GHG (greenhouse gas) intensity and emissions in Europe using general regression neural networks

    International Nuclear Information System (INIS)

    Antanasijević, Davor; Pocajt, Viktor; Ristić, Mirjana; Perić-Grujić, Aleksandra

    2015-01-01

    This paper presents a new approach for the estimation of energy-related GHG (greenhouse gas) emissions at the national level that combines the simplicity of the concept of GHG intensity and the generalization capabilities of ANNs (artificial neural networks). The main objectives of this work includes the determination of the accuracy of a GRNN (general regression neural network) model applied for the prediction of EC (energy consumption) and GHG intensity of energy consumption, utilizing general country statistics as inputs, as well as analysis of the accuracy of energy-related GHG emissions obtained by multiplying the two aforementioned outputs. The models were developed using historical data from the period 2004–2012, for a set of 26 European countries (EU Members). The obtained results demonstrate that the GRNN GHG intensity model provides a more accurate prediction, with the MAPE (mean absolute percentage error) of 4.5%, than tested MLR (multiple linear regression) and second-order and third-order non-linear MPR (multiple polynomial regression) models. Also, the GRNN EC model has high accuracy (MAPE = 3.6%), and therefore both GRNN models and the proposed approach can be considered as suitable for the calculation of GHG emissions. The energy-related predicted GHG emissions were very similar to the actual GHG emissions of EU Members (MAPE = 6.4%). - Highlights: • ANN modeling of GHG intensity of energy consumption is presented. • ANN modeling of energy consumption at the national level is presented. • GHG intensity concept was used for the estimation of energy-related GHG emissions. • The ANN models provide better results in comparison with conventional models. • Forecast of GHG emissions for 26 countries was made successfully with MAPE of 6.4%

  7. PREDICTING THE BOILING POINT OF PCDD/Fs BY THE QSPR METHOD BASED ON THE MOLECULAR DISTANCE-EDGE VECTOR INDEX

    Directory of Open Access Journals (Sweden)

    Long Jiao

    2015-05-01

    Full Text Available The quantitative structure property relationship (QSPR for the boiling point (Tb of polychlorinated dibenzo-p-dioxins and polychlorinated dibenzofurans (PCDD/Fs was investigated. The molecular distance-edge vector (MDEV index was used as the structural descriptor. The quantitative relationship between the MDEV index and Tb was modeled by using multivariate linear regression (MLR and artificial neural network (ANN, respectively. Leave-one-out cross validation and external validation were carried out to assess the prediction performance of the models developed. For the MLR method, the prediction root mean square relative error (RMSRE of leave-one-out cross validation and external validation was 1.77 and 1.23, respectively. For the ANN method, the prediction RMSRE of leave-one-out cross validation and external validation was 1.65 and 1.16, respectively. A quantitative relationship between the MDEV index and Tb of PCDD/Fs was demonstrated. Both MLR and ANN are practicable for modeling this relationship. The MLR model and ANN model developed can be used to predict the Tb of PCDD/Fs. Thus, the Tb of each PCDD/F was predicted by the developed models.

  8. Seasonal variations in the aragonite saturation state in the upper open-ocean waters of the North Pacific Ocean

    Science.gov (United States)

    Kim, Tae-Wook; Park, Geun-Ha; Kim, Dongseon; Lee, Kitack; Feely, Richard A.; Millero, Frank J.

    2015-06-01

    Seasonal variability of the aragonite saturation state (ΩAR) in the upper (50 m and 100 m depths) North Pacific Ocean (NPO) was investigated using multiple linear regression (MLR). The MLR algorithm derived from a high-quality carbon data set accurately predicted the ΩAR of evaluation data sets (three time series stations and P02 section) with acceptable uncertainty (<0.1 ΩAR). The algorithm was combined with seasonal climatology data, and the estimated ΩAR varied in the range of 0.4-0.6 in the midlatitude western NPO, with the largest variation found for the tropical eastern NPO. These marked variations were largely controlled by seasonal changes in vertical mixing and thermocline depth, both of which determine the degree of entrainment of CO2-rich corrosive waters from deeper depths. Our MLR-based subsurface ΩAR climatology is complementary to surface climatology based on pCO2 measurements.

  9. On a Robust MaxEnt Process Regression Model with Sample-Selection

    Directory of Open Access Journals (Sweden)

    Hea-Jung Kim

    2018-04-01

    Full Text Available In a regression analysis, a sample-selection bias arises when a dependent variable is partially observed as a result of the sample selection. This study introduces a Maximum Entropy (MaxEnt process regression model that assumes a MaxEnt prior distribution for its nonparametric regression function and finds that the MaxEnt process regression model includes the well-known Gaussian process regression (GPR model as a special case. Then, this special MaxEnt process regression model, i.e., the GPR model, is generalized to obtain a robust sample-selection Gaussian process regression (RSGPR model that deals with non-normal data in the sample selection. Various properties of the RSGPR model are established, including the stochastic representation, distributional hierarchy, and magnitude of the sample-selection bias. These properties are used in the paper to develop a hierarchical Bayesian methodology to estimate the model. This involves a simple and computationally feasible Markov chain Monte Carlo algorithm that avoids analytical or numerical derivatives of the log-likelihood function of the model. The performance of the RSGPR model in terms of the sample-selection bias correction, robustness to non-normality, and prediction, is demonstrated through results in simulations that attest to its good finite-sample performance.

  10. GA/ MLR

    African Journals Online (AJOL)

    model was further illustrated using various evaluation techniques: leave- one- out ... minimum energy conformation were obtained ..... The distribution of errors for the ... are distributed on both sides of the zero line, .... of systems in solution.

  11. CT- and MRI-based volumetry of resected liver specimen: Comparison to intraoperative volume and weight measurements and calculation of conversion factors

    International Nuclear Information System (INIS)

    Karlo, C.; Reiner, C.S.; Stolzmann, P.; Breitenstein, S.; Marincek, B.; Weishaupt, D.; Frauenfelder, T.

    2010-01-01

    Objective: To compare virtual volume to intraoperative volume and weight measurements of resected liver specimen and calculate appropriate conversion factors to reach better correlation. Methods: Preoperative (CT-group, n = 30; MRI-group, n = 30) and postoperative MRI (n = 60) imaging was performed in 60 patients undergoing partial liver resection. Intraoperative volume and weight of the resected liver specimen was measured. Virtual volume measurements were performed by two readers (R1,R2) using dedicated software. Conversion factors were calculated. Results: Mean intraoperative resection weight/volume: CT: 855 g/852 mL; MRI: 872 g/860 mL. Virtual resection volume: CT: 960 mL(R1), 982 mL(R2); MRI: 1112 mL(R1), 1115 mL(R2). Strong positive correlation for both readers between intraoperative and virtual measurements, mean of both readers: CT: R = 0.88(volume), R = 0.89(weight); MRI: R = 0.95(volume), R = 0.92(weight). Conversion factors: 0.85(CT), 0.78(MRI). Conclusion: CT- or MRI-based volumetry of resected liver specimen is accurate and recommended for preoperative planning. A conversion of the result is necessary to improve intraoperative and virtual measurement correlation. We found 0.85 for CT- and 0.78 for MRI-based volumetry the most appropriate conversion factors.

  12. Impact of multicollinearity on small sample hydrologic regression models

    Science.gov (United States)

    Kroll, Charles N.; Song, Peter

    2013-06-01

    Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.

  13. Development and Validation of a Rule-Based Strength Scaling Method for Musculoskeletal Modelling

    DEFF Research Database (Denmark)

    Oomen, Pieter; Annegarn, Janneke; Rasmussen, John

    2015-01-01

    performed maximal isometric knee extensions. A multiple linear regression analysis (MLR) resulted in an empirical strength scaling equation, accounting for age, mass, height, gender, segment masses and segment lengths. For validation purpose, 20 newly included healthy subjects performed a maximal isometric...

  14. A default Bayesian hypothesis test for correlations and partial correlations

    NARCIS (Netherlands)

    Wetzels, R.; Wagenmakers, E.J.

    2012-01-01

    We propose a default Bayesian hypothesis test for the presence of a correlation or a partial correlation. The test is a direct application of Bayesian techniques for variable selection in regression models. The test is easy to apply and yields practical advantages that the standard frequentist tests

  15. Rapid Detection of Pesticide Residues in Chinese Herbal Medicines by Fourier Transform Infrared Spectroscopy Coupled with Partial Least Squares Regression

    Directory of Open Access Journals (Sweden)

    Tianming Yang

    2016-01-01

    Full Text Available This paper reports a simple, rapid, and effective method for simultaneous detection of cartap (Ca, thiocyclam (Th, and tebufenozide (Te in Chinese herbal medicines including Radix Angelicae Dahuricae and Liquorices using Fourier transform infrared spectroscopy (FT-IR coupled with partial least squares regression (PLSR. The proposed method can handle the intrinsic interferences of herbal samples; satisfactory average recoveries attained from near-infrared (NIR and mid-infrared (MIR PLSR models were 99.0±10.8 and 100.2±1.0% for Ca, 100.2±6.9 and 99.7±2.5% for Th, and 99.1±6.3 and 99.6±1.0% for Te, respectively. Furthermore, some statistical parameters and figures of merit are fully investigated to evaluate the performance of the two models. It was found that both models could give accurate results and only the performance of MIR-PLSR was slightly better than that of NIR-PLSR in the cases suffering from herbal matrix interferences. In conclusion, FT-IR spectroscopy in combination with PLSR has been demonstrated for its application in rapid screening and quantitative analysis of multipesticide residues in Chinese herbal medicines without physical or chemical separation pretreatment step and any spectral processing, which also implies other potential applications such as food and drug safety, herbal plants quality, and environmental evaluation, due to its advantages of nontoxic and nondestructive analysis.

  16. Retro-regression--another important multivariate regression improvement.

    Science.gov (United States)

    Randić, M

    2001-01-01

    We review the serious problem associated with instabilities of the coefficients of regression equations, referred to as the MRA (multivariate regression analysis) "nightmare of the first kind". This is manifested when in a stepwise regression a descriptor is included or excluded from a regression. The consequence is an unpredictable change of the coefficients of the descriptors that remain in the regression equation. We follow with consideration of an even more serious problem, referred to as the MRA "nightmare of the second kind", arising when optimal descriptors are selected from a large pool of descriptors. This process typically causes at different steps of the stepwise regression a replacement of several previously used descriptors by new ones. We describe a procedure that resolves these difficulties. The approach is illustrated on boiling points of nonanes which are considered (1) by using an ordered connectivity basis; (2) by using an ordering resulting from application of greedy algorithm; and (3) by using an ordering derived from an exhaustive search for optimal descriptors. A novel variant of multiple regression analysis, called retro-regression (RR), is outlined showing how it resolves the ambiguities associated with both "nightmares" of the first and the second kind of MRA.

  17. Modified Regression Correlation Coefficient for Poisson Regression Model

    Science.gov (United States)

    Kaengthong, Nattacha; Domthong, Uthumporn

    2017-09-01

    This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).

  18. Identification of milk origin and process-induced changes in milk by stable isotope ratio mass spectrometry.

    Science.gov (United States)

    Scampicchio, Matteo; Mimmo, Tanja; Capici, Calogero; Huck, Christian; Innocente, Nadia; Drusch, Stephan; Cesco, Stefano

    2012-11-14

    Stable isotope values were used to develop a new analytical approach enabling the simultaneous identification of milk samples either processed with different heating regimens or from different geographical origins. The samples consisted of raw, pasteurized (HTST), and ultrapasteurized (UHT) milk from different Italian origins. The approach consisted of the analysis of the isotope ratio of δ¹³C and δ¹⁵N for the milk samples and their fractions (fat, casein, and whey). The main finding of this work is that as the heat processing affects the composition of the milk fractions, changes in δ¹³C and δ¹⁵N were also observed. These changes were used as markers to develop pattern recognition maps based on principal component analysis and supervised classification models, such as linear discriminant analysis (LDA), multivariate regression (MLR), principal component regression (PCR), and partial least-squares (PLS). The results give proof of the concept that isotope ratio mass spectroscopy can discriminate simultaneously between milk samples according to their geographical origin and type of processing.

  19. Evaluation of the prediction precision capability of partial least squares regression approach for analysis of high alloy steel by laser induced breakdown spectroscopy

    Science.gov (United States)

    Sarkar, Arnab; Karki, Vijay; Aggarwal, Suresh K.; Maurya, Gulab S.; Kumar, Rohit; Rai, Awadhesh K.; Mao, Xianglei; Russo, Richard E.

    2015-06-01

    Laser induced breakdown spectroscopy (LIBS) was applied for elemental characterization of high alloy steel using partial least squares regression (PLSR) with an objective to evaluate the analytical performance of this multivariate approach. The optimization of the number of principle components for minimizing error in PLSR algorithm was investigated. The effect of different pre-treatment procedures on the raw spectral data before PLSR analysis was evaluated based on several statistical (standard error of prediction, percentage relative error of prediction etc.) parameters. The pre-treatment with "NORM" parameter gave the optimum statistical results. The analytical performance of PLSR model improved by increasing the number of laser pulses accumulated per spectrum as well as by truncating the spectrum to appropriate wavelength region. It was found that the statistical benefit of truncating the spectrum can also be accomplished by increasing the number of laser pulses per accumulation without spectral truncation. The constituents (Co and Mo) present in hundreds of ppm were determined with relative precision of 4-9% (2σ), whereas the major constituents Cr and Ni (present at a few percent levels) were determined with a relative precision of ~ 2%(2σ).

  20. [Prediction of total nitrogen and alkali hydrolysable nitrogen content in loess using hyperspectral data based on correlation analysis and partial least squares regression].

    Science.gov (United States)

    Liu, Xiu-ying; Wang, Li; Chang, Qing-rui; Wang, Xiao-xing; Shang, Yan

    2015-07-01

    Wuqi County of Shaanxi Province, where the vegetation recovering measures have been carried out for years, was taken as the study area. A total of 100 loess samples from 24 different profiles were collected. Total nitrogen (TN) and alkali hydrolysable nitrogen (AHN) contents of the soil samples were analyzed, and the soil samples were scanned in the visible/near-infrared (VNIR) region of 350-2500 nm in the laboratory. The calibration models were developed between TN and AHN contents and VNIR values based on correlation analysis (CA) and partial least squares regression (PLS). Independent samples validated the calibration models. The results indicated that the optimum model for predicting TN of loess was established by using first derivative of reflectance. The best model for predicting AHN of loess was established by using normal derivative spectra. The optimum TN model could effectively predict TN in loess from 0 to 40 cm, but the optimum AHN model could only roughly predict AHN at the same depth. This study provided a good method for rapidly predicting TN of loess where vegetation recovering measures have been adopted, but prediction of AHN needs to be further studied.

  1. Genetic analyses of partial egg production in Japanese quail using multi-trait random regression models.

    Science.gov (United States)

    Karami, K; Zerehdaran, S; Barzanooni, B; Lotfi, E

    2017-12-01

    1. The aim of the present study was to estimate genetic parameters for average egg weight (EW) and egg number (EN) at different ages in Japanese quail using multi-trait random regression (MTRR) models. 2. A total of 8534 records from 900 quail, hatched between 2014 and 2015, were used in the study. Average weekly egg weights and egg numbers were measured from second until sixth week of egg production. 3. Nine random regression models were compared to identify the best order of the Legendre polynomials (LP). The most optimal model was identified by the Bayesian Information Criterion. A model with second order of LP for fixed effects, second order of LP for additive genetic effects and third order of LP for permanent environmental effects (MTRR23) was found to be the best. 4. According to the MTRR23 model, direct heritability for EW increased from 0.26 in the second week to 0.53 in the sixth week of egg production, whereas the ratio of permanent environment to phenotypic variance decreased from 0.48 to 0.1. Direct heritability for EN was low, whereas the ratio of permanent environment to phenotypic variance decreased from 0.57 to 0.15 during the production period. 5. For each trait, estimated genetic correlations among weeks of egg production were high (from 0.85 to 0.98). Genetic correlations between EW and EN were low and negative for the first two weeks, but they were low and positive for the rest of the egg production period. 6. In conclusion, random regression models can be used effectively for analysing egg production traits in Japanese quail. Response to selection for increased egg weight would be higher at older ages because of its higher heritability and such a breeding program would have no negative genetic impact on egg production.

  2. Linear support vector regression and partial least squares chemometric models for determination of Hydrochlorothiazide and Benazepril hydrochloride in presence of related impurities: A comparative study

    Science.gov (United States)

    Naguib, Ibrahim A.; Abdelaleem, Eglal A.; Draz, Mohammed E.; Zaazaa, Hala E.

    2014-09-01

    Partial least squares regression (PLSR) and support vector regression (SVR) are two popular chemometric models that are being subjected to a comparative study in the presented work. The comparison shows their characteristics via applying them to analyze Hydrochlorothiazide (HCZ) and Benazepril hydrochloride (BZ) in presence of HCZ impurities; Chlorothiazide (CT) and Salamide (DSA) as a case study. The analysis results prove to be valid for analysis of the two active ingredients in raw materials and pharmaceutical dosage form through handling UV spectral data in range (220-350 nm). For proper analysis a 4 factor 4 level experimental design was established resulting in a training set consisting of 16 mixtures containing different ratios of interfering species. An independent test set consisting of 8 mixtures was used to validate the prediction ability of the suggested models. The results presented indicate the ability of mentioned multivariate calibration models to analyze HCZ and BZ in presence of HCZ impurities CT and DSA with high selectivity and accuracy of mean percentage recoveries of (101.01 ± 0.80) and (100.01 ± 0.87) for HCZ and BZ respectively using PLSR model and of (99.78 ± 0.80) and (99.85 ± 1.08) for HCZ and BZ respectively using SVR model. The analysis results of the dosage form were statistically compared to the reference HPLC method with no significant differences regarding accuracy and precision. SVR model gives more accurate results compared to PLSR model and show high generalization ability, however, PLSR still keeps the advantage of being fast to optimize and implement.

  3. A Comparison of Advanced Regression Algorithms for Quantifying Urban Land Cover

    Directory of Open Access Journals (Sweden)

    Akpona Okujeni

    2014-07-01

    Full Text Available Quantitative methods for mapping sub-pixel land cover fractions are gaining increasing attention, particularly with regard to upcoming hyperspectral satellite missions. We evaluated five advanced regression algorithms combined with synthetically mixed training data for quantifying urban land cover from HyMap data at 3.6 and 9 m spatial resolution. Methods included support vector regression (SVR, kernel ridge regression (KRR, artificial neural networks (NN, random forest regression (RFR and partial least squares regression (PLSR. Our experiments demonstrate that both kernel methods SVR and KRR yield high accuracies for mapping complex urban surface types, i.e., rooftops, pavements, grass- and tree-covered areas. SVR and KRR models proved to be stable with regard to the spatial and spectral differences between both images and effectively utilized the higher complexity of the synthetic training mixtures for improving estimates for coarser resolution data. Observed deficiencies mainly relate to known problems arising from spectral similarities or shadowing. The remaining regressors either revealed erratic (NN or limited (RFR and PLSR performances when comprehensively mapping urban land cover. Our findings suggest that the combination of kernel-based regression methods, such as SVR and KRR, with synthetically mixed training data is well suited for quantifying urban land cover from imaging spectrometer data at multiple scales.

  4. Investigation of Antileishmanial Activities of Acridines Derivatives against Promastigotes and Amastigotes Form of Parasites Using Quantitative Structure Activity Relationship Analysis

    Directory of Open Access Journals (Sweden)

    Samir Chtita

    2016-01-01

    Full Text Available In a search of newer and potent antileishmanial (against promastigotes and amastigotes form of parasites drug, a series of 60 variously substituted acridines derivatives were subjected to a quantitative structure activity relationship (QSAR analysis for studying, interpreting, and predicting activities and designing new compounds by using multiple linear regression and artificial neural network (ANN methods. The used descriptors were computed with Gaussian 03, ACD/ChemSketch, Marvin Sketch, and ChemOffice programs. The QSAR models developed were validated according to the principles set up by the Organisation for Economic Co-operation and Development (OECD. The principal component analysis (PCA has been used to select descriptors that show a high correlation with activities. The univariate partitioning (UP method was used to divide the dataset into training and test sets. The multiple linear regression (MLR method showed a correlation coefficient of 0.850 and 0.814 for antileishmanial activities against promastigotes and amastigotes forms of parasites, respectively. Internal and external validations were used to determine the statistical quality of QSAR of the two MLR models. The artificial neural network (ANN method, considering the relevant descriptors obtained from the MLR, showed a correlation coefficient of 0.933 and 0.918 with 7-3-1 and 6-3-1 ANN models architecture for antileishmanial activities against promastigotes and amastigotes forms of parasites, respectively. The applicability domain of MLR models was investigated using simple and leverage approaches to detect outliers and outsides compounds. The effects of different descriptors in the activities were described and used to study and design new compounds with higher activities compared to the existing ones.

  5. QSAR study of benzimidazole derivatives inhibition on escherichia coli methionine Aminopeptidase

    Directory of Open Access Journals (Sweden)

    Zahra Garkani-Nejad

    2010-06-01

    Full Text Available The paper describes a quantitative structure-activity relationship (QSAR study of IC50 values of benzimidazole derivatives on escherichia coli methionine aminopeptidase. The activity of the 32 inhibitors has been estimated by means of multiple linear regression (MLR and artificial neural network (ANN techniques. The results obtained using the MLR method indicate that the activity of derivatives of benzimidazoles on CoII-loaded escherichia coli methionine aminopeptidase depend on different parameters containing topological descriptors, Burden eigen values, 3D MoRSE descriptors and 2D autocorrelation descriptors. The best artificial neural network model is a fully-connected, feed forward back propagation network with a 5-4-1 architecture. Standard error for the training set using this network was 0.193 with correlation coefficient 0.996 and for the prediction set standard error was 1.41 with correlation coefficient 0.802. Comparison of the quality of the ANN with different MLR models showed that ANN has a better predictive power.

  6. Research on NDT Technology in Inference of Steel Member Strength Based on Macro/Micro Model

    Directory of Open Access Journals (Sweden)

    Beidou Ding

    2017-01-01

    Full Text Available In consideration of correlations among hardness, chemical composition, grain size, and strength of carbon steel, a new nondestructive testing technology (NDT of inferring the carbon steel strength was explored. First, the hardness test, chemical composition analysis, and metallographic analysis of 162 low-carbon steel samples were conducted. Second, the following works were carried out: (1 quantitative relationship between steel Leeb hardness and carbon steel strength was studied on the basis of regression analysis of experimental data; (2 influences of chemical composition and grain size on tension properties of carbon steel were analyzed on the basis of stepwise regression analysis, and quantitative relationship between conventional compositions and grain size with steel strength was obtained; (3 according to the macro and/or micro factors such as hardness, chemical compositions, and grain size of carbon steel, the fitting formula of steel strength was established based on MLR (multiple linear regressions method. The above relationships and fitting formula based on MLR method could be used to estimate the steel strength with no damage to the structure in engineering practice.

  7. Multiple linear regression model for bromate formation based on the survey data of source waters from geographically different regions across China.

    Science.gov (United States)

    Yu, Jianwei; Liu, Juan; An, Wei; Wang, Yongjing; Zhang, Junzhi; Wei, Wei; Su, Ming; Yang, Min

    2015-01-01

    A total of 86 source water samples from 38 cities across major watersheds of China were collected for a bromide (Br(-)) survey, and the bromate (BrO3 (-)) formation potentials (BFPs) of 41 samples with Br(-) concentration >20 μg L(-1) were evaluated using a batch ozonation reactor. Statistical analyses indicated that higher alkalinity, hardness, and pH of water samples could lead to higher BFPs, with alkalinity as the most important factor. Based on the survey data, a multiple linear regression (MLR) model including three parameters (alkalinity, ozone dose, and total organic carbon (TOC)) was established with a relatively good prediction performance (model selection criterion = 2.01, R (2) = 0.724), using logarithmic transformation of the variables. Furthermore, a contour plot was used to interpret the influence of alkalinity and TOC on BrO3 (-) formation with prediction accuracy as high as 71 %, suggesting that these two parameters, apart from ozone dosage, were the most important ones affecting the BFPs of source waters with Br(-) concentration >20 μg L(-1). The model could be a useful tool for the prediction of the BFPs of source water.

  8. Linear and nonlinear models for predicting fish bioconcentration factors for pesticides.

    Science.gov (United States)

    Yuan, Jintao; Xie, Chun; Zhang, Ting; Sun, Jinfang; Yuan, Xuejie; Yu, Shuling; Zhang, Yingbiao; Cao, Yunyuan; Yu, Xingchen; Yang, Xuan; Yao, Wu

    2016-08-01

    This work is devoted to the applications of the multiple linear regression (MLR), multilayer perceptron neural network (MLP NN) and projection pursuit regression (PPR) to quantitative structure-property relationship analysis of bioconcentration factors (BCFs) of pesticides tested on Bluegill (Lepomis macrochirus). Molecular descriptors of a total of 107 pesticides were calculated with the DRAGON Software and selected by inverse enhanced replacement method. Based on the selected DRAGON descriptors, a linear model was built by MLR, nonlinear models were developed using MLP NN and PPR. The robustness of the obtained models was assessed by cross-validation and external validation using test set. Outliers were also examined and deleted to improve predictive power. Comparative results revealed that PPR achieved the most accurate predictions. This study offers useful models and information for BCF prediction, risk assessment, and pesticide formulation. Copyright © 2016 Elsevier Ltd. All rights reserved.

  9. A regional and nonstationary model for partial duration series of extreme rainfall

    DEFF Research Database (Denmark)

    Gregersen, Ida Bülow; Madsen, Henrik; Rosbjerg, Dan

    2017-01-01

    as the explanatory variables in the regional and temporal domain, respectively. Further analysis of partial duration series with nonstationary and regional thresholds shows that the mean exceedances also exhibit a significant variation in space and time for some rainfall durations, while the shape parameter is found...... of extreme rainfall. The framework is built on a partial duration series approach with a nonstationary, regional threshold value. The model is based on generalized linear regression solved by generalized estimation equations. It allows a spatial correlation between the stations in the network and accounts...... furthermore for variable observation periods at each station and in each year. Marginal regional and temporal regression models solved by generalized least squares are used to validate and discuss the results of the full spatiotemporal model. The model is applied on data from a large Danish rain gauge network...

  10. Analisis Faktor – Faktor yang Mempengaruhi Jumlah Kejahatan Pencurian Kendaraan Bermotor (Curanmor) Menggunakan Model Geographically Weighted Poisson Regression (Gwpr)

    OpenAIRE

    Haris, Muhammad; Yasin, Hasbi; Hoyyi, Abdul

    2015-01-01

    Theft is an act taking someone else's property, partially or entierely, with intention to have it illegally. Motor vehicle theft is one of the most highlighted crime type and disturbing the communities. Regression analysis is a statistical analysis for modeling the relationships between response variable and predictor variable. If the response variable follows a Poisson distribution or categorized as a count data, so the regression model used is Poisson regression. Geographically Weighted Poi...

  11. Sparse Regression by Projection and Sparse Discriminant Analysis

    KAUST Repository

    Qi, Xin

    2015-04-03

    © 2015, © American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America. Recent years have seen active developments of various penalized regression methods, such as LASSO and elastic net, to analyze high-dimensional data. In these approaches, the direction and length of the regression coefficients are determined simultaneously. Due to the introduction of penalties, the length of the estimates can be far from being optimal for accurate predictions. We introduce a new framework, regression by projection, and its sparse version to analyze high-dimensional data. The unique nature of this framework is that the directions of the regression coefficients are inferred first, and the lengths and the tuning parameters are determined by a cross-validation procedure to achieve the largest prediction accuracy. We provide a theoretical result for simultaneous model selection consistency and parameter estimation consistency of our method in high dimension. This new framework is then generalized such that it can be applied to principal components analysis, partial least squares, and canonical correlation analysis. We also adapt this framework for discriminant analysis. Compared with the existing methods, where there is relatively little control of the dependency among the sparse components, our method can control the relationships among the components. We present efficient algorithms and related theory for solving the sparse regression by projection problem. Based on extensive simulations and real data analysis, we demonstrate that our method achieves good predictive performance and variable selection in the regression setting, and the ability to control relationships between the sparse components leads to more accurate classification. In supplementary materials available online, the details of the algorithms and theoretical proofs, and R codes for all simulation studies are provided.

  12. Gain modulation of the middle latency cutaneous reflex in patients with chronic joint instability after ankle sprain.

    Science.gov (United States)

    Futatsubashi, Genki; Sasada, Shusaku; Tazoe, Toshiki; Komiyama, Tomoyoshi

    2013-07-01

    To investigate the neural alteration of reflex pathways arising from cutaneous afferents in patients with chronic ankle instability. Cutaneous reflexes were elicited by applying non-noxious electrical stimulation to the sural nerve of subjects with chronic ankle instability (n=17) and control subjects (n=17) while sitting. Electromyographic (EMG) signals were recorded from each ankle and thigh muscle. The middle latency response (MLR; latency: 70-120 ms) component was analyzed. In the peroneus longus (PL) and vastus lateralis (VL) muscles, linear regression analyses between the magnitude of the inhibitory MLR and background EMG activity showed that, compared to the uninjured side and the control subjects, the gain of the suppressive MLR was increased in the injured side. This was also confirmed by the pooled data for both groups. The degree of MLR alteration was significantly correlated to that of chronic ankle instability in the PL. The excitability of middle latency cutaneous reflexes in the PL and VL is modulated in subjects with chronic ankle instability. Cutaneous reflexes may be potential tools to investigate the pathological state of the neural system that controls the lower limbs in subjects with chronic ankle instability. Copyright © 2013 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.

  13. Shuffling cross-validation-bee algorithm as a new descriptor selection method for retention studies of pesticides in biopartitioning micellar chromatography.

    Science.gov (United States)

    Zarei, Kobra; Atabati, Morteza; Ahmadi, Monire

    2017-05-04

    Bee algorithm (BA) is an optimization algorithm inspired by the natural foraging behaviour of honey bees to find the optimal solution which can be proposed to feature selection. In this paper, shuffling cross-validation-BA (CV-BA) was applied to select the best descriptors that could describe the retention factor (log k) in the biopartitioning micellar chromatography (BMC) of 79 heterogeneous pesticides. Six descriptors were obtained using BA and then the selected descriptors were applied for model development using multiple linear regression (MLR). The descriptor selection was also performed using stepwise, genetic algorithm and simulated annealing methods and MLR was applied to model development and then the results were compared with those obtained from shuffling CV-BA. The results showed that shuffling CV-BA can be applied as a powerful descriptor selection method. Support vector machine (SVM) was also applied for model development using six selected descriptors by BA. The obtained statistical results using SVM were better than those obtained using MLR, as the root mean square error (RMSE) and correlation coefficient (R) for whole data set (training and test), using shuffling CV-BA-MLR, were obtained as 0.1863 and 0.9426, respectively, while these amounts for the shuffling CV-BA-SVM method were obtained as 0.0704 and 0.9922, respectively.

  14. QSAR study of benzimidazole derivatives inhibition on escherichia ...

    African Journals Online (AJOL)

    The paper describes a quantitative structure-activity relationship (QSAR) study of IC50 values of benzimidazole derivatives on escherichia coli methionine aminopeptidase. The activity of the 32 inhibitors has been estimated by means of multiple linear regression (MLR) and artificial neural network (ANN) techniques.

  15. New definition for the partial remission period in children and adolescents with type 1 diabetes

    DEFF Research Database (Denmark)

    Mortensen, Henrik B; Hougaard, Philip; Swift, Peter

    2009-01-01

    OBJECTIVE To find a simple definition of partial remission in type 1 diabetes that reflects both residual beta-cell function and efficacy of insulin treatment. RESEARCH DESIGN AND METHODS A total of 275 patients aged ..., stimulated C-peptide during a challenge was used as a measure of residual beta-cell function. RESULTS By multiple regression analysis, a negative association between stimulated C-peptide and A1C (regression coefficient -0.21, P ... the definition of an insulin dose-adjusted A1C (IDAA1C) as A1C (percent) + [4 x insulin dose (units per kilogram per 24 h)]. A calculated IDAA1C 300 pmol/l was used to define partial remission. The IDAA1C

  16. Dual Regression

    OpenAIRE

    Spady, Richard; Stouli, Sami

    2012-01-01

    We propose dual regression as an alternative to the quantile regression process for the global estimation of conditional distribution functions under minimal assumptions. Dual regression provides all the interpretational power of the quantile regression process while avoiding the need for repairing the intersecting conditional quantile surfaces that quantile regression often produces in practice. Our approach introduces a mathematical programming characterization of conditional distribution f...

  17. Application of regression analysis to creep of space shuttle materials

    International Nuclear Information System (INIS)

    Rummler, D.R.

    1975-01-01

    Metallic heat shields for Space Shuttle thermal protection systems must operate for many flight cycles at high temperatures in low-pressure air and use thin-gage (less than or equal to 0.65 mm) sheet. Available creep data for thin sheet under those conditions are inadequate. To assess the effects of oxygen partial pressure and sheet thickness on creep behavior and to develop constitutive creep equations for small sets of data, regression techniques are applied and discussed

  18. Neutrophil to lymphocyte with monocyte to lymphocyte ratio and white blood cell count in prediction of lung cancer

    Directory of Open Access Journals (Sweden)

    Thang Thanh Phan

    2018-04-01

    Full Text Available Background Lung cancer is the most common cause of cancer deaths in both sexes, while it is very difficult for screenings and early detection. Aims This study aims to clarify the role of systematic inflammation markers, including white blood cell (WBC, neutrophil (NEU, monocyte (MONO, platelet (PLT, neutrophil to lymphocyte ratio (NLR, monocyte to lymphocyte ratio (MLR and platelet to lymphocyte ratio (PLR in prediction of lung cancer. Methods A case-control study was conducted on 1,315 primary lung cancer patients and 1,315 healthy adults with matched age and gender at Cho Ray hospital. NLR, MLR and PLR were calculated by using neutrophil, lymphocyte, monocyte and platelet count which were recalled from laboratory database. With 600 cases in the derivation set, the logistic regression with univariate analysis was used to identify the impacted marker, then developing the optimal prediction model for lung cancer by logistic regression with multivariate method. The diagnostic values of optimal model consisting of sensitivity (Sen, specificity (Spe, positive predictive value (PPV, negative predictive value (NPV and the area under the ROC curve (AUC value were extracted and verified on all data, in validation set. Results The median values of WBC, NEU, MONO, PLT, NLR, MLR and PLR in lung cancer were not significantly difference between histological subtypes and clinical stages (p > 0.05, but higher than the values in control group (p < 0.01. Multivariates analysis shows that NLR, MLR and WBC were three parameters that have the significant impact of the optimal prediction model (p < 0.01. The AUC value, sensitivity and specificity of the optimal model for lung cancer detection were 0.881, 73.5 per cent (95 per cent CI:70.3–76.6 and 87.7 per cent (95 per centCI:85.2–89.9, respectively. Whereas, the PPV and NPV values of prediction model were 85.7 per cent (95 per cent CI:82.8–88.2 and 76.8 (95 per centCI:73.9–79.5, respectively. Among three

  19. Empirical tools for simulating salinity in the estuaries in Everglades National Park, Florida

    Science.gov (United States)

    Marshall, F. E.; Smith, D. T.; Nickerson, D. M.

    2011-12-01

    Salinity in a shallow estuary is affected by upland freshwater inputs (surface runoff, stream/canal flows, groundwater), atmospheric processes (precipitation, evaporation), marine connectivity, and wind patterns. In Everglades National Park (ENP) in South Florida, the unique Everglades ecosystem exists as an interconnected system of fresh, brackish, and salt water marshes, mangroves, and open water. For this effort a coastal aquifer conceptual model of the Everglades hydrologic system was used with traditional correlation and regression hydrologic techniques to create a series of multiple linear regression (MLR) salinity models from observed hydrologic, marine, and weather data. The 37 ENP MLR salinity models cover most of the estuarine areas of ENP and produce daily salinity simulations that are capable of estimating 65-80% of the daily variability in salinity depending upon the model. The Root Mean Squared Error is typically about 2-4 salinity units, and there is little bias in the predictions. However, the absolute error of a model prediction in the nearshore embayments and the mangrove zone of Florida Bay may be relatively large for a particular daily simulation during the seasonal transitions. Comparisons show that the models group regionally by similar independent variables and salinity regimes. The MLR salinity models have approximately the same expected range of simulation accuracy and error as higher spatial resolution salinity models.

  20. A regression modeling approach for studying carbonate system variability in the northern Gulf of Alaska

    Science.gov (United States)

    Evans, Wiley; Mathis, Jeremy T.; Winsor, Peter; Statscewich, Hank; Whitledge, Terry E.

    2013-01-01

    northern Gulf of Alaska (GOA) shelf experiences carbonate system variability on seasonal and annual time scales, but little information exists to resolve higher frequency variability in this region. To resolve this variability using platforms-of-opportunity, we present multiple linear regression (MLR) models constructed from hydrographic data collected along the Northeast Pacific Global Ocean Ecosystems Dynamics (GLOBEC) Seward Line. The empirical algorithms predict dissolved inorganic carbon (DIC) and total alkalinity (TA) using observations of nitrate (NO3-), temperature, salinity and pressure from the surface to 500 m, with R2s > 0.97 and RMSE values of 11 µmol kg-1 for DIC and 9 µmol kg-1 for TA. We applied these relationships to high-resolution NO3- data sets collected during a novel 20 h glider flight and a GLOBEC mesoscale SeaSoar survey. Results from the glider flight demonstrated time/space along-isopycnal variability of aragonite saturations (Ωarag) associated with a dicothermal layer (a cold near-surface layer found in high latitude oceans) that rivaled changes seen vertically through the thermocline. The SeaSoar survey captured the uplift to aragonite saturation horizon (depth where Ωarag = 1) shoaled to a previously unseen depth in the northern GOA. This work is similar to recent studies aimed at predicting the carbonate system in continental margin settings, albeit demonstrates that a NO3--based approach can be applied to high-latitude data collected from platforms capable of high-frequency measurements.

  1. Intermittent reservoir daily-inflow prediction using lumped and ...

    Indian Academy of Sciences (India)

    For the present case study considered, both MLR and ARIMA models performed ... is to be remembered that the transformation of ... Multi-linear regression; lumped and distributed data; time-series models; cause-effect ... flow data are short for adequate system study. ..... that the standard deviation, skewness, kurtosis.

  2. Partial meniscectomy is associated with increased risk of incident radiographic osteoarthritis and worsening cartilage damage in the following year

    Energy Technology Data Exchange (ETDEWEB)

    Roemer, Frank W. [Boston University School of Medicine, Quantitative Imaging Center, Department of Radiology, Boston, MA (United States); University of Erlangen-Nuremberg, Department of Radiology, Erlangen (Germany); Kwoh, C.K. [University of Arizona Arthritis Center and University of Arizona College of Medicine, Tucson, AZ (United States); Hannon, Michael J.; Grago, Jason [University of Pittsburgh School of Medicine, Division of Rheumatology and Clinical Immunology, Pittsburgh, PA (United States); Hunter, David J. [University of Sydney, Department of Rheumatology, Royal North Shore Hospital and Kolling Institute, St Leonards (Australia); Eckstein, Felix [Paracelsus Medical University, Institute of Anatomy, Salzburg (Austria); Boudreau, Robert M. [University of Pittsburgh Graduate School of Public Health, Department of Epidemiology, Pittsburgh, PA (United States); Englund, Martin [Lund University, Clinical Epidemiology Unit, Orthopaedics, Department of Clinical Sciences Lund, Lund (Sweden); Guermazi, Ali [Boston University School of Medicine, Quantitative Imaging Center, Department of Radiology, Boston, MA (United States)

    2017-01-15

    To assess whether partial meniscectomy is associated with increased risk of radiographic osteoarthritis (ROA) and worsening cartilage damage in the following year. We studied 355 knees from the Osteoarthritis Initiative that developed ROA (Kellgren-Lawrence grade ≥ 2), which were matched with control knees. The MR images were assessed using the semi-quantitative MOAKS system. Conditional logistic regression was applied to estimate risk of incident ROA. Logistic regression was used to assess the risk of worsening cartilage damage in knees with partial meniscectomy that developed ROA. In the group with incident ROA, 4.4 % underwent partial meniscectomy during the year prior to the case-defining visit, compared with none of the knees that did not develop ROA. All (n = 31) knees that had partial meniscectomy and 58.9 % (n = 165) of the knees with prevalent meniscal damage developed ROA (OR = 2.51, 95 % CI [1.73, 3.64]). In knees that developed ROA, partial meniscectomy was associated with an increased risk of worsening cartilage damage (OR = 4.51, 95 % CI [1.53, 13.33]). The probability of having had partial meniscectomy was higher in knees that developed ROA. When looking only at knees that developed ROA, partial meniscectomy was associated with greater risk of worsening cartilage damage. (orig.)

  3. Comparisons of prediction models of quality of life after laparoscopic cholecystectomy: a longitudinal prospective study.

    Directory of Open Access Journals (Sweden)

    Hon-Yi Shi

    Full Text Available BACKGROUND: Few studies of laparoscopic cholecystectomy (LC outcome have used longitudinal data for more than two years. Moreover, no studies have considered group differences in factors other than outcome such as age and nonsurgical treatment. Additionally, almost all published articles agree that the essential issue of the internal validity (reproducibility of the artificial neural network (ANN, support vector machine (SVM, Gaussian process regression (GPR and multiple linear regression (MLR models has not been adequately addressed. This study proposed to validate the use of these models for predicting quality of life (QOL after LC and to compare the predictive capability of ANNs with that of SVM, GPR and MLR. METHODOLOGY/PRINCIPAL FINDINGS: A total of 400 LC patients completed the SF-36 and the Gastrointestinal Quality of Life Index at baseline and at 2 years postoperatively. The criteria for evaluating the accuracy of the system models were mean square error (MSE and mean absolute percentage error (MAPE. A global sensitivity analysis was also performed to assess the relative significance of input parameters in the system model and to rank the variables in order of importance. Compared to SVM, GPR and MLR models, the ANN model generally had smaller MSE and MAPE values in the training data set and test data set. Most ANN models had MAPE values ranging from 4.20% to 8.60%, and most had high prediction accuracy. The global sensitivity analysis also showed that preoperative functional status was the best parameter for predicting QOL after LC. CONCLUSIONS/SIGNIFICANCE: Compared with SVM, GPR and MLR models, the ANN model in this study was more accurate in predicting patient-reported QOL and had higher overall performance indices. Further studies of this model may consider the effect of a more detailed database that includes complications and clinical examination findings as well as more detailed outcome data.

  4. QSAR studies for the acute toxicity of nitrobenzenes to the Tetrahymena pyriformis

    Directory of Open Access Journals (Sweden)

    Wang Dan-Dan

    2014-01-01

    Full Text Available Quantitative structure-activity relationship (QSAR models play a key role in finding the relationship between molecular structures and the toxicity of nitrobenzenes to Tetrahymena pyriformis. In this work, genetic algorithm, along with partial least square (GA-PLS was employed to select optimal subset of descriptors that have significant contribution to the toxicity of nitrobenzenes to Tetrahymena pyriformis. A set of five descriptors, namely G2, HOMT, G(Cl…Cl, Mor03v and MAXDP, was used for the prediction of the toxicity of 45 nitrobenzene derivatives and then were used to build the model by multiple linear regression (MLR method. It turned out that the built model, whose stability was confirmed using the leave-one-out validation and external validation test, showed high statistical significance (R2=0.963, Q2LOO=0.944. Moreover, Y-scrambling test indicated there was no chance correlation in this model.

  5. When Partial Nephrectomy is Unsuccessful: Understanding the Reasons for Conversion from Robotic Partial to Radical Nephrectomy at a Tertiary Referral Center.

    Science.gov (United States)

    Kara, Önder; Maurice, Matthew J; Mouracade, Pascal; Malkoç, Ercan; Dagenais, Julien; Nelson, Ryan J; Chavali, Jaya Sai S; Stein, Robert J; Fergany, Amr; Kaouk, Jihad H

    2017-07-01

    We sought to identify the preoperative factors associated with conversion from robotic partial nephrectomy to radical nephrectomy. We report the incidence of this event. Using our institutional review board approved database, we abstracted data on 1,023 robotic partial nephrectomies performed at our center between 2010 and 2015. Standard and converted cases were compared in terms of patients and tumor characteristics, and perioperative, functional and oncologic outcomes. Logistic regression analysis was done to identify predictors of radical conversion. The overall conversion rate was 3.1% (32 of 1,023 cases). The most common reasons for conversion were tumor involvement of hilar structures (8 cases or 25%), failure to achieve negative margins on frozen section (7 or 21.8%), suspicion of advanced disease (5 or 15.6%) and failure to progress (5 or 15.6%). Patients requiring conversion were older and had a higher Charlson score (both p partial nephrectomy cases had similar short-term oncologic outcomes but better renal functional preservation (p partial nephrectomy conversion to radical nephrectomy was 3.1%, including 2.2% of preoperatively anticipated nephrectomy cases. Increasing tumor size and complexity, and poor preoperative renal function are the main predictors of conversion. Copyright © 2017 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.

  6. Publicly available models to predict normal boiling point of organic compounds

    International Nuclear Information System (INIS)

    Oprisiu, Ioana; Marcou, Gilles; Horvath, Dragos; Brunel, Damien Bernard; Rivollet, Fabien; Varnek, Alexandre

    2013-01-01

    Quantitative structure–property models to predict the normal boiling point (T b ) of organic compounds were developed using non-linear ASNNs (associative neural networks) as well as multiple linear regression – ISIDA-MLR and SQS (stochastic QSAR sampler). Models were built on a diverse set of 2098 organic compounds with T b varying in the range of 185–491 K. In ISIDA-MLR and ASNN calculations, fragment descriptors were used, whereas fragment, FPTs (fuzzy pharmacophore triplets), and ChemAxon descriptors were employed in SQS models. Prediction quality of the models has been assessed in 5-fold cross validation. Obtained models were implemented in the on-line ISIDA predictor at (http://infochim.u-strasbg.fr/webserv/VSEngine.html)

  7. A generalized partially linear mean-covariance regression model for longitudinal proportional data, with applications to the analysis of quality of life data from cancer clinical trials.

    Science.gov (United States)

    Zheng, Xueying; Qin, Guoyou; Tu, Dongsheng

    2017-05-30

    Motivated by the analysis of quality of life data from a clinical trial on early breast cancer, we propose in this paper a generalized partially linear mean-covariance regression model for longitudinal proportional data, which are bounded in a closed interval. Cholesky decomposition of the covariance matrix for within-subject responses and generalized estimation equations are used to estimate unknown parameters and the nonlinear function in the model. Simulation studies are performed to evaluate the performance of the proposed estimation procedures. Our new model is also applied to analyze the data from the cancer clinical trial that motivated this research. In comparison with available models in the literature, the proposed model does not require specific parametric assumptions on the density function of the longitudinal responses and the probability function of the boundary values and can capture dynamic changes of time or other interested variables on both mean and covariance of the correlated proportional responses. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  8. Spatial Estimation of Losses Attributable to Meteorological Disasters in a Specific Area (105.0°E–115.0°E, 25°N–35°N Using Bayesian Maximum Entropy and Partial Least Squares Regression

    Directory of Open Access Journals (Sweden)

    F. S. Zhang

    2016-01-01

    Full Text Available The spatial mapping of losses attributable to such disasters is now well established as a means of describing the spatial patterns of disaster risk, and it has been shown to be suitable for many types of major meteorological disasters. However, few studies have been carried out by developing a regression model to estimate the effects of the spatial distribution of meteorological factors on losses associated with meteorological disasters. In this study, the proposed approach is capable of the following: (a estimating the spatial distributions of seven meteorological factors using Bayesian maximum entropy, (b identifying the four mapping methods used in this research with the best performance based on the cross validation, and (c establishing a fitted model between the PLS components and disaster losses information using partial least squares regression within a specific research area. The results showed the following: (a best mapping results were produced by multivariate Bayesian maximum entropy with probabilistic soft data; (b the regression model using three PLS components, extracted from seven meteorological factors by PLS method, was the most predictive by means of PRESS/SS test; (c northern Hunan Province sustains the most damage, and southeastern Gansu Province and western Guizhou Province sustained the least.

  9. Regression: A Bibliography.

    Science.gov (United States)

    Pedrini, D. T.; Pedrini, Bonnie C.

    Regression, another mechanism studied by Sigmund Freud, has had much research, e.g., hypnotic regression, frustration regression, schizophrenic regression, and infra-human-animal regression (often directly related to fixation). Many investigators worked with hypnotic age regression, which has a long history, going back to Russian reflexologists.…

  10. Journal of Chemical Sciences | Indian Academy of Sciences

    Indian Academy of Sciences (India)

    On the basis of the optimal conformation of the ligands, when fitting to the template, the respective scoring functions were obtained; different ligand efficiencies were evaluated and analysed. Statistical modelling using artificial neural network (ANN: 2 = 0.922) and multiple linear regression method (MLR: 2 = 0.851) ...

  11. Support vector machines classifiers of physical activities in preschoolers

    Science.gov (United States)

    The goal of this study is to develop, test, and compare multinomial logistic regression (MLR) and support vector machines (SVM) in classifying preschool-aged children physical activity data acquired from an accelerometer. In this study, 69 children aged 3-5 years old were asked to participate in a s...

  12. Correlation between morphological characters and estimated bunch ...

    African Journals Online (AJOL)

    The methodology of multiple linear regression (MLR) was used to estimate bunch weight. The most significant variables that were measured included number of leaves at harvest, number of fruits per bunch, FW, FL, rachis weight (RW) and stalk length (SL), generating the following prediction equation: BW= -5.249 + ...

  13. Genetic analysis of partial egg production records in Japanese quail using random regression models.

    Science.gov (United States)

    Abou Khadiga, G; Mahmoud, B Y F; Farahat, G S; Emam, A M; El-Full, E A

    2017-08-01

    The main objectives of this study were to detect the most appropriate random regression model (RRM) to fit the data of monthly egg production in 2 lines (selected and control) of Japanese quail and to test the consistency of different criteria of model choice. Data from 1,200 female Japanese quails for the first 5 months of egg production from 4 consecutive generations of an egg line selected for egg production in the first month (EP1) was analyzed. Eight RRMs with different orders of Legendre polynomials were compared to determine the proper model for analysis. All criteria of model choice suggested that the adequate model included the second-order Legendre polynomials for fixed effects, and the third-order for additive genetic effects and permanent environmental effects. Predictive ability of the best model was the highest among all models (ρ = 0.987). According to the best model fitted to the data, estimates of heritability were relatively low to moderate (0.10 to 0.17) showed a descending pattern from the first to the fifth month of production. A similar pattern was observed for permanent environmental effects with greater estimates in the first (0.36) and second (0.23) months of production than heritability estimates. Genetic correlations between separate production periods were higher (0.18 to 0.93) than their phenotypic counterparts (0.15 to 0.87). The superiority of the selected line over the control was observed through significant (P egg production in earlier ages (first and second months) than later ones. A methodology based on random regression animal models can be recommended for genetic evaluation of egg production in Japanese quail. © 2017 Poultry Science Association Inc.

  14. Regional estimation of rainfall intensity-duration-frequency curves using generalized least squares regression of partial duration series statistics

    DEFF Research Database (Denmark)

    Madsen, H.; Mikkelsen, Peter Steen; Rosbjerg, Dan

    2002-01-01

    A general framework for regional analysis and modeling of extreme rainfall characteristics is presented. The model is based on the partial duration series (PDS) method that includes in the analysis all events above a threshold level. In the PDS model the average annual number of exceedances...

  15. Advanced statistics: linear regression, part I: simple linear regression.

    Science.gov (United States)

    Marill, Keith A

    2004-01-01

    Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.

  16. Prediction of heating value of straw by proximate data, and near infrared spectroscopy

    International Nuclear Information System (INIS)

    Huang Caijin; Han Lujia; Yang Zengling; Liu Xian

    2008-01-01

    Exploration of straw resources for energy production has been attracting agricultural scientists and engineers for decades. And the heating value of straw has always been the focus when initiating a straw-based biomass energy project. Nevertheless determination of heating values of straw needs delicate and expensive calorimeter, and is time-consuming. It's quite desirable to develop quick and easy model predicting heating values of straw. In this study, we proposed three applicable models, first two are multiple linear regression (MLR) equations by contents of moisture, ash, and volatile matter, the other one is based on the near infrared spectroscopy (NIRS) technology. All the models provide satisfactory estimations of heating values of straw samples. The adjusted determination coefficients for MLR models were 0.9049 and 0.9039, and determination coefficients of calibration for NIRS model was 0.9604; When evaluated on independent validation, the determination coefficients were 0.8595, 0.8524 and 0.8946, respectively. The results indicated that both MLR models and NIRS model have the potential to predict the heating values of straw, while the NIRS model presented better accuracy

  17. Characteristics of Hospitals Associated with Complete and Partial Implementation of Electronic Health Records.

    Science.gov (United States)

    Bhounsule, Prajakta; Peterson, Andrew M

    2016-01-01

    (1) To determine the proportion of hospitals with and without implementation of electronic health records (EHRs). (2) To examine characteristics of hospitals that report implementation of EHRs partially or completely versus those that report no implementation. (3) To identify hospital characteristics associated with nonimplementation to help devise future policy initiatives. This was a retrospective cross-sectional study using the 2012 American Hospital Association Annual Survey Database. The outcome variable was the implementation of EHRs completely or partially. Independent variables were hospital characteristics, such as staffing, organization structure, accreditations, ownership, and services and facilities provided at the hospitals. Descriptive frequencies were determined, and multinomial logistic regression was used to determine variables independently associated with complete or partial implementation of EHRs. In this study, 12.6 percent of hospitals reported no implementation of EHRs, while 43.9 percent of hospitals implemented EHRs partially and 43.5 percent implemented EHRs completely. Overall characteristics of hospitals with complete and partial implementation were similar. The multinomial regression model revealed a positive association between the number of licensed beds and complete implementation of EHRs. A positive association was found between children's general medical, surgical, and heart hospitals and complete implementation of EHRs. Conversely, psychiatric and rehabilitation hospitals, limited service hospitals, hospitals participating in a network, service hospitals, government nonfederal hospitals, and nongovernment not-for-profit hospitals showed less likelihood of complete implementation of EHRs. Study findings suggest a disparity of EHR implementation between larger, for-profit hospitals and smaller, not-for-profit hospitals. Low rates of implementation were observed with psychiatric and rehabilitation hospitals. EHR policy initiatives

  18. Reliable and relevant modelling of real world data: a personal account of the development of PLS Regression

    DEFF Research Database (Denmark)

    Martens, Harald

    2001-01-01

    Why and how the Partial Least Squares Regression (PLSR) was developed, is here described from the author's perspective. The paper outlines my frustrating experiences in the 70'ies with two conflicting and equally over-ambitious and oversimplified modelling cultures - in traditional chemistry...

  19. Excited States and Photodebromination of Selected Polybrominated Diphenyl Ethers: Computational and Quantitative Structure—Property Relationship Studies

    Directory of Open Access Journals (Sweden)

    Jin Luo

    2015-01-01

    Full Text Available This paper presents a density functional theory (DFT/time-dependent DFT (TD-DFT study on the lowest lying singlet and triplet excited states of 20 selected polybrominateddiphenyl ether (PBDE congeners, with the solvation effect included in the calculations using the polarized continuum model (PCM. The results obtained showed that for most of the brominated diphenyl ether (BDE congeners, the lowest singlet excited state was initiated by the electron transfer from HOMO to LUMO, involving a π–σ* excitation. In triplet excited states, structure of the BDE congeners differed notably from that of the BDE ground states with one of the specific C–Br bonds bending off the aromatic plane. In addition, the partial least squares regression (PLSR, principal component analysis-multiple linear regression analysis (PCA-MLR, and back propagation artificial neural network (BP-ANN approaches were employed for a quantitative structure-property relationship (QSPR study. Based on the previously reported kinetic data for the debromination by ultraviolet (UV and sunlight, obtained QSPR models exhibited a reasonable evaluation of the photodebromination reactivity even when the BDE congeners had same degree of bromination, albeit different patterns of bromination.

  20. Ordinary least square regression, orthogonal regression, geometric mean regression and their applications in aerosol science

    International Nuclear Information System (INIS)

    Leng Ling; Zhang Tianyi; Kleinman, Lawrence; Zhu Wei

    2007-01-01

    Regression analysis, especially the ordinary least squares method which assumes that errors are confined to the dependent variable, has seen a fair share of its applications in aerosol science. The ordinary least squares approach, however, could be problematic due to the fact that atmospheric data often does not lend itself to calling one variable independent and the other dependent. Errors often exist for both measurements. In this work, we examine two regression approaches available to accommodate this situation. They are orthogonal regression and geometric mean regression. Comparisons are made theoretically as well as numerically through an aerosol study examining whether the ratio of organic aerosol to CO would change with age

  1. Polynomial regression analysis and significance test of the regression function

    International Nuclear Information System (INIS)

    Gao Zhengming; Zhao Juan; He Shengping

    2012-01-01

    In order to analyze the decay heating power of a certain radioactive isotope per kilogram with polynomial regression method, the paper firstly demonstrated the broad usage of polynomial function and deduced its parameters with ordinary least squares estimate. Then significance test method of polynomial regression function is derived considering the similarity between the polynomial regression model and the multivariable linear regression model. Finally, polynomial regression analysis and significance test of the polynomial function are done to the decay heating power of the iso tope per kilogram in accord with the authors' real work. (authors)

  2. Vital Signs Monitoring and Interpretation for Critically Ill Patients

    DEFF Research Database (Denmark)

    Vilic, Adnan

    . An introduced queue-based multiple linear regression (qMLR) model achieved best results with a root mean square error (RMSE) of RMSE = 3.11 on a Scandinavian Stroke Scale (SSS) where degree of disability ranged from 0 - 46. Worse outcomes were observed in patients who had pulse > 80 and a negative correlation...

  3. Reduced Rank Regression

    DEFF Research Database (Denmark)

    Johansen, Søren

    2008-01-01

    The reduced rank regression model is a multivariate regression model with a coefficient matrix with reduced rank. The reduced rank regression algorithm is an estimation procedure, which estimates the reduced rank regression model. It is related to canonical correlations and involves calculating...

  4. Regularized principal covariates regression and its application to finding coupled patterns in climate fields

    Science.gov (United States)

    Fischer, M. J.

    2014-02-01

    There are many different methods for investigating the coupling between two climate fields, which are all based on the multivariate regression model. Each different method of solving the multivariate model has its own attractive characteristics, but often the suitability of a particular method for a particular problem is not clear. Continuum regression methods search the solution space between the conventional methods and thus can find regression model subspaces that mix the attractive characteristics of the end-member subspaces. Principal covariates regression is a continuum regression method that is easily applied to climate fields and makes use of two end-members: principal components regression and redundancy analysis. In this study, principal covariates regression is extended to additionally span a third end-member (partial least squares or maximum covariance analysis). The new method, regularized principal covariates regression, has several attractive features including the following: it easily applies to problems in which the response field has missing values or is temporally sparse, it explores a wide range of model spaces, and it seeks a model subspace that will, for a set number of components, have a predictive skill that is the same or better than conventional regression methods. The new method is illustrated by applying it to the problem of predicting the southern Australian winter rainfall anomaly field using the regional atmospheric pressure anomaly field. Regularized principal covariates regression identifies four major coupled patterns in these two fields. The two leading patterns, which explain over half the variance in the rainfall field, are related to the subtropical ridge and features of the zonally asymmetric circulation.

  5. The comparison of partial least squares and principal component regression in simultaneous spectrophotometric determination of ascorbic acid, dopamine and uric acid in real samples

    Directory of Open Access Journals (Sweden)

    Habiboallah Khajehsharifi

    2017-05-01

    Full Text Available Partial least squares (PLS1 and principal component regression (PCR are two multivariate calibration methods that allow simultaneous determination of several analytes in spite of their overlapping spectra. In this research, a spectrophotometric method using PLS1 is proposed for the simultaneous determination of ascorbic acid (AA, dopamine (DA and uric acid (UA. The linear concentration ranges for AA, DA and UA were 1.76–47.55, 0.57–22.76 and 1.68–28.58 (in μg mL−1, respectively. However, PLS1 and PCR were applied to design calibration set based on absorption spectra in the 250–320 nm range for 36 different mixtures of AA, DA and UA, in all cases, the PLS1 calibration method showed more quantitative prediction ability than PCR method. Cross validation method was used to select the optimum number of principal components (NPC. The NPC for AA, DA and UA was found to be 4 by PLS1 and 5, 12, 8 by PCR. Prediction error sum of squares (PRESS of AA, DA and UA were 1.2461, 1.1144, 2.3104 for PLS1 and 11.0563, 1.3819, 4.0956 for PCR, respectively. Satisfactory results were achieved for the simultaneous determination of AA, DA and UA in some real samples such as human urine, serum and pharmaceutical formulations.

  6. Neutrophil/lymphocyte ratio and platelet/lymphocyte ratio in mood disorders: A meta-analysis.

    Science.gov (United States)

    Mazza, Mario Gennaro; Lucchi, Sara; Tringali, Agnese Grazia Maria; Rossetti, Aurora; Botti, Eugenia Rossana; Clerici, Massimo

    2018-06-08

    The immune and inflammatory system is involved in the etiology of mood disorders. Neutrophil/lymphocyte ratio (NLR), platelet/lymphocyte ratio (PLR) and monocyte/lymphocyte ratio (MLR) are inexpensive and reproducible biomarkers of inflammation. This is the first meta-analysis exploring the role of NLR and PLR in mood disorder. We identified 11 studies according to our inclusion criteria from the main Electronic Databases. Meta-analyses were carried out generating pooled standardized mean differences (SMDs) between index and healthy controls (HC). Heterogeneity was estimated. Relevant sensitivity and meta-regression analyses were conducted. Subjects with bipolar disorder (BD) had higher NLR and PLR as compared with HC (respectively SMD = 0.672; p analysis evidenced an influence of bipolar phase on the overall estimate whit studies including subjects in manic and any bipolar phase showing a significantly higher NLR and PLR as compared with HC whereas the effect was not significant among studies including only euthymic bipolar subjects. Meta-regression showed that age and sex influenced the relationship between BD and NLR but not the relationship between BD and PLR. Meta-analysis was not carried out for MLR because our search identified only one study when comparing BD to HC, and only one study when comparing MDD to HC. Subjects with major depressive disorder (MDD) had higher NLR as compared with HC (SMD = 0.670; p = 0.028; I 2  = 89.931%). Heterogeneity-based sensitivity analyses and meta-regression confirmed these findings. Our meta-analysis supports the hypothesis that an inflammatory activation occurs in mood disorders and NLR and PLR may be useful to detect this activation. More researches including comparison of NLR, PLR and MLR between different bipolar phases and between BD and MDD are needed. Copyright © 2018 Elsevier Inc. All rights reserved.

  7. Prevalence of and risk factors for retinopathy of prematurity in a ...

    African Journals Online (AJOL)

    Twenty-four previously reported risk factors for the development of ROP were identified for use in a multivariate logistic regression (MLR) analysis. Results. A total of 356 patients were included. The overall prevalence of ROP was 21.8% and that of clinically significant ROP (CSROP) 4.4%. The risk factors with a statistically ...

  8. A Monte Carlo simulation study comparing linear regression, beta regression, variable-dispersion beta regression and fractional logit regression at recovering average difference measures in a two sample design.

    Science.gov (United States)

    Meaney, Christopher; Moineddin, Rahim

    2014-01-24

    In biomedical research, response variables are often encountered which have bounded support on the open unit interval--(0,1). Traditionally, researchers have attempted to estimate covariate effects on these types of response data using linear regression. Alternative modelling strategies may include: beta regression, variable-dispersion beta regression, and fractional logit regression models. This study employs a Monte Carlo simulation design to compare the statistical properties of the linear regression model to that of the more novel beta regression, variable-dispersion beta regression, and fractional logit regression models. In the Monte Carlo experiment we assume a simple two sample design. We assume observations are realizations of independent draws from their respective probability models. The randomly simulated draws from the various probability models are chosen to emulate average proportion/percentage/rate differences of pre-specified magnitudes. Following simulation of the experimental data we estimate average proportion/percentage/rate differences. We compare the estimators in terms of bias, variance, type-1 error and power. Estimates of Monte Carlo error associated with these quantities are provided. If response data are beta distributed with constant dispersion parameters across the two samples, then all models are unbiased and have reasonable type-1 error rates and power profiles. If the response data in the two samples have different dispersion parameters, then the simple beta regression model is biased. When the sample size is small (N0 = N1 = 25) linear regression has superior type-1 error rates compared to the other models. Small sample type-1 error rates can be improved in beta regression models using bias correction/reduction methods. In the power experiments, variable-dispersion beta regression and fractional logit regression models have slightly elevated power compared to linear regression models. Similar results were observed if the

  9. Recursive N-way partial least squares for brain-computer interface.

    Directory of Open Access Journals (Sweden)

    Andrey Eliseyev

    Full Text Available In the article tensor-input/tensor-output blockwise Recursive N-way Partial Least Squares (RNPLS regression is considered. It combines the multi-way tensors decomposition with a consecutive calculation scheme and allows blockwise treatment of tensor data arrays with huge dimensions, as well as the adaptive modeling of time-dependent processes with tensor variables. In the article the numerical study of the algorithm is undertaken. The RNPLS algorithm demonstrates fast and stable convergence of regression coefficients. Applied to Brain Computer Interface system calibration, the algorithm provides an efficient adjustment of the decoding model. Combining the online adaptation with easy interpretation of results, the method can be effectively applied in a variety of multi-modal neural activity flow modeling tasks.

  10. Regression Phalanxes

    OpenAIRE

    Zhang, Hongyang; Welch, William J.; Zamar, Ruben H.

    2017-01-01

    Tomal et al. (2015) introduced the notion of "phalanxes" in the context of rare-class detection in two-class classification problems. A phalanx is a subset of features that work well for classification tasks. In this paper, we propose a different class of phalanxes for application in regression settings. We define a "Regression Phalanx" - a subset of features that work well together for prediction. We propose a novel algorithm which automatically chooses Regression Phalanxes from high-dimensi...

  11. Combining Estimation of Green Vegetation Fraction in an Arid Region from Landsat 7 ETM+ Data

    Directory of Open Access Journals (Sweden)

    Kun Jia

    2017-11-01

    Full Text Available Fractional vegetation cover (FVC, or green vegetation fraction, is an important parameter for characterizing conditions of the land surface vegetation, and also a key variable of models for simulating cycles of water, carbon and energy on the land surface. There are several types of FVC estimation models using remote sensing data, and evaluating their performance over a specific region is of great significance. Therefore, this study firstly evaluated three types of FVC estimation models using Landsat 7 ETM+ data in an agriculture region of Heihe River Basin, China, and then proposed a combination strategy from different individual models to improve the FVC estimation accuracy, which employed the multiple linear regression (MLR and Bayesian model average (BMA methods. The validation results indicated that the spectral mixture analysis model with three endmembers (SMA3 achieved the best FVC estimation accuracy (determination coefficient (R2 = 0.902, root mean square error (RMSE = 0.076 among the seven individual models using Landsat 7 ETM+ data. In addition, the MLR and BMA combination methods could both improve FVC estimation accuracy (R2 = 0.913, RMSE = 0.063 and R2 = 0.904, RMSE = 0.069 for MLR and BMA, respectively. Therefore, it could be concluded that both MLR and BMA combination methods integrating FVC estimates from different models using Landsat 7 ETM+ data could effectively weaken the estimation errors of individual models and improve the final FVC estimation accuracy.

  12. New Inference Procedures for Semiparametric Varying-Coefficient Partially Linear Cox Models

    Directory of Open Access Journals (Sweden)

    Yunbei Ma

    2014-01-01

    Full Text Available In biomedical research, one major objective is to identify risk factors and study their risk impacts, as this identification can help clinicians to both properly make a decision and increase efficiency of treatments and resource allocation. A two-step penalized-based procedure is proposed to select linear regression coefficients for linear components and to identify significant nonparametric varying-coefficient functions for semiparametric varying-coefficient partially linear Cox models. It is shown that the penalized-based resulting estimators of the linear regression coefficients are asymptotically normal and have oracle properties, and the resulting estimators of the varying-coefficient functions have optimal convergence rates. A simulation study and an empirical example are presented for illustration.

  13. Application of Artificial Neural Network and Support Vector Machines in Predicting Metabolizable Energy in Compound Feeds for Pigs.

    Science.gov (United States)

    Ahmadi, Hamed; Rodehutscord, Markus

    2017-01-01

    In the nutrition literature, there are several reports on the use of artificial neural network (ANN) and multiple linear regression (MLR) approaches for predicting feed composition and nutritive value, while the use of support vector machines (SVM) method as a new alternative approach to MLR and ANN models is still not fully investigated. The MLR, ANN, and SVM models were developed to predict metabolizable energy (ME) content of compound feeds for pigs based on the German energy evaluation system from analyzed contents of crude protein (CP), ether extract (EE), crude fiber (CF), and starch. A total of 290 datasets from standardized digestibility studies with compound feeds was provided from several institutions and published papers, and ME was calculated thereon. Accuracy and precision of developed models were evaluated, given their produced prediction values. The results revealed that the developed ANN [ R 2  = 0.95; root mean square error (RMSE) = 0.19 MJ/kg of dry matter] and SVM ( R 2  = 0.95; RMSE = 0.21 MJ/kg of dry matter) models produced better prediction values in estimating ME in compound feed than those produced by conventional MLR ( R 2  = 0.89; RMSE = 0.27 MJ/kg of dry matter). The developed ANN and SVM models produced better prediction values in estimating ME in compound feed than those produced by conventional MLR; however, there were not obvious differences between performance of ANN and SVM models. Thus, SVM model may also be considered as a promising tool for modeling the relationship between chemical composition and ME of compound feeds for pigs. To provide the readers and nutritionist with the easy and rapid tool, an Excel ® calculator, namely, SVM_ME_pig, was created to predict the metabolizable energy values in compound feeds for pigs using developed support vector machine model.

  14. Prediction of persistent hemodynamic depression after carotid angioplasty and stenting using artificial neural network model.

    Science.gov (United States)

    Jeon, Jin Pyeong; Kim, Chulho; Oh, Byoung-Doo; Kim, Sun Jeong; Kim, Yu-Seop

    2018-01-01

    To assess and compare predictive factors for persistent hemodynamic depression (PHD) after carotid artery angioplasty and stenting (CAS) using artificial neural network (ANN) and multiple logistic regression (MLR) or support vector machines (SVM) models. A retrospective data set of patients (n=76) who underwent CAS from 2007 to 2014 was used as input (training cohort) to a back-propagation ANN using TensorFlow platform. PHD was defined when systolic blood pressure was less than 90mmHg or heart rate was less 50 beats/min that lasted for more than one hour. The resulting ANN was prospectively tested in 33 patients (test cohort) and compared with MLR or SVM models according to accuracy and receiver operating characteristics (ROC) curve analysis. No significant difference in baseline characteristics between the training cohort and the test cohort was observed. PHD was observed in 21 (27.6%) patients in the training cohort and 10 (30.3%) patients in the test cohort. In the training cohort, the accuracy of ANN for the prediction of PHD was 98.7% and the area under the ROC curve (AUROC) was 0.961. In the test cohort, the number of correctly classified instances was 32 (97.0%) using the ANN model. In contrast, the accuracy rate of MLR or SVM model was both 75.8%. ANN (AUROC: 0.950; 95% CI [confidence interval]: 0.813-0.996) showed superior predictive performance compared to MLR model (AUROC: 0.796; 95% CI: 0.620-0.915, p<0.001) or SVM model (AUROC: 0.885; 95% CI: 0.725-0.969, p<0.001). The ANN model seems to have more powerful prediction capabilities than MLR or SVM model for persistent hemodynamic depression after CAS. External validation with a large cohort is needed to confirm our results. Copyright © 2017. Published by Elsevier B.V.

  15. Constructing general partial differential equations using polynomial and neural networks.

    Science.gov (United States)

    Zjavka, Ladislav; Pedrycz, Witold

    2016-01-01

    Sum fraction terms can approximate multi-variable functions on the basis of discrete observations, replacing a partial differential equation definition with polynomial elementary data relation descriptions. Artificial neural networks commonly transform the weighted sum of inputs to describe overall similarity relationships of trained and new testing input patterns. Differential polynomial neural networks form a new class of neural networks, which construct and solve an unknown general partial differential equation of a function of interest with selected substitution relative terms using non-linear multi-variable composite polynomials. The layers of the network generate simple and composite relative substitution terms whose convergent series combinations can describe partial dependent derivative changes of the input variables. This regression is based on trained generalized partial derivative data relations, decomposed into a multi-layer polynomial network structure. The sigmoidal function, commonly used as a nonlinear activation of artificial neurons, may transform some polynomial items together with the parameters with the aim to improve the polynomial derivative term series ability to approximate complicated periodic functions, as simple low order polynomials are not able to fully make up for the complete cycles. The similarity analysis facilitates substitutions for differential equations or can form dimensional units from data samples to describe real-world problems. Copyright © 2015 Elsevier Ltd. All rights reserved.

  16. Multilayer perceptron neural network-based approach for modeling phycocyanin pigment concentrations: case study from lower Charles River buoy, USA.

    Science.gov (United States)

    Heddam, Salim

    2016-09-01

    This paper proposes multilayer perceptron neural network (MLPNN) to predict phycocyanin (PC) pigment using water quality variables as predictor. In the proposed model, four water quality variables that are water temperature, dissolved oxygen, pH, and specific conductance were selected as the inputs for the MLPNN model, and the PC as the output. To demonstrate the capability and the usefulness of the MLPNN model, a total of 15,849 data measured at 15-min (15 min) intervals of time are used for the development of the model. The data are collected at the lower Charles River buoy, and available from the US Environmental Protection Agency (USEPA). For comparison purposes, a multiple linear regression (MLR) model that was frequently used for predicting water quality variables in previous studies is also built. The performances of the models are evaluated using a set of widely used statistical indices. The performance of the MLPNN and MLR models is compared with the measured data. The obtained results show that (i) the all proposed MLPNN models are more accurate than the MLR models and (ii) the results obtained are very promising and encouraging for the development of phycocyanin-predictive models.

  17. Source apportionment of PM2.5 at the Lin'an regional background site in China with three receptor models

    Science.gov (United States)

    Deng, Junjun; Zhang, Yanru; Qiu, Yuqing; Zhang, Hongliang; Du, Wenjiao; Xu, Lingling; Hong, Youwei; Chen, Yanting; Chen, Jinsheng

    2018-04-01

    Source apportionment of fine particulate matter (PM2.5) were conducted at the Lin'an Regional Atmospheric Background Station (LA) in the Yangtze River Delta (YRD) region in China from July 2014 to April 2015 with three receptor models including principal component analysis combining multiple linear regression (PCA-MLR), UNMIX and Positive Matrix Factorization (PMF). The model performance, source identification and source contribution of the three models were analyzed and inter-compared. Source apportionment of PM2.5 was also conducted with the receptor models. Good correlations between the reconstructed and measured concentrations of PM2.5 and its major chemical species were obtained for all models. PMF resolved almost all masses of PM2.5, while PCA-MLR and UNMIX explained about 80%. Five, four and seven sources were identified by PCA-MLR, UNMIX and PMF, respectively. Combustion, secondary source, marine source, dust and industrial activities were identified by all the three receptor models. Combustion source and secondary source were the major sources, and totally contributed over 60% to PM2.5. The PMF model had a better performance on separating the different combustion sources. These findings improve the understanding of PM2.5 sources in background region.

  18. Spatial interpolation schemes of daily precipitation for hydrologic modeling

    Science.gov (United States)

    Hwang, Y.; Clark, M.R.; Rajagopalan, B.; Leavesley, G.

    2012-01-01

    Distributed hydrologic models typically require spatial estimates of precipitation interpolated from sparsely located observational points to the specific grid points. We compare and contrast the performance of regression-based statistical methods for the spatial estimation of precipitation in two hydrologically different basins and confirmed that widely used regression-based estimation schemes fail to describe the realistic spatial variability of daily precipitation field. The methods assessed are: (1) inverse distance weighted average; (2) multiple linear regression (MLR); (3) climatological MLR; and (4) locally weighted polynomial regression (LWP). In order to improve the performance of the interpolations, the authors propose a two-step regression technique for effective daily precipitation estimation. In this simple two-step estimation process, precipitation occurrence is first generated via a logistic regression model before estimate the amount of precipitation separately on wet days. This process generated the precipitation occurrence, amount, and spatial correlation effectively. A distributed hydrologic model (PRMS) was used for the impact analysis in daily time step simulation. Multiple simulations suggested noticeable differences between the input alternatives generated by three different interpolation schemes. Differences are shown in overall simulation error against the observations, degree of explained variability, and seasonal volumes. Simulated streamflows also showed different characteristics in mean, maximum, minimum, and peak flows. Given the same parameter optimization technique, LWP input showed least streamflow error in Alapaha basin and CMLR input showed least error (still very close to LWP) in Animas basin. All of the two-step interpolation inputs resulted in lower streamflow error compared to the directly interpolated inputs. ?? 2011 Springer-Verlag.

  19. Cast Partial Denture versus Acrylic Partial Denture for Replacement of Missing Teeth in Partially Edentulous Patients

    Directory of Open Access Journals (Sweden)

    Pramita Suwal

    2017-03-01

    Full Text Available Aim: To compare the effects of cast partial denture with conventional all acrylic denture in respect to retention, stability, masticatory efficiency, comfort and periodontal health of abutments. Methods: 50 adult partially edentulous patient seeking for replacement of missing teeth having Kennedy class I and II arches with or without modification areas were selected for the study. Group-A was treated with cast partial denture and Group-B with acrylic partial denture. Data collected during follow-up visit of 3 months, 6 months, and 1 year by evaluating retention, stability, masticatory efficiency, comfort, periodontal health of abutment. Results: Chi-square test was applied to find out differences between the groups at 95% confidence interval where p = 0.05. One year comparison shows that cast partial denture maintained retention and stability better than acrylic partial denture (p< 0.05. The masticatory efficiency was significantly compromising from 3rd month to 1 year in all acrylic partial denture groups (p< 0.05. The comfort of patient with cast partial denture was maintained better during the observation period (p< 0.05. Periodontal health of abutment was gradually deteriorated in all acrylic denture group (p

  20. Advanced statistics: linear regression, part II: multiple linear regression.

    Science.gov (United States)

    Marill, Keith A

    2004-01-01

    The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.

  1. Boosted beta regression.

    Directory of Open Access Journals (Sweden)

    Matthias Schmid

    Full Text Available Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1. Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures.

  2. Regression to Causality : Regression-style presentation influences causal attribution

    DEFF Research Database (Denmark)

    Bordacconi, Mats Joe; Larsen, Martin Vinæs

    2014-01-01

    of equivalent results presented as either regression models or as a test of two sample means. Our experiment shows that the subjects who were presented with results as estimates from a regression model were more inclined to interpret these results causally. Our experiment implies that scholars using regression...... models – one of the primary vehicles for analyzing statistical results in political science – encourage causal interpretation. Specifically, we demonstrate that presenting observational results in a regression model, rather than as a simple comparison of means, makes causal interpretation of the results...... more likely. Our experiment drew on a sample of 235 university students from three different social science degree programs (political science, sociology and economics), all of whom had received substantial training in statistics. The subjects were asked to compare and evaluate the validity...

  3. Handbook of Partial Least Squares Concepts, Methods and Applications

    CERN Document Server

    Vinzi, Vincenzo Esposito; Henseler, Jörg

    2010-01-01

    This handbook provides a comprehensive overview of Partial Least Squares (PLS) methods with specific reference to their use in marketing and with a discussion of the directions of current research and perspectives. It covers the broad area of PLS methods, from regression to structural equation modeling applications, software and interpretation of results. The handbook serves both as an introduction for those without prior knowledge of PLS and as a comprehensive reference for researchers and practitioners interested in the most recent advances in PLS methodology.

  4. Application of two regression-based methods to estimate the effects of partial harvest on forest structure using Landsat data.

    Science.gov (United States)

    S.P. Healey; Z. Yang; W.B. Cohen; D.J. Pierce

    2006-01-01

    Although partial harvests are common in many forest types globally, there has been little assessment of the potential to map the intensity of these harvests using Landsat data. We modeled basal area removal and percentage cover change in a study area in central Washington (northwestern USA) using biennial Landsat imagery and reference data from historical aerial photos...

  5. Predicting water solubility of congeners: Chloronaphthalenes-A case study

    Energy Technology Data Exchange (ETDEWEB)

    Puzyn, Tomasz, E-mail: puzi@qsar.eu.org [Faculty of Chemistry, University of Gdansk, Sobieskiego 18, 80-952 Gdansk (Poland); Mostrag, Aleksandra; Falandysz, Jerzy [Faculty of Chemistry, University of Gdansk, Sobieskiego 18, 80-952 Gdansk (Poland); Kholod, Yana; Leszczynski, Jerzy [NSF CREST Nanotoxicity Center, Department of Chemistry, Jackson State University, 1325 Lynch St, Jackson, MS 39217-0510 (United States)

    2009-10-30

    Since the important physicochemical data for chloronaphtalenes (PCNs) are still scarce, we have predicted water solubility (log S) of all 75 congeners with the Quantitative Structure-Property Relationship (QSPR) scheme. The values of log S, predicted by the most efficient model, varied from 0.01 to 1660 {mu}g dm{sup -3} (2.85 x 10{sup -11}-1.02 x 10{sup -5} mol dm{sup -3}), depending on the number of chlorine atoms present in the molecule and the substitution pattern. We found that the main factor determining relative differences in solubility between the congeners is the solvent accessible volume related to the cavitation process occurring in the solvent. The results are presented as a case study of QSPR modeling for those Persistent Organic Pollutants (POPs) that exist as families of congeners. By investigating the impact of (i) the way of the molecular descriptors' calculation, (ii) the size of applied database and (iii) chemometric method of modeling (Multiple Linear Regression, MLR, and/or Partial Least Squares regression, PLS) on the quality of the models we proposed general recommendations for dealing with congeners. We found that the combination of the B3LYP functional with 6-311++G(d,p) basis set was the most optimal technique of the molecular descriptors' calculation for congeners when comparing with semi-empirical PM3, ab initio Hartee-Fock (HF), and Moller-Pleset 2 (MP2) method carried out with different-size basis sets. Moreover, the model developed with a larger and more general database that includes chloronaphthalenes, polychlorinated dibezno-p-dioxins, furans and biphenyls predicted the values of log S for PCNs noticeable worse than the model calibrated only on PCNs. In the later case it was possible to obtain satisfactory results by employing even the simplest MLR method and only one molecular descriptor. The values of log S were also calculated with the WSKOWIN and COSMO-RS models as the reference techniques and then compared to our

  6. Predicting water solubility of congeners: Chloronaphthalenes-A case study

    International Nuclear Information System (INIS)

    Puzyn, Tomasz; Mostrag, Aleksandra; Falandysz, Jerzy; Kholod, Yana; Leszczynski, Jerzy

    2009-01-01

    Since the important physicochemical data for chloronaphtalenes (PCNs) are still scarce, we have predicted water solubility (log S) of all 75 congeners with the Quantitative Structure-Property Relationship (QSPR) scheme. The values of log S, predicted by the most efficient model, varied from 0.01 to 1660 μg dm -3 (2.85 x 10 -11 -1.02 x 10 -5 mol dm -3 ), depending on the number of chlorine atoms present in the molecule and the substitution pattern. We found that the main factor determining relative differences in solubility between the congeners is the solvent accessible volume related to the cavitation process occurring in the solvent. The results are presented as a case study of QSPR modeling for those Persistent Organic Pollutants (POPs) that exist as families of congeners. By investigating the impact of (i) the way of the molecular descriptors' calculation, (ii) the size of applied database and (iii) chemometric method of modeling (Multiple Linear Regression, MLR, and/or Partial Least Squares regression, PLS) on the quality of the models we proposed general recommendations for dealing with congeners. We found that the combination of the B3LYP functional with 6-311++G(d,p) basis set was the most optimal technique of the molecular descriptors' calculation for congeners when comparing with semi-empirical PM3, ab initio Hartee-Fock (HF), and Moller-Pleset 2 (MP2) method carried out with different-size basis sets. Moreover, the model developed with a larger and more general database that includes chloronaphthalenes, polychlorinated dibezno-p-dioxins, furans and biphenyls predicted the values of log S for PCNs noticeable worse than the model calibrated only on PCNs. In the later case it was possible to obtain satisfactory results by employing even the simplest MLR method and only one molecular descriptor. The values of log S were also calculated with the WSKOWIN and COSMO-RS models as the reference techniques and then compared to our results.

  7. Probabilistic inference of fatigue damage propagation with limited and partial information

    Directory of Open Access Journals (Sweden)

    Huang Min

    2015-08-01

    Full Text Available A general method of probabilistic fatigue damage prognostics using limited and partial information is developed. Limited and partial information refers to measurable data that are not enough or cannot directly be used to statistically identify model parameter using traditional regression analysis. In the proposed method, the prior probability distribution of model parameters is derived based on the principle of maximum entropy (MaxEnt using the limited and partial information as constraints. The posterior distribution is formulated using the principle of maximum relative entropy (MRE to perform probability updating when new information is available and reduces uncertainty in prognosis results. It is shown that the posterior distribution is equivalent to a Bayesian posterior when the new information used for updating is point measurements. A numerical quadrature interpolating method is used to calculate the asymptotic approximation for the prior distribution. Once the prior is obtained, subsequent measurement data are used to perform updating using Markov chain Monte Carlo (MCMC simulations. Fatigue crack prognosis problems with experimental data are presented for demonstration and validation.

  8. Investigating oral health-related quality of life and self-perceived satisfaction with partial dentures.

    Science.gov (United States)

    Abuzar, Menaka A; Kahwagi, Esperance; Yamakawa, Takeshi

    2012-05-01

    To investigate the prevalence and severity of oral health-related quality of life in patients treated with removable partial dentures at a publicly-funded dental hospital. The association between patients' demographic profiles, denture-related, variables and oral health-related quality of life was also investigated. A questionnaire was designed to investigate the use and satisfaction of removable partial dentures, and oral health-related quality of life of removable partial denture wearers using the Oral Health Impact Profile-14. The questionnaire was administered to 740 randomly-selected patients who received removable partial dentures during 2005-2008. The response rate was 31.35%. Non-parametric tests and a logistic regression model were used to analyze the association between denture-related variables and oral health-related quality of life. A question on symptoms unrelated to dentures was also analyzed. The Oral Health Impact Profile-14 prevalence calculated was 43.1%. The removable partial denture experience and frequency of use was inversely associated with Oral Health Impact Profile-14 scores. Metal-based removable partial dentures were associated with lower Oral Health Impact Profile prevalence and severity scores. No significant association was found between demographic profile, circumstance for provision of removable partial dentures and Oral Health Impact Profile-14 score. The participants of this study indicated that perceived denture performance, removable partial dentures material, experience, and frequency of use are associated with oral health-related quality of life. © 2012 Blackwell Publishing Asia Pty Ltd.

  9. Does the Magnitude of the Link between Unemployment and Crime Depend on the Crime Level? A Quantile Regression Approach

    Directory of Open Access Journals (Sweden)

    Horst Entorf

    2015-07-01

    Full Text Available Two alternative hypotheses – referred to as opportunity- and stigma-based behavior – suggest that the magnitude of the link between unemployment and crime also depends on preexisting local crime levels. In order to analyze conjectured nonlinearities between both variables, we use quantile regressions applied to German district panel data. While both conventional OLS and quantile regressions confirm the positive link between unemployment and crime for property crimes, results for assault differ with respect to the method of estimation. Whereas conventional mean regressions do not show any significant effect (which would confirm the usual result found for violent crimes in the literature, quantile regression reveals that size and importance of the relationship are conditional on the crime rate. The partial effect is significantly positive for moderately low and median quantiles of local assault rates.

  10. The role of chemometrics in single and sequential extraction assays: a review. Part II. Cluster analysis, multiple linear regression, mixture resolution, experimental design and other techniques.

    Science.gov (United States)

    Giacomino, Agnese; Abollino, Ornella; Malandrino, Mery; Mentasti, Edoardo

    2011-03-04

    Single and sequential extraction procedures are used for studying element mobility and availability in solid matrices, like soils, sediments, sludge, and airborne particulate matter. In the first part of this review we reported an overview on these procedures and described the applications of chemometric uni- and bivariate techniques and of multivariate pattern recognition techniques based on variable reduction to the experimental results obtained. The second part of the review deals with the use of chemometrics not only for the visualization and interpretation of data, but also for the investigation of the effects of experimental conditions on the response, the optimization of their values and the calculation of element fractionation. We will describe the principles of the multivariate chemometric techniques considered, the aims for which they were applied and the key findings obtained. The following topics will be critically addressed: pattern recognition by cluster analysis (CA), linear discriminant analysis (LDA) and other less common techniques; modelling by multiple linear regression (MLR); investigation of spatial distribution of variables by geostatistics; calculation of fractionation patterns by a mixture resolution method (Chemometric Identification of Substrates and Element Distributions, CISED); optimization and characterization of extraction procedures by experimental design; other multivariate techniques less commonly applied. Copyright © 2010 Elsevier B.V. All rights reserved.

  11. Predicting the fidelity of JPEG2000 compressed CT images using DICOM header information

    International Nuclear Information System (INIS)

    Kim, Kil Joong; Kim, Bohyoung; Lee, Hyunna; Choi, Hosik; Jeon, Jong-June; Ahn, Jeong-Hwan; Lee, Kyoung Ho

    2011-01-01

    Purpose: To propose multiple logistic regression (MLR) and artificial neural network (ANN) models constructed using digital imaging and communications in medicine (DICOM) header information in predicting the fidelity of Joint Photographic Experts Group (JPEG) 2000 compressed abdomen computed tomography (CT) images. Methods: Our institutional review board approved this study and waived informed patient consent. Using a JPEG2000 algorithm, 360 abdomen CT images were compressed reversibly (n = 48, as negative control) or irreversibly (n = 312) to one of different compression ratios (CRs) ranging from 4:1 to 10:1. Five radiologists independently determined whether the original and compressed images were distinguishable or indistinguishable. The 312 irreversibly compressed images were divided randomly into training (n = 156) and testing (n = 156) sets. The MLR and ANN models were constructed regarding the DICOM header information as independent variables and the pooled radiologists' responses as dependent variable. As independent variables, we selected the CR (DICOM tag number: 0028, 2112), effective tube current-time product (0018, 9332), section thickness (0018, 0050), and field of view (0018, 0090) among the DICOM tags. Using the training set, an optimal subset of independent variables was determined by backward stepwise selection in a four-fold cross-validation scheme. The MLR and ANN models were constructed with the determined independent variables using the training set. The models were then evaluated on the testing set by using receiver-operating-characteristic (ROC) analysis regarding the radiologists' pooled responses as the reference standard and by measuring Spearman rank correlation between the model prediction and the number of radiologists who rated the two images as distinguishable. Results: The CR and section thickness were determined as the optimal independent variables. The areas under the ROC curve for the MLR and ANN predictions were 0.91 (95% CI; 0

  12. Regression analysis with categorized regression calibrated exposure: some interesting findings

    Directory of Open Access Journals (Sweden)

    Hjartåker Anette

    2006-07-01

    Full Text Available Abstract Background Regression calibration as a method for handling measurement error is becoming increasingly well-known and used in epidemiologic research. However, the standard version of the method is not appropriate for exposure analyzed on a categorical (e.g. quintile scale, an approach commonly used in epidemiologic studies. A tempting solution could then be to use the predicted continuous exposure obtained through the regression calibration method and treat it as an approximation to the true exposure, that is, include the categorized calibrated exposure in the main regression analysis. Methods We use semi-analytical calculations and simulations to evaluate the performance of the proposed approach compared to the naive approach of not correcting for measurement error, in situations where analyses are performed on quintile scale and when incorporating the original scale into the categorical variables, respectively. We also present analyses of real data, containing measures of folate intake and depression, from the Norwegian Women and Cancer study (NOWAC. Results In cases where extra information is available through replicated measurements and not validation data, regression calibration does not maintain important qualities of the true exposure distribution, thus estimates of variance and percentiles can be severely biased. We show that the outlined approach maintains much, in some cases all, of the misclassification found in the observed exposure. For that reason, regression analysis with the corrected variable included on a categorical scale is still biased. In some cases the corrected estimates are analytically equal to those obtained by the naive approach. Regression calibration is however vastly superior to the naive method when applying the medians of each category in the analysis. Conclusion Regression calibration in its most well-known form is not appropriate for measurement error correction when the exposure is analyzed on a

  13. Order Selection for General Expression of Nonlinear Autoregressive Model Based on Multivariate Stepwise Regression

    Science.gov (United States)

    Shi, Jinfei; Zhu, Songqing; Chen, Ruwen

    2017-12-01

    An order selection method based on multiple stepwise regressions is proposed for General Expression of Nonlinear Autoregressive model which converts the model order problem into the variable selection of multiple linear regression equation. The partial autocorrelation function is adopted to define the linear term in GNAR model. The result is set as the initial model, and then the nonlinear terms are introduced gradually. Statistics are chosen to study the improvements of both the new introduced and originally existed variables for the model characteristics, which are adopted to determine the model variables to retain or eliminate. So the optimal model is obtained through data fitting effect measurement or significance test. The simulation and classic time-series data experiment results show that the method proposed is simple, reliable and can be applied to practical engineering.

  14. Recurrent Partial Words

    Directory of Open Access Journals (Sweden)

    Francine Blanchet-Sadri

    2011-08-01

    Full Text Available Partial words are sequences over a finite alphabet that may contain wildcard symbols, called holes, which match or are compatible with all letters; partial words without holes are said to be full words (or simply words. Given an infinite partial word w, the number of distinct full words over the alphabet that are compatible with factors of w of length n, called subwords of w, refers to a measure of complexity of infinite partial words so-called subword complexity. This measure is of particular interest because we can construct partial words with subword complexities not achievable by full words. In this paper, we consider the notion of recurrence over infinite partial words, that is, we study whether all of the finite subwords of a given infinite partial word appear infinitely often, and we establish connections between subword complexity and recurrence in this more general framework.

  15. Radiocolloid studies of the regression of intrasplenic lesions

    International Nuclear Information System (INIS)

    Spencer, R.P.; Karimeddini, M.K.; Sziklas, J.J.; Gupta, S.M.; Rosenberg, R.J.

    1982-01-01

    Five cases are presented in which intrasplenic defects, noted on /sup 99m/Tc sulfur colloid imaging, had at least partially regressed on follow-up studies. One, representing splenic trauma, reinforced the concept of the ability of the spleen to heal itself. A second case involved splenic invasion by direct extension of a soft tissue sarcoma. Improvement was noted after the patient was treated with chemotherapy. Three cases were related to splenic manifestations of lymphoma. Of these three patients (one each with lymphocytic, histiocytic, and mixed diffuse histiocytic lymphoma plus nodular), two showed improvement after treatment with chemotherapy alone and the third after combined chemotherapy and external radiation treatment. Return of splenic reticuloendothelial function to previously involved regions within the spleen occurred for all five patients. Comments were made as to the apparent rate of return of function

  16. Radiocolloid studies of the regression of intrasplenic lesions

    Energy Technology Data Exchange (ETDEWEB)

    Spencer, R.P.; Karimeddini, M.K.; Sziklas, J.J.; Gupta, S.M.; Rosenberg, R.J.

    1982-07-01

    Five cases are presented in which intrasplenic defects, noted on /sup 99m/Tc sulfur colloid imaging, had at least partially regressed on follow-up studies. One, representing splenic trauma, reinforced the concept of the ability of the spleen to heal itself. A second case involved splenic invasion by direct extension of a soft tissue sarcoma. Improvement was noted after the patient was treated with chemotherapy. Three cases were related to splenic manifestations of lymphoma. Of these three patients (one each with lymphocytic, histiocytic, and mixed diffuse histiocytic lymphoma plus nodular), two showed improvement after treatment with chemotherapy alone and the third after combined chemotherapy and external radiation treatment. Return of splenic reticuloendothelial function to previously involved regions within the spleen occurred for all five patients. Comments were made as to the apparent rate of return of function.

  17. Linking Simple Economic Theory Models and the Cointegrated Vector AutoRegressive Model

    DEFF Research Database (Denmark)

    Møller, Niels Framroze

    This paper attempts to clarify the connection between simple economic theory models and the approach of the Cointegrated Vector-Auto-Regressive model (CVAR). By considering (stylized) examples of simple static equilibrium models, it is illustrated in detail, how the theoretical model and its stru....... Further fundamental extensions and advances to more sophisticated theory models, such as those related to dynamics and expectations (in the structural relations) are left for future papers......This paper attempts to clarify the connection between simple economic theory models and the approach of the Cointegrated Vector-Auto-Regressive model (CVAR). By considering (stylized) examples of simple static equilibrium models, it is illustrated in detail, how the theoretical model and its......, it is demonstrated how other controversial hypotheses such as Rational Expectations can be formulated directly as restrictions on the CVAR-parameters. A simple example of a "Neoclassical synthetic" AS-AD model is also formulated. Finally, the partial- general equilibrium distinction is related to the CVAR as well...

  18. Molecular structure-adsorption study on current textile dyes.

    Science.gov (United States)

    Örücü, E; Tugcu, G; Saçan, M T

    2014-01-01

    This study was performed to investigate the adsorption of a diverse set of textile dyes onto granulated activated carbon (GAC). The adsorption experiments were carried out in a batch system. The Langmuir and Freundlich isotherm models were applied to experimental data and the isotherm constants were calculated for 33 anthraquinone and azo dyes. The adsorption equilibrium data fitted more adequately to the Langmuir isotherm model than the Freundlich isotherm model. Added to a qualitative analysis of experimental results, multiple linear regression (MLR), support vector regression (SVR) and back propagation neural network (BPNN) methods were used to develop quantitative structure-property relationship (QSPR) models with the novel adsorption data. The data were divided randomly into training and test sets. The predictive ability of all models was evaluated using the test set. Descriptors were selected with a genetic algorithm (GA) using QSARINS software. Results related to QSPR models on the adsorption capacity of GAC showed that molecular structure of dyes was represented by ionization potential based on two-dimensional topological distances, chromophoric features and a property filter index. Comparison of the performance of the models demonstrated the superiority of the BPNN over GA-MLR and SVR models.

  19. Time-adaptive quantile regression

    DEFF Research Database (Denmark)

    Møller, Jan Kloppenborg; Nielsen, Henrik Aalborg; Madsen, Henrik

    2008-01-01

    and an updating procedure are combined into a new algorithm for time-adaptive quantile regression, which generates new solutions on the basis of the old solution, leading to savings in computation time. The suggested algorithm is tested against a static quantile regression model on a data set with wind power......An algorithm for time-adaptive quantile regression is presented. The algorithm is based on the simplex algorithm, and the linear optimization formulation of the quantile regression problem is given. The observations have been split to allow a direct use of the simplex algorithm. The simplex method...... production, where the models combine splines and quantile regression. The comparison indicates superior performance for the time-adaptive quantile regression in all the performance parameters considered....

  20. Forecasting Electricity Demand in Thailand with an Artificial Neural Network Approach

    Directory of Open Access Journals (Sweden)

    Karin Kandananond

    2011-08-01

    Full Text Available Demand planning for electricity consumption is a key success factor for the development of any countries. However, this can only be achieved if the demand is forecasted accurately. In this research, different forecasting methods—autoregressive integrated moving average (ARIMA, artificial neural network (ANN and multiple linear regression (MLR—were utilized to formulate prediction models of the electricity demand in Thailand. The objective was to compare the performance of these three approaches and the empirical data used in this study was the historical data regarding the electricity demand (population, gross domestic product: GDP, stock index, revenue from exporting industrial products and electricity consumption in Thailand from 1986 to 2010. The results showed that the ANN model reduced the mean absolute percentage error (MAPE to 0.996%, while those of ARIMA and MLR were 2.80981 and 3.2604527%, respectively. Based on these error measures, the results indicated that the ANN approach outperformed the ARIMA and MLR methods in this scenario. However, the paired test indicated that there was no significant difference among these methods at α = 0.05. According to the principle of parsimony, the ARIMA and MLR models might be preferable to the ANN one because of their simple structure and competitive performance

  1. Principal Covariates Clusterwise Regression (PCCR): Accounting for Multicollinearity and Population Heterogeneity in Hierarchically Organized Data.

    Science.gov (United States)

    Wilderjans, Tom Frans; Vande Gaer, Eva; Kiers, Henk A L; Van Mechelen, Iven; Ceulemans, Eva

    2017-03-01

    In the behavioral sciences, many research questions pertain to a regression problem in that one wants to predict a criterion on the basis of a number of predictors. Although in many cases, ordinary least squares regression will suffice, sometimes the prediction problem is more challenging, for three reasons: first, multiple highly collinear predictors can be available, making it difficult to grasp their mutual relations as well as their relations to the criterion. In that case, it may be very useful to reduce the predictors to a few summary variables, on which one regresses the criterion and which at the same time yields insight into the predictor structure. Second, the population under study may consist of a few unknown subgroups that are characterized by different regression models. Third, the obtained data are often hierarchically structured, with for instance, observations being nested into persons or participants within groups or countries. Although some methods have been developed that partially meet these challenges (i.e., principal covariates regression (PCovR), clusterwise regression (CR), and structural equation models), none of these methods adequately deals with all of them simultaneously. To fill this gap, we propose the principal covariates clusterwise regression (PCCR) method, which combines the key idea's behind PCovR (de Jong & Kiers in Chemom Intell Lab Syst 14(1-3):155-164, 1992) and CR (Späth in Computing 22(4):367-373, 1979). The PCCR method is validated by means of a simulation study and by applying it to cross-cultural data regarding satisfaction with life.

  2. Bone marrow endothelial progenitors augment atherosclerotic plaque regression in a mouse model of plasma lipid lowering

    Science.gov (United States)

    Yao, Longbiao; Heuser-Baker, Janet; Herlea-Pana, Oana; Iida, Ryuji; Wang, Qilong; Zou, Ming-Hui; Barlic-Dicen, Jana

    2012-01-01

    The major event initiating atherosclerosis is hypercholesterolemia-induced disruption of vascular endothelium integrity. In settings of endothelial damage, endothelial progenitor cells (EPCs) are mobilized from bone marrow into circulation and home to sites of vascular injury where they aid endothelial regeneration. Given the beneficial effects of EPCs in vascular repair, we hypothesized that these cells play a pivotal role in atherosclerosis regression. We tested our hypothesis in the atherosclerosis-prone mouse model in which hypercholesterolemia, one of the main factors affecting EPC homeostasis, is reversible (Reversa mice). In these mice normalization of plasma lipids decreased atherosclerotic burden; however, plaque regression was incomplete. To explore whether endothelial progenitors contribute to atherosclerosis regression, bone marrow EPCs from a transgenic strain expressing green fluorescent protein under the control of endothelial cell-specific Tie2 promoter (Tie2-GFP+) were isolated. These cells were then adoptively transferred into atheroregressing Reversa recipients where they augmented plaque regression induced by reversal of hypercholesterolemia. Advanced plaque regression correlated with engraftment of Tie2-GFP+ EPCs into endothelium and resulted in an increase in atheroprotective nitric oxide and improved vascular relaxation. Similarly augmented plaque regression was also detected in regressing Reversa mice treated with the stem cell mobilizer AMD3100 which also mobilizes EPCs to peripheral blood. We conclude that correction of hypercholesterolemia in Reversa mice leads to partial plaque regression that can be augmented by AMD3100 treatment or by adoptive transfer of EPCs. This suggests that direct cell therapy or indirect progenitor cell mobilization therapy may be used in combination with statins to treat atherosclerosis. PMID:23081735

  3. Estimation of lung tumor position from multiple anatomical features on 4D-CT using multiple regression analysis.

    Science.gov (United States)

    Ono, Tomohiro; Nakamura, Mitsuhiro; Hirose, Yoshinori; Kitsuda, Kenji; Ono, Yuka; Ishigaki, Takashi; Hiraoka, Masahiro

    2017-09-01

    To estimate the lung tumor position from multiple anatomical features on four-dimensional computed tomography (4D-CT) data sets using single regression analysis (SRA) and multiple regression analysis (MRA) approach and evaluate an impact of the approach on internal target volume (ITV) for stereotactic body radiotherapy (SBRT) of the lung. Eleven consecutive lung cancer patients (12 cases) underwent 4D-CT scanning. The three-dimensional (3D) lung tumor motion exceeded 5 mm. The 3D tumor position and anatomical features, including lung volume, diaphragm, abdominal wall, and chest wall positions, were measured on 4D-CT images. The tumor position was estimated by SRA using each anatomical feature and MRA using all anatomical features. The difference between the actual and estimated tumor positions was defined as the root-mean-square error (RMSE). A standard partial regression coefficient for the MRA was evaluated. The 3D lung tumor position showed a high correlation with the lung volume (R = 0.92 ± 0.10). Additionally, ITVs derived from SRA and MRA approaches were compared with ITV derived from contouring gross tumor volumes on all 10 phases of the 4D-CT (conventional ITV). The RMSE of the SRA was within 3.7 mm in all directions. Also, the RMSE of the MRA was within 1.6 mm in all directions. The standard partial regression coefficient for the lung volume was the largest and had the most influence on the estimated tumor position. Compared with conventional ITV, average percentage decrease of ITV were 31.9% and 38.3% using SRA and MRA approaches, respectively. The estimation accuracy of lung tumor position was improved by the MRA approach, which provided smaller ITV than conventional ITV. © 2017 The Authors. Journal of Applied Clinical Medical Physics published by Wiley Periodicals, Inc. on behalf of American Association of Physicists in Medicine.

  4. Regression analysis by example

    CERN Document Server

    Chatterjee, Samprit

    2012-01-01

    Praise for the Fourth Edition: ""This book is . . . an excellent source of examples for regression analysis. It has been and still is readily readable and understandable."" -Journal of the American Statistical Association Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. Regression Analysis by Example, Fifth Edition has been expanded

  5. Applied logistic regression

    CERN Document Server

    Hosmer, David W; Sturdivant, Rodney X

    2013-01-01

     A new edition of the definitive guide to logistic regression modeling for health science and other applications This thoroughly expanded Third Edition provides an easily accessible introduction to the logistic regression (LR) model and highlights the power of this model by examining the relationship between a dichotomous outcome and a set of covariables. Applied Logistic Regression, Third Edition emphasizes applications in the health sciences and handpicks topics that best suit the use of modern statistical software. The book provides readers with state-of-

  6. Application of the step-wise regression procedure to the semi-empirical formulae of the nuclear binding energy

    International Nuclear Information System (INIS)

    Eissa, E.A.; Ayad, M.; Gashier, F.A.B.

    1984-01-01

    Most of the binding energy semi-empirical terms without the deformation corrections used by P.A. Seeger are arranged in a multiple linear regression form. The stepwise regression procedure with 95% confidence levels for acceptance and rejection of variables is applied for seeking a model for calculating binding energies of even-even (E-E) nuclei through a significance testing of each basic term. Partial F-values are taken as estimates for the significance of each term. The residual standard deviation and the overall F-value are used for selecting the best linear regression model. (E-E) nuclei are taken into sets lying between two successive proton and neutron magic numbers. The present work is in favour of the magic number 126 followed by 164 for the neutrons and indecisive in supporting the recently predicted proton magic number 114 rather than the previous one, 126. (author)

  7. Normalization Ridge Regression in Practice I: Comparisons Between Ordinary Least Squares, Ridge Regression and Normalization Ridge Regression.

    Science.gov (United States)

    Bulcock, J. W.

    The problem of model estimation when the data are collinear was examined. Though the ridge regression (RR) outperforms ordinary least squares (OLS) regression in the presence of acute multicollinearity, it is not a problem free technique for reducing the variance of the estimates. It is a stochastic procedure when it should be nonstochastic and it…

  8. Predicting heavy metal concentrations in soils and plants using field spectrophotometry

    Science.gov (United States)

    Muradyan, V.; Tepanosyan, G.; Asmaryan, Sh.; Sahakyan, L.; Saghatelyan, A.; Warner, T. A.

    2017-09-01

    Aim of this study is to predict heavy metal (HM) concentrations in soils and plants using field remote sensing methods. The studied sites were an industrial town of Kajaran and city of Yerevan. The research also included sampling of soils and leaves of two tree species exposed to different pollution levels and determination of contents of HM in lab conditions. The obtained spectral values were then collated with contents of HM in Kajaran soils and the tree leaves sampled in Yerevan, and statistical analysis was done. Consequently, Zn and Pb have a negative correlation coefficient (p regression models and artificial neural network (ANN) for HM prediction were developed. Good results were obtained for the best stress sensitive spectral band ANN (R2 0.9, RPD 2.0), Simple Linear Regression (SLR) and Partial Least Squares Regression (PLSR) (R2 0.7, RPD 1.4) models. Multiple Linear Regression (MLR) model was not applicable to predict Pb and Zn concentrations in soils in this research. Almost all full spectrum PLS models provide good calibration and validation results (RPD>1.4). Full spectrum ANN models are characterized by excellent calibration R2, rRMSE and RPD (0.9; 0.1 and >2.5 respectively). For prediction of Pb and Ni contents in plants SLR and PLS models were used. The latter provide almost the same results. Our findings indicate that it is possible to make coarse direct estimation of HM content in soils and plants using rapid and economic reflectance spectroscopy.

  9. Factors Contributing to Pelvis Instability in Female Adolescent Athletes During Unilateral Repeated Partial Squat Activity

    Science.gov (United States)

    Scarborough, Donna Moxley; Linderman, Shannon; Berkson, Eric M.; Oh, Luke S.

    2017-01-01

    Objectives: Unilateral partial squat tasks are often used to assess athletes’ lower extremity (LE) neuromuscular control. Single squat biomechanics such as lateral drop of the non-stance limb’s pelvis have been linked to knee injury risk. Yet, there are limited studies on the factors contributing to pelvic instability during the unilateral partial squat such as anatomical alignment of the knee and hip strength. The purpose of this study was 1) to assess the influence of leg dominance on pelvic drop among female athletes during the repeated unilateral partial squat activity and 2) to investigate the contributions that lower limb kinematics and hip strength have on pelvis drop. Methods: 42 female athletes (27= softball pitchers, 15=gymnasts, avg age=16.48 ± 2.54 years) underwent lower limb assessment. The quadriceps angle (Q angle) and the average of 3 trials for hip abduction and extension strength (handheld dynamometer measurements) were used for analyses. 3D biomechanical analysis of the repeated unilateral partial squat activity followed using a 20 motion capture camera system which created a 15 segment model of each subject. The subject stood on one leg at the lateral edge of a 17.78 cm box with hands placed on the hips and squatted so that the free hanging contralateral limb came as close to the ground without contact for 5 continuous repetitions. One trial for each limb was performed. Peak pelvic drop and ankle, knee and hip angles and torques (normalized by weight) at this time point were calculated using Visual 3D (C-Motion) biomechanical software. Paired T-test, Spearman correlations and multiple regression model statistical analyses were performed. Results: Peak pelvic drop during the unilateral partial squat did not differ significantly on the basis of limb dominance (p=0.831, Dom: -3.40 ± 5.10° , ND: -3.46 ± 4.44°). Peak pelvic drop displayed a Spearman correlation with the functional measure of hip abduction/adduction (ABD/ADD) angle (rs= 0

  10. Vector regression introduced

    Directory of Open Access Journals (Sweden)

    Mok Tik

    2014-06-01

    Full Text Available This study formulates regression of vector data that will enable statistical analysis of various geodetic phenomena such as, polar motion, ocean currents, typhoon/hurricane tracking, crustal deformations, and precursory earthquake signals. The observed vector variable of an event (dependent vector variable is expressed as a function of a number of hypothesized phenomena realized also as vector variables (independent vector variables and/or scalar variables that are likely to impact the dependent vector variable. The proposed representation has the unique property of solving the coefficients of independent vector variables (explanatory variables also as vectors, hence it supersedes multivariate multiple regression models, in which the unknown coefficients are scalar quantities. For the solution, complex numbers are used to rep- resent vector information, and the method of least squares is deployed to estimate the vector model parameters after transforming the complex vector regression model into a real vector regression model through isomorphism. Various operational statistics for testing the predictive significance of the estimated vector parameter coefficients are also derived. A simple numerical example demonstrates the use of the proposed vector regression analysis in modeling typhoon paths.

  11. Applied linear regression

    CERN Document Server

    Weisberg, Sanford

    2013-01-01

    Praise for the Third Edition ""...this is an excellent book which could easily be used as a course text...""-International Statistical Institute The Fourth Edition of Applied Linear Regression provides a thorough update of the basic theory and methodology of linear regression modeling. Demonstrating the practical applications of linear regression analysis techniques, the Fourth Edition uses interesting, real-world exercises and examples. Stressing central concepts such as model building, understanding parameters, assessing fit and reliability, and drawing conclusions, the new edition illus

  12. Non-proportional odds multivariate logistic regression of ordinal family data.

    Science.gov (United States)

    Zaloumis, Sophie G; Scurrah, Katrina J; Harrap, Stephen B; Ellis, Justine A; Gurrin, Lyle C

    2015-03-01

    Methods to examine whether genetic and/or environmental sources can account for the residual variation in ordinal family data usually assume proportional odds. However, standard software to fit the non-proportional odds model to ordinal family data is limited because the correlation structure of family data is more complex than for other types of clustered data. To perform these analyses we propose the non-proportional odds multivariate logistic regression model and take a simulation-based approach to model fitting using Markov chain Monte Carlo methods, such as partially collapsed Gibbs sampling and the Metropolis algorithm. We applied the proposed methodology to male pattern baldness data from the Victorian Family Heart Study. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  13. Color measurement of tea leaves at different drying periods using hyperspectral imaging technique.

    Science.gov (United States)

    Xie, Chuanqi; Li, Xiaoli; Shao, Yongni; He, Yong

    2014-01-01

    This study investigated the feasibility of using hyperspectral imaging technique for nondestructive measurement of color components (ΔL*, Δa* and Δb*) and classify tea leaves during different drying periods. Hyperspectral images of tea leaves at five drying periods were acquired in the spectral region of 380-1030 nm. The three color features were measured by the colorimeter. Different preprocessing algorithms were applied to select the best one in accordance with the prediction results of partial least squares regression (PLSR) models. Competitive adaptive reweighted sampling (CARS) and successive projections algorithm (SPA) were used to identify the effective wavelengths, respectively. Different models (least squares-support vector machine [LS-SVM], PLSR, principal components regression [PCR] and multiple linear regression [MLR]) were established to predict the three color components, respectively. SPA-LS-SVM model performed excellently with the correlation coefficient (rp) of 0.929 for ΔL*, 0.849 for Δa*and 0.917 for Δb*, respectively. LS-SVM model was built for the classification of different tea leaves. The correct classification rates (CCRs) ranged from 89.29% to 100% in the calibration set and from 71.43% to 100% in the prediction set, respectively. The total classification results were 96.43% in the calibration set and 85.71% in the prediction set. The result showed that hyperspectral imaging technique could be used as an objective and nondestructive method to determine color features and classify tea leaves at different drying periods.

  14. Color measurement of tea leaves at different drying periods using hyperspectral imaging technique.

    Directory of Open Access Journals (Sweden)

    Chuanqi Xie

    Full Text Available This study investigated the feasibility of using hyperspectral imaging technique for nondestructive measurement of color components (ΔL*, Δa* and Δb* and classify tea leaves during different drying periods. Hyperspectral images of tea leaves at five drying periods were acquired in the spectral region of 380-1030 nm. The three color features were measured by the colorimeter. Different preprocessing algorithms were applied to select the best one in accordance with the prediction results of partial least squares regression (PLSR models. Competitive adaptive reweighted sampling (CARS and successive projections algorithm (SPA were used to identify the effective wavelengths, respectively. Different models (least squares-support vector machine [LS-SVM], PLSR, principal components regression [PCR] and multiple linear regression [MLR] were established to predict the three color components, respectively. SPA-LS-SVM model performed excellently with the correlation coefficient (rp of 0.929 for ΔL*, 0.849 for Δa*and 0.917 for Δb*, respectively. LS-SVM model was built for the classification of different tea leaves. The correct classification rates (CCRs ranged from 89.29% to 100% in the calibration set and from 71.43% to 100% in the prediction set, respectively. The total classification results were 96.43% in the calibration set and 85.71% in the prediction set. The result showed that hyperspectral imaging technique could be used as an objective and nondestructive method to determine color features and classify tea leaves at different drying periods.

  15. New models and online calculator for predicting non-sentinel lymph node status in sentinel lymph node positive breast cancer patients

    Directory of Open Access Journals (Sweden)

    Johnson Denise L

    2008-03-01

    Full Text Available Abstract Background Current practice is to perform a completion axillary lymph node dissection (ALND for breast cancer patients with tumor-involved sentinel lymph nodes (SLNs, although fewer than half will have non-sentinel node (NSLN metastasis. Our goal was to develop new models to quantify the risk of NSLN metastasis in SLN-positive patients and to compare predictive capabilities to another widely used model. Methods We constructed three models to predict NSLN status: recursive partitioning with receiver operating characteristic curves (RP-ROC, boosted Classification and Regression Trees (CART, and multivariate logistic regression (MLR informed by CART. Data were compiled from a multicenter Northern California and Oregon database of 784 patients who prospectively underwent SLN biopsy and completion ALND. We compared the predictive abilities of our best model and the Memorial Sloan-Kettering Breast Cancer Nomogram (Nomogram in our dataset and an independent dataset from Northwestern University. Results 285 patients had positive SLNs, of which 213 had known angiolymphatic invasion status and 171 had complete pathologic data including hormone receptor status. 264 (93% patients had limited SLN disease (micrometastasis, 70%, or isolated tumor cells, 23%. 101 (35% of all SLN-positive patients had tumor-involved NSLNs. Three variables (tumor size, angiolymphatic invasion, and SLN metastasis size predicted risk in all our models. RP-ROC and boosted CART stratified patients into four risk levels. MLR informed by CART was most accurate. Using two composite predictors calculated from three variables, MLR informed by CART was more accurate than the Nomogram computed using eight predictors. In our dataset, area under ROC curve (AUC was 0.83/0.85 for MLR (n = 213/n = 171 and 0.77 for Nomogram (n = 171. When applied to an independent dataset (n = 77, AUC was 0.74 for our model and 0.62 for Nomogram. The composite predictors in our model were the product of

  16. Partial tooth gear bearings

    Science.gov (United States)

    Vranish, John M. (Inventor)

    2010-01-01

    A partial gear bearing including an upper half, comprising peak partial teeth, and a lower, or bottom, half, comprising valley partial teeth. The upper half also has an integrated roller section between each of the peak partial teeth with a radius equal to the gear pitch radius of the radially outwardly extending peak partial teeth. Conversely, the lower half has an integrated roller section between each of the valley half teeth with a radius also equal to the gear pitch radius of the peak partial teeth. The valley partial teeth extend radially inwardly from its roller section. The peak and valley partial teeth are exactly out of phase with each other, as are the roller sections of the upper and lower halves. Essentially, the end roller bearing of the typical gear bearing has been integrated into the normal gear tooth pattern.

  17. Understanding poisson regression.

    Science.gov (United States)

    Hayat, Matthew J; Higgins, Melinda

    2014-04-01

    Nurse investigators often collect study data in the form of counts. Traditional methods of data analysis have historically approached analysis of count data either as if the count data were continuous and normally distributed or with dichotomization of the counts into the categories of occurred or did not occur. These outdated methods for analyzing count data have been replaced with more appropriate statistical methods that make use of the Poisson probability distribution, which is useful for analyzing count data. The purpose of this article is to provide an overview of the Poisson distribution and its use in Poisson regression. Assumption violations for the standard Poisson regression model are addressed with alternative approaches, including addition of an overdispersion parameter or negative binomial regression. An illustrative example is presented with an application from the ENSPIRE study, and regression modeling of comorbidity data is included for illustrative purposes. Copyright 2014, SLACK Incorporated.

  18. Toxicity of ionic liquids: Database and prediction via quantitative structure–activity relationship method

    International Nuclear Information System (INIS)

    Zhao, Yongsheng; Zhao, Jihong; Huang, Ying; Zhou, Qing; Zhang, Xiangping; Zhang, Suojiang

    2014-01-01

    Highlights: • A comprehensive database on toxicity of ionic liquids (ILs) was established. • Relationship between structure and toxicity of IL has been analyzed qualitatively. • Two new QSAR models were developed for predicting toxicity of ILs to IPC-81. • Accuracy of proposed nonlinear SVM model is much higher than the linear MLR model. • The established models can be explored in designing novel green agents. - Abstract: A comprehensive database on toxicity of ionic liquids (ILs) is established. The database includes over 4000 pieces of data. Based on the database, the relationship between IL's structure and its toxicity has been analyzed qualitatively. Furthermore, Quantitative Structure–Activity relationships (QSAR) model is conducted to predict the toxicities (EC 50 values) of various ILs toward the Leukemia rat cell line IPC-81. Four parameters selected by the heuristic method (HM) are used to perform the studies of multiple linear regression (MLR) and support vector machine (SVM). The squared correlation coefficient (R 2 ) and the root mean square error (RMSE) of training sets by two QSAR models are 0.918 and 0.959, 0.258 and 0.179, respectively. The prediction R 2 and RMSE of QSAR test sets by MLR model are 0.892 and 0.329, by SVM model are 0.958 and 0.234, respectively. The nonlinear model developed by SVM algorithm is much outperformed MLR, which indicates that SVM model is more reliable in the prediction of toxicity of ILs. This study shows that increasing the relative number of O atoms of molecules leads to decrease in the toxicity of ILs

  19. Alternative Methods of Regression

    CERN Document Server

    Birkes, David

    2011-01-01

    Of related interest. Nonlinear Regression Analysis and its Applications Douglas M. Bates and Donald G. Watts ".an extraordinary presentation of concepts and methods concerning the use and analysis of nonlinear regression models.highly recommend[ed].for anyone needing to use and/or understand issues concerning the analysis of nonlinear regression models." --Technometrics This book provides a balance between theory and practice supported by extensive displays of instructive geometrical constructs. Numerous in-depth case studies illustrate the use of nonlinear regression analysis--with all data s

  20. Direct-on-Filter α-Quartz Estimation in Respirable Coal Mine Dust Using Transmission Fourier Transform Infrared Spectrometry and Partial Least Squares Regression.

    Science.gov (United States)

    Miller, Arthur L; Weakley, Andrew Todd; Griffiths, Peter R; Cauda, Emanuele G; Bayman, Sean

    2017-05-01

    In order to help reduce silicosis in miners, the National Institute for Occupational Health and Safety (NIOSH) is developing field-portable methods for measuring airborne respirable crystalline silica (RCS), specifically the polymorph α-quartz, in mine dusts. In this study we demonstrate the feasibility of end-of-shift measurement of α-quartz using a direct-on-filter (DoF) method to analyze coal mine dust samples deposited onto polyvinyl chloride filters. The DoF method is potentially amenable for on-site analyses, but deviates from the current regulatory determination of RCS for coal mines by eliminating two sample preparation steps: ashing the sampling filter and redepositing the ash prior to quantification by Fourier transform infrared (FT-IR) spectrometry. In this study, the FT-IR spectra of 66 coal dust samples from active mines were used, and the RCS was quantified by using: (1) an ordinary least squares (OLS) calibration approach that utilizes standard silica material as done in the Mine Safety and Health Administration's P7 method; and (2) a partial least squares (PLS) regression approach. Both were capable of accounting for kaolinite, which can confound the IR analysis of silica. The OLS method utilized analytical standards for silica calibration and kaolin correction, resulting in a good linear correlation with P7 results and minimal bias but with the accuracy limited by the presence of kaolinite. The PLS approach also produced predictions well-correlated to the P7 method, as well as better accuracy in RCS prediction, and no bias due to variable kaolinite mass. Besides decreased sensitivity to mineral or substrate confounders, PLS has the advantage that the analyst is not required to correct for the presence of kaolinite or background interferences related to the substrate, making the method potentially viable for automated RCS prediction in the field. This study demonstrated the efficacy of FT-IR transmission spectrometry for silica determination in

  1. Polyphenolic, polysaccharide and oligosaccharide composition of Tempranillo red wines and their relationship with the perceived astringency.

    Science.gov (United States)

    Quijada-Morín, Natalia; Williams, Pascale; Rivas-Gonzalo, Julián C; Doco, Thierry; Escribano-Bailón, M Teresa

    2014-07-01

    The influence of the proanthocyanidic, polysaccharide and oligosaccharide composition on astringency perception of Tempranillo wines has been evaluated. Statistical analyses revealed the existence of relationships between chemical composition and perceived astringency. Proanthocyanidic subunit distribution had the strongest contribution to the multiple linear regression (MLR) model. Polysaccharide families showed clear opposition to astringency perception according to principal component analysis (PCA) results, being stronger for mannoproteins and rhamnogalacturonan-II (RG-II), but only Polysaccharides Rich in Arabinose and Galactose (PRAGs) were considered in the final fitted MLR model, which explained 96.8% of the variability observed in the data. Oligosaccharides did not show a clear opposition, revealing that structure and size of carbohydrates are important for astringency perception. Mannose and galactose residues in the oligosaccharide fraction are positively related to astringency perception, probably because its presence is consequence of the degradation of polysaccharides. Copyright © 2014 Elsevier Ltd. All rights reserved.

  2. Sources of mutagenic activity in urban fine particles

    International Nuclear Information System (INIS)

    Stevens, R.K.; Lewis, C.W.; Dzubay, T.G.; Cupitt, L.T.; Lewtas, J.

    1990-01-01

    Samples were collected during the winter of 1984-1985 in the cities of Albuquerque, NM and Raleigh NC as part of a US Environmental Protection Agency study to evaluate methods to determine the emission sources contributing to the mutagenic properties of extractable organic matter (EOM) present in fine particles. Data derived from the analysis of the composition of these fine particles served as input to a multi-linear regression (MLR) model used to calculate the relative contribution of wood burning and motor vehicle sources to mutagenic activity observed in the extractable organic matter. At both sites the mutagenic potency of EOM was found to be greater (3-5 times) for mobile sources when compared to wood smoke extractable organics. Carbon-14 measurements which give a direct determination of the amount of EOM that originated from wood burning were in close agreement with the source apportionment results derived from the MLR model

  3. MODEL PERAMALAN KONSUMSI BAHAN BAKAR JENIS PREMIUM DI INDONESIA DENGAN REGRESI LINIER BERGANDA

    Directory of Open Access Journals (Sweden)

    Farizal

    2014-12-01

    Full Text Available Energy consumption forecasting, especially premium, is an integral part of energy management. Premium is a type of energy that receives government subsidy. Unfortunately, premium forecastings being performed have considerable high error resulting difficulties on reaching planned subsidy target and exploding the amount. In this study forecasting was conducted using multilinear regression (MLR method with ten candidate predictor variables. The result shows that only four variables which are inflation, selling price disparity between pertamanx and premium, economic growth rate, and the number of car, dictate premium consumption. Analsys on the MLR model indicates that the model has a considerable low error with the mean absolute percentage error (MAPE of 5.18%. The model has been used to predict 2013 primium consumption with 1.05% of error. The model predicted that 2013 premium consumption was 29.56 million kiloliter, while the reality was 29.26 million kiloliter.

  4. Introduction to regression graphics

    CERN Document Server

    Cook, R Dennis

    2009-01-01

    Covers the use of dynamic and interactive computer graphics in linear regression analysis, focusing on analytical graphics. Features new techniques like plot rotation. The authors have composed their own regression code, using Xlisp-Stat language called R-code, which is a nearly complete system for linear regression analysis and can be utilized as the main computer program in a linear regression course. The accompanying disks, for both Macintosh and Windows computers, contain the R-code and Xlisp-Stat. An Instructor's Manual presenting detailed solutions to all the problems in the book is ava

  5. Identification of a brainstem circuit regulating visual cortical state in parallel with locomotion.

    Science.gov (United States)

    Lee, A Moses; Hoy, Jennifer L; Bonci, Antonello; Wilbrecht, Linda; Stryker, Michael P; Niell, Cristopher M

    2014-07-16

    Sensory processing is dependent upon behavioral state. In mice, locomotion is accompanied by changes in cortical state and enhanced visual responses. Although recent studies have begun to elucidate intrinsic cortical mechanisms underlying this effect, the neural circuits that initially couple locomotion to cortical processing are unknown. The mesencephalic locomotor region (MLR) has been shown to be capable of initiating running and is associated with the ascending reticular activating system. Here, we find that optogenetic stimulation of the MLR in awake, head-fixed mice can induce both locomotion and increases in the gain of cortical responses. MLR stimulation below the threshold for overt movement similarly changed cortical processing, revealing that MLR's effects on cortex are dissociable from locomotion. Likewise, stimulation of MLR projections to the basal forebrain also enhanced cortical responses, suggesting a pathway linking the MLR to cortex. These studies demonstrate that the MLR regulates cortical state in parallel with locomotion. Copyright © 2014 Elsevier Inc. All rights reserved.

  6. [Is there either agenesis or regression of the Mullerian duct in female bird embryos under the influence of male hormone?].

    Science.gov (United States)

    Lutz-Ostertag, Y; Lutz, H

    1976-01-01

    The natural occurence of "Free-Martinism" in Birds and the chorio-allantoïc grafting experiments of testis fragments on female chick host-embryos allow to the authors to define the manner provoking the entire or partial disappearance of the müllerian ducts and to state exactly if the phenomenon is a agenesis or a regression.

  7. Fourier transform infrared spectroscopic imaging and multivariate regression for prediction of proteoglycan content of articular cartilage.

    Directory of Open Access Journals (Sweden)

    Lassi Rieppo

    Full Text Available Fourier Transform Infrared (FT-IR spectroscopic imaging has been earlier applied for the spatial estimation of the collagen and the proteoglycan (PG contents of articular cartilage (AC. However, earlier studies have been limited to the use of univariate analysis techniques. Current analysis methods lack the needed specificity for collagen and PGs. The aim of the present study was to evaluate the suitability of partial least squares regression (PLSR and principal component regression (PCR methods for the analysis of the PG content of AC. Multivariate regression models were compared with earlier used univariate methods and tested with a sample material consisting of healthy and enzymatically degraded steer AC. Chondroitinase ABC enzyme was used to increase the variation in PG content levels as compared to intact AC. Digital densitometric measurements of Safranin O-stained sections provided the reference for PG content. The results showed that multivariate regression models predict PG content of AC significantly better than earlier used absorbance spectrum (i.e. the area of carbohydrate region with or without amide I normalization or second derivative spectrum univariate parameters. Increased molecular specificity favours the use of multivariate regression models, but they require more knowledge of chemometric analysis and extended laboratory resources for gathering reference data for establishing the models. When true molecular specificity is required, the multivariate models should be used.

  8. Integrated Multiscale Latent Variable Regression and Application to Distillation Columns

    Directory of Open Access Journals (Sweden)

    Muddu Madakyaru

    2013-01-01

    Full Text Available Proper control of distillation columns requires estimating some key variables that are challenging to measure online (such as compositions, which are usually estimated using inferential models. Commonly used inferential models include latent variable regression (LVR techniques, such as principal component regression (PCR, partial least squares (PLS, and regularized canonical correlation analysis (RCCA. Unfortunately, measured practical data are usually contaminated with errors, which degrade the prediction abilities of inferential models. Therefore, noisy measurements need to be filtered to enhance the prediction accuracy of these models. Multiscale filtering has been shown to be a powerful feature extraction tool. In this work, the advantages of multiscale filtering are utilized to enhance the prediction accuracy of LVR models by developing an integrated multiscale LVR (IMSLVR modeling algorithm that integrates modeling and feature extraction. The idea behind the IMSLVR modeling algorithm is to filter the process data at different decomposition levels, model the filtered data from each level, and then select the LVR model that optimizes a model selection criterion. The performance of the developed IMSLVR algorithm is illustrated using three examples, one using synthetic data, one using simulated distillation column data, and one using experimental packed bed distillation column data. All examples clearly demonstrate the effectiveness of the IMSLVR algorithm over the conventional methods.

  9. Dietary predictors of childhood obesity in a representative sample of children in north east of Iran.

    Science.gov (United States)

    Baygi, Fereshteh; Qorbani, Mostafa; Dorosty, Ahmad Reza; Kelishadi, Roya; Asayesh, Hamid; Rezapour, Aziz; Mohammadi, Younes; Mohammadi, Fatemeh

    2013-07-01

    The prevalence of obesity is increasing in Iranian youngsters. This study aimed to assess some dietary determinants of obesity in a representative sample of children in Neishabour, a city in northeastern, Iran. This case-control study was conducted among 114 school students, aged 6-12 years, with a body mass index (BMI) ≥95th (based on percentile of Iranian children) as the case group and 102 age- and gender-matched controls, who were selected from their non-obese classmates. Nutrient intake data were collected by trained nutritionists by using two 24-hour-dietary recalls through maternal interviews in the presence of their child. A food frequency questionnaire was used for detecting the snack consumption patterns. Statistical analysis was done using univariate and multivariate logistic regression (MLR) by SPSS version 16. In univariate logistic regression, total energy, protein, carbohydrate, fat (including saturated, mono- and poly-unsaturated fat), and dietary fiber were the positive predictors of obesity in studied children. The estimated crude ORs for frequency of corn-based extruded snacks, carbonated beverages, potato chips, fast foods, and chocolate consumption were statistically significant. After MLR analysis, the association of obesity remained significant with energy intake (OR = 2.489, 95%CI: 1.667-3.716), frequency of corn-based extruded snacks (OR = 1.122, 95%CI: 1.007-1.250), and potato chips (OR = 1.143, 95%CI:1.024-1.276). The MLR analysis showed that dietary fiber (OR = 0.601, 95%CI: 0.368-0.983) and natural fruit juice intake (OR = 0.909, 95%CI: 0.835-0.988) were protective factors against obesity. The findings serve to confirm the role of an unhealthy diet, notably calorie-dense snacks, in childhood obesity. Healthy dietary habits, such as the consumption of high-fiber foods, should be encouraged among children.

  10. Prediction of unwanted pregnancies using logistic regression, probit regression and discriminant analysis.

    Science.gov (United States)

    Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon

    2015-01-01

    Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended.

  11. Experts' understanding of partial derivatives using the Partial Derivative Machine

    OpenAIRE

    Roundy, David; Dorko, Allison; Dray, Tevian; Manogue, Corinne A.; Weber, Eric

    2014-01-01

    Partial derivatives are used in a variety of different ways within physics. Most notably, thermodynamics uses partial derivatives in ways that students often find confusing. As part of a collaboration with mathematics faculty, we are at the beginning of a study of the teaching of partial derivatives, a goal of better aligning the teaching of multivariable calculus with the needs of students in STEM disciplines. As a part of this project, we have performed a pilot study of expert understanding...

  12. Efficient Semiparametric Marginal Estimation for the Partially Linear Additive Model for Longitudinal/Clustered Data

    KAUST Repository

    Carroll, Raymond; Maity, Arnab; Mammen, Enno; Yu, Kyusang

    2009-01-01

    We consider the efficient estimation of a regression parameter in a partially linear additive nonparametric regression model from repeated measures data when the covariates are multivariate. To date, while there is some literature in the scalar covariate case, the problem has not been addressed in the multivariate additive model case. Ours represents a first contribution in this direction. As part of this work, we first describe the behavior of nonparametric estimators for additive models with repeated measures when the underlying model is not additive. These results are critical when one considers variants of the basic additive model. We apply them to the partially linear additive repeated-measures model, deriving an explicit consistent estimator of the parametric component; if the errors are in addition Gaussian, the estimator is semiparametric efficient. We also apply our basic methods to a unique testing problem that arises in genetic epidemiology; in combination with a projection argument we develop an efficient and easily computed testing scheme. Simulations and an empirical example from nutritional epidemiology illustrate our methods.

  13. Efficient Semiparametric Marginal Estimation for the Partially Linear Additive Model for Longitudinal/Clustered Data

    KAUST Repository

    Carroll, Raymond

    2009-04-23

    We consider the efficient estimation of a regression parameter in a partially linear additive nonparametric regression model from repeated measures data when the covariates are multivariate. To date, while there is some literature in the scalar covariate case, the problem has not been addressed in the multivariate additive model case. Ours represents a first contribution in this direction. As part of this work, we first describe the behavior of nonparametric estimators for additive models with repeated measures when the underlying model is not additive. These results are critical when one considers variants of the basic additive model. We apply them to the partially linear additive repeated-measures model, deriving an explicit consistent estimator of the parametric component; if the errors are in addition Gaussian, the estimator is semiparametric efficient. We also apply our basic methods to a unique testing problem that arises in genetic epidemiology; in combination with a projection argument we develop an efficient and easily computed testing scheme. Simulations and an empirical example from nutritional epidemiology illustrate our methods.

  14. Selection of a suitable model for the prediction of soil water content in north of Iran

    Energy Technology Data Exchange (ETDEWEB)

    Esmaeelnejad, L.; Ramezanpour, H.; Seyedmohammadi, H.; Shabanpou, M.

    2015-07-01

    Multiple Linear Regression (MLR), Artificial Neural Network (ANN) and Rosetta model were employed to develop pedotransfers functions (PTFs) for soil moisture prediction using available soil properties for northern soils of Iran. The Rosetta model is based on ANN works in a hierarchical approach to predict water retention curves. For this purpose, 240 soil samples were selected from the south of Guilan province, Gilevan region, northern Iran. The data set was divided into two subsets for calibration and testing of the models. The general performance of PTFs was evaluated using coefficient of determination (R2), root mean square error (RMSE) and mean biased error between the observed and predicted values. Results showed that ANN with two hidden layers, Tan-sigmoid and linear functions for hidden and output layers respectively, performed better than the others in predicting soil moisture. In the other hand, ANN can model non-linear functions and showed to perform better than MLR. After ANN, MLR had better accuracy than Rosetta. The developed PTFs resulted in more accurate estimation at matric potentials of 100, 300, 500, 1000, 1500 kPa. Whereas, Rosetta model resulted in slightly better estimation than derived PTFs at matric potentials of 33 kPa. This research can provide the scientific basis for the study of soil hydraulic properties and be helpful for the estimation of soil water retention in other places with similar conditions, too.. (Author)

  15. Robust Image Regression Based on the Extended Matrix Variate Power Exponential Distribution of Dependent Noise.

    Science.gov (United States)

    Luo, Lei; Yang, Jian; Qian, Jianjun; Tai, Ying; Lu, Gui-Fu

    2017-09-01

    Dealing with partial occlusion or illumination is one of the most challenging problems in image representation and classification. In this problem, the characterization of the representation error plays a crucial role. In most current approaches, the error matrix needs to be stretched into a vector and each element is assumed to be independently corrupted. This ignores the dependence between the elements of error. In this paper, it is assumed that the error image caused by partial occlusion or illumination changes is a random matrix variate and follows the extended matrix variate power exponential distribution. This has the heavy tailed regions and can be used to describe a matrix pattern of l×m dimensional observations that are not independent. This paper reveals the essence of the proposed distribution: it actually alleviates the correlations between pixels in an error matrix E and makes E approximately Gaussian. On the basis of this distribution, we derive a Schatten p -norm-based matrix regression model with L q regularization. Alternating direction method of multipliers is applied to solve this model. To get a closed-form solution in each step of the algorithm, two singular value function thresholding operators are introduced. In addition, the extended Schatten p -norm is utilized to characterize the distance between the test samples and classes in the design of the classifier. Extensive experimental results for image reconstruction and classification with structural noise demonstrate that the proposed algorithm works much more robustly than some existing regression-based methods.

  16. A Comparative Investigation of the Combined Effects of Pre-Processing, Wavelength Selection, and Regression Methods on Near-Infrared Calibration Model Performance.

    Science.gov (United States)

    Wan, Jian; Chen, Yi-Chieh; Morris, A Julian; Thennadil, Suresh N

    2017-07-01

    Near-infrared (NIR) spectroscopy is being widely used in various fields ranging from pharmaceutics to the food industry for analyzing chemical and physical properties of the substances concerned. Its advantages over other analytical techniques include available physical interpretation of spectral data, nondestructive nature and high speed of measurements, and little or no need for sample preparation. The successful application of NIR spectroscopy relies on three main aspects: pre-processing of spectral data to eliminate nonlinear variations due to temperature, light scattering effects and many others, selection of those wavelengths that contribute useful information, and identification of suitable calibration models using linear/nonlinear regression . Several methods have been developed for each of these three aspects and many comparative studies of different methods exist for an individual aspect or some combinations. However, there is still a lack of comparative studies for the interactions among these three aspects, which can shed light on what role each aspect plays in the calibration and how to combine various methods of each aspect together to obtain the best calibration model. This paper aims to provide such a comparative study based on four benchmark data sets using three typical pre-processing methods, namely, orthogonal signal correction (OSC), extended multiplicative signal correction (EMSC) and optical path-length estimation and correction (OPLEC); two existing wavelength selection methods, namely, stepwise forward selection (SFS) and genetic algorithm optimization combined with partial least squares regression for spectral data (GAPLSSP); four popular regression methods, namely, partial least squares (PLS), least absolute shrinkage and selection operator (LASSO), least squares support vector machine (LS-SVM), and Gaussian process regression (GPR). The comparative study indicates that, in general, pre-processing of spectral data can play a significant

  17. Response of dissolved trace metals to land use/land cover and their source apportionment using a receptor model in a subtropic river, China.

    Science.gov (United States)

    Li, Siyue; Zhang, Quanfa

    2011-06-15

    Water samples were collected for determination of dissolved trace metals in 56 sampling sites throughout the upper Han River, China. Multivariate statistical analyses including correlation analysis, stepwise multiple linear regression models, and principal component and factor analysis (PCA/FA) were employed to examine the land use influences on trace metals, and a receptor model of factor analysis-multiple linear regression (FA-MLR) was used for source identification/apportionment of anthropogenic heavy metals in the surface water of the River. Our results revealed that land use was an important factor in water metals in the snow melt flow period and land use in the riparian zone was not a better predictor of metals than land use away from the river. Urbanization in a watershed and vegetation along river networks could better explain metals, and agriculture, regardless of its relative location, however slightly explained metal variables in the upper Han River. FA-MLR analysis identified five source types of metals, and mining, fossil fuel combustion, and vehicle exhaust were the dominant pollutions in the surface waters. The results demonstrated great impacts of human activities on metal concentrations in the subtropical river of China. Copyright © 2011 Elsevier B.V. All rights reserved.

  18. A Review on Predicting Ground PM2.5 Concentration Using Satellite Aerosol Optical Depth

    Directory of Open Access Journals (Sweden)

    Yuanyuan Chu

    2016-10-01

    Full Text Available This study reviewed the prediction of fine particulate matter (PM2.5 from satellite aerosol optical depth (AOD and summarized the advantages and limitations of these predicting models. A total of 116 articles were included from 1436 records retrieved. The number of such studies has been increasing since 2003. Among these studies, four predicting models were widely used: Multiple Linear Regression (MLR (25 articles, Mixed-Effect Model (MEM (23 articles, Chemical Transport Model (CTM (16 articles and Geographically Weighted Regression (GWR (10 articles. We found that there is no so-called best model among them and each has both advantages and limitations. Regarding the prediction accuracy, MEM performs the best, while MLR performs worst. CTM predicts PM2.5 better on a global scale, while GWR tends to perform well on a regional level. Moreover, prediction performance can be significantly improved by combining meteorological variables with land use factors of each region, instead of only considering meteorological variables. In addition, MEM has advantages in dealing with the AOD data with missing values. We recommend that with the help of higher resolution AOD data, future works could be focused on developing satellite-based predicting models for the prediction of historical PM2.5 and other air pollutants.

  19. Quantifying Parkinson's disease finger-tapping severity by extracting and synthesizing finger motion properties.

    Science.gov (United States)

    Sano, Yuko; Kandori, Akihiko; Shima, Keisuke; Yamaguchi, Yuki; Tsuji, Toshio; Noda, Masafumi; Higashikawa, Fumiko; Yokoe, Masaru; Sakoda, Saburo

    2016-06-01

    We propose a novel index of Parkinson's disease (PD) finger-tapping severity, called "PDFTsi," for quantifying the severity of symptoms related to the finger tapping of PD patients with high accuracy. To validate the efficacy of PDFTsi, the finger-tapping movements of normal controls and PD patients were measured by using magnetic sensors, and 21 characteristics were extracted from the finger-tapping waveforms. To distinguish motor deterioration due to PD from that due to aging, the aging effect on finger tapping was removed from these characteristics. Principal component analysis (PCA) was applied to the age-normalized characteristics, and principal components that represented the motion properties of finger tapping were calculated. Multiple linear regression (MLR) with stepwise variable selection was applied to the principal components, and PDFTsi was calculated. The calculated PDFTsi indicates that PDFTsi has a high estimation ability, namely a mean square error of 0.45. The estimation ability of PDFTsi is higher than that of the alternative method, MLR with stepwise regression selection without PCA, namely a mean square error of 1.30. This result suggests that PDFTsi can quantify PD finger-tapping severity accurately. Furthermore, the result of interpreting a model for calculating PDFTsi indicated that motion wideness and rhythm disorder are important for estimating PD finger-tapping severity.

  20. Linear regression in astronomy. I

    Science.gov (United States)

    Isobe, Takashi; Feigelson, Eric D.; Akritas, Michael G.; Babu, Gutti Jogesh

    1990-01-01

    Five methods for obtaining linear regression fits to bivariate data with unknown or insignificant measurement errors are discussed: ordinary least-squares (OLS) regression of Y on X, OLS regression of X on Y, the bisector of the two OLS lines, orthogonal regression, and 'reduced major-axis' regression. These methods have been used by various researchers in observational astronomy, most importantly in cosmic distance scale applications. Formulas for calculating the slope and intercept coefficients and their uncertainties are given for all the methods, including a new general form of the OLS variance estimates. The accuracy of the formulas was confirmed using numerical simulations. The applicability of the procedures is discussed with respect to their mathematical properties, the nature of the astronomical data under consideration, and the scientific purpose of the regression. It is found that, for problems needing symmetrical treatment of the variables, the OLS bisector performs significantly better than orthogonal or reduced major-axis regression.

  1. Logic regression and its extensions.

    Science.gov (United States)

    Schwender, Holger; Ruczinski, Ingo

    2010-01-01

    Logic regression is an adaptive classification and regression procedure, initially developed to reveal interacting single nucleotide polymorphisms (SNPs) in genetic association studies. In general, this approach can be used in any setting with binary predictors, when the interaction of these covariates is of primary interest. Logic regression searches for Boolean (logic) combinations of binary variables that best explain the variability in the outcome variable, and thus, reveals variables and interactions that are associated with the response and/or have predictive capabilities. The logic expressions are embedded in a generalized linear regression framework, and thus, logic regression can handle a variety of outcome types, such as binary responses in case-control studies, numeric responses, and time-to-event data. In this chapter, we provide an introduction to the logic regression methodology, list some applications in public health and medicine, and summarize some of the direct extensions and modifications of logic regression that have been proposed in the literature. Copyright © 2010 Elsevier Inc. All rights reserved.

  2. An ensemble Kalman filter for statistical estimation of physics constrained nonlinear regression models

    International Nuclear Information System (INIS)

    Harlim, John; Mahdi, Adam; Majda, Andrew J.

    2014-01-01

    A central issue in contemporary science is the development of nonlinear data driven statistical–dynamical models for time series of noisy partial observations from nature or a complex model. It has been established recently that ad-hoc quadratic multi-level regression models can have finite-time blow-up of statistical solutions and/or pathological behavior of their invariant measure. Recently, a new class of physics constrained nonlinear regression models were developed to ameliorate this pathological behavior. Here a new finite ensemble Kalman filtering algorithm is developed for estimating the state, the linear and nonlinear model coefficients, the model and the observation noise covariances from available partial noisy observations of the state. Several stringent tests and applications of the method are developed here. In the most complex application, the perfect model has 57 degrees of freedom involving a zonal (east–west) jet, two topographic Rossby waves, and 54 nonlinearly interacting Rossby waves; the perfect model has significant non-Gaussian statistics in the zonal jet with blocked and unblocked regimes and a non-Gaussian skewed distribution due to interaction with the other 56 modes. We only observe the zonal jet contaminated by noise and apply the ensemble filter algorithm for estimation. Numerically, we find that a three dimensional nonlinear stochastic model with one level of memory mimics the statistical effect of the other 56 modes on the zonal jet in an accurate fashion, including the skew non-Gaussian distribution and autocorrelation decay. On the other hand, a similar stochastic model with zero memory levels fails to capture the crucial non-Gaussian behavior of the zonal jet from the perfect 57-mode model

  3. Tumor regression patterns in retinoblastoma

    International Nuclear Information System (INIS)

    Zafar, S.N.; Siddique, S.N.; Zaheer, N.

    2016-01-01

    To observe the types of tumor regression after treatment, and identify the common pattern of regression in our patients. Study Design: Descriptive study. Place and Duration of Study: Department of Pediatric Ophthalmology and Strabismus, Al-Shifa Trust Eye Hospital, Rawalpindi, Pakistan, from October 2011 to October 2014. Methodology: Children with unilateral and bilateral retinoblastoma were included in the study. Patients were referred to Pakistan Institute of Medical Sciences, Islamabad, for chemotherapy. After every cycle of chemotherapy, dilated funds examination under anesthesia was performed to record response of the treatment. Regression patterns were recorded on RetCam II. Results: Seventy-four tumors were included in the study. Out of 74 tumors, 3 were ICRB group A tumors, 43 were ICRB group B tumors, 14 tumors belonged to ICRB group C, and remaining 14 were ICRB group D tumors. Type IV regression was seen in 39.1% (n=29) tumors, type II in 29.7% (n=22), type III in 25.6% (n=19), and type I in 5.4% (n=4). All group A tumors (100%) showed type IV regression. Seventeen (39.5%) group B tumors showed type IV regression. In group C, 5 tumors (35.7%) showed type II regression and 5 tumors (35.7%) showed type IV regression. In group D, 6 tumors (42.9%) regressed to type II non-calcified remnants. Conclusion: The response and success of the focal and systemic treatment, as judged by the appearance of different patterns of tumor regression, varies with the ICRB grouping of the tumor. (author)

  4. Regression models for the restricted residual mean life for right-censored and left-truncated data

    DEFF Research Database (Denmark)

    Cortese, Giuliana; Holmboe, Stine A.; Scheike, Thomas H.

    2017-01-01

    The hazard ratios resulting from a Cox's regression hazards model are hard to interpret and to be converted into prolonged survival time. As the main goal is often to study survival functions, there is increasing interest in summary measures based on the survival function that are easier to inter......The hazard ratios resulting from a Cox's regression hazards model are hard to interpret and to be converted into prolonged survival time. As the main goal is often to study survival functions, there is increasing interest in summary measures based on the survival function that are easier...... to interpret than the hazard ratio; the residual mean time is an important example of those measures. However, because of the presence of right censoring, the tail of the survival distribution is often difficult to estimate correctly. Therefore, we consider the restricted residual mean time, which represents...... a partial area under the survival function, given any time horizon τ, and is interpreted as the residual life expectancy up to τ of a subject surviving up to time t. We present a class of regression models for this measure, based on weighted estimating equations and inverse probability of censoring weighted...

  5. Combining Alphas via Bounded Regression

    Directory of Open Access Journals (Sweden)

    Zura Kakushadze

    2015-11-01

    Full Text Available We give an explicit algorithm and source code for combining alpha streams via bounded regression. In practical applications, typically, there is insufficient history to compute a sample covariance matrix (SCM for a large number of alphas. To compute alpha allocation weights, one then resorts to (weighted regression over SCM principal components. Regression often produces alpha weights with insufficient diversification and/or skewed distribution against, e.g., turnover. This can be rectified by imposing bounds on alpha weights within the regression procedure. Bounded regression can also be applied to stock and other asset portfolio construction. We discuss illustrative examples.

  6. Antiplasmodial Activity, Cytotoxicity and Structure-Activity Relationship Study of Cyclopeptide Alkaloids

    Directory of Open Access Journals (Sweden)

    Emmy Tuenter

    2017-02-01

    Full Text Available Cyclopeptide alkaloids are polyamidic, macrocyclic compounds, containing a 13-, 14-, or 15-membered ring. The ring system consists of a hydroxystyrylamine moiety, an amino acid, and a β-hydroxy amino acid; attached to the ring is a side chain, comprised of one or two more amino acid moieties. In vitro antiplasmodial activity was shown before for several compounds belonging to this class, and in this paper the antiplasmodial and cytotoxic activities of ten more cyclopeptide alkaloids are reported. Combining these results and the IC50 values that were reported by our group previously, a library consisting of 19 cyclopeptide alkaloids was created. A qualitative SAR (structure-activity relationship study indicated that a 13-membered macrocyclic ring is preferable over a 14-membered one. Furthermore, the presence of a β-hydroxy proline moiety could correlate with higher antiplasmodial activity, and methoxylation (or, to a lesser extent, hydroxylation of the styrylamine moiety could be important for displaying antiplasmodial activity. In addition, QSAR (quantitative structure-activity relationship models were developed, using PLS (partial least squares regression and MLR (multiple linear regression. On the one hand, these models allow for the indication of the most important descriptors (molecular properties responsible for the antiplasmodial activity. Additionally, predictions made for interesting structures did not contradict the expectations raised in the qualitative SAR study.

  7. Quality evaluation of regional forage resources by means of near infrared reflectance spectroscopy

    Directory of Open Access Journals (Sweden)

    Bruno Ronchi

    2010-01-01

    Full Text Available Quality parameters of grassland and pasture samples collected during a three-year period at two environmentally andgeographically different areas were analysed by Near Infrared Reflectance Spectroscopy (NIRS. Chemical analysis forcrude protein (CP, crude fibre (CF, neutral detergent fibre (NDF, acid detergent fibre (ADF, acid detergent lignin (ADLand crude ash (ASH carried out on two-thirds of the samples were used in calibration processes. The remaining onethirdof the data was used to validate the best calibrations obtained. Samples selection is discussed. Different math pretreatments(derivative, gap, primary smoothing and secondary smoothing, light scattering correction methods and calibrationalgorithms were tested to achieve the better predictive performances. We obtained the best results using differentregression algorithms to correlate spectral information to chemical data. For CP (R2 = 0.94, SEP=1.3, NDF (R2 =0.95, SEP = 2.14 and ADF (R2 = 0.92, SEP=2.06 Multiple Linear Regression (MLR models fit chemical data better thanMean Partial Least Square (MPLS regression. A molecular basis explanation of wavelengths selected was carried out.MPLS models worked well for CF (R2 = 0.93, SEP=1.57, and ASH (R2 = 0.95, SEP=1.17 while poor calibrations wereobtained for ADL using both algorithms. To confirm the reliability of the models developed, uncertainties of predictionswere compared with findings on nutritional variations and animal performances.

  8. riskRegression

    DEFF Research Database (Denmark)

    Ozenne, Brice; Sørensen, Anne Lyngholm; Scheike, Thomas

    2017-01-01

    In the presence of competing risks a prediction of the time-dynamic absolute risk of an event can be based on cause-specific Cox regression models for the event and the competing risks (Benichou and Gail, 1990). We present computationally fast and memory optimized C++ functions with an R interface...... for predicting the covariate specific absolute risks, their confidence intervals, and their confidence bands based on right censored time to event data. We provide explicit formulas for our implementation of the estimator of the (stratified) baseline hazard function in the presence of tied event times. As a by...... functionals. The software presented here is implemented in the riskRegression package....

  9. Towards Robust and Accurate Multi-View and Partially-Occluded Face Alignment.

    Science.gov (United States)

    Xing, Junliang; Niu, Zhiheng; Huang, Junshi; Hu, Weiming; Zhou, Xi; Yan, Shuicheng

    2018-04-01

    Face alignment acts as an important task in computer vision. Regression-based methods currently dominate the approach to solving this problem, which generally employ a series of mapping functions from the face appearance to iteratively update the face shape hypothesis. One keypoint here is thus how to perform the regression procedure. In this work, we formulate this regression procedure as a sparse coding problem. We learn two relational dictionaries, one for the face appearance and the other one for the face shape, with coupled reconstruction coefficient to capture their underlying relationships. To deploy this model for face alignment, we derive the relational dictionaries in a stage-wised manner to perform close-loop refinement of themselves, i.e., the face appearance dictionary is first learned from the face shape dictionary and then used to update the face shape hypothesis, and the updated face shape dictionary from the shape hypothesis is in return used to refine the face appearance dictionary. To improve the model accuracy, we extend this model hierarchically from the whole face shape to face part shapes, thus both the global and local view variations of a face are captured. To locate facial landmarks under occlusions, we further introduce an occlusion dictionary into the face appearance dictionary to recover face shape from partially occluded face appearance. The occlusion dictionary is learned in a data driven manner from background images to represent a set of elemental occlusion patterns, a sparse combination of which models various practical partial face occlusions. By integrating all these technical innovations, we obtain a robust and accurate approach to locate facial landmarks under different face views and possibly severe occlusions for face images in the wild. Extensive experimental analyses and evaluations on different benchmark datasets, as well as two new datasets built by ourselves, have demonstrated the robustness and accuracy of our proposed

  10. Correlation, Regression and Path Analyses of Seed Yield Components in Crambe abyssinica, a Promising Industrial Oil Crop

    OpenAIRE

    Huang, Banglian; Yang, Yiming; Luo, Tingting; Wu, S.; Du, Xuezhu; Cai, Detian; Loo, van, E.N.; Huang Bangquan

    2013-01-01

    In the present study correlation, regression and path analyses were carried out to decide correlations among the agro- nomic traits and their contributions to seed yield per plant in Crambe abyssinica. Partial correlation analysis indicated that plant height (X1) was significantly correlated with branching height and the number of first branches (P <0.01); Branching height (X2) was significantly correlated with pod number of primary inflorescence (P <0.01) and number of secondary branch...

  11. Conventional, Partially Converted and Environmentally Friendly Farming in South Korea: Profitability and Factors Affecting Farmers’ Choice

    Directory of Open Access Journals (Sweden)

    Saem Lee

    2016-07-01

    Full Text Available While organic farming is well established in Europe a nd USA, it is still catching up in Asian countries. The government of South Korea has implemented environmentally friendly farming that encompasses organic farming. Despite the promotion of environmentally friendly farming, it still has a low share in South Korea and partially converted farming has emerged in some districts of South Korea. However, the partially converted farming has not yet been investigated by the government. Thus, our study implemented a financial analysis to compare the annual costs and net returns of conventional, partially converted and environmentally friendly farming in Gangwon Province. The result showed that environmentally friendly farming was more profitable with respect to farm net returns. To find out the factors affecting the adoption of environmentally friendly farming, multinomial logistic regression was implemented. The findings revealed that education and subsidy positively and significantly influenced the probability of farmers’ choice on partially converted and environmentally friendly farming. Farm size had a negative and significant relationship with only environmentally friendly farming. This study will contribute to future policy establishment for sustainable agriculture as recommended by improving the quality of fertilizers, suggesting the additional investigation associated with partially converted farmers.

  12. Comparative 1H NMR-based metabonomic analysis of HIV-1 sera

    International Nuclear Information System (INIS)

    Philippeos, C.; Steffens, F. E.; Meyer, D.

    2009-01-01

    1 H NMR spectroscopy of sera from HIV-1 infected and uninfected individuals was performed on 300 and 600 MHz instruments. The resultant spectra were automatically data reduced to 90 and 180 integral segments of equal length. Analysis of variance identified significant differences between the sample groups, especially for the samples analyzed on 600 MHz and reduced to fewer segments. Linear discriminant analysis correctly classified 100% of the samples analyzed on the 300 MHz NMR (reduced to 180 segments); an increase in instrument sensitivity resulted in lower percentages of correctly classified samples. Multinomial logistic regression (MLR) resulted in 100% correct classification of all samples from both instruments. Thus 1 H-NMR metabonomics on either instrument distinguishes HIV-positive individuals using or not using anti retroviral therapy, but the sensitivity of the instrument impacts on data reduction. Furthermore, MLR is a novel multivariate statistical technique for improved classification of biological data analyzed in NMR

  13. Comparative {sup 1}H NMR-based metabonomic analysis of HIV-1 sera

    Energy Technology Data Exchange (ETDEWEB)

    Philippeos, C. [University of Johannesburg, Department of Biochemistry (South Africa); Steffens, F. E. [University of Pretoria, Department of Statistics (South Africa); Meyer, D. [University of Pretoria, Department of Biochemistry (South Africa)], E-mail: debra.meyer@up.ac.za

    2009-07-15

    {sup 1}H NMR spectroscopy of sera from HIV-1 infected and uninfected individuals was performed on 300 and 600 MHz instruments. The resultant spectra were automatically data reduced to 90 and 180 integral segments of equal length. Analysis of variance identified significant differences between the sample groups, especially for the samples analyzed on 600 MHz and reduced to fewer segments. Linear discriminant analysis correctly classified 100% of the samples analyzed on the 300 MHz NMR (reduced to 180 segments); an increase in instrument sensitivity resulted in lower percentages of correctly classified samples. Multinomial logistic regression (MLR) resulted in 100% correct classification of all samples from both instruments. Thus {sup 1}H-NMR metabonomics on either instrument distinguishes HIV-positive individuals using or not using anti retroviral therapy, but the sensitivity of the instrument impacts on data reduction. Furthermore, MLR is a novel multivariate statistical technique for improved classification of biological data analyzed in NMR.

  14. Gender and distance influence performance predictors in young swimmers

    Directory of Open Access Journals (Sweden)

    Paulo Victor Mezzaroba

    2013-12-01

    Full Text Available Predictors of performance in adult swimmers are constantly changing during youth especially because the training routine begins even before puberty in the modality. Therefore this study aimed to determine the group of parameters that best predict short and middle swimming distance performances of young swimmers of both genders. Thirty-three 10-to 16-years-old male and female competitive swimmers participated in the study. Multiple linear regression (MLR was used considering mean speed of maximum 100, 200 and 400 m efforts as dependent variables, and five parameters groups as possible predictors (anthropometry, body composition, physiological and biomechanical parameters, chronological age/pubic hair. The main results revealed explanatory powers of almost 100% for both genders and all performances, but with different predictors entered in MLR models of each parameter group or all variables. Thus, there are considerable differences in short and middle swimming distance, and males and females predictors that should be considered in training programs.

  15. Computing Air Demand Using the Takagi–Sugeno Model for Dam Outlets

    Directory of Open Access Journals (Sweden)

    Mohammad Zounemat-Kermani

    2013-09-01

    Full Text Available An adaptive neuro-fuzzy inference system (ANFIS was developed using the subtractive clustering technique to study the air demand in low-level outlet works. The ANFIS model was employed to calculate vent air discharge in different gate openings for an embankment dam. A hybrid learning algorithm obtained from combining back-propagation and least square estimate was adopted to identify linear and non-linear parameters in the ANFIS model. Empirical relationships based on the experimental information obtained from physical models were applied to 108 experimental data points to obtain more reliable evaluations. The feed-forward Levenberg-Marquardt neural network (LMNN and multiple linear regression (MLR models were also built using the same data to compare model performances with each other. The results indicated that the fuzzy rule-based model performed better than the LMNN and MLR models, in terms of the simulation performance criteria established, as the root mean square error, the Nash–Sutcliffe efficiency, the correlation coefficient and the Bias.

  16. Modeling rainfall-runoff process using soft computing techniques

    Science.gov (United States)

    Kisi, Ozgur; Shiri, Jalal; Tombul, Mustafa

    2013-02-01

    Rainfall-runoff process was modeled for a small catchment in Turkey, using 4 years (1987-1991) of measurements of independent variables of rainfall and runoff values. The models used in the study were Artificial Neural Networks (ANNs), Adaptive Neuro-Fuzzy Inference System (ANFIS) and Gene Expression Programming (GEP) which are Artificial Intelligence (AI) approaches. The applied models were trained and tested using various combinations of the independent variables. The goodness of fit for the model was evaluated in terms of the coefficient of determination (R2), root mean square error (RMSE), mean absolute error (MAE), coefficient of efficiency (CE) and scatter index (SI). A comparison was also made between these models and traditional Multi Linear Regression (MLR) model. The study provides evidence that GEP (with RMSE=17.82 l/s, MAE=6.61 l/s, CE=0.72 and R2=0.978) is capable of modeling rainfall-runoff process and is a viable alternative to other applied artificial intelligence and MLR time-series methods.

  17. Antioxidant-capacity-based models for the prediction of acrylamide reduction by flavonoids.

    Science.gov (United States)

    Cheng, Jun; Chen, Xinyu; Zhao, Sheng; Zhang, Yu

    2015-02-01

    The aim of this study was to investigate the applicability of artificial neural network (ANN) and multiple linear regression (MLR) models for the estimation of acrylamide reduction by flavonoids, using multiple antioxidant capacities of Maillard reaction products as variables via a microwave food processing workstation. The addition of selected flavonoids could effectively reduce acrylamide formation, which may be closely related to the number of phenolic hydroxyl groups of flavonoids (R: 0.735-0.951, Pcapacity (ΔTEAC) measured by DPPH (R(2)=0.833), ABTS (R(2)=0.860) or FRAP (R(2)=0.824) assay. Both ANN and MLR models could effectively serve as predictive tools for estimating the reduction of acrylamide affected by flavonoids. The current predictive model study provides a low-cost and easy-to-use approach to the estimation of rates at which acrylamide is degraded, while avoiding tedious sample pretreatment procedures and advanced instrumental analysis. Copyright © 2014 Elsevier Ltd. All rights reserved.

  18. QSAR studies of some side chain modified 7-chloro-4-aminoquinolines as antimalarial agents

    Directory of Open Access Journals (Sweden)

    Nitendra K. Sahu

    2014-11-01

    Full Text Available The quantitative structure–activity relationship (QSAR analyses were carried out for a series of new side chain modified 4-amino-7-chloroquinolines to find out the structural requirements of their antimalarial activities against both chloroquine sensitive (HB3 and resistant (Dd2 Plasmodium falciparum strain. The statistically significant best 2D QSAR models for Dd2, having correlation coefficient (r2 = 0.9188 and cross validated squared correlation coefficient (q2 = 0.8349 with external predictive ability (pred_r2 = 0.7258 and for HB3, having r2 = 0.9024, q2 = 0.8089 and pred_r2 = 0.7463 were developed by multiple linear regression coupled with genetic algorithm (GA–MLR and stepwise (SW–MLR forward algorithm, respectively. The results of the present study may be useful on the designing of more potent analogues as antimalarial agents.

  19. Composition and sources of winter and summertime aerosols at Ny Alesund, Spitsbergen

    International Nuclear Information System (INIS)

    Maenhaut, W.; Cornille, P.; Pacyna, J.M.

    1991-01-01

    Filter samples of < 2.5 μm aerosol were collected in (late) winter of 1983, 1984, 1986, and 1987 and in the summer of 1984, 1986, and 1987 at Ny Alesund, Spitsbergen, and analyzed for over 40 elements by a combination of INAA and PIXE. The data sets of the various sampling campaigns and the combined winter and combined summer data were examined by receptor modeling, including absolute principal component analysis (APCA), chemical mass balance (CMB) and multiple linear regression (MLR) techniques. APCA yielded four components, both for the winter and for the summer aerosol. For the winter aerosol, the components were identified as a general pollution component, crustal dust, sea-salt, and a halogen (Br,I) component. The CMB and MLR calculations were used to obtain source (source region) apportionments for the anthropogenic trace elements and for sulfate. For the summer, about 50% of the sulfate was attributed to a marine biogenic source

  20. Type-Directed Partial Evaluation

    DEFF Research Database (Denmark)

    Danvy, Olivier

    1998-01-01

    Type-directed partial evaluation uses a normalization function to achieve partial evaluation. These lecture notes review its background, foundations, practice, and applications. Of specific interest is the modular technique of offline and online type-directed partial evaluation in Standard ML...

  1. Type-Directed Partial Evaluation

    DEFF Research Database (Denmark)

    Danvy, Olivier

    1998-01-01

    Type-directed partial evaluation uses a normalization function to achieve partial evaluation. These lecture notes review its background, foundations, practice, and applications. Of specific interest is the modular technique of offline and online type-directed partial evaluation in Standard ML of ...

  2. Regional regression equations for the estimation of selected monthly low-flow duration and frequency statistics at ungaged sites on streams in New Jersey

    Science.gov (United States)

    Watson, Kara M.; McHugh, Amy R.

    2014-01-01

    Regional regression equations were developed for estimating monthly flow-duration and monthly low-flow frequency statistics for ungaged streams in Coastal Plain and non-coastal regions of New Jersey for baseline and current land- and water-use conditions. The equations were developed to estimate 87 different streamflow statistics, which include the monthly 99-, 90-, 85-, 75-, 50-, and 25-percentile flow-durations of the minimum 1-day daily flow; the August–September 99-, 90-, and 75-percentile minimum 1-day daily flow; and the monthly 7-day, 10-year (M7D10Y) low-flow frequency. These 87 streamflow statistics were computed for 41 continuous-record streamflow-gaging stations (streamgages) with 20 or more years of record and 167 low-flow partial-record stations in New Jersey with 10 or more streamflow measurements. The regression analyses used to develop equations to estimate selected streamflow statistics were performed by testing the relation between flow-duration statistics and low-flow frequency statistics for 32 basin characteristics (physical characteristics, land use, surficial geology, and climate) at the 41 streamgages and 167 low-flow partial-record stations. The regression analyses determined drainage area, soil permeability, average April precipitation, average June precipitation, and percent storage (water bodies and wetlands) were the significant explanatory variables for estimating the selected flow-duration and low-flow frequency statistics. Streamflow estimates were computed for two land- and water-use conditions in New Jersey—land- and water-use during the baseline period of record (defined as the years a streamgage had little to no change in development and water use) and current land- and water-use conditions (1989–2008)—for each selected station using data collected through water year 2008. The baseline period of record is representative of a period when the basin was unaffected by change in development. The current period is

  3. Estimation of the volume of distribution of some pharmacologically important compounds from their structural descriptor

    Directory of Open Access Journals (Sweden)

    MOHAMMAD H. FATEMI

    2011-07-01

    Full Text Available Quantitative structure–activity relationship (QSAR approaches were used to estimate the volume of distribution (Vd using an artificial neural network (ANN. The data set consisted of the volume of distribution of 129 pharmacologically important compounds, i.e., benzodiazepines, barbiturates, nonsteroidal anti-inflammatory drugs (NSAIDs, tricyclic anti-depressants and some antibiotics, such as betalactams, tetracyclines and quinolones. The descriptors, which were selected by stepwise variable selection methods, were: the Moriguchi octanol–water partition coefficient; the 3D-MoRSE-signal 30, weighted by atomic van der Waals volumes; the fragment-based polar surface area; the d COMMA2 value, weighted by atomic masses; the Geary autocorrelation, weighted by the atomic Sanderson electronegativities; the 3D-MoRSE – signal 02, weighted by atomic masses, and the Geary autocorrelation – lag 5, weighted by the atomic van der Waals volumes. These descriptors were used as inputs for developing multiple linear regressions (MLR and artificial neural network models as linear and non-linear feature mapping techniques, respectively. The standard errors in the estimation of Vd by the MLR model were: 0.104, 0.103 and 0.076 and for the ANN model: 0.029, 0.087 and 0.082 for the training, internal and external validation test, respectively. The robustness of these models were also evaluated by the leave-5-out cross validation procedure, that gives the statistics Q2 = 0.72 for the MLR model and Q2 = 0.82 for the ANN model. Moreover, the results of the Y-randomization test revealed that there were no chance correlations among the data matrix. In conclusion, the results of this study indicate the applicability of the estimation of the Vd value of drugs from their structural molecular descriptors. Furthermore, the statistics of the developed models indicate the superiority of the ANN over the MLR model.

  4. Study on proliferative responses to host Ia antigens in allogeneic bone marrow chimera in mice: sequential analysis of the reactivity and characterization of the cells involved in the responses

    International Nuclear Information System (INIS)

    Iwabuchi, K.; Ogasawara, K.; Ogasawara, M.; Yasumizu, R.; Noguchi, M.; Geng, L.; Fujita, M.; Good, R.A.; Onoe, K.

    1987-01-01

    Irradiation bone marrow chimeras were established by reconstitution of lethally irradiated AKR mice with C57BL/10 marrow cells to permit serial analysis of the developing reactivities of lymphocytes from such chimeras, [B10----AKR], against donor, host, or third party antigens. We found that substantial proliferative responses to Ia antigens of the recipient strain and also to third party antigens were generated by the thymocytes obtained from the irradiation chimeras at an early stage after bone marrow reconstitution. The majority of the responding thymocytes had surfaces lacking demonstrable peanut agglutinin receptors and were donor type Thy-1+, Ly-2-, and L3T4+ in both anti-recipient and anti-third party MLR. In anti-host responses, however, Ly-2+ thymocytes seemed to be at least partially involved. This capacity of thymus cells to mount a response to antigens of the recipient strain declined shortly thereafter, whereas the capacity to mount MLR against third party antigens persisted. The spleen cells of [B10----AKR] chimeras at the same time developed a more durable capability to exhibit anti-host reactivities and a permanent capability of reacting to third party allo-antigens. The stimulator antigens were Ia molecules on the stimulator cells in both anti-recipient and anti-third party MLR. The responding splenocytes were of donor origin and most of them had Thy-1+, Ly-1+2-, and L3T4+ phenotype

  5. Regression in autistic spectrum disorders.

    Science.gov (United States)

    Stefanatos, Gerry A

    2008-12-01

    A significant proportion of children diagnosed with Autistic Spectrum Disorder experience a developmental regression characterized by a loss of previously-acquired skills. This may involve a loss of speech or social responsitivity, but often entails both. This paper critically reviews the phenomena of regression in autistic spectrum disorders, highlighting the characteristics of regression, age of onset, temporal course, and long-term outcome. Important considerations for diagnosis are discussed and multiple etiological factors currently hypothesized to underlie the phenomenon are reviewed. It is argued that regressive autistic spectrum disorders can be conceptualized on a spectrum with other regressive disorders that may share common pathophysiological features. The implications of this viewpoint are discussed.

  6. Gender and distance influence performance predictors in young swimmers

    OpenAIRE

    Mezzaroba, Paulo Victor; Papoti, Marcelo; Machado, Fabiana Andrade

    2013-01-01

    Predictors of performance in adult swimmers are constantly changing during youth especially because the training routine begins even before puberty in the modality. Therefore this study aimed to determine the group of parameters that best predict short and middle swimming distance performances of young swimmers of both genders. Thirty-three 10-to 16-years-old male and female competitive swimmers participated in the study. Multiple linear regression (MLR) was used considering mean speed of max...

  7. Understanding logistic regression analysis

    OpenAIRE

    Sperandei, Sandro

    2014-01-01

    Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using ex...

  8. Modeling of the Monthly Rainfall-Runoff Process Through Regressions

    Directory of Open Access Journals (Sweden)

    Campos-Aranda Daniel Francisco

    2014-10-01

    Full Text Available To solve the problems associated with the assessment of water resources of a river, the modeling of the rainfall-runoff process (RRP allows the deduction of runoff missing data and to extend its record, since generally the information available on precipitation is larger. It also enables the estimation of inputs to reservoirs, when their building led to the suppression of the gauging station. The simplest mathematical model that can be set for the RRP is the linear regression or curve on a monthly basis. Such a model is described in detail and is calibrated with the simultaneous record of monthly rainfall and runoff in Ballesmi hydrometric station, which covers 35 years. Since the runoff of this station has an important contribution from the spring discharge, the record is corrected first by removing that contribution. In order to do this a procedure was developed based either on the monthly average regional runoff coefficients or on nearby and similar watershed; in this case the Tancuilín gauging station was used. Both stations belong to the Partial Hydrologic Region No. 26 (Lower Rio Panuco and are located within the state of San Luis Potosi, México. The study performed indicates that the monthly regression model, due to its conceptual approach, faithfully reproduces monthly average runoff volumes and achieves an excellent approximation in relation to the dispersion, proved by calculation of the means and standard deviations.

  9. Linear regression in astronomy. II

    Science.gov (United States)

    Feigelson, Eric D.; Babu, Gutti J.

    1992-01-01

    A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.

  10. Pemodelan Tingkat Penghunian Kamar Hotel di Kendari dengan Transformasi Wavelet Kontinu dan Partial Least Squares

    Directory of Open Access Journals (Sweden)

    Margaretha Ohyver

    2014-12-01

    Full Text Available Multicollinearity and outliers are the common problems when estimating regression model.   Multicollinearitiy occurs when there are high correlations among predictor variables, leading to difficulties in separating the effects of each independent variable on the response variable. While, if outliers are present in the data to be analyzed, then the assumption of normality in the regression will be violated and the results of the analysis may be incorrect or misleading. Both of these cases occurred in the data on room occupancy rate of hotels in Kendari. The purpose of this study is to find a model for the data that is free of multicollinearity and outliers and to determine the factors that affect the level of room occupancy hotels in Kendari. The method used is Continuous Wavelet Transformation and Partial Least Squares. The result of this research is a regression model that is free of multicollinearity and a  pattern of data that resolved the present of outliers.

  11. Partial Least Squares tutorial for analyzing neuroimaging data

    Directory of Open Access Journals (Sweden)

    Patricia Van Roon

    2014-09-01

    Full Text Available Partial least squares (PLS has become a respected and meaningful soft modeling analysis technique that can be applied to very large datasets where the number of factors or variables is greater than the number of observations. Current biometric studies (e.g., eye movements, EKG, body movements, EEG are often of this nature. PLS eliminates the multiple linear regression issues of over-fitting data by finding a few underlying or latent variables (factors that account for most of the variation in the data. In real-world applications, where linear models do not always apply, PLS can model the non-linear relationship well. This tutorial introduces two PLS methods, PLS Correlation (PLSC and PLS Regression (PLSR and their applications in data analysis which are illustrated with neuroimaging examples. Both methods provide straightforward and comprehensible techniques for determining and modeling relationships between two multivariate data blocks by finding latent variables that best describes the relationships. In the examples, the PLSC will analyze the relationship between neuroimaging data such as Event-Related Potential (ERP amplitude averages from different locations on the scalp with their corresponding behavioural data. Using the same data, the PLSR will be used to model the relationship between neuroimaging and behavioural data. This model will be able to predict future behaviour solely from available neuroimaging data. To find latent variables, Singular Value Decomposition (SVD for PLSC and Non-linear Iterative PArtial Least Squares (NIPALS for PLSR are implemented in this tutorial. SVD decomposes the large data block into three manageable matrices containing a diagonal set of singular values, as well as left and right singular vectors. For PLSR, NIPALS algorithms are used because it provides amore precise estimation of the latent variables. Mathematica notebooks are provided for each PLS method with clearly labeled sections and subsections. The

  12. Compatriot partiality and cosmopolitan justice: Can we justify compatriot partiality within the cosmopolitan framework?

    Directory of Open Access Journals (Sweden)

    Rachelle Bascara

    2016-10-01

    Full Text Available This paper shows an alternative way in which compatriot partiality could be justified within the framework of global distributive justice. Philosophers who argue that compatriot partiality is similar to racial partiality capture something correct about compatriot partiality. However, the analogy should not lead us to comprehensively reject compatriot partiality. We can justify compatriot partiality on the same grounds that liberation movements and affirmative action have been justified. Hence, given cosmopolitan demands of justice, special consideration for the economic well-being of your nation as a whole is justified if and only if the country it identifies is an oppressed developing nation in an unjust global order.This justification is incomplete. We also need to say why Person A, qua national of Country A, is justified in helping her compatriots in Country A over similarly or slightly more oppressed non-compatriots in Country B. I argue that Person A’s partiality towards her compatriots admits further vindication because it is part of an oppressed group’s project of self-emancipation, which is preferable to paternalistic emancipation.Finally, I identify three benefits in my justification for compatriot partiality. First, I do not offer a blanket justification for all forms of compatriot partiality. Partiality between members of oppressed groups is only a temporary effective measure designed to level an unlevel playing field. Second, because history attests that sovereign republics could arise as a collective response to colonial oppression, justifying compatriot partiality on the grounds that I have identified is conducive to the development of sovereignty and even democracy in poor countries, thereby avoiding problems of infringement that many humanitarian poverty alleviation efforts encounter. Finally, my justification for compatriot partiality complies with the implicit cosmopolitan commitment to the realizability of global justice

  13. A Matlab program for stepwise regression

    Directory of Open Access Journals (Sweden)

    Yanhong Qi

    2016-03-01

    Full Text Available The stepwise linear regression is a multi-variable regression for identifying statistically significant variables in the linear regression equation. In present study, we presented the Matlab program of stepwise regression.

  14. Plant leaf chlorophyll content retrieval based on a field imaging spectroscopy system.

    Science.gov (United States)

    Liu, Bo; Yue, Yue-Min; Li, Ru; Shen, Wen-Jing; Wang, Ke-Lin

    2014-10-23

    A field imaging spectrometer system (FISS; 380-870 nm and 344 bands) was designed for agriculture applications. In this study, FISS was used to gather spectral information from soybean leaves. The chlorophyll content was retrieved using a multiple linear regression (MLR), partial least squares (PLS) regression and support vector machine (SVM) regression. Our objective was to verify the performance of FISS in a quantitative spectral analysis through the estimation of chlorophyll content and to determine a proper quantitative spectral analysis method for processing FISS data. The results revealed that the derivative reflectance was a more sensitive indicator of chlorophyll content and could extract content information more efficiently than the spectral reflectance, which is more significant for FISS data compared to ASD (analytical spectral devices) data, reducing the corresponding RMSE (root mean squared error) by 3.3%-35.6%. Compared with the spectral features, the regression methods had smaller effects on the retrieval accuracy. A multivariate linear model could be the ideal model to retrieve chlorophyll information with a small number of significant wavelengths used. The smallest RMSE of the chlorophyll content retrieved using FISS data was 0.201 mg/g, a relative reduction of more than 30% compared with the RMSE based on a non-imaging ASD spectrometer, which represents a high estimation accuracy compared with the mean chlorophyll content of the sampled leaves (4.05 mg/g). Our study indicates that FISS could obtain both spectral and spatial detailed information of high quality. Its image-spectrum-in-one merit promotes the good performance of FISS in quantitative spectral analyses, and it can potentially be widely used in the agricultural sector.

  15. Optimizing the physical ergonomics indices for the use of partial pressure suits.

    Science.gov (United States)

    Ding, Li; Li, Xianxue; Hedge, Alan; Hu, Huimin; Feathers, David; Qin, Zhifeng; Xiao, Huajun; Xue, Lihao; Zhou, Qianxiang

    2015-03-01

    This study developed an ergonomic evaluation system for the design of high-altitude partial pressure suits (PPSs). A total of twenty-one Chinese males participated in the experiment which tested three types of ergonomics indices (manipulative mission, operational reach and operational strength) were studied using a three-dimensional video-based motion capture system, a target-pointing board, a hand dynamometer, and a step-tread apparatus. In total, 36 ergonomics indices were evaluated and optimized using regression and fitting analysis. Some indices that were found to be linearly related and redundant were removed from the study. An optimal ergonomics index system was established that can be used to conveniently and quickly evaluate the performance of different pressurized/non-pressurized suit designs. The resulting ergonomics index system will provide a theoretical basis and practical guidance for mission planners, suit designers and engineers to design equipment for human use, and to aid in assessing partial pressure suits. Copyright © 2014 Elsevier Ltd and The Ergonomics Society. All rights reserved.

  16. Application of transmission infrared spectroscopy and partial least squares regression to predict immunoglobulin G concentration in dairy and beef cow colostrum.

    Science.gov (United States)

    Elsohaby, Ibrahim; Windeyer, M Claire; Haines, Deborah M; Homerosky, Elizabeth R; Pearson, Jennifer M; McClure, J Trenton; Keefe, Greg P

    2018-03-06

    The objective of this study was to explore the potential of transmission infrared (TIR) spectroscopy in combination with partial least squares regression (PLSR) for quantification of dairy and beef cow colostral immunoglobulin G (IgG) concentration and assessment of colostrum quality. A total of 430 colostrum samples were collected from dairy (n = 235) and beef (n = 195) cows and tested by a radial immunodiffusion (RID) assay and TIR spectroscopy. Colostral IgG concentrations obtained by the RID assay were linked to the preprocessed spectra and divided into combined and prediction data sets. Three PLSR calibration models were built: one for the dairy cow colostrum only, the second for beef cow colostrum only, and the third for the merged dairy and beef cow colostrum. The predictive performance of each model was evaluated separately using the independent prediction data set. The Pearson correlation coefficients between IgG concentrations as determined by the TIR-based assay and the RID assay were 0.84 for dairy cow colostrum, 0.88 for beef cow colostrum, and 0.92 for the merged set of dairy and beef cow colostrum. The average of the differences between colostral IgG concentrations obtained by the RID- and TIR-based assays were -3.5, 2.7, and 1.4 g/L for dairy, beef, and merged colostrum samples, respectively. Further, the average relative error of the colostral IgG predicted by the TIR spectroscopy from the RID assay was 5% for dairy cow, 1.2% for beef cow, and 0.8% for the merged data set. The average intra-assay CV% of the IgG concentration predicted by the TIR-based method were 3.2%, 2.5%, and 6.9% for dairy cow, beef cow, and merged data set, respectively.The utility of TIR method for assessment of colostrum quality was evaluated using the entire data set and showed that TIR spectroscopy accurately identified the quality status of 91% of dairy cow colostrum, 95% of beef cow colostrum, and 89% and 93% of the merged dairy and beef cow colostrum samples

  17. Essays on partial retirement

    NARCIS (Netherlands)

    Kantarci, T.

    2012-01-01

    The five essays in this dissertation address a range of topics in the micro-economic literature on partial retirement. The focus is on the labor market behavior of older age groups. The essays examine the economic and non-economic determinants of partial retirement behavior, the effect of partial

  18. Optimal difference-based estimation for partially linear models

    KAUST Repository

    Zhou, Yuejin; Cheng, Yebin; Dai, Wenlin; Tong, Tiejun

    2017-01-01

    Difference-based methods have attracted increasing attention for analyzing partially linear models in the recent literature. In this paper, we first propose to solve the optimal sequence selection problem in difference-based estimation for the linear component. To achieve the goal, a family of new sequences and a cross-validation method for selecting the adaptive sequence are proposed. We demonstrate that the existing sequences are only extreme cases in the proposed family. Secondly, we propose a new estimator for the residual variance by fitting a linear regression method to some difference-based estimators. Our proposed estimator achieves the asymptotic optimal rate of mean squared error. Simulation studies also demonstrate that our proposed estimator performs better than the existing estimator, especially when the sample size is small and the nonparametric function is rough.

  19. Optimal difference-based estimation for partially linear models

    KAUST Repository

    Zhou, Yuejin

    2017-12-16

    Difference-based methods have attracted increasing attention for analyzing partially linear models in the recent literature. In this paper, we first propose to solve the optimal sequence selection problem in difference-based estimation for the linear component. To achieve the goal, a family of new sequences and a cross-validation method for selecting the adaptive sequence are proposed. We demonstrate that the existing sequences are only extreme cases in the proposed family. Secondly, we propose a new estimator for the residual variance by fitting a linear regression method to some difference-based estimators. Our proposed estimator achieves the asymptotic optimal rate of mean squared error. Simulation studies also demonstrate that our proposed estimator performs better than the existing estimator, especially when the sample size is small and the nonparametric function is rough.

  20. Quantitative monitoring of sucrose, reducing sugar and total sugar dynamics for phenotyping of water-deficit stress tolerance in rice through spectroscopy and chemometrics

    Science.gov (United States)

    Das, Bappa; Sahoo, Rabi N.; Pargal, Sourabh; Krishna, Gopal; Verma, Rakesh; Chinnusamy, Viswanathan; Sehgal, Vinay K.; Gupta, Vinod K.; Dash, Sushanta K.; Swain, Padmini

    2018-03-01

    In the present investigation, the changes in sucrose, reducing and total sugar content due to water-deficit stress in rice leaves were modeled using visible, near infrared (VNIR) and shortwave infrared (SWIR) spectroscopy. The objectives of the study were to identify the best vegetation indices and suitable multivariate technique based on precise analysis of hyperspectral data (350 to 2500 nm) and sucrose, reducing sugar and total sugar content measured at different stress levels from 16 different rice genotypes. Spectral data analysis was done to identify suitable spectral indices and models for sucrose estimation. Novel spectral indices in near infrared (NIR) range viz. ratio spectral index (RSI) and normalised difference spectral indices (NDSI) sensitive to sucrose, reducing sugar and total sugar content were identified which were subsequently calibrated and validated. The RSI and NDSI models had R2 values of 0.65, 0.71 and 0.67; RPD values of 1.68, 1.95 and 1.66 for sucrose, reducing sugar and total sugar, respectively for validation dataset. Different multivariate spectral models such as artificial neural network (ANN), multivariate adaptive regression splines (MARS), multiple linear regression (MLR), partial least square regression (PLSR), random forest regression (RFR) and support vector machine regression (SVMR) were also evaluated. The best performing multivariate models for sucrose, reducing sugars and total sugars were found to be, MARS, ANN and MARS, respectively with respect to RPD values of 2.08, 2.44, and 1.93. Results indicated that VNIR and SWIR spectroscopy combined with multivariate calibration can be used as a reliable alternative to conventional methods for measurement of sucrose, reducing sugars and total sugars of rice under water-deficit stress as this technique is fast, economic, and noninvasive.

  1. Quantile regression theory and applications

    CERN Document Server

    Davino, Cristina; Vistocco, Domenico

    2013-01-01

    A guide to the implementation and interpretation of Quantile Regression models This book explores the theory and numerous applications of quantile regression, offering empirical data analysis as well as the software tools to implement the methods. The main focus of this book is to provide the reader with a comprehensivedescription of the main issues concerning quantile regression; these include basic modeling, geometrical interpretation, estimation and inference for quantile regression, as well as issues on validity of the model, diagnostic tools. Each methodological aspect is explored and

  2. Fungible weights in logistic regression.

    Science.gov (United States)

    Jones, Jeff A; Waller, Niels G

    2016-06-01

    In this article we develop methods for assessing parameter sensitivity in logistic regression models. To set the stage for this work, we first review Waller's (2008) equations for computing fungible weights in linear regression. Next, we describe 2 methods for computing fungible weights in logistic regression. To demonstrate the utility of these methods, we compute fungible logistic regression weights using data from the Centers for Disease Control and Prevention's (2010) Youth Risk Behavior Surveillance Survey, and we illustrate how these alternate weights can be used to evaluate parameter sensitivity. To make our work accessible to the research community, we provide R code (R Core Team, 2015) that will generate both kinds of fungible logistic regression weights. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  3. Principal component regression analysis with SPSS.

    Science.gov (United States)

    Liu, R X; Kuang, J; Gong, Q; Hou, X L

    2003-06-01

    The paper introduces all indices of multicollinearity diagnoses, the basic principle of principal component regression and determination of 'best' equation method. The paper uses an example to describe how to do principal component regression analysis with SPSS 10.0: including all calculating processes of the principal component regression and all operations of linear regression, factor analysis, descriptives, compute variable and bivariate correlations procedures in SPSS 10.0. The principal component regression analysis can be used to overcome disturbance of the multicollinearity. The simplified, speeded up and accurate statistical effect is reached through the principal component regression analysis with SPSS.

  4. Logistic regression models

    CERN Document Server

    Hilbe, Joseph M

    2009-01-01

    This book really does cover everything you ever wanted to know about logistic regression … with updates available on the author's website. Hilbe, a former national athletics champion, philosopher, and expert in astronomy, is a master at explaining statistical concepts and methods. Readers familiar with his other expository work will know what to expect-great clarity.The book provides considerable detail about all facets of logistic regression. No step of an argument is omitted so that the book will meet the needs of the reader who likes to see everything spelt out, while a person familiar with some of the topics has the option to skip "obvious" sections. The material has been thoroughly road-tested through classroom and web-based teaching. … The focus is on helping the reader to learn and understand logistic regression. The audience is not just students meeting the topic for the first time, but also experienced users. I believe the book really does meet the author's goal … .-Annette J. Dobson, Biometric...

  5. Time-resolved flow reconstruction with indirect measurements using regression models and Kalman-filtered POD ROM

    Science.gov (United States)

    Leroux, Romain; Chatellier, Ludovic; David, Laurent

    2018-01-01

    This article is devoted to the estimation of time-resolved particle image velocimetry (TR-PIV) flow fields using a time-resolved point measurements of a voltage signal obtained by hot-film anemometry. A multiple linear regression model is first defined to map the TR-PIV flow fields onto the voltage signal. Due to the high temporal resolution of the signal acquired by the hot-film sensor, the estimates of the TR-PIV flow fields are obtained with a multiple linear regression method called orthonormalized partial least squares regression (OPLSR). Subsequently, this model is incorporated as the observation equation in an ensemble Kalman filter (EnKF) applied on a proper orthogonal decomposition reduced-order model to stabilize it while reducing the effects of the hot-film sensor noise. This method is assessed for the reconstruction of the flow around a NACA0012 airfoil at a Reynolds number of 1000 and an angle of attack of {20}°. Comparisons with multi-time delay-modified linear stochastic estimation show that both the OPLSR and EnKF combined with OPLSR are more accurate as they produce a much lower relative estimation error, and provide a faithful reconstruction of the time evolution of the velocity flow fields.

  6. Logistic regression applied to natural hazards: rare event logistic regression with replications

    Science.gov (United States)

    Guns, M.; Vanacker, V.

    2012-06-01

    Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.

  7. Tamoxifen with and without radiation after partial mastectomy in patients with involved nodes

    Energy Technology Data Exchange (ETDEWEB)

    Cooke, Andrew L; Perera, Francisco; Fisher, Barbara; Opeitum, Abiola; Yu, Norman

    1995-02-15

    Purpose: To determine the effect of tamoxifen on local control after partial mastectomy with and without adjuvant breast irradiation. Methods and Materials: A retrospective study of 97 node positive patients identified from the records of the London Regional Cancer Center included 44 patients who received tamoxifen and breast irradiation (40 or 50 Gy plus booster dose) after partial mastectomy, and 53 patients who received tamoxifen only after partial mastectomy. Base line characteristics of the two groups were similar. Results: At 39 months actuarial follow-up there was a breast tumor recurrence (BTR) in 5% vs. 21% of patients when radiation was omitted (p = 0.0388), but there was no difference in the cause-specific mortality of the two treatment groups. Cox Regression analysis (on only 10 BTR) showed age and adjuvant radiation as significant predictors of BTR. In patients not receiving radiation, no BTR was seen in 22 patients {>=}70 years of age at diagnosis vs. 8 BTR in 31 patients <70 years (p = 0.0130). All BTR occurred while patients were receiving tamoxifen. Conclusion: Tamoxifen alone with omission of radiation after partial mastectomy provides inferior breast tumor control in node positive patients. This is especially true for patients under 70 years of age. Patients aged 70 years or older at the time of diagnosis of breast cancer who receive tamoxifen have a low rate of breast tumor recurrence when radiation is omitted. These patients represent a group for whom radiation might not be necessary.

  8. Understanding logistic regression analysis.

    Science.gov (United States)

    Sperandei, Sandro

    2014-01-01

    Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using examples to make it as simple as possible. After definition of the technique, the basic interpretation of the results is highlighted and then some special issues are discussed.

  9. Determination of Ethanol in Blood Samples Using Partial Least Square Regression Applied to Surface Enhanced Raman Spectroscopy.

    Science.gov (United States)

    Açikgöz, Güneş; Hamamci, Berna; Yildiz, Abdulkadir

    2018-04-01

    Alcohol consumption triggers toxic effect to organs and tissues in the human body. The risks are essentially thought to be related to ethanol content in alcoholic beverages. The identification of ethanol in blood samples requires rapid, minimal sample handling, and non-destructive analysis, such as Raman Spectroscopy. This study aims to apply Raman Spectroscopy for identification of ethanol in blood samples. Silver nanoparticles were synthesized to obtain Surface Enhanced Raman Spectroscopy (SERS) spectra of blood samples. The SERS spectra were used for Partial Least Square (PLS) for determining ethanol quantitatively. To apply PLS method, 920~820 cm -1 band interval was chosen and the spectral changes of the observed concentrations statistically associated with each other. The blood samples were examined according to this model and the quantity of ethanol was determined as that: first a calibration method was established. A strong relationship was observed between known concentration values and the values obtained by PLS method (R 2 = 1). Second instead of then, quantities of ethanol in 40 blood samples were predicted according to the calibration method. Quantitative analysis of the ethanol in the blood was done by analyzing the data obtained by Raman spectroscopy and the PLS method.

  10. Selection of a Geostatistical Method to Interpolate Soil Properties of the State Crop Testing Fields using Attributes of a Digital Terrain Model

    Science.gov (United States)

    Sahabiev, I. A.; Ryazanov, S. S.; Kolcova, T. G.; Grigoryan, B. R.

    2018-03-01

    The three most common techniques to interpolate soil properties at a field scale—ordinary kriging (OK), regression kriging with multiple linear regression drift model (RK + MLR), and regression kriging with principal component regression drift model (RK + PCR)—were examined. The results of the performed study were compiled into an algorithm of choosing the most appropriate soil mapping technique. Relief attributes were used as the auxiliary variables. When spatial dependence of a target variable was strong, the OK method showed more accurate interpolation results, and the inclusion of the auxiliary data resulted in an insignificant improvement in prediction accuracy. According to the algorithm, the RK + PCR method effectively eliminates multicollinearity of explanatory variables. However, if the number of predictors is less than ten, the probability of multicollinearity is reduced, and application of the PCR becomes irrational. In that case, the multiple linear regression should be used instead.

  11. Associations between partial sickness benefit and disability pensions: initial findings of a Finnish nationwide register study.

    Science.gov (United States)

    Kausto, Johanna; Virta, Lauri; Luukkonen, Ritva; Viikari-Juntura, Eira

    2010-06-23

    Timely return to work after longterm sickness absence and the increased use of flexible work arrangements together with partial health-related benefits are tools intended to increase participation in work life. Although partial sickness benefit and partial disability pension are used in many countries, prospective studies on their use are largely lacking. Partial sickness benefit was introduced in Finland in 2007. This register study aimed to investigate the use of health-related benefits by subjects with prolonged sickness absence, initially on either partial or full sick leave. Representative population data (13 375 men and 16 052 women either on partial or full sick leave in 2007) were drawn from national registers and followed over an average of 18 months. The registers provided information on the study outcomes: diagnoses and days of payment for compensated sick leaves, and the occurrence of disability pension. Survival analysis and multinomial regression were carried out using sociodemographic variables and prior sickness absence as covariates. Approximately 60% of subjects on partial sick leave and 30% of those on full sick leave had at least one recurrent sick leave over the follow up. A larger proportion of those on partial sick leave (16%) compared to those on full sick leave (1%) had their first recurrent sick leave during the first month of follow up. The adjusted risks of the first recurrent sick leave were 1.8 and 1.7 for men and women, respectively, when subjects on partial sick leave were compared with those on full sick leave. There was no increased risk when those with their first recurrent sick leave in the first month were excluded from the analyses. The risks of a full disability pension were smaller and risks of a partial disability pension approximately two-fold among men and women initially on partial sick leave, compared to subjects on full sick leave. This is the first follow up study of the newly adopted partial sickness benefit in

  12. Associations between partial sickness benefit and disability pensions: initial findings of a Finnish nationwide register study

    Directory of Open Access Journals (Sweden)

    Luukkonen Ritva

    2010-06-01

    Full Text Available Abstract Background Timely return to work after longterm sickness absence and the increased use of flexible work arrangements together with partial health-related benefits are tools intended to increase participation in work life. Although partial sickness benefit and partial disability pension are used in many countries, prospective studies on their use are largely lacking. Partial sickness benefit was introduced in Finland in 2007. This register study aimed to investigate the use of health-related benefits by subjects with prolonged sickness absence, initially on either partial or full sick leave. Methods Representative population data (13 375 men and 16 052 women either on partial or full sick leave in 2007 were drawn from national registers and followed over an average of 18 months. The registers provided information on the study outcomes: diagnoses and days of payment for compensated sick leaves, and the occurrence of disability pension. Survival analysis and multinomial regression were carried out using sociodemographic variables and prior sickness absence as covariates. Results Approximately 60% of subjects on partial sick leave and 30% of those on full sick leave had at least one recurrent sick leave over the follow up. A larger proportion of those on partial sick leave (16% compared to those on full sick leave (1% had their first recurrent sick leave during the first month of follow up. The adjusted risks of the first recurrent sick leave were 1.8 and 1.7 for men and women, respectively, when subjects on partial sick leave were compared with those on full sick leave. There was no increased risk when those with their first recurrent sick leave in the first month were excluded from the analyses. The risks of a full disability pension were smaller and risks of a partial disability pension approximately two-fold among men and women initially on partial sick leave, compared to subjects on full sick leave. Conclusions This is the first follow

  13. Minimax Regression Quantiles

    DEFF Research Database (Denmark)

    Bache, Stefan Holst

    A new and alternative quantile regression estimator is developed and it is shown that the estimator is root n-consistent and asymptotically normal. The estimator is based on a minimax ‘deviance function’ and has asymptotically equivalent properties to the usual quantile regression estimator. It is......, however, a different and therefore new estimator. It allows for both linear- and nonlinear model specifications. A simple algorithm for computing the estimates is proposed. It seems to work quite well in practice but whether it has theoretical justification is still an open question....

  14. Anatomic partial nephrectomy: technique evolution.

    Science.gov (United States)

    Azhar, Raed A; Metcalfe, Charles; Gill, Inderbir S

    2015-03-01

    Partial nephrectomy provides equivalent long-term oncologic and superior functional outcomes as radical nephrectomy for T1a renal masses. Herein, we review the various vascular clamping techniques employed during minimally invasive partial nephrectomy, describe the evolution of our partial nephrectomy technique and provide an update on contemporary thinking about the impact of ischemia on renal function. Recently, partial nephrectomy surgical technique has shifted away from main artery clamping and towards minimizing/eliminating global renal ischemia during partial nephrectomy. Supported by high-fidelity three-dimensional imaging, novel anatomic-based partial nephrectomy techniques have recently been developed, wherein partial nephrectomy can now be performed with segmental, minimal or zero global ischemia to the renal remnant. Sequential innovations have included early unclamping, segmental clamping, super-selective clamping and now culminating in anatomic zero-ischemia surgery. By eliminating 'under-the-gun' time pressure of ischemia for the surgeon, these techniques allow an unhurried, tightly contoured tumour excision with point-specific sutured haemostasis. Recent data indicate that zero-ischemia partial nephrectomy may provide better functional outcomes by minimizing/eliminating global ischemia and preserving greater vascularized kidney volume. Contemporary partial nephrectomy includes a spectrum of surgical techniques ranging from conventional-clamped to novel zero-ischemia approaches. Technique selection should be tailored to each individual case on the basis of tumour characteristics, surgical feasibility, surgeon experience, patient demographics and baseline renal function.

  15. Measurement and ANN prediction of pH-dependent solubility of nitrogen-heterocyclic compounds.

    Science.gov (United States)

    Sun, Feifei; Yu, Qingni; Zhu, Jingke; Lei, Lecheng; Li, Zhongjian; Zhang, Xingwang

    2015-09-01

    Based on the solubility of 25 nitrogen-heterocyclic compounds (NHCs) measured by saturation shake-flask method, artificial neural network (ANN) was employed to the study of the quantitative relationship between the structure and pH-dependent solubility of NHCs. With genetic algorithm-multivariate linear regression (GA-MLR) approach, five out of the 1497 molecular descriptors computed by Dragon software were selected to describe the molecular structures of NHCs. Using the five selected molecular descriptors as well as pH and the partial charge on the nitrogen atom of NHCs (QN) as inputs of ANN, a quantitative structure-property relationship (QSPR) model without using Henderson-Hasselbalch (HH) equation was successfully developed to predict the aqueous solubility of NHCs in different pH water solutions. The prediction model performed well on the 25 model NHCs with an absolute average relative deviation (AARD) of 5.9%, while HH approach gave an AARD of 36.9% for the same model NHCs. It was found that QN played a very important role in the description of NHCs and, with QN, ANN became a potential tool for the prediction of pH-dependent solubility of NHCs. Copyright © 2015 Elsevier Ltd. All rights reserved.

  16. Doubly robust estimation of generalized partial linear models for longitudinal data with dropouts.

    Science.gov (United States)

    Lin, Huiming; Fu, Bo; Qin, Guoyou; Zhu, Zhongyi

    2017-12-01

    We develop a doubly robust estimation of generalized partial linear models for longitudinal data with dropouts. Our method extends the highly efficient aggregate unbiased estimating function approach proposed in Qu et al. (2010) to a doubly robust one in the sense that under missing at random (MAR), our estimator is consistent when either the linear conditional mean condition is satisfied or a model for the dropout process is correctly specified. We begin with a generalized linear model for the marginal mean, and then move forward to a generalized partial linear model, allowing for nonparametric covariate effect by using the regression spline smoothing approximation. We establish the asymptotic theory for the proposed method and use simulation studies to compare its finite sample performance with that of Qu's method, the complete-case generalized estimating equation (GEE) and the inverse-probability weighted GEE. The proposed method is finally illustrated using data from a longitudinal cohort study. © 2017, The International Biometric Society.

  17. Regression with Sparse Approximations of Data

    DEFF Research Database (Denmark)

    Noorzad, Pardis; Sturm, Bob L.

    2012-01-01

    We propose sparse approximation weighted regression (SPARROW), a method for local estimation of the regression function that uses sparse approximation with a dictionary of measurements. SPARROW estimates the regression function at a point with a linear combination of a few regressands selected...... by a sparse approximation of the point in terms of the regressors. We show SPARROW can be considered a variant of \\(k\\)-nearest neighbors regression (\\(k\\)-NNR), and more generally, local polynomial kernel regression. Unlike \\(k\\)-NNR, however, SPARROW can adapt the number of regressors to use based...

  18. Neuro-fuzzy decoding of sensory information from ensembles of simultaneously recorded dorsal root ganglion neurons for functional electrical stimulation applications

    Science.gov (United States)

    Rigosa, J.; Weber, D. J.; Prochazka, A.; Stein, R. B.; Micera, S.

    2011-08-01

    Functional electrical stimulation (FES) is used to improve motor function after injury to the central nervous system. Some FES systems use artificial sensors to switch between finite control states. To optimize FES control of the complex behavior of the musculo-skeletal system in activities of daily life, it is highly desirable to implement feedback control. In theory, sensory neural signals could provide the required control signals. Recent studies have demonstrated the feasibility of deriving limb-state estimates from the firing rates of primary afferent neurons recorded in dorsal root ganglia (DRG). These studies used multiple linear regression (MLR) methods to generate estimates of limb position and velocity based on a weighted sum of firing rates in an ensemble of simultaneously recorded DRG neurons. The aim of this study was to test whether the use of a neuro-fuzzy (NF) algorithm (the generalized dynamic fuzzy neural networks (GD-FNN)) could improve the performance, robustness and ability to generalize from training to test sets compared to the MLR technique. NF and MLR decoding methods were applied to ensemble DRG recordings obtained during passive and active limb movements in anesthetized and freely moving cats. The GD-FNN model provided more accurate estimates of limb state and generalized better to novel movement patterns. Future efforts will focus on implementing these neural recording and decoding methods in real time to provide closed-loop control of FES using the information extracted from sensory neurons.

  19. Empirical Estimation of Total Nitrogen and Total Phosphorus Concentration of Urban Water Bodies in China Using High Resolution IKONOS Multispectral Imagery

    Directory of Open Access Journals (Sweden)

    Jiaming Liu

    2015-11-01

    Full Text Available Measuring total nitrogen (TN and total phosphorus (TP is important in managing heavy polluted urban waters in China. This study uses high spatial resolution IKONOS imagery with four multispectral bands, which roughly correspond to Landsat/TM bands 1–4, to determine TN and TP in small urban rivers and lakes in China. By using Lake Cihu and the lower reaches of Wen-Rui Tang (WRT River as examples, this paper develops both multiple linear regressions (MLR and artificial neural network (ANN models to estimate TN and TP concentrations from high spatial resolution remote sensing imagery and in situ water samples collected concurrently with overpassing satellite. The measured and estimated values of both MLR and ANN models are in good agreement (R2 > 0.85 and RMSE < 2.50. The empirical equations selected by MLR are more straightforward, whereas the estimated accuracy using ANN model is better (R2 > 0.86 and RMSE < 0.89. Results validate the potential of using high resolution IKONOS multispectral imagery to study the chemical states of small-sized urban water bodies. The spatial distribution maps of TN and TP concentrations generated by the ANN model can inform the decision makers of variations in water quality in Lake Cihu and lower reaches of WRT River. The approaches and equations developed in this study could be applied to other urban water bodies for water quality monitoring.

  20. Analytical study of friction coefficients of pomegranate seed as essential parameters in design of post-harvest equipment

    Directory of Open Access Journals (Sweden)

    S.M. Shafaei

    2016-09-01

    Full Text Available Friction coefficients (static friction coefficient (SFC and dynamic friction coefficient (DFC of pomegranate seed on different structural surfaces (glass, aluminum, plywood, galvanized steel and rubber as affected by moisture content (4–21.9% (d. b. and sliding velocity (1.4–16 (cm/s were investigated. Analysis of variance (ANOVA was performed to determine the effect of main treatments and their interactions on SFC and DFC. Significance of single or multiple effect of the main treatments with five levels was assessed using Duncan’s multiple range test (DMRT. To predict SFC and DFC, multiple linear regression (MLR modeling technique was applied for each type of structural surface. The goodness of fit of each MLR model was evaluated using statistical parameters: coefficient of determination, root mean square error and mean relative deviation modulus. Results showed that the minimum and maximum SFC or DFC were in minimum and maximum moisture content on glass and rubber surface, respectively. ANOVA table indicated the significant effect of main treatments and their interactions on SFC and DFC at significance level of 1% (P < 0.01. According to DMRT results, SFC linearly increased as moisture content increased and DFC increased also linearly as individual or simultaneous increment of moisture content and sliding velocity occurred, for all experimental conditions. According to the obtained statistical parameters, both SFC and DFC were properly predicted by means of MLR modeling technique.

  1. Factors Influencing Intraocular Pressure Changes after Laser In Situ Keratomileusis with Flaps Created by Femtosecond Laser or Mechanical Microkeratome.

    Directory of Open Access Journals (Sweden)

    Meng-Yin Lin

    Full Text Available The aim of this study is to describe factors that influence the measured intraocular pressure (IOP change and to develop a predictive model after myopic laser in situ keratomileusis (LASIK with a femtosecond (FS laser or a microkeratome (MK. We retrospectively reviewed preoperative, intraoperative, and 12-month postoperative medical records in 2485 eyes of 1309 patients who underwent LASIK with an FS laser or an MK for myopia and myopic astigmatism. Data were extracted, such as preoperative age, sex, IOP, manifest spherical equivalent (MSE, central corneal keratometry (CCK, central corneal thickness (CCT, and intended flap thickness and postoperative IOP (postIOP at 1, 6 and 12 months. Linear mixed model (LMM and multivariate linear regression (MLR method were used for data analysis. In both models, the preoperative CCT and ablation depth had significant effects on predicting IOP changes in the FS and MK groups. The intended flap thickness was a significant predictor only in the FS laser group (P < .0001 in both models. In the FS group, LMM and MLR could respectively explain 47.00% and 18.91% of the variation of postoperative IOP underestimation (R2 = 0.47 and R(2 = 0.1891. In the MK group, LMM and MLR could explain 37.79% and 19.13% of the variation of IOP underestimation (R(2 = 0.3779 and 0.1913 respectively. The best-fit model for prediction of IOP changes was the LMM in LASIK with an FS laser.

  2. Neuro-fuzzy decoding of sensory information from ensembles of simultaneously recorded dorsal root ganglion neurons for functional electrical stimulation applications.

    Science.gov (United States)

    Rigosa, J; Weber, D J; Prochazka, A; Stein, R B; Micera, S

    2011-08-01

    Functional electrical stimulation (FES) is used to improve motor function after injury to the central nervous system. Some FES systems use artificial sensors to switch between finite control states. To optimize FES control of the complex behavior of the musculo-skeletal system in activities of daily life, it is highly desirable to implement feedback control. In theory, sensory neural signals could provide the required control signals. Recent studies have demonstrated the feasibility of deriving limb-state estimates from the firing rates of primary afferent neurons recorded in dorsal root ganglia (DRG). These studies used multiple linear regression (MLR) methods to generate estimates of limb position and velocity based on a weighted sum of firing rates in an ensemble of simultaneously recorded DRG neurons. The aim of this study was to test whether the use of a neuro-fuzzy (NF) algorithm (the generalized dynamic fuzzy neural networks (GD-FNN)) could improve the performance, robustness and ability to generalize from training to test sets compared to the MLR technique. NF and MLR decoding methods were applied to ensemble DRG recordings obtained during passive and active limb movements in anesthetized and freely moving cats. The GD-FNN model provided more accurate estimates of limb state and generalized better to novel movement patterns. Future efforts will focus on implementing these neural recording and decoding methods in real time to provide closed-loop control of FES using the information extracted from sensory neurons.

  3. Comparison of artificial intelligence techniques for prediction of soil temperatures in Turkey

    Science.gov (United States)

    Citakoglu, Hatice

    2017-10-01

    Soil temperature is a meteorological data directly affecting the formation and development of plants of all kinds. Soil temperatures are usually estimated with various models including the artificial neural networks (ANNs), adaptive neuro-fuzzy inference system (ANFIS), and multiple linear regression (MLR) models. Soil temperatures along with other climate data are recorded by the Turkish State Meteorological Service (MGM) at specific locations all over Turkey. Soil temperatures are commonly measured at 5-, 10-, 20-, 50-, and 100-cm depths below the soil surface. In this study, the soil temperature data in monthly units measured at 261 stations in Turkey having records of at least 20 years were used to develop relevant models. Different input combinations were tested in the ANN and ANFIS models to estimate soil temperatures, and the best combination of significant explanatory variables turns out to be monthly minimum and maximum air temperatures, calendar month number, depth of soil, and monthly precipitation. Next, three standard error terms (mean absolute error (MAE, °C), root mean squared error (RMSE, °C), and determination coefficient ( R 2 )) were employed to check the reliability of the test data results obtained through the ANN, ANFIS, and MLR models. ANFIS (RMSE 1.99; MAE 1.09; R 2 0.98) is found to outperform both ANN and MLR (RMSE 5.80, 8.89; MAE 1.89, 2.36; R 2 0.93, 0.91) in estimating soil temperature in Turkey.

  4. Identifying Risk Factors for Drug Use in an Iranian Treatment Sample: A Prediction Approach Using Decision Trees.

    Science.gov (United States)

    Amirabadizadeh, Alireza; Nezami, Hossein; Vaughn, Michael G; Nakhaee, Samaneh; Mehrpour, Omid

    2018-05-12

    Substance abuse exacts considerable social and health care burdens throughout the world. The aim of this study was to create a prediction model to better identify risk factors for drug use. A prospective cross-sectional study was conducted in South Khorasan Province, Iran. Of the total of 678 eligible subjects, 70% (n: 474) were randomly selected to provide a training set for constructing decision tree and multiple logistic regression (MLR) models. The remaining 30% (n: 204) were employed in a holdout sample to test the performance of the decision tree and MLR models. Predictive performance of different models was analyzed by the receiver operating characteristic (ROC) curve using the testing set. Independent variables were selected from demographic characteristics and history of drug use. For the decision tree model, the sensitivity and specificity for identifying people at risk for drug abuse were 66% and 75%, respectively, while the MLR model was somewhat less effective at 60% and 73%. Key independent variables in the analyses included first substance experience, age at first drug use, age, place of residence, history of cigarette use, and occupational and marital status. While study findings are exploratory and lack generalizability they do suggest that the decision tree model holds promise as an effective classification approach for identifying risk factors for drug use. Convergent with prior research in Western contexts is that age of drug use initiation was a critical factor predicting a substance use disorder.

  5. Logistic regression applied to natural hazards: rare event logistic regression with replications

    Directory of Open Access Journals (Sweden)

    M. Guns

    2012-06-01

    Full Text Available Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.

  6. A simple approach to power and sample size calculations in logistic regression and Cox regression models.

    Science.gov (United States)

    Vaeth, Michael; Skovlund, Eva

    2004-06-15

    For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.

  7. Estimation of active pharmaceutical ingredients content using locally weighted partial least squares and statistical wavelength selection.

    OpenAIRE

    Kim, Sanghong; Kano, Manabu; Nakagawa, Hiroshi; Hasebe, Shinji

    2011-01-01

    Development of quality estimation models using near infrared spectroscopy (NIRS) and multivariate analysis has been accelerated as a process analytical technology (PAT) tool in the pharmaceutical industry. Although linear regression methods such as partial least squares (PLS) are widely used, they cannot always achieve high estimation accuracy because physical and chemical properties of a measuring object have a complex effect on NIR spectra. In this research, locally weighted PLS (LW-PLS) wh...

  8. Tutorial on Online Partial Evaluation

    Directory of Open Access Journals (Sweden)

    William R. Cook

    2011-09-01

    Full Text Available This paper is a short tutorial introduction to online partial evaluation. We show how to write a simple online partial evaluator for a simple, pure, first-order, functional programming language. In particular, we show that the partial evaluator can be derived as a variation on a compositionally defined interpreter. We demonstrate the use of the resulting partial evaluator for program optimization in the context of model-driven development.

  9. Constructing high-accuracy intermolecular potential energy surface with multi-dimension Morse/Long-Range model

    Science.gov (United States)

    Zhai, Yu; Li, Hui; Le Roy, Robert J.

    2018-04-01

    Spectroscopically accurate Potential Energy Surfaces (PESs) are fundamental for explaining and making predictions of the infrared and microwave spectra of van der Waals (vdW) complexes, and the model used for the potential energy function is critically important for providing accurate, robust and portable analytical PESs. The Morse/Long-Range (MLR) model has proved to be one of the most general, flexible and accurate one-dimensional (1D) model potentials, as it has physically meaningful parameters, is flexible, smooth and differentiable everywhere, to all orders and extrapolates sensibly at both long and short ranges. The Multi-Dimensional Morse/Long-Range (mdMLR) potential energy model described herein is based on that 1D MLR model, and has proved to be effective and accurate in the potentiology of various types of vdW complexes. In this paper, we review the current status of development of the mdMLR model and its application to vdW complexes. The future of the mdMLR model is also discussed. This review can serve as a tutorial for the construction of an mdMLR PES.

  10. Utilização de regressão multivariada para avaliação espectrofotométrica da demanda química de oxigênio em amostras de relevância ambiental Use of multivariate regression in spectrophotometric evaluation of chemical oxigen demand in samples of environmental relevance

    Directory of Open Access Journals (Sweden)

    Patricio Peralta-Zamora

    2005-10-01

    Full Text Available In this work, a partial least squares regression routine was used to develop a multivariate calibration model to predict the chemical oxygen demand (COD in substrates of environmental relevance (paper effluents and landfill leachates from UV-Vis spectral data. The calibration models permit the fast determination of the COD with typical relative errors lower by 10% with respect to the conventional methodology.

  11. Eslicarbazepine acetate add-on for drug-resistant partial epilepsy.

    Science.gov (United States)

    Chang, Xian-Chao; Yuan, Hai; Wang, Yi; Xu, Hui-Qin; Hong, Wen-Ke; Zheng, Rong-Yuan

    2017-10-25

    This is an updated version of the Cochrane Review published in the Cochrane Library 2011, Issue 12.The majority of people with epilepsy have a good prognosis, but up to 30% of people continue to have seizures despite several regimens of antiepileptic drugs. In this review, we summarized the current evidence regarding eslicarbazepine acetate (ESL) when used as an add-on treatment for drug-resistant partial epilepsy. To evaluate the efficacy and tolerability of ESL when used as an add-on treatment for people with drug-resistant partial epilepsy. The searches for the original review were run in November 2011. Subsequently, we searched the Cochrane Epilepsy Group Specialized Register (6 December 2016), the Cochrane Central Register of Controlled Trials (CENTRAL 2016, Issue 11) and MEDLINE (1946 to 6 December 2016). There were no language restrictions. We reviewed the reference lists of retrieved studies to search for additional reports of relevant studies. We also contacted the manufacturers of ESL and experts in the field for information about any unpublished or ongoing studies. Randomized placebo controlled double-blind add-on trials of ESL in people with drug-resistant partial epilepsy. Two review authors independently selected trials for inclusion and extracted data. Outcomes investigated included 50% or greater reduction in seizure frequency, seizure freedom, treatment withdrawal, adverse effects, and drug interactions. Primary analyses were by intention to treat (ITT). The dose-response relationship was evaluated in regression models. We included five trials (1799 participants) rated at low risk of bias; all studies were funded by BIAL. The overall risk ratio (RR) with 95% confidence interval (CI) for 50% or greater reduction in seizure frequency was 1.71 (95% CI 1.42 to 2.05). Dose regression analysis showed evidence that ESL reduced seizure frequency with an increase in efficacy with increasing doses of ESL. ESL was significantly associated with seizure freedom

  12. Post-processing through linear regression

    Science.gov (United States)

    van Schaeybroeck, B.; Vannitsem, S.

    2011-03-01

    Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS) method, a new time-dependent Tikhonov regularization (TDTR) method, the total least-square method, a new geometric-mean regression (GM), a recently introduced error-in-variables (EVMOS) method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified. These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise). At long lead times the regression schemes (EVMOS, TDTR) which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.

  13. Determinants of Childhood Obesity in Representative Sample of Children in North East of Iran

    Directory of Open Access Journals (Sweden)

    Fereshteh Baygi

    2012-01-01

    Full Text Available Childhood obesity has become, a global public health problem, and epidemiological studies are important to identify its determinants in different populations. This study aimed to investigate factors associated with obesity in a representative sample of children in Neishabour, Iran. This study was conducted among 1500 randomly selected 6–12-year-old students from urban areas of Neishabour, northeast of Iran. Then, through a case-control study, 114 obese (BMI≥95th percentile of Iranian reference children were selected as the case group and were compared with 102 controls (15th≤BMI<85th percentile. Factors suggested to be associated with weight status were investigated, for example, parental obesity, child physical activity levels, socio-economic status (SES, and so forth. The analysis was conducted using univariate and multivariate logistic regression (MLR in SPSS version 16. In univariate logistic regression model, birth weight, birth order, family extension, TV watching, sleep duration, physical activity, parents’ job, parents’ education, parental obesity history, and SES were significantly associated with children’s obesity. After MLR analysis, physical activity and parental obesity history remained statistically significant in the model. Our findings showed that physical activity and parental obesity history are the most important determinants for childhood obesity in our population. This finding should be considered in implementation of preventive interventions.

  14. Regression modeling methods, theory, and computation with SAS

    CERN Document Server

    Panik, Michael

    2009-01-01

    Regression Modeling: Methods, Theory, and Computation with SAS provides an introduction to a diverse assortment of regression techniques using SAS to solve a wide variety of regression problems. The author fully documents the SAS programs and thoroughly explains the output produced by the programs.The text presents the popular ordinary least squares (OLS) approach before introducing many alternative regression methods. It covers nonparametric regression, logistic regression (including Poisson regression), Bayesian regression, robust regression, fuzzy regression, random coefficients regression,

  15. Better Autologistic Regression

    Directory of Open Access Journals (Sweden)

    Mark A. Wolters

    2017-11-01

    Full Text Available Autologistic regression is an important probability model for dichotomous random variables observed along with covariate information. It has been used in various fields for analyzing binary data possessing spatial or network structure. The model can be viewed as an extension of the autologistic model (also known as the Ising model, quadratic exponential binary distribution, or Boltzmann machine to include covariates. It can also be viewed as an extension of logistic regression to handle responses that are not independent. Not all authors use exactly the same form of the autologistic regression model. Variations of the model differ in two respects. First, the variable coding—the two numbers used to represent the two possible states of the variables—might differ. Common coding choices are (zero, one and (minus one, plus one. Second, the model might appear in either of two algebraic forms: a standard form, or a recently proposed centered form. Little attention has been paid to the effect of these differences, and the literature shows ambiguity about their importance. It is shown here that changes to either coding or centering in fact produce distinct, non-nested probability models. Theoretical results, numerical studies, and analysis of an ecological data set all show that the differences among the models can be large and practically significant. Understanding the nature of the differences and making appropriate modeling choices can lead to significantly improved autologistic regression analyses. The results strongly suggest that the standard model with plus/minus coding, which we call the symmetric autologistic model, is the most natural choice among the autologistic variants.

  16. Use of generalized ordered logistic regression for the analysis of multidrug resistance data.

    Science.gov (United States)

    Agga, Getahun E; Scott, H Morgan

    2015-10-01

    Statistical analysis of antimicrobial resistance data largely focuses on individual antimicrobial's binary outcome (susceptible or resistant). However, bacteria are becoming increasingly multidrug resistant (MDR). Statistical analysis of MDR data is mostly descriptive often with tabular or graphical presentations. Here we report the applicability of generalized ordinal logistic regression model for the analysis of MDR data. A total of 1,152 Escherichia coli, isolated from the feces of weaned pigs experimentally supplemented with chlortetracycline (CTC) and copper, were tested for susceptibilities against 15 antimicrobials and were binary classified into resistant or susceptible. The 15 antimicrobial agents tested were grouped into eight different antimicrobial classes. We defined MDR as the number of antimicrobial classes to which E. coli isolates were resistant ranging from 0 to 8. Proportionality of the odds assumption of the ordinal logistic regression model was violated only for the effect of treatment period (pre-treatment, during-treatment and post-treatment); but not for the effect of CTC or copper supplementation. Subsequently, a partially constrained generalized ordinal logistic model was built that allows for the effect of treatment period to vary while constraining the effects of treatment (CTC and copper supplementation) to be constant across the levels of MDR classes. Copper (Proportional Odds Ratio [Prop OR]=1.03; 95% CI=0.73-1.47) and CTC (Prop OR=1.1; 95% CI=0.78-1.56) supplementation were not significantly associated with the level of MDR adjusted for the effect of treatment period. MDR generally declined over the trial period. In conclusion, generalized ordered logistic regression can be used for the analysis of ordinal data such as MDR data when the proportionality assumptions for ordered logistic regression are violated. Published by Elsevier B.V.

  17. Semiparametric regression during 2003–2007

    KAUST Repository

    Ruppert, David; Wand, M.P.; Carroll, Raymond J.

    2009-01-01

    Semiparametric regression is a fusion between parametric regression and nonparametric regression that integrates low-rank penalized splines, mixed model and hierarchical Bayesian methodology – thus allowing more streamlined handling of longitudinal and spatial correlation. We review progress in the field over the five-year period between 2003 and 2007. We find semiparametric regression to be a vibrant field with substantial involvement and activity, continual enhancement and widespread application.

  18. Unbalanced Regressions and the Predictive Equation

    DEFF Research Database (Denmark)

    Osterrieder, Daniela; Ventosa-Santaulària, Daniel; Vera-Valdés, J. Eduardo

    Predictive return regressions with persistent regressors are typically plagued by (asymptotically) biased/inconsistent estimates of the slope, non-standard or potentially even spurious statistical inference, and regression unbalancedness. We alleviate the problem of unbalancedness in the theoreti......Predictive return regressions with persistent regressors are typically plagued by (asymptotically) biased/inconsistent estimates of the slope, non-standard or potentially even spurious statistical inference, and regression unbalancedness. We alleviate the problem of unbalancedness...

  19. Comparison of multinomial logistic regression and logistic regression: which is more efficient in allocating land use?

    Science.gov (United States)

    Lin, Yingzhi; Deng, Xiangzheng; Li, Xing; Ma, Enjun

    2014-12-01

    Spatially explicit simulation of land use change is the basis for estimating the effects of land use and cover change on energy fluxes, ecology and the environment. At the pixel level, logistic regression is one of the most common approaches used in spatially explicit land use allocation models to determine the relationship between land use and its causal factors in driving land use change, and thereby to evaluate land use suitability. However, these models have a drawback in that they do not determine/allocate land use based on the direct relationship between land use change and its driving factors. Consequently, a multinomial logistic regression method was introduced to address this flaw, and thereby, judge the suitability of a type of land use in any given pixel in a case study area of the Jiangxi Province, China. A comparison of the two regression methods indicated that the proportion of correctly allocated pixels using multinomial logistic regression was 92.98%, which was 8.47% higher than that obtained using logistic regression. Paired t-test results also showed that pixels were more clearly distinguished by multinomial logistic regression than by logistic regression. In conclusion, multinomial logistic regression is a more efficient and accurate method for the spatial allocation of land use changes. The application of this method in future land use change studies may improve the accuracy of predicting the effects of land use and cover change on energy fluxes, ecology, and environment.

  20. Interpretation of commonly used statistical regression models.

    Science.gov (United States)

    Kasza, Jessica; Wolfe, Rory

    2014-01-01

    A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.

  1. Linear regression

    CERN Document Server

    Olive, David J

    2017-01-01

    This text covers both multiple linear regression and some experimental design models. The text uses the response plot to visualize the model and to detect outliers, does not assume that the error distribution has a known parametric distribution, develops prediction intervals that work when the error distribution is unknown, suggests bootstrap hypothesis tests that may be useful for inference after variable selection, and develops prediction regions and large sample theory for the multivariate linear regression model that has m response variables. A relationship between multivariate prediction regions and confidence regions provides a simple way to bootstrap confidence regions. These confidence regions often provide a practical method for testing hypotheses. There is also a chapter on generalized linear models and generalized additive models. There are many R functions to produce response and residual plots, to simulate prediction intervals and hypothesis tests, to detect outliers, and to choose response trans...

  2. Regression modeling of ground-water flow

    Science.gov (United States)

    Cooley, R.L.; Naff, R.L.

    1985-01-01

    Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)

  3. Partially Observed Mixtures of IRT Models: An Extension of the Generalized Partial-Credit Model

    Science.gov (United States)

    Von Davier, Matthias; Yamamoto, Kentaro

    2004-01-01

    The generalized partial-credit model (GPCM) is used frequently in educational testing and in large-scale assessments for analyzing polytomous data. Special cases of the generalized partial-credit model are the partial-credit model--or Rasch model for ordinal data--and the two parameter logistic (2PL) model. This article extends the GPCM to the…

  4. Intermittent Metronomic Drug Schedule Is Essential for Activating Antitumor Innate Immunity and Tumor Xenograft Regression

    Directory of Open Access Journals (Sweden)

    Chong-Sheng Chen

    2014-01-01

    Full Text Available Metronomic chemotherapy using cyclophosphamide (CPA is widely associated with antiangiogenesis; however, recent studies implicate other immune-based mechanisms, including antitumor innate immunity, which can induce major tumor regression in implanted brain tumor models. This study demonstrates the critical importance of drug schedule: CPA induced a potent antitumor innate immune response and tumor regression when administered intermittently on a 6-day repeating metronomic schedule but not with the same total exposure to activated CPA administered on an every 3-day schedule or using a daily oral regimen that serves as the basis for many clinical trials of metronomic chemotherapy. Notably, the more frequent metronomic CPA schedules abrogated the antitumor innate immune and therapeutic responses. Further, the innate immune response and antitumor activity both displayed an unusually steep dose-response curve and were not accompanied by antiangiogenesis. The strong recruitment of innate immune cells by the 6-day repeating CPA schedule was not sustained, and tumor regression was abolished, by a moderate (25% reduction in CPA dose. Moreover, an ~20% increase in CPA dose eliminated the partial tumor regression and weak innate immune cell recruitment seen in a subset of the every 6-day treated tumors. Thus, metronomic drug treatment must be at a sufficiently high dose but also sufficiently well spaced in time to induce strong sustained antitumor immune cell recruitment. Many current clinical metronomic chemotherapeutic protocols employ oral daily low-dose schedules that do not meet these requirements, suggesting that they may benefit from optimization designed to maximize antitumor immune responses.

  5. Partial lesions of the intratemporal segment of the facial nerve: graft versus partial reconstruction.

    Science.gov (United States)

    Bento, Ricardo F; Salomone, Raquel; Brito, Rubens; Tsuji, Robinson K; Hausen, Mariana

    2008-09-01

    In cases of partial lesions of the intratemporal segment of the facial nerve, should the surgeon perform an intraoperative partial reconstruction, or partially remove the injured segment and place a graft? We present results from partial lesion reconstruction on the intratemporal segment of the facial nerve. A retrospective study on 42 patients who presented partial lesions on the intratemporal segment of the facial nerve was performed between 1988 and 2005. The patients were divided into 3 groups based on the procedure used: interposition of the partial graft on the injured area of the nerve (group 1; 12 patients); keeping the preserved part and performing tubulization (group 2; 8 patients); and dividing the parts of the injured nerve (proximal and distal) and placing a total graft of the sural nerve (group 3; 22 patients). Fracture of the temporal bone was the most frequent cause of the lesion in all groups, followed by iatrogenic causes (p lesion of the facial nerve is still questionable. Among these 42 patients, the best results were those from the total graft of the facial nerve.

  6. Partial twisting for scalar mesons

    International Nuclear Information System (INIS)

    Agadjanov, Dimitri; Meißner, Ulf-G.; Rusetsky, Akaki

    2014-01-01

    The possibility of imposing partially twisted boundary conditions is investigated for the scalar sector of lattice QCD. According to the commonly shared belief, the presence of quark-antiquark annihilation diagrams in the intermediate state generally hinders the use of the partial twisting. Using effective field theory techniques in a finite volume, and studying the scalar sector of QCD with total isospin I=1, we however demonstrate that partial twisting can still be performed, despite the fact that annihilation diagrams are present. The reason for this are delicate cancellations, which emerge due to the graded symmetry in partially quenched QCD with valence, sea and ghost quarks. The modified Lüscher equation in case of partial twisting is given

  7. Predictive model of Amorphophallus muelleri growth in some agroforestry in East Java by multiple regression analysis

    Directory of Open Access Journals (Sweden)

    BUDIMAN

    2012-01-01

    Full Text Available Budiman, Arisoesilaningsih E. 2012. Predictive model of Amorphophallus muelleri growth in some agroforestry in East Java by multiple regression analysis. Biodiversitas 13: 18-22. The aims of this research was to determine the multiple regression models of vegetative and corm growth of Amorphophallus muelleri Blume in some age variations and habitat conditions of agroforestry in East Java. Descriptive exploratory research method was conducted by systematic random sampling at five agroforestries on four plantations in East Java: Saradan, Bojonegoro, Nganjuk and Blitar. In each agroforestry, we observed A. muelleri vegetative and corm growth on four growing age (1, 2, 3 and 4 years old respectively as well as environmental variables such as altitude, vegetation, climate and soil conditions. Data were analyzed using descriptive statistics to compare A. muelleri habitat in five agroforestries. Meanwhile, the influence and contribution of each environmental variable to the growth of A. muelleri vegetative and corm were determined using multiple regression analysis of SPSS 17.0. The multiple regression models of A. muelleri vegetative and corm growth were generated based on some characteristics of agroforestries and age showed high validity with R2 = 88-99%. Regression model showed that age, monthly temperatures, percentage of radiation and soil calcium (Ca content either simultaneously or partially determined the growth of A. muelleri vegetative and corm. Based on these models, the A. muelleri corm reached the optimal growth after four years of cultivation and they will be ready to be harvested. Additionally, the soil Ca content should reach 25.3 me.hg-1 as Sugihwaras agroforestry, with the maximal radiation of 60%.

  8. Post-processing through linear regression

    Directory of Open Access Journals (Sweden)

    B. Van Schaeybroeck

    2011-03-01

    Full Text Available Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS method, a new time-dependent Tikhonov regularization (TDTR method, the total least-square method, a new geometric-mean regression (GM, a recently introduced error-in-variables (EVMOS method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified.

    These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise. At long lead times the regression schemes (EVMOS, TDTR which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.

  9. Logistic regression applied to natural hazards: rare event logistic regression with replications

    OpenAIRE

    Guns, M.; Vanacker, Veerle

    2012-01-01

    Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logisti...

  10. Partial order infinitary term rewriting

    DEFF Research Database (Denmark)

    Bahr, Patrick

    2014-01-01

    We study an alternative model of infinitary term rewriting. Instead of a metric on terms, a partial order on partial terms is employed to formalise convergence of reductions. We consider both a weak and a strong notion of convergence and show that the metric model of convergence coincides with th...... to the metric setting -- orthogonal systems are both infinitarily confluent and infinitarily normalising in the partial order setting. The unique infinitary normal forms that the partial order model admits are Böhm trees....

  11. Beginning partial differential equations

    CERN Document Server

    O'Neil, Peter V

    2011-01-01

    A rigorous, yet accessible, introduction to partial differential equations-updated in a valuable new edition Beginning Partial Differential Equations, Second Edition provides a comprehensive introduction to partial differential equations (PDEs) with a special focus on the significance of characteristics, solutions by Fourier series, integrals and transforms, properties and physical interpretations of solutions, and a transition to the modern function space approach to PDEs. With its breadth of coverage, this new edition continues to present a broad introduction to the field, while also addres

  12. Do fibrin sealants impact negative outcomes after robot-assisted partial nephrectomy?

    Science.gov (United States)

    Cohen, Jason; Jayram, Gautam; Mullins, Jeffrey K; Ball, Mark W; Allaf, Mohamad E

    2013-10-01

    Contemporary rates of postoperative hemorrhage after partial nephrectomy (PN) are low. Commercially available hemostatic agents are commonly used during this surgery to reduce this risk despite a paucity of data supporting the practice. We assessed the impact of fibrin sealant hemostatic agents, a costly addition to surgeries, during robot-assisted partial nephrectomy (RAPN). Between 2007 and 2011, 114 consecutive patients underwent RAPN by a single surgeon (MEA). Evicel fibrin sealant was used in the first 74 patients during renorraphy. The last 40 patients had renorraphy performed without the use of any hemostatic agents. Clinicopathologic, operative, and complication data were compared between groups. Multivariate and univariate logistic regression analysis was performed to test the association between the use of fibrin sealants and operative outcomes. Patient demographic data and clinical tumor characteristics were similar between groups. The use of fibrin sealant did not increase operative time (166.3 vs 176.1 minutes, P=0.28), warm ischemia time (WIT) (14.4 vs 16.1 minutes, P=0.18), or length of hospital stay (2.6 vs 2.4 days, P=0.35). The omission of these agents did not increase estimated blood loss (116.6 vs 176.1 mL, P=0.8) or postoperative blood transfusion (0% vs 2.5%, P=0.17). Univariate analysis demonstrated no association between use of fibrin sealants and increased complications (P>0.05). Multivariable logistic regression showed no statistically significant predictive value of omission of hemostatic agents for perioperative outcomes (P>0.05). Perioperative hemorrhage and other major complications after contemporary RAPN are rare in experienced hands. In our study, the use of fibrin sealants during RAPN does not decrease the rate of complications, blood loss, or hospital stay. Furthermore, no impact is seen on operative time, WIT, or other negative outcomes. Omitting these agents during RAPN could be a safe, effective, cost-saving measure.

  13. A Seemingly Unrelated Poisson Regression Model

    OpenAIRE

    King, Gary

    1989-01-01

    This article introduces a new estimator for the analysis of two contemporaneously correlated endogenous event count variables. This seemingly unrelated Poisson regression model (SUPREME) estimator combines the efficiencies created by single equation Poisson regression model estimators and insights from "seemingly unrelated" linear regression models.

  14. A case of succinic semialdehyde dehydrogenase deficiency with status epilepticus and rapid regression.

    Science.gov (United States)

    Horino, Asako; Kawawaki, Hisashi; Fukuoka, Masataka; Tsuji, Hitomi; Hattori, Yuka; Inoue, Takeshi; Nukui, Megumi; Kuki, Ichiro; Okazaki, Shin; Tomiwa, Kiyotaka; Hirose, Shinichi

    2016-10-01

    Clinical phenotypic expression of SSADH deficiency is highly heterogeneous, and some infants may develop refractory secondary generalized seizures. A 9-month-old boy manifested partial seizures, developing severe status epilepticus, and conventional antiepileptic drugs were ineffective. Use of ketamine contributed to the control of status epilepticus, achieving a reduction in frequency of partial seizures, and improving EEG findings without apparent complications. Diffusion-weighted images showed hyperintensities in the bilateral basal ganglia and fornix, and multiple T2 hyperintensity lesions were detected. (123)I-iomazenil (IMZ) SPECT revealed a decrease in binding of (123)I-iomazenil predominantly in the left temporal region by the 18th day of hospitalization. However, repeated IMZ-SPECT on the 46th day of hospitalization demonstrated almost no accumulation across a broad region, sparing the left temporal region. The patient showed rapid regression, refractory myoclonus, and severe progressive brain atrophy. IMZ-SPECT findings demonstrated reduced benzodiazepine receptor binding and its dynamic changes in an SSADH-deficient patient. Considering the down regulation of the GABAA receptor, ketamine should be included in pharmacotherapeutic strategies for treatment of refractory status epilepticus in SSADH-deficient patients. Copyright © 2016 The Japanese Society of Child Neurology. Published by Elsevier B.V. All rights reserved.

  15. Recursive Algorithm For Linear Regression

    Science.gov (United States)

    Varanasi, S. V.

    1988-01-01

    Order of model determined easily. Linear-regression algorithhm includes recursive equations for coefficients of model of increased order. Algorithm eliminates duplicative calculations, facilitates search for minimum order of linear-regression model fitting set of data satisfactory.

  16. Use of Artificial Neural Network Models to Predict Indicator Organism Concentrations in an Urban Watershed

    Science.gov (United States)

    Mas, D. M.; Ahlfeld, D. P.

    2004-05-01

    Forecasting stream water quality is important for numerous aspects of resource protection and management. Fecal coliform and enteroccocus are primary indicator organisms used to assess potential pathogen contamination. Consequently, modeling the occurrence and concentration of fecal coliform and enterococcus is an important tool in watershed management. In addition, analyzing the relationship between model input and predicted indicator organisms is useful for elucidating possible sources of contamination and mechanisms of transport. While many process-based, statistical, and empirical models exist for water quality prediction, artificial neural network (ANN) models are increasingly being used for forecasting of water resources variables because ANNs are often capable of modeling complex systems for which behavioral rules are either unknown or difficult to simulate. The performance of ANNs compared to more established modeling approaches such as multiple linear regression (MLR) remains an importance research question. Data collected the U.S. Geological Survey in the lower Charles River in Massachusetts, USA in 1999-2000 was examined to determine correlation between various water quality constituents and indicator organisms and to explore the relationship between rainfall characteristics and indicator organism concentrations. Using the results of the statistical analysis to guide the selection of explanatory variables, MLR was performed to develop predictive equations for wet weather and dry weather conditions. The results show that the best-performing predictor variables are generally consistent for both indicator organisms considered. In addition, the regression equations show increasing indicator organism concentrations as a function of suspended sediment concentrations and length of time since last precipitation event, suggesting accumulation and wash off as a key mechanism of pathogen transport under wet weather conditions. This research also presents the

  17. Solubility Temperature Dependence Predicted from 2D Structure

    Directory of Open Access Journals (Sweden)

    Alex Avdeef

    2015-12-01

    Full Text Available The objective of the study was to find a computational procedure to normalize solubility data determined at various temperatures (e.g., 10 – 50 oC to values at a “reference” temperature (e.g., 25 °C. A simple procedure was devised to predict enthalpies of solution, ΔHsol, from which the temperature dependence of intrinsic (uncharged form solubility, log S0, could be calculated. As dependent variables, values of ΔHsol at 25 °C were subjected to multiple linear regression (MLR analysis, using melting points (mp and Abraham solvation descriptors. Also, the enthalpy data were subjected to random forest regression (RFR and recursive partition tree (RPT analyses. A total of 626 molecules were examined, drawing on 2040 published solubility values measured at various temperatures, along with 77 direct calori    metric measurements. The three different prediction methods (RFR, RPT, MLR all indicated that the estimated standard deviations in the enthalpy data are 11-15 kJ mol-1, which is concordant with the 10 kJ mol-1 propagation error estimated from solubility measurements (assuming 0.05 log S errors, and consistent with the 7 kJ mol-1 average reproducibility in enthalpy values from interlaboratory replicates. According to the MLR model, higher values of mp, H‑bond acidity, polarizability/dipolarity, and dispersion forces relate to more positive (endothermic enthalpy values. However, molecules that are large and have high H-bond basicity are likely to possess negative (exothermic enthalpies of solution. With log S0 values normalized to 25 oC, it was shown that the interlaboratory average standard deviations in solubility measurement are reduced to 0.06 ‑ 0.17 log unit, with higher errors for the least-soluble druglike molecules. Such improvements in data mining are expected to contribute to more reliable in silico prediction models of solubility for use in drug discovery.

  18. Models for predicting objective function weights in prostate cancer IMRT

    International Nuclear Information System (INIS)

    Boutilier, Justin J.; Lee, Taewoo; Craig, Tim; Sharpe, Michael B.; Chan, Timothy C. Y.

    2015-01-01

    Purpose: To develop and evaluate the clinical applicability of advanced machine learning models that simultaneously predict multiple optimization objective function weights from patient geometry for intensity-modulated radiation therapy of prostate cancer. Methods: A previously developed inverse optimization method was applied retrospectively to determine optimal objective function weights for 315 treated patients. The authors used an overlap volume ratio (OV) of bladder and rectum for different PTV expansions and overlap volume histogram slopes (OVSR and OVSB for the rectum and bladder, respectively) as explanatory variables that quantify patient geometry. Using the optimal weights as ground truth, the authors trained and applied three prediction models: logistic regression (LR), multinomial logistic regression (MLR), and weighted K-nearest neighbor (KNN). The population average of the optimal objective function weights was also calculated. Results: The OV at 0.4 cm and OVSR at 0.1 cm features were found to be the most predictive of the weights. The authors observed comparable performance (i.e., no statistically significant difference) between LR, MLR, and KNN methodologies, with LR appearing to perform the best. All three machine learning models outperformed the population average by a statistically significant amount over a range of clinical metrics including bladder/rectum V53Gy, bladder/rectum V70Gy, and dose to the bladder, rectum, CTV, and PTV. When comparing the weights directly, the LR model predicted bladder and rectum weights that had, on average, a 73% and 74% relative improvement over the population average weights, respectively. The treatment plans resulting from the LR weights had, on average, a rectum V70Gy that was 35% closer to the clinical plan and a bladder V70Gy that was 29% closer, compared to the population average weights. Similar results were observed for all other clinical metrics. Conclusions: The authors demonstrated that the KNN and MLR

  19. Models for predicting objective function weights in prostate cancer IMRT

    Energy Technology Data Exchange (ETDEWEB)

    Boutilier, Justin J., E-mail: j.boutilier@mail.utoronto.ca; Lee, Taewoo [Department of Mechanical and Industrial Engineering, University of Toronto, 5 King’s College Road, Toronto, Ontario M5S 3G8 (Canada); Craig, Tim [Radiation Medicine Program, UHN Princess Margaret Cancer Centre, 610 University of Avenue, Toronto, Ontario M5T 2M9, Canada and Department of Radiation Oncology, University of Toronto, 148 - 150 College Street, Toronto, Ontario M5S 3S2 (Canada); Sharpe, Michael B. [Radiation Medicine Program, UHN Princess Margaret Cancer Centre, 610 University of Avenue, Toronto, Ontario M5T 2M9 (Canada); Department of Radiation Oncology, University of Toronto, 148 - 150 College Street, Toronto, Ontario M5S 3S2 (Canada); Techna Institute for the Advancement of Technology for Health, 124 - 100 College Street, Toronto, Ontario M5G 1P5 (Canada); Chan, Timothy C. Y. [Department of Mechanical and Industrial Engineering, University of Toronto, 5 King’s College Road, Toronto, Ontario M5S 3G8, Canada and Techna Institute for the Advancement of Technology for Health, 124 - 100 College Street, Toronto, Ontario M5G 1P5 (Canada)

    2015-04-15

    Purpose: To develop and evaluate the clinical applicability of advanced machine learning models that simultaneously predict multiple optimization objective function weights from patient geometry for intensity-modulated radiation therapy of prostate cancer. Methods: A previously developed inverse optimization method was applied retrospectively to determine optimal objective function weights for 315 treated patients. The authors used an overlap volume ratio (OV) of bladder and rectum for different PTV expansions and overlap volume histogram slopes (OVSR and OVSB for the rectum and bladder, respectively) as explanatory variables that quantify patient geometry. Using the optimal weights as ground truth, the authors trained and applied three prediction models: logistic regression (LR), multinomial logistic regression (MLR), and weighted K-nearest neighbor (KNN). The population average of the optimal objective function weights was also calculated. Results: The OV at 0.4 cm and OVSR at 0.1 cm features were found to be the most predictive of the weights. The authors observed comparable performance (i.e., no statistically significant difference) between LR, MLR, and KNN methodologies, with LR appearing to perform the best. All three machine learning models outperformed the population average by a statistically significant amount over a range of clinical metrics including bladder/rectum V53Gy, bladder/rectum V70Gy, and dose to the bladder, rectum, CTV, and PTV. When comparing the weights directly, the LR model predicted bladder and rectum weights that had, on average, a 73% and 74% relative improvement over the population average weights, respectively. The treatment plans resulting from the LR weights had, on average, a rectum V70Gy that was 35% closer to the clinical plan and a bladder V70Gy that was 29% closer, compared to the population average weights. Similar results were observed for all other clinical metrics. Conclusions: The authors demonstrated that the KNN and MLR

  20. Hyperbolic partial differential equations

    CERN Document Server

    Witten, Matthew

    1986-01-01

    Hyperbolic Partial Differential Equations III is a refereed journal issue that explores the applications, theory, and/or applied methods related to hyperbolic partial differential equations, or problems arising out of hyperbolic partial differential equations, in any area of research. This journal issue is interested in all types of articles in terms of review, mini-monograph, standard study, or short communication. Some studies presented in this journal include discretization of ideal fluid dynamics in the Eulerian representation; a Riemann problem in gas dynamics with bifurcation; periodic M

  1. Applied regression analysis a research tool

    CERN Document Server

    Pantula, Sastry; Dickey, David

    1998-01-01

    Least squares estimation, when used appropriately, is a powerful research tool. A deeper understanding of the regression concepts is essential for achieving optimal benefits from a least squares analysis. This book builds on the fundamentals of statistical methods and provides appropriate concepts that will allow a scientist to use least squares as an effective research tool. Applied Regression Analysis is aimed at the scientist who wishes to gain a working knowledge of regression analysis. The basic purpose of this book is to develop an understanding of least squares and related statistical methods without becoming excessively mathematical. It is the outgrowth of more than 30 years of consulting experience with scientists and many years of teaching an applied regression course to graduate students. Applied Regression Analysis serves as an excellent text for a service course on regression for non-statisticians and as a reference for researchers. It also provides a bridge between a two-semester introduction to...

  2. Determination and importance of temperature dependence of retention coefficient (RPHPLC) in QSAR model of nitrazepams' partition coefficient in bile acid micelles.

    Science.gov (United States)

    Posa, Mihalj; Pilipović, Ana; Lalić, Mladena; Popović, Jovan

    2011-02-15

    Linear dependence between temperature (t) and retention coefficient (k, reversed phase HPLC) of bile acids is obtained. Parameters (a, intercept and b, slope) of the linear function k=f(t) highly correlate with bile acids' structures. Investigated bile acids form linear congeneric groups on a principal component (calculated from k=f(t)) score plot that are in accordance with conformations of the hydroxyl and oxo groups in a bile acid steroid skeleton. Partition coefficient (K(p)) of nitrazepam in bile acids' micelles is investigated. Nitrazepam molecules incorporated in micelles show modified bioavailability (depo effect, higher permeability, etc.). Using multiple linear regression method QSAR models of nitrazepams' partition coefficient, K(p) are derived on the temperatures of 25°C and 37°C. For deriving linear regression models on both temperatures experimentally obtained lipophilicity parameters are included (PC1 from data k=f(t)) and in silico descriptors of the shape of a molecule while on the higher temperature molecular polarisation is introduced. This indicates the fact that the incorporation mechanism of nitrazepam in BA micelles changes on the higher temperatures. QSAR models are derived using partial least squares method as well. Experimental parameters k=f(t) are shown to be significant predictive variables. Both QSAR models are validated using cross validation and internal validation method. PLS models have slightly higher predictive capability than MLR models. Copyright © 2010 Elsevier B.V. All rights reserved.

  3. Standards for Standardized Logistic Regression Coefficients

    Science.gov (United States)

    Menard, Scott

    2011-01-01

    Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a…

  4. [Application of negative binomial regression and modified Poisson regression in the research of risk factors for injury frequency].

    Science.gov (United States)

    Cao, Qingqing; Wu, Zhenqiang; Sun, Ying; Wang, Tiezhu; Han, Tengwei; Gu, Chaomei; Sun, Yehuan

    2011-11-01

    To Eexplore the application of negative binomial regression and modified Poisson regression analysis in analyzing the influential factors for injury frequency and the risk factors leading to the increase of injury frequency. 2917 primary and secondary school students were selected from Hefei by cluster random sampling method and surveyed by questionnaire. The data on the count event-based injuries used to fitted modified Poisson regression and negative binomial regression model. The risk factors incurring the increase of unintentional injury frequency for juvenile students was explored, so as to probe the efficiency of these two models in studying the influential factors for injury frequency. The Poisson model existed over-dispersion (P Poisson regression and negative binomial regression model, was fitted better. respectively. Both showed that male gender, younger age, father working outside of the hometown, the level of the guardian being above junior high school and smoking might be the results of higher injury frequencies. On a tendency of clustered frequency data on injury event, both the modified Poisson regression analysis and negative binomial regression analysis can be used. However, based on our data, the modified Poisson regression fitted better and this model could give a more accurate interpretation of relevant factors affecting the frequency of injury.

  5. Partial differential equations

    CERN Document Server

    Evans, Lawrence C

    2010-01-01

    This text gives a comprehensive survey of modern techniques in the theoretical study of partial differential equations (PDEs) with particular emphasis on nonlinear equations. The exposition is divided into three parts: representation formulas for solutions; theory for linear partial differential equations; and theory for nonlinear partial differential equations. Included are complete treatments of the method of characteristics; energy methods within Sobolev spaces; regularity for second-order elliptic, parabolic, and hyperbolic equations; maximum principles; the multidimensional calculus of variations; viscosity solutions of Hamilton-Jacobi equations; shock waves and entropy criteria for conservation laws; and, much more.The author summarizes the relevant mathematics required to understand current research in PDEs, especially nonlinear PDEs. While he has reworked and simplified much of the classical theory (particularly the method of characteristics), he primarily emphasizes the modern interplay between funct...

  6. Separation of anthropogenic CO{sub 2} in the North Atlantic - methodological developments and measurements; Separation von anthropogenem CO{sub 2} im Nordatlantik - Methodische Entwicklungen und Messungen

    Energy Technology Data Exchange (ETDEWEB)

    Friis, K.

    2001-07-01

    The foci for this thesis were: (1) the development of a fully automated pH-system, and (2) the identification of anthropogenic CO{sub 2} in the subpolar North Atlantic based on measurements using this system. A spectrophotometric pH-system for discrete sea water sample analysis was developed. For the detection of the temporal increase in anthropogenic CO{sub 2}, the statistical method of Wallace (1995) was tested for its applicability in the subpolar gyre. The original method is based on a comparison of historical and recent data sets. For one of the data sets a predictive equation for C{sub T} is derived by multiple linear regression (MLR) based on several independent chemical and hydrographic parameters. The difference between a C{sub T} value measured at a later or earlier time with the C{sub T} value predicted using the MLR-equation can potentially give a measure of the anthropogenic CO{sub 2}-increase between the two sampling periods, independent of hydrographic or biologically-mediated changes within the water column.

  7. Source Apportionment and Risk Assessment of Emerging Contaminants: An Approach of Pharmaco-Signature in Water Systems

    Science.gov (United States)

    Jiang, Jheng Jie; Lee, Chon Lin; Fang, Meng Der; Boyd, Kenneth G.; Gibb, Stuart W.

    2015-01-01

    This paper presents a methodology based on multivariate data analysis for characterizing potential source contributions of emerging contaminants (ECs) detected in 26 river water samples across multi-scape regions during dry and wet seasons. Based on this methodology, we unveil an approach toward potential source contributions of ECs, a concept we refer to as the “Pharmaco-signature.” Exploratory analysis of data points has been carried out by unsupervised pattern recognition (hierarchical cluster analysis, HCA) and receptor model (principal component analysis-multiple linear regression, PCA-MLR) in an attempt to demonstrate significant source contributions of ECs in different land-use zone. Robust cluster solutions grouped the database according to different EC profiles. PCA-MLR identified that 58.9% of the mean summed ECs were contributed by domestic impact, 9.7% by antibiotics application, and 31.4% by drug abuse. Diclofenac, ibuprofen, codeine, ampicillin, tetracycline, and erythromycin-H2O have significant pollution risk quotients (RQ>1), indicating potentially high risk to aquatic organisms in Taiwan. PMID:25874375

  8. Robust modelling of solubility in supercritical carbon dioxide using Bayesian methods.

    Science.gov (United States)

    Tarasova, Anna; Burden, Frank; Gasteiger, Johann; Winkler, David A

    2010-04-01

    Two sparse Bayesian methods were used to derive predictive models of solubility of organic dyes and polycyclic aromatic compounds in supercritical carbon dioxide (scCO(2)), over a wide range of temperatures (285.9-423.2K) and pressures (60-1400 bar): a multiple linear regression employing an expectation maximization algorithm and a sparse prior (MLREM) method and a non-linear Bayesian Regularized Artificial Neural Network with a Laplacian Prior (BRANNLP). A randomly selected test set was used to estimate the predictive ability of the models. The MLREM method resulted in a model of similar predictivity to the less sparse MLR method, while the non-linear BRANNLP method created models of substantially better predictivity than either the MLREM or MLR based models. The BRANNLP method simultaneously generated context-relevant subsets of descriptors and a robust, non-linear quantitative structure-property relationship (QSPR) model for the compound solubility in scCO(2). The differences between linear and non-linear descriptor selection methods are discussed. (c) 2009 Elsevier Inc. All rights reserved.

  9. Lifestyle and oral facial disorders associated with sleep bruxism in children.

    Science.gov (United States)

    Alencar, Nashalie Andrade de; Fernandes, Alline Birra Nolasco; Souza, Margareth Maria Gomes de; Luiz, Ronir Raggio; Fonseca-Gonçalves, Andréa; Maia, Lucianne Cople

    2017-05-01

    The aim of the study was to investigate the routine, sleep history, and orofacial disorders associated with children aged 3-7 years with nocturnal bruxism. Children (n = 66) were divided into groups of parent reported nocturnal bruxism (n = 34) and those without the disorder (n = 32). Data about the child's routine during the day, during sleep and awakening, headache frequency, temporomandibular joint (TMJ), and hearing impairments were obtained through interviews with parents/caregivers. Electromyography examination was used to assess the activity of facial muscles. Multiple logistic regression (MLR), chi-square test, and t-test analyses were performed. MLR revealed association of nightmares (p = 0.002; OR = 18.09) and snoring (p = 0.013; OR = 0.14) with bruxism. Variables related to awakening revealed an association with bruxism (p bruxism) reported more complaints of orofacial pain, facial appearance, and headache occurrence (p  0.05). Nightmares and snoring are associated with nocturnal bruxism in children. Bruxism in children elicits consequences such as headache, orofacial pain, and pain related to awakening.

  10. Using machine learning and quantum chemistry descriptors to predict the toxicity of ionic liquids.

    Science.gov (United States)

    Cao, Lingdi; Zhu, Peng; Zhao, Yongsheng; Zhao, Jihong

    2018-06-15

    Large-scale application of ionic liquids (ILs) hinges on the advancement of designable and eco-friendly nature. Research of the potential toxicity of ILs towards different organisms and trophic levels is insufficient. Quantitative structure-activity relationships (QSAR) model is applied to evaluate the toxicity of ILs towards the leukemia rat cell line (ICP-81). The structures of 57 cations and 21 anions were optimized by quantum chemistry. The electrostatic potential surface area (S EP ) and charge distribution area (S σ-profile ) descriptors are calculated and used to predict the toxicity of ILs. The performance and predictive aptitude of extreme learning machine (ELM) model are analyzed and compared with those of multiple linear regression (MLR) and support vector machine (SVM) models. The highest R 2 and the lowest AARD% and RMSE of the training set, test set and total set for the ELM are observed, which validates the superior performance of the ELM than that of obtained by the MLR and SVM. The applicability domain of the model is assessed by the Williams plot. Copyright © 2018 Elsevier B.V. All rights reserved.

  11. Logistic regression for dichotomized counts.

    Science.gov (United States)

    Preisser, John S; Das, Kalyan; Benecha, Habtamu; Stamm, John W

    2016-12-01

    Sometimes there is interest in a dichotomized outcome indicating whether a count variable is positive or zero. Under this scenario, the application of ordinary logistic regression may result in efficiency loss, which is quantifiable under an assumed model for the counts. In such situations, a shared-parameter hurdle model is investigated for more efficient estimation of regression parameters relating to overall effects of covariates on the dichotomous outcome, while handling count data with many zeroes. One model part provides a logistic regression containing marginal log odds ratio effects of primary interest, while an ancillary model part describes the mean count of a Poisson or negative binomial process in terms of nuisance regression parameters. Asymptotic efficiency of the logistic model parameter estimators of the two-part models is evaluated with respect to ordinary logistic regression. Simulations are used to assess the properties of the models with respect to power and Type I error, the latter investigated under both misspecified and correctly specified models. The methods are applied to data from a randomized clinical trial of three toothpaste formulations to prevent incident dental caries in a large population of Scottish schoolchildren. © The Author(s) 2014.

  12. Partial Cooperative Equilibria: Existence and Characterization

    Directory of Open Access Journals (Sweden)

    Amandine Ghintran

    2010-09-01

    Full Text Available We study the solution concepts of partial cooperative Cournot-Nash equilibria and partial cooperative Stackelberg equilibria. The partial cooperative Cournot-Nash equilibrium is axiomatically characterized by using notions of rationality, consistency and converse consistency with regard to reduced games. We also establish sufficient conditions for which partial cooperative Cournot-Nash equilibria and partial cooperative Stackelberg equilibria exist in supermodular games. Finally, we provide an application to strategic network formation where such solution concepts may be useful.

  13. Bayesian ARTMAP for regression.

    Science.gov (United States)

    Sasu, L M; Andonie, R

    2013-10-01

    Bayesian ARTMAP (BA) is a recently introduced neural architecture which uses a combination of Fuzzy ARTMAP competitive learning and Bayesian learning. Training is generally performed online, in a single-epoch. During training, BA creates input data clusters as Gaussian categories, and also infers the conditional probabilities between input patterns and categories, and between categories and classes. During prediction, BA uses Bayesian posterior probability estimation. So far, BA was used only for classification. The goal of this paper is to analyze the efficiency of BA for regression problems. Our contributions are: (i) we generalize the BA algorithm using the clustering functionality of both ART modules, and name it BA for Regression (BAR); (ii) we prove that BAR is a universal approximator with the best approximation property. In other words, BAR approximates arbitrarily well any continuous function (universal approximation) and, for every given continuous function, there is one in the set of BAR approximators situated at minimum distance (best approximation); (iii) we experimentally compare the online trained BAR with several neural models, on the following standard regression benchmarks: CPU Computer Hardware, Boston Housing, Wisconsin Breast Cancer, and Communities and Crime. Our results show that BAR is an appropriate tool for regression tasks, both for theoretical and practical reasons. Copyright © 2013 Elsevier Ltd. All rights reserved.

  14. Mechanisms of neuroblastoma regression

    Science.gov (United States)

    Brodeur, Garrett M.; Bagatell, Rochelle

    2014-01-01

    Recent genomic and biological studies of neuroblastoma have shed light on the dramatic heterogeneity in the clinical behaviour of this disease, which spans from spontaneous regression or differentiation in some patients, to relentless disease progression in others, despite intensive multimodality therapy. This evidence also suggests several possible mechanisms to explain the phenomena of spontaneous regression in neuroblastomas, including neurotrophin deprivation, humoral or cellular immunity, loss of telomerase activity and alterations in epigenetic regulation. A better understanding of the mechanisms of spontaneous regression might help to identify optimal therapeutic approaches for patients with these tumours. Currently, the most druggable mechanism is the delayed activation of developmentally programmed cell death regulated by the tropomyosin receptor kinase A pathway. Indeed, targeted therapy aimed at inhibiting neurotrophin receptors might be used in lieu of conventional chemotherapy or radiation in infants with biologically favourable tumours that require treatment. Alternative approaches consist of breaking immune tolerance to tumour antigens or activating neurotrophin receptor pathways to induce neuronal differentiation. These approaches are likely to be most effective against biologically favourable tumours, but they might also provide insights into treatment of biologically unfavourable tumours. We describe the different mechanisms of spontaneous neuroblastoma regression and the consequent therapeutic approaches. PMID:25331179

  15. Successful removable partial dentures.

    Science.gov (United States)

    Lynch, Christopher D

    2012-03-01

    Removable partial dentures (RPDs) remain a mainstay of prosthodontic care for partially dentate patients. Appropriately designed, they can restore masticatory efficiency, improve aesthetics and speech, and help secure overall oral health. However, challenges remain in providing such treatments, including maintaining adequate plaque control, achieving adequate retention, and facilitating patient tolerance. The aim of this paper is to review the successful provision of RPDs. Removable partial dentures are a successful form of treatment for replacing missing teeth, and can be successfully provided with appropriate design and fabrication concepts in mind.

  16. Using the Ridge Regression Procedures to Estimate the Multiple Linear Regression Coefficients

    Science.gov (United States)

    Gorgees, HazimMansoor; Mahdi, FatimahAssim

    2018-05-01

    This article concerns with comparing the performance of different types of ordinary ridge regression estimators that have been already proposed to estimate the regression parameters when the near exact linear relationships among the explanatory variables is presented. For this situations we employ the data obtained from tagi gas filling company during the period (2008-2010). The main result we reached is that the method based on the condition number performs better than other methods since it has smaller mean square error (MSE) than the other stated methods.

  17. Multicollinearity and Regression Analysis

    Science.gov (United States)

    Daoud, Jamal I.

    2017-12-01

    In regression analysis it is obvious to have a correlation between the response and predictor(s), but having correlation among predictors is something undesired. The number of predictors included in the regression model depends on many factors among which, historical data, experience, etc. At the end selection of most important predictors is something objective due to the researcher. Multicollinearity is a phenomena when two or more predictors are correlated, if this happens, the standard error of the coefficients will increase [8]. Increased standard errors means that the coefficients for some or all independent variables may be found to be significantly different from In other words, by overinflating the standard errors, multicollinearity makes some variables statistically insignificant when they should be significant. In this paper we focus on the multicollinearity, reasons and consequences on the reliability of the regression model.

  18. Panel Smooth Transition Regression Models

    DEFF Research Database (Denmark)

    González, Andrés; Terasvirta, Timo; Dijk, Dick van

    We introduce the panel smooth transition regression model. This new model is intended for characterizing heterogeneous panels, allowing the regression coefficients to vary both across individuals and over time. Specifically, heterogeneity is allowed for by assuming that these coefficients are bou...

  19. 32 CFR 751.13 - Partial payments.

    Science.gov (United States)

    2010-07-01

    ... voucher and all other information related to the partial payment shall be placed in the claim file. Action... 32 National Defense 5 2010-07-01 2010-07-01 false Partial payments. 751.13 Section 751.13 National... Claims Against the United States § 751.13 Partial payments. (a) Partial payments when hardship exists...

  20. [Acrylic resin removable partial dentures].

    Science.gov (United States)

    de Baat, C; Witter, D J; Creugers, N H J

    2011-01-01

    An acrylic resin removable partial denture is distinguished from other types of removable partial dentures by an all-acrylic resin base which is, in principle, solely supported by the edentulous regions of the tooth arch and in the maxilla also by the hard palate. When compared to the other types of removable partial dentures, the acrylic resin removable partial denture has 3 favourable aspects: the economic aspect, its aesthetic quality and the ease with which it can be extended and adjusted. Disadvantages are an increased risk of caries developing, gingivitis, periodontal disease, denture stomatitis, alveolar bone reduction, tooth migration, triggering of the gag reflex and damage to the acrylic resin base. Present-day indications are ofa temporary or palliative nature or are motivated by economic factors. Special varieties of the acrylic resin removable partial denture are the spoon denture, the flexible denture fabricated of non-rigid acrylic resin, and the two-piece sectional denture. Furthermore, acrylic resin removable partial dentures can be supplied with clasps or reinforced by fibers or metal wires.

  1. Credit Scoring Problem Based on Regression Analysis

    OpenAIRE

    Khassawneh, Bashar Suhil Jad Allah

    2014-01-01

    ABSTRACT: This thesis provides an explanatory introduction to the regression models of data mining and contains basic definitions of key terms in the linear, multiple and logistic regression models. Meanwhile, the aim of this study is to illustrate fitting models for the credit scoring problem using simple linear, multiple linear and logistic regression models and also to analyze the found model functions by statistical tools. Keywords: Data mining, linear regression, logistic regression....

  2. Unbalanced Regressions and the Predictive Equation

    DEFF Research Database (Denmark)

    Osterrieder, Daniela; Ventosa-Santaulària, Daniel; Vera-Valdés, J. Eduardo

    Predictive return regressions with persistent regressors are typically plagued by (asymptotically) biased/inconsistent estimates of the slope, non-standard or potentially even spurious statistical inference, and regression unbalancedness. We alleviate the problem of unbalancedness in the theoreti......Predictive return regressions with persistent regressors are typically plagued by (asymptotically) biased/inconsistent estimates of the slope, non-standard or potentially even spurious statistical inference, and regression unbalancedness. We alleviate the problem of unbalancedness...... in the theoretical predictive equation by suggesting a data generating process, where returns are generated as linear functions of a lagged latent I(0) risk process. The observed predictor is a function of this latent I(0) process, but it is corrupted by a fractionally integrated noise. Such a process may arise due...... to aggregation or unexpected level shifts. In this setup, the practitioner estimates a misspecified, unbalanced, and endogenous predictive regression. We show that the OLS estimate of this regression is inconsistent, but standard inference is possible. To obtain a consistent slope estimate, we then suggest...

  3. [From clinical judgment to linear regression model.

    Science.gov (United States)

    Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O

    2013-01-01

    When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R 2 ) indicates the importance of independent variables in the outcome.

  4. Autistic Regression

    Science.gov (United States)

    Matson, Johnny L.; Kozlowski, Alison M.

    2010-01-01

    Autistic regression is one of the many mysteries in the developmental course of autism and pervasive developmental disorders not otherwise specified (PDD-NOS). Various definitions of this phenomenon have been used, further clouding the study of the topic. Despite this problem, some efforts at establishing prevalence have been made. The purpose of…

  5. Ridge regression estimator: combining unbiased and ordinary ridge regression methods of estimation

    Directory of Open Access Journals (Sweden)

    Sharad Damodar Gore

    2009-10-01

    Full Text Available Statistical literature has several methods for coping with multicollinearity. This paper introduces a new shrinkage estimator, called modified unbiased ridge (MUR. This estimator is obtained from unbiased ridge regression (URR in the same way that ordinary ridge regression (ORR is obtained from ordinary least squares (OLS. Properties of MUR are derived. Results on its matrix mean squared error (MMSE are obtained. MUR is compared with ORR and URR in terms of MMSE. These results are illustrated with an example based on data generated by Hoerl and Kennard (1975.

  6. Discriminative Elastic-Net Regularized Linear Regression.

    Science.gov (United States)

    Zhang, Zheng; Lai, Zhihui; Xu, Yong; Shao, Ling; Wu, Jian; Xie, Guo-Sen

    2017-03-01

    In this paper, we aim at learning compact and discriminative linear regression models. Linear regression has been widely used in different problems. However, most of the existing linear regression methods exploit the conventional zero-one matrix as the regression targets, which greatly narrows the flexibility of the regression model. Another major limitation of these methods is that the learned projection matrix fails to precisely project the image features to the target space due to their weak discriminative capability. To this end, we present an elastic-net regularized linear regression (ENLR) framework, and develop two robust linear regression models which possess the following special characteristics. First, our methods exploit two particular strategies to enlarge the margins of different classes by relaxing the strict binary targets into a more feasible variable matrix. Second, a robust elastic-net regularization of singular values is introduced to enhance the compactness and effectiveness of the learned projection matrix. Third, the resulting optimization problem of ENLR has a closed-form solution in each iteration, which can be solved efficiently. Finally, rather than directly exploiting the projection matrix for recognition, our methods employ the transformed features as the new discriminate representations to make final image classification. Compared with the traditional linear regression model and some of its variants, our method is much more accurate in image classification. Extensive experiments conducted on publicly available data sets well demonstrate that the proposed framework can outperform the state-of-the-art methods. The MATLAB codes of our methods can be available at http://www.yongxu.org/lunwen.html.

  7. Simultaneous spectrophotometric determination of crystal violet and malachite green in water samples using partial least squares regression and central composite design after preconcentration by dispersive solid-phase extraction.

    Science.gov (United States)

    Razi-Asrami, Mahboobeh; Ghasemi, Jahan B; Amiri, Nayereh; Sadeghi, Seyed Jamal

    2017-04-01

    In this paper, a simple, fast, and inexpensive method is introduced for the simultaneous spectrophotometric determination of crystal violet (CV) and malachite green (MG) contents in aquatic samples using partial least squares regression (PLS) as a multivariate calibration technique after preconcentration by graphene oxide (GO). The method was based on the sorption and desorption of analytes onto GO and direct determination by ultraviolet-visible spectrophotometric techniques. GO was synthesized according to Hummers method. To characterize the shape and structure of GO, FT-IR, SEM, and XRD were used. The effective factors on the extraction efficiency such as pH, extraction time, and the amount of adsorbent were optimized using central composite design. The optimum values of these factors were 6, 15 min, and 12 mg, respectively. The maximum capacity of GO for the adsorption of CV and MG was 63.17 and 77.02 mg g -1 , respectively. Preconcentration factors and extraction recoveries were obtained and were 19.6, 98% for CV and 20, 100% for MG, respectively. LOD and linear dynamic ranges for CV and MG were 0.009, 0.03-0.3, 0.015, and 0.05-0.5 (μg mL -1 ), respectively. The intra-day and inter-day relative standard deviations were 1.99 and 0.58 for CV and 1.69 and 3.13 for MG at the concentration level of 50 ng mL -1 , respectively. Finally, the proposed DSPE/PLS method was successfully applied for the simultaneous determination of the trace amount of CV and MG in the real water samples.

  8. Estimating Forest Aboveground Biomass by Combining ALOS PALSAR and WorldView-2 Data: A Case Study at Purple Mountain National Park, Nanjing, China

    Directory of Open Access Journals (Sweden)

    Songqiu Deng

    2014-08-01

    Full Text Available Enhanced methods are required for mapping the forest aboveground biomass (AGB over a large area in Chinese forests. This study attempted to develop an improved approach to retrieving biomass by combining PALSAR (Phased Array type L-band Synthetic Aperture Radar and WorldView-2 data. A total of 33 variables with potential correlations with forest biomass were extracted from the above data. However, these parameters had poor fits to the observed biomass. Accordingly, the synergies of several variables were explored to identify improved relationships with the AGB. Using principal component analysis and multivariate linear regression (MLR, the accuracies of the biomass estimates obtained using PALSAR and WorldView-2 data were improved to approximately 65% to 71%. In addition, using the additional dataset developed from the fusion of FBD (fine beam dual-polarization and WorldView-2 data improved the performance to 79% with an RMSE (root mean square error of 35.13 Mg/ha when using the MLR method. Moreover, a further improvement (R2 = 0.89, relative RMSE = 17.08% was obtained by combining all the variables mentioned above. For the purpose of comparison with MLR, a neural network approach was also used to estimate the biomass. However, this approach did not produce significant improvements in the AGB estimates. Consequently, the final MLR model was recommended to map the AGB of the study area. Finally, analyses of estimated error in distinguishing forest types and vertical structures suggested that the RMSE decreases gradually from broad-leaved to coniferous to mixed forest. In terms of different vertical structures (VS, VS3 has a high error because the forest lacks undergrowth trees, while VS4 forest, which has approximately the same amounts of stems in each of the three DBH (diameter at breast height classes (DBH > 20, 10 ≤ DBH ≤ 20, and DBH < 10 cm, has the lowest RMSE. This study demonstrates that the combination of PALSAR and WorldView-2 data

  9. A field survey of the partially edentate elderly: Investigation of factors related to the usage rate of removable partial dentures.

    Science.gov (United States)

    Murai, S; Matsuda, K; Ikebe, K; Enoki, K; Hatta, K; Fujiwara, K; Maeda, Y

    2015-11-01

    Although the shortened dental arch (SDA) concept has been known to all over the world, acceptance of the SDA concept as an oral health standard can be questionable from the patients' point of view, even if it is biologically reasonable. Furthermore, because the health insurance system covers removable partial dentures (RPDs) for all citizens in Japan, SDA patients seem to prefer to receive prosthetic treatment to replace the missing teeth. However, there were few field surveys to investigate the usage rate of RPDs in Japan. The purpose of this study was to determine the usage rate of RPDs in older Japanese subjects and to investigate the factors related to the usage of RPDs. Partially edentate participants (n = 390) were included in this study. Oral examinations were conducted to record several indices. The Cochran-Armitage trend test was used to evaluate the relationship between the number of missing teeth and the usage rate of RPDs. Chi-squared tests and logistic regression analysis were conducted to evaluate the factors related to the usage rate of RPDs. Usage of RPDs had a significantly positive association with the number of missing distal extension teeth and bilaterally missing teeth. The usage rate of RPDs increased as the number of missing distal extension teeth increased (P for trend < 0·001). The conclusion of this study was that participants with missing distal extension teeth had higher usage rates of RPDs than other participants, and the usage rate increased as the number of missing distal extension teeth increased. © 2015 John Wiley & Sons Ltd.

  10. Partial Actions and Power Sets

    Directory of Open Access Journals (Sweden)

    Jesús Ávila

    2013-01-01

    Full Text Available We consider a partial action (X,α with enveloping action (T,β. In this work we extend α to a partial action on the ring (P(X,Δ,∩ and find its enveloping action (E,β. Finally, we introduce the concept of partial action of finite type to investigate the relationship between (E,β and (P(T,β.

  11. Simultaneous determination of penicillin G salts by infrared spectroscopy: Evaluation of combining orthogonal signal correction with radial basis function-partial least squares regression

    Science.gov (United States)

    Talebpour, Zahra; Tavallaie, Roya; Ahmadi, Seyyed Hamid; Abdollahpour, Assem

    2010-09-01

    In this study, a new method for the simultaneous determination of penicillin G salts in pharmaceutical mixture via FT-IR spectroscopy combined with chemometrics was investigated. The mixture of penicillin G salts is a complex system due to similar analytical characteristics of components. Partial least squares (PLS) and radial basis function-partial least squares (RBF-PLS) were used to develop the linear and nonlinear relation between spectra and components, respectively. The orthogonal signal correction (OSC) preprocessing method was used to correct unexpected information, such as spectral overlapping and scattering effects. In order to compare the influence of OSC on PLS and RBF-PLS models, the optimal linear (PLS) and nonlinear (RBF-PLS) models based on conventional and OSC preprocessed spectra were established and compared. The obtained results demonstrated that OSC clearly enhanced the performance of both RBF-PLS and PLS calibration models. Also in the case of some nonlinear relation between spectra and component, OSC-RBF-PLS gave satisfactory results than OSC-PLS model which indicated that the OSC was helpful to remove extrinsic deviations from linearity without elimination of nonlinear information related to component. The chemometric models were tested on an external dataset and finally applied to the analysis commercialized injection product of penicillin G salts.

  12. Categorical regression dose-response modeling

    Science.gov (United States)

    The goal of this training is to provide participants with training on the use of the U.S. EPA’s Categorical Regression soft¬ware (CatReg) and its application to risk assessment. Categorical regression fits mathematical models to toxicity data that have been assigned ord...

  13. Abstract Expression Grammar Symbolic Regression

    Science.gov (United States)

    Korns, Michael F.

    This chapter examines the use of Abstract Expression Grammars to perform the entire Symbolic Regression process without the use of Genetic Programming per se. The techniques explored produce a symbolic regression engine which has absolutely no bloat, which allows total user control of the search space and output formulas, which is faster, and more accurate than the engines produced in our previous papers using Genetic Programming. The genome is an all vector structure with four chromosomes plus additional epigenetic and constraint vectors, allowing total user control of the search space and the final output formulas. A combination of specialized compiler techniques, genetic algorithms, particle swarm, aged layered populations, plus discrete and continuous differential evolution are used to produce an improved symbolic regression sytem. Nine base test cases, from the literature, are used to test the improvement in speed and accuracy. The improved results indicate that these techniques move us a big step closer toward future industrial strength symbolic regression systems.

  14. DFT study on oxidation of HS(CH2) m SH ( m = 1-8) in oxidative desulfurization

    Science.gov (United States)

    Song, Y. Z.; Song, J. J.; Zhao, T. T.; Chen, C. Y.; He, M.; Du, J.

    2016-06-01

    Density functional theory was employed for calculation of HS(CH2) m SH ( m = 1-8) and its derivatives at B3LYP method at 6-31++g ( d, p) level. Using eigenvalues of LUMO and HOMO for HS(CH2) m SH, the standard electrode potentials were estimated by a stepwise multiple regression techniques (MLR), and obtained as E° = 1.500 + 7.167 × 10-3 HOMO-0.229 LUMO with high correlation coefficients of 0.973 and F values of 43.973.

  15. Comparison of Classical Linear Regression and Orthogonal Regression According to the Sum of Squares Perpendicular Distances

    OpenAIRE

    KELEŞ, Taliha; ALTUN, Murat

    2016-01-01

    Regression analysis is a statistical technique for investigating and modeling the relationship between variables. The purpose of this study was the trivial presentation of the equation for orthogonal regression (OR) and the comparison of classical linear regression (CLR) and OR techniques with respect to the sum of squared perpendicular distances. For that purpose, the analyses were shown by an example. It was found that the sum of squared perpendicular distances of OR is smaller. Thus, it wa...

  16. Generalized least squares and empirical Bayes estimation in regional partial duration series index-flood modeling

    DEFF Research Database (Denmark)

    Madsen, Henrik; Rosbjerg, Dan

    1997-01-01

    parameters is inferred from regional data using generalized least squares (GLS) regression. Two different Bayesian T-year event estimators are introduced: a linear estimator that requires only some moments of the prior distributions to be specified and a parametric estimator that is based on specified......A regional estimation procedure that combines the index-flood concept with an empirical Bayes method for inferring regional information is introduced. The model is based on the partial duration series approach with generalized Pareto (GP) distributed exceedances. The prior information of the model...

  17. Relationship Between Preoperative Extrusion of the Medial Meniscus and Surgical Outcomes After Partial Meniscectomy.

    Science.gov (United States)

    Kim, Sung-Jae; Choi, Chong Hyuk; Chun, Yong-Min; Kim, Sung-Hwan; Lee, Su-Keon; Jang, Jinyoung; Jeong, Howon; Jung, Min

    2017-07-01

    No previous study has examined arthritic change after meniscectomy with regard to extrusion of the medial meniscus. (1) To determine the factors related to preoperative meniscal extrusion; (2) to investigate the relationship between medial meniscal extrusion and postoperative outcomes of partial meniscectomy, and to identify a cutoff point of meniscal extrusion that contributes to arthritic change after partial meniscectomy in nonosteoarthritic knees. Cohort study; Level of evidence, 3. A total of 208 patients who underwent partial meniscectomy of the medial meniscus between January 2000 and September 2006 were retrospectively reviewed. The extent of extrusion and severity of degeneration of the medial meniscus as shown on preoperative MRI were evaluated. The minimum follow-up duration was 7 years. Clinical function was assessed with the Lysholm knee scoring scale, the International Knee Documentation Committee (IKDC) subjective knee evaluation form, and the Tapper and Hoover grading system. Radiological evaluation was conducted by use of the IKDC radiographic assessment scale. Regression analysis was performed to identify factors affecting preoperative extrusion of the medial meniscus and factors influencing follow-up results after partial meniscectomy. Receiver operating characteristic curve was used to identify a cutoff point for the extent of meniscal extrusion that was associated with arthritic change. The mean ± SD preoperative Lysholm knee score was 65.0 ± 6.3 and the mean IKDC subjective score was 60.1 ± 7.5. The mean follow-up functional scores were 93.2 ± 5.1 ( P meniscus showed a tendency to increase as the extent of intrameniscal degeneration increased, and the medial meniscus was extruded more in patients with horizontal, horizontal flap, and complex tears. The preoperative extent of meniscal extrusion had a statistically significant correlation with follow-up Lysholm knee score (coefficient = -0.10, P = .002), IKDC subjective score (coefficient

  18. Pathological assessment of liver fibrosis regression

    Directory of Open Access Journals (Sweden)

    WANG Bingqiong

    2017-03-01

    Full Text Available Hepatic fibrosis is the common pathological outcome of chronic hepatic diseases. An accurate assessment of fibrosis degree provides an important reference for a definite diagnosis of diseases, treatment decision-making, treatment outcome monitoring, and prognostic evaluation. At present, many clinical studies have proven that regression of hepatic fibrosis and early-stage liver cirrhosis can be achieved by effective treatment, and a correct evaluation of fibrosis regression has become a hot topic in clinical research. Liver biopsy has long been regarded as the gold standard for the assessment of hepatic fibrosis, and thus it plays an important role in the evaluation of fibrosis regression. This article reviews the clinical application of current pathological staging systems in the evaluation of fibrosis regression from the perspectives of semi-quantitative scoring system, quantitative approach, and qualitative approach, in order to propose a better pathological evaluation system for the assessment of fibrosis regression.

  19. Patient satisfaction with laser-sintered removable partial dentures: A crossover pilot clinical trial.

    Science.gov (United States)

    Almufleh, Balqees; Emami, Elham; Alageel, Omar; de Melo, Fabiana; Seng, Francois; Caron, Eric; Nader, Samer Abi; Al-Hashedi, Ashwaq; Albuquerque, Rubens; Feine, Jocelyne; Tamimi, Faleh

    2018-04-01

    Clinical data regarding newly introduced laser-sintered removable partial dentures (RPDs) are needed before this technique can be recommended. Currently, only a few clinical reports have been published, with no clinical studies. This clinical trial compared short-term satisfaction in patients wearing RPDs fabricated with conventional or computer-aided design and computer-aided manufacturing (CAD-CAM) laser-sintering technology. Twelve participants with partial edentulism were enrolled in this pilot crossover double-blinded clinical trial. Participants were randomly assigned to wear cast or CAD-CAM laser-sintered RPDs for alternate periods of 30 days. The outcome of interest was patient satisfaction as measured using the McGill Denture Satisfaction Instrument. Assessments was conducted at 1, 2, and 4 weeks. The participant's preference in regard to the type of prosthesis was assessed at the final evaluation. The linear mixed effects regression models for repeated measures were used to analyze the data, using the intention-to-treat principle. To assess the robustness of potential, incomplete adherence, sensitivity analyses were conducted. Statistically significant differences were found in patients' satisfaction between the 2 methods of RPD fabrication. Participants were significantly more satisfied with laser-sintered prostheses than cast prostheses in regard to general satisfaction, ability to speak, ability to clean, comfort, ability to masticate, masticatory efficiency, and oral condition (Premovable partial dentures may lead to better outcomes in terms of patient satisfaction in the short term. The conclusion from this pilot study requires confirmation by a larger randomized controlled trial. ClinicalTrials.gov. A study about patient satisfaction with laser-sintered removable partial dentures; NCT02769715. Copyright © 2017 Editorial Council for the Journal of Prosthetic Dentistry. Published by Elsevier Inc. All rights reserved.

  20. Estimating carbonate parameters from hydrographic data for the intermediate and deep waters of the Southern Hemisphere oceans

    Science.gov (United States)

    Bostock, H. C.; Mikaloff Fletcher, S. E.; Williams, M. J. M.

    2013-10-01

    Using ocean carbon data from global datasets, we have developed several multiple linear regression (MLR) algorithms to estimate alkalinity and dissolved inorganic carbon (DIC) in the intermediate and deep waters of the Southern Hemisphere (south of 25° S) from only hydrographic data (temperature, salinity and dissolved oxygen). A Monte Carlo experiment was used to identify a potential density (σθ) of 27.5 as an optimal break point between the two regimes with different MLR algorithms. The algorithms provide a good estimate of DIC (R2=0.98) and alkalinity (R2=0.91), and excellent agreement for aragonite and calcite saturation states (R2=0.99). Combining the algorithms with the CSIRO Atlas of Regional Seas (CARS), we have mapped the calcite saturation horizon (CSH) and aragonite saturation horizon (ASH) for the Southern Ocean at a spatial resolution of 0.5°. These maps are more detailed and more consistent with the oceanography than the previously gridded GLODAP data. The high-resolution ASH map reveals a dramatic circumpolar shoaling at the polar front. North of 40° S the CSH is deepest in the Atlantic (~ 4000 m) and shallower in the Pacific Ocean (~ 2750 m), while the CSH sits between 3200 and 3400 m in the Indian Ocean. The uptake of anthropogenic carbon by the ocean will alter the relationships between DIC and hydrographic data in the intermediate and deep waters over time. Thus continued sampling will be required, and the MLR algorithms will need to be adjusted in the future to account for these changes.

  1. Estimating carbonate parameters from hydrographic data for the intermediate and deep waters of the Southern Hemisphere oceans

    Directory of Open Access Journals (Sweden)

    H. C. Bostock

    2013-10-01

    Full Text Available Using ocean carbon data from global datasets, we have developed several multiple linear regression (MLR algorithms to estimate alkalinity and dissolved inorganic carbon (DIC in the intermediate and deep waters of the Southern Hemisphere (south of 25° S from only hydrographic data (temperature, salinity and dissolved oxygen. A Monte Carlo experiment was used to identify a potential density (σθ of 27.5 as an optimal break point between the two regimes with different MLR algorithms. The algorithms provide a good estimate of DIC (R2=0.98 and alkalinity (R2=0.91, and excellent agreement for aragonite and calcite saturation states (R2=0.99. Combining the algorithms with the CSIRO Atlas of Regional Seas (CARS, we have mapped the calcite saturation horizon (CSH and aragonite saturation horizon (ASH for the Southern Ocean at a spatial resolution of 0.5°. These maps are more detailed and more consistent with the oceanography than the previously gridded GLODAP data. The high-resolution ASH map reveals a dramatic circumpolar shoaling at the polar front. North of 40° S the CSH is deepest in the Atlantic (~ 4000 m and shallower in the Pacific Ocean (~ 2750 m, while the CSH sits between 3200 and 3400 m in the Indian Ocean. The uptake of anthropogenic carbon by the ocean will alter the relationships between DIC and hydrographic data in the intermediate and deep waters over time. Thus continued sampling will be required, and the MLR algorithms will need to be adjusted in the future to account for these changes.

  2. The Analysis of Constitutions of Traditional Chinese Medicine in Relation to Cerebral Infarction in a Chinese Sample.

    Science.gov (United States)

    Liu, Jiaqi; Xu, Fei; Mohammadtursun, Nabijan; Lv, Yubao; Tang, Zihui; Dong, Jingcheng

    2018-05-01

    To investigate the relationships between the constitutions of Traditional Chinese Medicine (TCM) and patients with cerebral infarction (CI) in a Chinese sample. A total of 3748 participants with complete data were available for data analysis. All study subjects underwent complete clinical baseline characteristics' evaluation, including a physical examination and response to a structured, nurse-assisted, self-administrated questionnaire. A population of 2010 neutral participants were used as the control group. Multiple variable regression (MLR) were employed to estimate the relationship between constitutions of TCM and the outcome. A cross-sectional study was conducted to evaluate the association of body constitution of TCM and CI. Communications and healthcare centers in Shanghai. A total of 3748 participants with complete data were available for data analysis. All study subjects underwent complete clinical baseline characteristics' evaluation, including a physical examination and response to a structured, nurse-assisted, self-administrated questionnaire. A population of 2010 neutral participants were used as the control group. MLR were employed to estimate the relationship between constitutions of TCM and the outcome. The prevalence of CI was 2.84% and 4.66% in neutral participants and yang-deficient participants (p = 0.012), respectively. Univariate analysis demonstrated a positive correlation between yang deficiency and CI. After adjustment for relevant potential confounding factors, the MLR detected significant associations between yang deficiency and CI (odds ratio = 1.44, p = 0.093). A yang-deficient constitution was significantly and independently associated with CI. A higher prevalence of CI was found in yang-deficient participants as compared with neutral participants.

  3. [Transfer characteristic and source identification of soil heavy metals from water-level-fluctuating zone along Xiangxi River, three-Gorges Reservoir area].

    Science.gov (United States)

    Xu, Tao; Wang, Fei; Guo, Qiang; Nie, Xiao-Qian; Huang, Ying-Ping; Chen, Jun

    2014-04-01

    Transfer characteristics of heavy metals and their evaluation of potential risk were studied based on determining concentration of heavy metal in soils from water-level-fluctuating zone (altitude:145-175 m) and bank (altitude: 175-185 m) along Xiangxi River, Three Gorges Reservoir area. Factor analysis-multiple linear regression (FA-MLR) was employed for heavy metal source identification and source apportionment. Results demonstrate that, during exposing season, the concentration of soil heavy metals in water-level-fluctuation zone and bank showed the variation, and the concentration of soil heavy metals reduced in shallow soil, but increased in deep soil at water-level-fluctuation zone. However, the concentration of soil heavy metals reduced in both shallow and deep soil at bank during the same period. According to the geoaccumulation index,the pollution extent of heavy metals followed the order: Cd > Pb > Cu > Cr, Cd is the primary pollutant. FA and FA-MLR reveal that in soils from water-level-fluctuation zone, 75.60% of Pb originates from traffic, 62.03% of Cd is from agriculture, 64.71% of Cu and 75.36% of Cr are from natural rock. In soils from bank, 82.26% of Pb originates from traffic, 68.63% of Cd is from agriculture, 65.72% of Cu and 69.33% of Cr are from natural rock. In conclusion, FA-MLR can successfully identify source of heavy metal and compute source apportionment of heavy metals, meanwhile the transfer characteristic is revealed. All these information can be a reference for heavy metal pollution control.

  4. Atom-type-based AI topological descriptors: application in structure-boiling point correlations of oxo organic compounds.

    Science.gov (United States)

    Ren, Biye

    2003-01-01

    Structure-boiling point relationships are studied for a series of oxo organic compounds by means of multiple linear regression (MLR) analysis. Excellent MLR models based on the recently introduced Xu index and the atom-type-based AI indices are obtained for the two subsets containing respectively 77 ethers and 107 carbonyl compounds and a combined set of 184 oxo compounds. The best models are tested using the leave-one-out cross-validation and an external test set, respectively. The MLR model produces a correlation coefficient of r = 0.9977 and a standard error of s = 3.99 degrees C for the training set of 184 compounds, and r(cv) = 0.9974 and s(cv) = 4.16 degrees C for the cross-validation set, and r(pred) = 0.9949 and s(pred) = 4.38 degrees C for the prediction set of 21 compounds. For the two subsets containing respectively 77 ethers and 107 carbonyl compounds, the quality of the models is further improved. The standard errors are reduced to 3.30 and 3.02 degrees C, respectively. Furthermore, the results obtained from this study indicate that the boiling points of the studied oxo compound dominantly depend on molecular size and also depend on individual atom types, especially oxygen heteroatoms in molecules due to strong polar interactions between molecules. These excellent structure-boiling point models not only provide profound insights into the role of structural features in a molecule but also illustrate the usefulness of these indices in QSPR/QSAR modeling of complex compounds.

  5. Lack of autologous mixed lymphocyte reaction in patients with chronic lymphocytic leukemia: evidence for autoreactive T-cell dysfunction not correlated with phenotype, karyotype, or clinical status

    International Nuclear Information System (INIS)

    Han, T.; Bloom, M.L.; Dadey, B.; Bennett, G.; Minowada, J.; Sandberg, A.A.; Ozer, H.

    1982-01-01

    In the present study, there was a complete lack of autologous MLR between responding T cells or T subsets and unirradiated or irradiated leukemic B cells or monocytes in all 20 patients with CLL, regardless of disease status, stage, phenotype, or karyotype of the disease. The stimulating capacity of unirradiated CLL B cells and CLL monocytes or irradiated CLL B cells was significantly depressed as compared to that of respective normal B cells and monocytes in allogeneic MLR. The responding capacity of CLL T cells was also variably lower than that of normal T cells against unirradiated or irradiated normal allogeneic B cells and monocytes. The depressed allogeneic MLR between CLL B cells or CLL monocytes and normal T cells described in the present study could be explained on the basis of a defect in the stimulating antigens of leukemic B cells or monocytes. The decreased allogeneic MLR of CLL T cells might simply be explained by a defect in the responsiveness of T lymphocytes from patients with CLL. However, these speculations do not adequately explain the complete lack of autologous MLR in these patients. When irradiated CLL B cells or irradiated CLL T cells were cocultured with normal T cells and irradiated normal B cells, it was found that there was no suppressor cell activity of CLL B cells or CLL T cells on normal autologous MLR. Our data suggest that the absence or dysfunction of autoreactive T cells within the Tnon-gamma subset account for the lack of autologous MLR in patients with CLL. The possible significance of the autologous MLR, its relationship to in vivo immunoregulatory mechanisms, and the possible role of breakdown of autoimmunoregulation in the oncogenic process of certain lymphoproliferative and autoimmune diseases in man are discussed

  6. Association between response rates and survival outcomes in patients with newly diagnosed multiple myeloma. A systematic review and meta-regression analysis.

    Science.gov (United States)

    Mainou, Maria; Madenidou, Anastasia-Vasiliki; Liakos, Aris; Paschos, Paschalis; Karagiannis, Thomas; Bekiari, Eleni; Vlachaki, Efthymia; Wang, Zhen; Murad, Mohammad Hassan; Kumar, Shaji; Tsapas, Apostolos

    2017-06-01

    We performed a systematic review and meta-regression analysis of randomized control trials to investigate the association between response to initial treatment and survival outcomes in patients with newly diagnosed multiple myeloma (MM). Response outcomes included complete response (CR) and the combined outcome of CR or very good partial response (VGPR), while survival outcomes were overall survival (OS) and progression-free survival (PFS). We used random-effect meta-regression models and conducted sensitivity analyses based on definition of CR and study quality. Seventy-two trials were included in the systematic review, 63 of which contributed data in meta-regression analyses. There was no association between OS and CR in patients without autologous stem cell transplant (ASCT) (regression coefficient: .02, 95% confidence interval [CI] -0.06, 0.10), in patients undergoing ASCT (-.11, 95% CI -0.44, 0.22) and in trials comparing ASCT with non-ASCT patients (.04, 95% CI -0.29, 0.38). Similarly, OS did not correlate with the combined metric of CR or VGPR, and no association was evident between response outcomes and PFS. Sensitivity analyses yielded similar results. This meta-regression analysis suggests that there is no association between conventional response outcomes and survival in patients with newly diagnosed MM. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  7. Inhibition of human dendritic cell activation by hydroethanolic but not lipophilic extracts of turmeric (Curcuma longa).

    Science.gov (United States)

    Krasovsky, Joseph; Chang, David H; Deng, Gary; Yeung, Simon; Lee, Mavis; Leung, Ping Chung; Cunningham-Rundles, Susanna; Cassileth, Barrie; Dhodapkar, Madhav V

    2009-03-01

    Turmeric has been extensively utilized in Indian and Chinese medicine for its immune-modulatory properties. Dendritic cells (DCs) are antigen-presenting cells specialized to initiate and regulate immunity. The ability of DCs to initiate immunity is linked to their activation status. The effects of turmeric on human DCs have not been studied. Here we show that hydroethanolic (HEE) but not lipophilic "supercritical" extraction (SCE) of turmeric inhibits the activation of human DCs in response to inflammatory cytokines. Treatment of DCs with HEE also inhibits the ability of DCs to stimulate the mixed lymphocyte reaction (MLR). Importantly, the lipophilic fraction does not synergize with the hydroethanolic fraction for the ability of inhibiting DC maturation. Rather, culturing of DCs with the combination of HEE and SCE leads to partial abrogation of the effects of HEE on the MLR initiated by DCs. These data provide a mechanism for the anti-inflammatory properties of turmeric. However, they suggest that these extracts are not synergistic and may contain components with mutually antagonistic effects on human DCs. Harnessing the immune effects of turmeric may benefit from specifically targeting the active fractions.

  8. Logistic Regression: Concept and Application

    Science.gov (United States)

    Cokluk, Omay

    2010-01-01

    The main focus of logistic regression analysis is classification of individuals in different groups. The aim of the present study is to explain basic concepts and processes of binary logistic regression analysis intended to determine the combination of independent variables which best explain the membership in certain groups called dichotomous…

  9. Predictors of course in obsessive-compulsive disorder: logistic regression versus Cox regression for recurrent events.

    Science.gov (United States)

    Kempe, P T; van Oppen, P; de Haan, E; Twisk, J W R; Sluis, A; Smit, J H; van Dyck, R; van Balkom, A J L M

    2007-09-01

    Two methods for predicting remissions in obsessive-compulsive disorder (OCD) treatment are evaluated. Y-BOCS measurements of 88 patients with a primary OCD (DSM-III-R) diagnosis were performed over a 16-week treatment period, and during three follow-ups. Remission at any measurement was defined as a Y-BOCS score lower than thirteen combined with a reduction of seven points when compared with baseline. Logistic regression models were compared with a Cox regression for recurrent events model. Logistic regression yielded different models at different evaluation times. The recurrent events model remained stable when fewer measurements were used. Higher baseline levels of neuroticism and more severe OCD symptoms were associated with a lower chance of remission, early age of onset and more depressive symptoms with a higher chance. Choice of outcome time affects logistic regression prediction models. Recurrent events analysis uses all information on remissions and relapses. Short- and long-term predictors for OCD remission show overlap.

  10. Sparse reduced-rank regression with covariance estimation

    KAUST Repository

    Chen, Lisha

    2014-12-08

    Improving the predicting performance of the multiple response regression compared with separate linear regressions is a challenging question. On the one hand, it is desirable to seek model parsimony when facing a large number of parameters. On the other hand, for certain applications it is necessary to take into account the general covariance structure for the errors of the regression model. We assume a reduced-rank regression model and work with the likelihood function with general error covariance to achieve both objectives. In addition we propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty, and to estimate the error covariance matrix simultaneously by using a similar penalty on the precision matrix. We develop a numerical algorithm to solve the penalized regression problem. In a simulation study and real data analysis, the new method is compared with two recent methods for multivariate regression and exhibits competitive performance in prediction and variable selection.

  11. Sparse reduced-rank regression with covariance estimation

    KAUST Repository

    Chen, Lisha; Huang, Jianhua Z.

    2014-01-01

    Improving the predicting performance of the multiple response regression compared with separate linear regressions is a challenging question. On the one hand, it is desirable to seek model parsimony when facing a large number of parameters. On the other hand, for certain applications it is necessary to take into account the general covariance structure for the errors of the regression model. We assume a reduced-rank regression model and work with the likelihood function with general error covariance to achieve both objectives. In addition we propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty, and to estimate the error covariance matrix simultaneously by using a similar penalty on the precision matrix. We develop a numerical algorithm to solve the penalized regression problem. In a simulation study and real data analysis, the new method is compared with two recent methods for multivariate regression and exhibits competitive performance in prediction and variable selection.

  12. Comparison of Treatment Outcomes in Partially Edentulous Patients with Implant-Supported Fixed Prostheses and Removable Partial Dentures.

    Science.gov (United States)

    Nogawa, Toshifumi; Takayama, Yoshiyuki; Ishida, Keita; Yokoyama, Atsuro

    The aim of this study was to compare masticatory performance, occlusal force, and oral health-related quality of life (OHRQoL) in patients with mandibular distal-extension edentulism between those with implant-supported fixed prostheses (ISFPs) and those with removable partial dentures (RPDs), and to evaluate relationships among them. Subjects were recruited from patients using ISFPs or RPDs for mandibular distal-extension edentulism. Masticatory performance was evaluated based on the glucose extracted from chewed gummy jelly. Occlusal force was measured with a pressure-sensitive sheet, and data were subjected to computer analysis. The Japanese version of the Oral Health Impact Profile (OHIP-J) was used to evaluate OHRQoL. The masticatory performance, occlusal force, and OHIP-J scores of the ISFP and RPD groups were compared using the Wilcoxon rank-sum test. The relationships among the variables were analyzed using the Spearman rank correlation coefficient test. Multivariate logistic regression analysis was employed with the OHIP-J score as a dependent variable. Nineteen patients with ISFPs and 25 patients with RPDs participated in this study. No significant difference was observed between the two groups with regard to masticatory performance and occlusal force. The OHIP-J score was significantly lower in the ISFP group than in the RPD group. The OHIP-J score had no significant correlation with masticatory performance, but was significantly correlated with occlusal force and the prosthetic method. Multivariate logistic regression analysis showed that younger age, RPDs, and lower occlusal force were significantly associated with a higher OHIP-J summary score. The present results suggest that the difference in masticatory performance and occlusal force between ISFPs and RPDs is small, but ISFPs are superior to RPDs with regard to OHRQoL in patients with mandibular distal-extension edentulism. In addition, there appears to be a slight correlation between the OHIP

  13. Regression models of reactor diagnostic signals

    International Nuclear Information System (INIS)

    Vavrin, J.

    1989-01-01

    The application is described of an autoregression model as the simplest regression model of diagnostic signals in experimental analysis of diagnostic systems, in in-service monitoring of normal and anomalous conditions and their diagnostics. The method of diagnostics is described using a regression type diagnostic data base and regression spectral diagnostics. The diagnostics is described of neutron noise signals from anomalous modes in the experimental fuel assembly of a reactor. (author)

  14. Support vector machine regression (SVR/LS-SVM)--an alternative to neural networks (ANN) for analytical chemistry? Comparison of nonlinear methods on near infrared (NIR) spectroscopy data.

    Science.gov (United States)

    Balabin, Roman M; Lomakina, Ekaterina I

    2011-04-21

    In this study, we make a general comparison of the accuracy and robustness of five multivariate calibration models: partial least squares (PLS) regression or projection to latent structures, polynomial partial least squares (Poly-PLS) regression, artificial neural networks (ANNs), and two novel techniques based on support vector machines (SVMs) for multivariate data analysis: support vector regression (SVR) and least-squares support vector machines (LS-SVMs). The comparison is based on fourteen (14) different datasets: seven sets of gasoline data (density, benzene content, and fractional composition/boiling points), two sets of ethanol gasoline fuel data (density and ethanol content), one set of diesel fuel data (total sulfur content), three sets of petroleum (crude oil) macromolecules data (weight percentages of asphaltenes, resins, and paraffins), and one set of petroleum resins data (resins content). Vibrational (near-infrared, NIR) spectroscopic data are used to predict the properties and quality coefficients of gasoline, biofuel/biodiesel, diesel fuel, and other samples of interest. The four systems presented here range greatly in composition, properties, strength of intermolecular interactions (e.g., van der Waals forces, H-bonds), colloid structure, and phase behavior. Due to the high diversity of chemical systems studied, general conclusions about SVM regression methods can be made. We try to answer the following question: to what extent can SVM-based techniques replace ANN-based approaches in real-world (industrial/scientific) applications? The results show that both SVR and LS-SVM methods are comparable to ANNs in accuracy. Due to the much higher robustness of the former, the SVM-based approaches are recommended for practical (industrial) application. This has been shown to be especially true for complicated, highly nonlinear objects.

  15. Regression and Sparse Regression Methods for Viscosity Estimation of Acid Milk From it’s Sls Features

    DEFF Research Database (Denmark)

    Sharifzadeh, Sara; Skytte, Jacob Lercke; Nielsen, Otto Højager Attermann

    2012-01-01

    Statistical solutions find wide spread use in food and medicine quality control. We investigate the effect of different regression and sparse regression methods for a viscosity estimation problem using the spectro-temporal features from new Sub-Surface Laser Scattering (SLS) vision system. From...... with sparse LAR, lasso and Elastic Net (EN) sparse regression methods. Due to the inconsistent measurement condition, Locally Weighted Scatter plot Smoothing (Loess) has been employed to alleviate the undesired variation in the estimated viscosity. The experimental results of applying different methods show...

  16. Testing discontinuities in nonparametric regression

    KAUST Repository

    Dai, Wenlin

    2017-01-19

    In nonparametric regression, it is often needed to detect whether there are jump discontinuities in the mean function. In this paper, we revisit the difference-based method in [13 H.-G. Müller and U. Stadtmüller, Discontinuous versus smooth regression, Ann. Stat. 27 (1999), pp. 299–337. doi: 10.1214/aos/1018031100

  17. Testing discontinuities in nonparametric regression

    KAUST Repository

    Dai, Wenlin; Zhou, Yuejin; Tong, Tiejun

    2017-01-01

    In nonparametric regression, it is often needed to detect whether there are jump discontinuities in the mean function. In this paper, we revisit the difference-based method in [13 H.-G. Müller and U. Stadtmüller, Discontinuous versus smooth regression, Ann. Stat. 27 (1999), pp. 299–337. doi: 10.1214/aos/1018031100

  18. Artificial neural network and particle swarm optimization for removal of methyl orange by gold nanoparticles loaded on activated carbon and Tamarisk

    Science.gov (United States)

    Ghaedi, M.; Ghaedi, A. M.; Ansari, A.; Mohammadi, F.; Vafaei, A.

    2014-11-01

    The influence of variables, namely initial dye concentration, adsorbent dosage (g), stirrer speed (rpm) and contact time (min) on the removal of methyl orange (MO) by gold nanoparticles loaded on activated carbon (Au-NP-AC) and Tamarisk were investigated using multiple linear regression (MLR) and artificial neural network (ANN) and the variables were optimized by partial swarm optimization (PSO). Comparison of the results achieved using proposed models, showed the ANN model was better than the MLR model for prediction of methyl orange removal using Au-NP-AC and Tamarisk. Using the optimal ANN model the coefficient of determination (R2) for the test data set were 0.958 and 0.989; mean squared error (MSE) values were 0.00082 and 0.0006 for Au-NP-AC and Tamarisk adsorbent, respectively. In this study a novel and green approach were reported for the synthesis of gold nanoparticle and activated carbon by Tamarisk. This material was characterized using different techniques such as SEM, TEM, XRD and BET. The usability of Au-NP-AC and activated carbon (AC) Tamarisk for the methyl orange from aqueous solutions was investigated. The effect of variables such as pH, initial dye concentration, adsorbent dosage (g) and contact time (min) on methyl orange removal were studied. Fitting the experimental equilibrium data to various isotherm models such as Langmuir, Freundlich, Tempkin and Dubinin-Radushkevich models show the suitability and applicability of the Langmuir model. Kinetic models such as pseudo-first order, pseudo-second order, Elovich and intraparticle diffusion models indicate that the second-order equation and intraparticle diffusion models control the kinetic of the adsorption process. The small amount of proposed Au-NP-AC and activated carbon (0.015 g and 0.75 g) is applicable for successful removal of methyl orange (>98%) in short time (20 min for Au-NP-AC and 45 min for Tamarisk-AC) with high adsorption capacity 161 mg g-1 for Au-NP-AC and 3.84 mg g-1 for Tamarisk-AC.

  19. Algorithms over partially ordered sets

    DEFF Research Database (Denmark)

    Baer, Robert M.; Østerby, Ole

    1969-01-01

    in partially ordered sets, answer the combinatorial question of how many maximal chains might exist in a partially ordered set withn elements, and we give an algorithm for enumerating all maximal chains. We give (in § 3) algorithms which decide whether a partially ordered set is a (lower or upper) semi......-lattice, and whether a lattice has distributive, modular, and Boolean properties. Finally (in § 4) we give Algol realizations of the various algorithms....

  20. On Solving Lq-Penalized Regressions

    Directory of Open Access Journals (Sweden)

    Tracy Zhou Wu

    2007-01-01

    Full Text Available Lq-penalized regression arises in multidimensional statistical modelling where all or part of the regression coefficients are penalized to achieve both accuracy and parsimony of statistical models. There is often substantial computational difficulty except for the quadratic penalty case. The difficulty is partly due to the nonsmoothness of the objective function inherited from the use of the absolute value. We propose a new solution method for the general Lq-penalized regression problem based on space transformation and thus efficient optimization algorithms. The new method has immediate applications in statistics, notably in penalized spline smoothing problems. In particular, the LASSO problem is shown to be polynomial time solvable. Numerical studies show promise of our approach.