regression technique r-pls: Topics by WorldWideScience.org

Sample records for regression technique r-pls

Variable and subset selection in PLS regression

DEFF Research Database (Denmark)

Høskuldsson, Agnar

2001-01-01

The purpose of this paper is to present some useful methods for introductory analysis of variables and subsets in relation to PLS regression. We present here methods that are efficient in finding the appropriate variables or subset to use in the PLS regression. The general conclusion...... is that variable selection is important for successful analysis of chemometric data. An important aspect of the results presented is that lack of variable selection can spoil the PLS regression, and that cross-validation measures using a test set can show larger variation, when we use different subsets of X, than...
Kinetic microplate bioassays for relative potency of antibiotics improved by partial Least Square (PLS) regression.

Science.gov (United States)

Francisco, Fabiane Lacerda; Saviano, Alessandro Morais; Almeida, Túlia de Souza Botelho; Lourenço, Felipe Rebello

2016-05-01

Microbiological assays are widely used to estimate the relative potencies of antibiotics in order to guarantee the efficacy, safety, and quality of drug products. Despite of the advantages of turbidimetric bioassays when compared to other methods, it has limitations concerning the linearity and range of the dose-response curve determination. Here, we proposed to use partial least squares (PLS) regression to solve these limitations and to improve the prediction of relative potencies of antibiotics. Kinetic-reading microplate turbidimetric bioassays for apramacyin and vancomycin were performed using Escherichia coli (ATCC 8739) and Bacillus subtilis (ATCC 6633), respectively. Microbial growths were measured as absorbance up to 180 and 300min for apramycin and vancomycin turbidimetric bioassays, respectively. Conventional dose-response curves (absorbances or area under the microbial growth curve vs. log of antibiotic concentration) showed significant regression, however there were significant deviation of linearity. Thus, they could not be used for relative potency estimations. PLS regression allowed us to construct a predictive model for estimating the relative potencies of apramycin and vancomycin without over-fitting and it improved the linear range of turbidimetric bioassay. In addition, PLS regression provided predictions of relative potencies equivalent to those obtained from agar diffusion official methods. Therefore, we conclude that PLS regression may be used to estimate the relative potencies of antibiotics with significant advantages when compared to conventional dose-response curve determination. Copyright © 2016 Elsevier B.V. All rights reserved.
The Chaotic Prediction for Aero-Engine Performance Parameters Based on Nonlinear PLS Regression

Directory of Open Access Journals (Sweden)

Chunxiao Zhang

2012-01-01

Full Text Available The prediction of the aero-engine performance parameters is very important for aero-engine condition monitoring and fault diagnosis. In this paper, the chaotic phase space of engine exhaust temperature (EGT time series which come from actual air-borne ACARS data is reconstructed through selecting some suitable nearby points. The partial least square (PLS based on the cubic spline function or the kernel function transformation is adopted to obtain chaotic predictive function of EGT series. The experiment results indicate that the proposed PLS chaotic prediction algorithm based on biweight kernel function transformation has significant advantage in overcoming multicollinearity of the independent variables and solve the stability of regression model. Our predictive NMSE is 16.5 percent less than that of the traditional linear least squares (OLS method and 10.38 percent less than that of the linear PLS approach. At the same time, the forecast error is less than that of nonlinear PLS algorithm through bootstrap test screening.
Towards molecular design using 2D-molecular contour maps obtained from PLS regression coefficients

Science.gov (United States)

Borges, Cleber N.; Barigye, Stephen J.; Freitas, Matheus P.

2017-12-01

The multivariate image analysis descriptors used in quantitative structure-activity relationships are direct representations of chemical structures as they are simply numerical decodifications of pixels forming the 2D chemical images. These MDs have found great utility in the modeling of diverse properties of organic molecules. Given the multicollinearity and high dimensionality of the data matrices generated with the MIA-QSAR approach, modeling techniques that involve the projection of the data space onto orthogonal components e.g. Partial Least Squares (PLS) have been generally used. However, the chemical interpretation of the PLS-based MIA-QSAR models, in terms of the structural moieties affecting the modeled bioactivity has not been straightforward. This work describes the 2D-contour maps based on the PLS regression coefficients, as a means of assessing the relevance of single MIA predictors to the response variable, and thus allowing for the structural, electronic and physicochemical interpretation of the MIA-QSAR models. A sample study to demonstrate the utility of the 2D-contour maps to design novel drug-like molecules is performed using a dataset of some anti-HIV-1 2-amino-6-arylsulfonylbenzonitriles and derivatives, and the inferences obtained are consistent with other reports in the literature. In addition, the different schemes for encoding atomic properties in molecules are discussed and evaluated.
Analysis of designed experiments by stabilised PLS Regression and jack-knifing

DEFF Research Database (Denmark)

Martens, Harald; Høy, M.; Westad, F.

2001-01-01

Pragmatical, visually oriented methods for assessing and optimising bi-linear regression models are described, and applied to PLS Regression (PLSR) analysis of multi-response data from controlled experiments. The paper outlines some ways to stabilise the PLSR method to extend its range...... the reliability of the linear and bi-linear model parameter estimates. The paper illustrates how the obtained PLSR "significance" probabilities are similar to those from conventional factorial ANOVA, but the PLSR is shown to give important additional overview plots of the main relevant structures in the multi....... An Introduction, Wiley, Chichester, UK, 2001]....
Application of NIRS coupled with PLS regression as a rapid, non-destructive alternative method for quantification of KBA in Boswellia sacra

Science.gov (United States)

Al-Harrasi, Ahmed; Rehman, Najeeb Ur; Mabood, Fazal; Albroumi, Muhammaed; Ali, Liaqat; Hussain, Javid; Hussain, Hidayat; Csuk, René; Khan, Abdul Latif; Alam, Tanveer; Alameri, Saif

2017-09-01

In the present study, for the first time, NIR spectroscopy coupled with PLS regression as a rapid and alternative method was developed to quantify the amount of Keto-β-Boswellic Acid (KBA) in different plant parts of Boswellia sacra and the resin exudates of the trunk. NIR spectroscopy was used for the measurement of KBA standards and B. sacra samples in absorption mode in the wavelength range from 700-2500 nm. PLS regression model was built from the obtained spectral data using 70% of KBA standards (training set) in the range from 0.1 ppm to 100 ppm. The PLS regression model obtained was having R-square value of 98% with 0.99 corelationship value and having good prediction with RMSEP value 3.2 and correlation of 0.99. It was then used to quantify the amount of KBA in the samples of B. sacra. The results indicated that the MeOH extract of resin has the highest concentration of KBA (0.6%) followed by essential oil (0.1%). However, no KBA was found in the aqueous extract. The MeOH extract of the resin was subjected to column chromatography to get various sub-fractions at different polarity of organic solvents. The sub-fraction at 4% MeOH/CHCl3 (4.1% of KBA) was found to contain the highest percentage of KBA followed by another sub-fraction at 2% MeOH/CHCl3 (2.2% of KBA). The present results also indicated that KBA is only present in the gum-resin of the trunk and not in all parts of the plant. These results were further confirmed through HPLC analysis and therefore it is concluded that NIRS coupled with PLS regression is a rapid and alternate method for quantification of KBA in Boswellia sacra. It is non-destructive, rapid, sensitive and uses simple methods of sample preparation.
Senate Bill (PLS No. 200, de 2015, analysis versus the Principle of the Prohibition of Social Regression

Directory of Open Access Journals (Sweden)

Glaucia Ribeiro Lima

2016-12-01

Full Text Available The Senate Bill (PLS number 200, of 2015, proposes the edition of a law for the conduction of clinical trials involving human subjects. This study aimed to perform a critical analysis of the PLS 200/2015, based on the Principle of the Prohibition of Social Regression. Thus, a descriptive, documentary and normative research was conducted, with survey of the ethical and sanitary standards related to clinical research and findings related to the PL 200/2015. The PLS 200/2015 and the information regarding was also consulted on the website of the Senate. The regulation of the matter by law demonstrated not to be a problem in the research. The main conflicts were related to the creation of Independent Ethics Committee (IEC, that does not link the ethic review to an State Agency; the use of placebo, in which flexibility is contrary to all efforts to ensure that participants have the best treatment options; and post-study access, which restriction is contrary to the existing regulations that determine the free and unlimited access. The analysis of the main settings specified in the PLS 200/2015 did not identify social or scientific improvements. The Principle of the Prohibition of Social Regression can be used, thus, to ensure the constitutional provisions already undertake and accomplished, mainly the right to health, human dignity and the inviolability of the right to live.
Impact of multicollinearity on small sample hydrologic regression models

Science.gov (United States)

Kroll, Charles N.; Song, Peter

2013-06-01

Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.
COMPARISON OF PARTIAL LEAST SQUARES REGRESSION METHOD ALGORITHMS: NIPALS AND PLS-KERNEL AND AN APPLICATION

Directory of Open Access Journals (Sweden)

ELİF BULUT

2013-06-01

Full Text Available Partial Least Squares Regression (PLSR is a multivariate statistical method that consists of partial least squares and multiple linear regression analysis. Explanatory variables, X, having multicollinearity are reduced to components which explain the great amount of covariance between explanatory and response variable. These components are few in number and they don’t have multicollinearity problem. Then multiple linear regression analysis is applied to those components to model the response variable Y. There are various PLSR algorithms. In this study NIPALS and PLS-Kernel algorithms will be studied and illustrated on a real data set.
Enhanced Anomaly Detection Via PLS Regression Models and Information Entropy Theory

KAUST Repository

Harrou, Fouzi

2015-12-07

Accurate and effective fault detection and diagnosis of modern engineering systems is crucial for ensuring reliability, safety and maintaining the desired product quality. In this work, we propose an innovative method for detecting small faults in the highly correlated multivariate data. The developed method utilizes partial least square (PLS) method as a modelling framework, and the symmetrized Kullback-Leibler divergence (KLD) as a monitoring index, where it is used to quantify the dissimilarity between probability distributions of current PLS-based residual and reference one obtained using fault-free data. The performance of the PLS-based KLD fault detection algorithm is illustrated and compared to the conventional PLS-based fault detection methods. Using synthetic data, we have demonstrated the greater sensitivity and effectiveness of the developed method over the conventional methods, especially when data are highly correlated and small faults are of interest.
Enhanced Anomaly Detection Via PLS Regression Models and Information Entropy Theory

KAUST Repository

Harrou, Fouzi; Sun, Ying

2015-01-01

Accurate and effective fault detection and diagnosis of modern engineering systems is crucial for ensuring reliability, safety and maintaining the desired product quality. In this work, we propose an innovative method for detecting small faults in the highly correlated multivariate data. The developed method utilizes partial least square (PLS) method as a modelling framework, and the symmetrized Kullback-Leibler divergence (KLD) as a monitoring index, where it is used to quantify the dissimilarity between probability distributions of current PLS-based residual and reference one obtained using fault-free data. The performance of the PLS-based KLD fault detection algorithm is illustrated and compared to the conventional PLS-based fault detection methods. Using synthetic data, we have demonstrated the greater sensitivity and effectiveness of the developed method over the conventional methods, especially when data are highly correlated and small faults are of interest.
The effect of PLS regression in PLS path model estimation when multicollinearity is present

DEFF Research Database (Denmark)

Nielsen, Rikke; Kristensen, Kai; Eskildsen, Jacob

PLS path modelling has previously been found to be robust to multicollinearity both between latent variables and between manifest variables of a common latent variable (see e.g. Cassel et al. (1999), Kristensen, Eskildsen (2005), Westlund et al. (2008)). However, most of the studies investigate...... models with relatively few variables and very simple dependence structures compared to the models that are often estimated in practical settings. A recent study by Nielsen et al. (2009) found that when model structure is more complex, PLS path modelling is not as robust to multicollinearity between...... latent variables as previously assumed. A difference in the standard error of path coefficients of as much as 83% was found between moderate and severe levels of multicollinearity. Large differences were found not only for large path coefficients, but also for small path coefficients and in some cases...
Assessment of bitter taste of pharmaceuticals with multisensor system employing 3 way PLS regression

International Nuclear Information System (INIS)

Rudnitskaya, Alisa; Kirsanov, Dmitry; Blinova, Yulia; Legin, Evgeny; Seleznev, Boris; Clapham, David; Ives, Robert S.; Saunders, Kenneth A.; Legin, Andrey

2013-01-01

Highlights: ► Chemically diverse APIs are studied with potentiometric “electronic tongue”. ► Bitter taste of APIs can be predicted with 3wayPLS regression from ET data. ► High correlation of ET assessment with human panel and rat in vivo model. -- Abstract: The application of the potentiometric multisensor system (electronic tongue, ET) for quantification of the bitter taste of structurally diverse active pharmaceutical ingredients (API) is reported. The measurements were performed using a set of bitter substances that had been assessed by a professional human sensory panel and the in vivo rat brief access taste aversion (BATA) model to produce bitterness intensity scores for each substance at different concentrations. The set consisted of eight substances, both inorganic and organic – azelastine, caffeine, chlorhexidine, potassium nitrate, naratriptan, paracetamol, quinine, and sumatriptan. With the aim of enhancing the response of the sensors to the studied APIs, measurements were carried out at different pH levels ranging from 2 to 10, thus promoting ionization of the compounds. This experiment yielded a 3 way data array (samples × sensors × pH levels) from which 3wayPLS regression models were constructed with both human panel and rat model reference data. These models revealed that artificial assessment of bitter taste with ET in the chosen set of API's is possible with average relative errors of 16% in terms of human panel bitterness score and 25% in terms of inhibition values from in vivo rat model data. Furthermore, these 3wayPLS models were applied for prediction of the bitterness in blind test samples of a further set of API's. The results of the prediction were compared with the inhibition values obtained from the in vivo rat model
Assessment of bitter taste of pharmaceuticals with multisensor system employing 3 way PLS regression

Energy Technology Data Exchange (ETDEWEB)

Rudnitskaya, Alisa [CESAM and Chemistry Department, University of Aveiro, Aveiro (Portugal); Kirsanov, Dmitry, E-mail: d.kirsanov@gmail.com [Chemistry Department, St. Petersburg University, St. Petersburg (Russian Federation); Blinova, Yulia [Chemistry Department, St. Petersburg University, St. Petersburg (Russian Federation); Legin, Evgeny [Sensor Systems LLC, St. Petersburg (Russian Federation); Seleznev, Boris [Chemistry Department, St. Petersburg University, St. Petersburg (Russian Federation); Clapham, David; Ives, Robert S.; Saunders, Kenneth A. [GlaxoSmithKline Pharmaceuticals, Gunnels Wood Road, Stevenage (United Kingdom); Legin, Andrey [Chemistry Department, St. Petersburg University, St. Petersburg (Russian Federation)

2013-04-03

Highlights: ► Chemically diverse APIs are studied with potentiometric “electronic tongue”. ► Bitter taste of APIs can be predicted with 3wayPLS regression from ET data. ► High correlation of ET assessment with human panel and rat in vivo model. -- Abstract: The application of the potentiometric multisensor system (electronic tongue, ET) for quantification of the bitter taste of structurally diverse active pharmaceutical ingredients (API) is reported. The measurements were performed using a set of bitter substances that had been assessed by a professional human sensory panel and the in vivo rat brief access taste aversion (BATA) model to produce bitterness intensity scores for each substance at different concentrations. The set consisted of eight substances, both inorganic and organic – azelastine, caffeine, chlorhexidine, potassium nitrate, naratriptan, paracetamol, quinine, and sumatriptan. With the aim of enhancing the response of the sensors to the studied APIs, measurements were carried out at different pH levels ranging from 2 to 10, thus promoting ionization of the compounds. This experiment yielded a 3 way data array (samples × sensors × pH levels) from which 3wayPLS regression models were constructed with both human panel and rat model reference data. These models revealed that artificial assessment of bitter taste with ET in the chosen set of API's is possible with average relative errors of 16% in terms of human panel bitterness score and 25% in terms of inhibition values from in vivo rat model data. Furthermore, these 3wayPLS models were applied for prediction of the bitterness in blind test samples of a further set of API's. The results of the prediction were compared with the inhibition values obtained from the in vivo rat model.
semPLS: Structural Equation Modeling Using Partial Least Squares

Directory of Open Access Journals (Sweden)

Armin Monecke

2012-05-01

Full Text Available Structural equation models (SEM are very popular in many disciplines. The partial least squares (PLS approach to SEM offers an alternative to covariance-based SEM, which is especially suited for situations when data is not normally distributed. PLS path modelling is referred to as soft-modeling-technique with minimum demands regarding mea- surement scales, sample sizes and residual distributions. The semPLS package provides the capability to estimate PLS path models within the R programming environment. Different setups for the estimation of factor scores can be used. Furthermore it contains modular methods for computation of bootstrap confidence intervals, model parameters and several quality indices. Various plot functions help to evaluate the model. The well known mobile phone dataset from marketing research is used to demonstrate the features of the package.
Application of sequential and orthogonalised-partial least squares (SO-PLS) regression to predict sensory properties of Cabernet Sauvignon wines from grape chemical composition.

Science.gov (United States)

Niimi, Jun; Tomic, Oliver; Næs, Tormod; Jeffery, David W; Bastian, Susan E P; Boss, Paul K

2018-08-01

The current study determined the applicability of sequential and orthogonalised-partial least squares (SO-PLS) regression to relate Cabernet Sauvignon grape chemical composition to the sensory perception of the corresponding wines. Grape samples (n = 25) were harvested at a similar maturity and vinified identically in 2013. Twelve measures using various (bio)chemical methods were made on grapes. Wines were evaluated using descriptive analysis with a trained panel (n = 10) for sensory profiling. Data was analysed globally using SO-PLS for the entire sensory profiles (SO-PLS2), as well as for single sensory attributes (SO-PLS1). SO-PLS1 models were superior in validated explained variances than SO-PLS2. SO-PLS provided a structured approach in the selection of predictor chemical data sets that best contributed to the correlation of important sensory attributes. This new approach presents great potential for application in other explorative metabolomics studies of food and beverages to address factors such as quality and regional influences. Copyright © 2018 Elsevier Ltd. All rights reserved.
Measurement of process variables in solid-state fermentation of wheat straw using FT-NIR spectroscopy and synergy interval PLS algorithm

Science.gov (United States)

Jiang, Hui; Liu, Guohai; Mei, Congli; Yu, Shuang; Xiao, Xiahong; Ding, Yuhan

2012-11-01

The feasibility of rapid determination of the process variables (i.e. pH and moisture content) in solid-state fermentation (SSF) of wheat straw using Fourier transform near infrared (FT-NIR) spectroscopy was studied. Synergy interval partial least squares (siPLS) algorithm was implemented to calibrate regression model. The number of PLS factors and the number of subintervals were optimized simultaneously by cross-validation. The performance of the prediction model was evaluated according to the root mean square error of cross-validation (RMSECV), the root mean square error of prediction (RMSEP) and the correlation coefficient (R). The measurement results of the optimal model were obtained as follows: RMSECV = 0.0776, Rc = 0.9777, RMSEP = 0.0963, and Rp = 0.9686 for pH model; RMSECV = 1.3544% w/w, Rc = 0.8871, RMSEP = 1.4946% w/w, and Rp = 0.8684 for moisture content model. Finally, compared with classic PLS and iPLS models, the siPLS model revealed its superior performance. The overall results demonstrate that FT-NIR spectroscopy combined with siPLS algorithm can be used to measure process variables in solid-state fermentation of wheat straw, and NIR spectroscopy technique has a potential to be utilized in SSF industry.
Fusion of neural computing and PLS techniques for load estimation

Energy Technology Data Exchange (ETDEWEB)

Lu, M.; Xue, H.; Cheng, X. [Northwestern Polytechnical Univ., Xi' an (China); Zhang, W. [Xi' an Inst. of Post and Telecommunication, Xi' an (China)

2007-07-01

A method to predict the electric load of a power system in real time was presented. The method is based on neurocomputing and partial least squares (PLS). Short-term load forecasts for power systems are generally determined by conventional statistical methods and Computational Intelligence (CI) techniques such as neural computing. However, statistical modeling methods often require the input of questionable distributional assumptions, and neural computing is weak, particularly in determining topology. In order to overcome the problems associated with conventional techniques, the authors developed a CI hybrid model based on neural computation and PLS techniques. The theoretical foundation for the designed CI hybrid model was presented along with its application in a power system. The hybrid model is suitable for nonlinear modeling and latent structure extracting. It can automatically determine the optimal topology to maximize the generalization. The CI hybrid model provides faster convergence and better prediction results compared to the abductive networks model because it incorporates a load conversion technique as well as new transfer functions. In order to demonstrate the effectiveness of the hybrid model, load forecasting was performed on a data set obtained from the Puget Sound Power and Light Company. Compared with the abductive networks model, the CI hybrid model reduced the forecast error by 32.37 per cent on workday, and by an average of 27.18 per cent on the weekend. It was concluded that the CI hybrid model has a more powerful predictive ability. 7 refs., 1 tab., 3 figs.
Determination of fat content in chicken hamburgers using NIR spectroscopy and the Successive Projections Algorithm for interval selection in PLS regression (iSPA-PLS)

Science.gov (United States)

Krepper, Gabriela; Romeo, Florencia; Fernandes, David Douglas de Sousa; Diniz, Paulo Henrique Gonçalves Dias; de Araújo, Mário César Ugulino; Di Nezio, María Susana; Pistonesi, Marcelo Fabián; Centurión, María Eugenia

2018-01-01

Determining fat content in hamburgers is very important to minimize or control the negative effects of fat on human health, effects such as cardiovascular diseases and obesity, which are caused by the high consumption of saturated fatty acids and cholesterol. This study proposed an alternative analytical method based on Near Infrared Spectroscopy (NIR) and Successive Projections Algorithm for interval selection in Partial Least Squares regression (iSPA-PLS) for fat content determination in commercial chicken hamburgers. For this, 70 hamburger samples with a fat content ranging from 14.27 to 32.12 mg kg- 1 were prepared based on the upper limit recommended by the Argentinean Food Codex, which is 20% (w w- 1). NIR spectra were then recorded and then preprocessed by applying different approaches: base line correction, SNV, MSC, and Savitzky-Golay smoothing. For comparison, full-spectrum PLS and the Interval PLS are also used. The best performance for the prediction set was obtained for the first derivative Savitzky-Golay smoothing with a second-order polynomial and window size of 19 points, achieving a coefficient of correlation of 0.94, RMSEP of 1.59 mg kg- 1, REP of 7.69% and RPD of 3.02. The proposed methodology represents an excellent alternative to the conventional Soxhlet extraction method, since waste generation is avoided, yet without the use of either chemical reagents or solvents, which follows the primary principles of Green Chemistry. The new method was successfully applied to chicken hamburger analysis, and the results agreed with those with reference values at a 95% confidence level, making it very attractive for routine analysis.
Comparison of partial least squares and lasso regression techniques as applied to laser-induced breakdown spectroscopy of geological samples

International Nuclear Information System (INIS)

Dyar, M.D.; Carmosino, M.L.; Breves, E.A.; Ozanne, M.V.; Clegg, S.M.; Wiens, R.C.

2012-01-01

A remote laser-induced breakdown spectrometer (LIBS) designed to simulate the ChemCam instrument on the Mars Science Laboratory Rover Curiosity was used to probe 100 geologic samples at a 9-m standoff distance. ChemCam consists of an integrated remote LIBS instrument that will probe samples up to 7 m from the mast of the rover and a remote micro-imager (RMI) that will record context images. The elemental compositions of 100 igneous and highly-metamorphosed rocks are determined with LIBS using three variations of multivariate analysis, with a goal of improving the analytical accuracy. Two forms of partial least squares (PLS) regression are employed with finely-tuned parameters: PLS-1 regresses a single response variable (elemental concentration) against the observation variables (spectra, or intensity at each of 6144 spectrometer channels), while PLS-2 simultaneously regresses multiple response variables (concentrations of the ten major elements in rocks) against the observation predictor variables, taking advantage of natural correlations between elements. Those results are contrasted with those from the multivariate regression technique of the least absolute shrinkage and selection operator (lasso), which is a penalized shrunken regression method that selects the specific channels for each element that explain the most variance in the concentration of that element. To make this comparison, we use results of cross-validation and of held-out testing, and employ unscaled and uncentered spectral intensity data because all of the input variables are already in the same units. Results demonstrate that the lasso, PLS-1, and PLS-2 all yield comparable results in terms of accuracy for this dataset. However, the interpretability of these methods differs greatly in terms of fundamental understanding of LIBS emissions. PLS techniques generate principal components, linear combinations of intensities at any number of spectrometer channels, which explain as much variance in the

Boosted regression trees, multivariate adaptive regression splines and their two-step combinations with multiple linear regression or partial least squares to predict blood-brain barrier passage: a case study.

Science.gov (United States)

Deconinck, E; Zhang, M H; Petitet, F; Dubus, E; Ijjaali, I; Coomans, D; Vander Heyden, Y

2008-02-18

The use of some unconventional non-linear modeling techniques, i.e. classification and regression trees and multivariate adaptive regression splines-based methods, was explored to model the blood-brain barrier (BBB) passage of drugs and drug-like molecules. The data set contains BBB passage values for 299 structural and pharmacological diverse drugs, originating from a structured knowledge-based database. Models were built using boosted regression trees (BRT) and multivariate adaptive regression splines (MARS), as well as their respective combinations with stepwise multiple linear regression (MLR) and partial least squares (PLS) regression in two-step approaches. The best models were obtained using combinations of MARS with either stepwise MLR or PLS. It could be concluded that the use of combinations of a linear with a non-linear modeling technique results in some improved properties compared to the individual linear and non-linear models and that, when the use of such a combination is appropriate, combinations using MARS as non-linear technique should be preferred over those with BRT, due to some serious drawbacks of the BRT approaches.
Evaluation of in-line Raman data for end-point determination of a coating process: Comparison of Science-Based Calibration, PLS-regression and univariate data analysis.

Science.gov (United States)

Barimani, Shirin; Kleinebudde, Peter

2017-10-01

A multivariate analysis method, Science-Based Calibration (SBC), was used for the first time for endpoint determination of a tablet coating process using Raman data. Two types of tablet cores, placebo and caffeine cores, received a coating suspension comprising a polyvinyl alcohol-polyethylene glycol graft-copolymer and titanium dioxide to a maximum coating thickness of 80µm. Raman spectroscopy was used as in-line PAT tool. The spectra were acquired every minute and correlated to the amount of applied aqueous coating suspension. SBC was compared to another well-known multivariate analysis method, Partial Least Squares-regression (PLS) and a simpler approach, Univariate Data Analysis (UVDA). All developed calibration models had coefficient of determination values (R 2 ) higher than 0.99. The coating endpoints could be predicted with root mean square errors (RMSEP) less than 3.1% of the applied coating suspensions. Compared to PLS and UVDA, SBC proved to be an alternative multivariate calibration method with high predictive power. Copyright © 2017 Elsevier B.V. All rights reserved.
Comparison of partial least squares and lasso regression techniques as applied to laser-induced breakdown spectroscopy of geological samples

Energy Technology Data Exchange (ETDEWEB)

Dyar, M.D., E-mail: mdyar@mtholyoke.edu [Dept. of Astronomy, Mount Holyoke College, 50 College St., South Hadley, MA 01075 (United States); Carmosino, M.L.; Breves, E.A.; Ozanne, M.V. [Dept. of Astronomy, Mount Holyoke College, 50 College St., South Hadley, MA 01075 (United States); Clegg, S.M.; Wiens, R.C. [Los Alamos National Laboratory, P.O. Box 1663, MS J565, Los Alamos, NM 87545 (United States)

2012-04-15

A remote laser-induced breakdown spectrometer (LIBS) designed to simulate the ChemCam instrument on the Mars Science Laboratory Rover Curiosity was used to probe 100 geologic samples at a 9-m standoff distance. ChemCam consists of an integrated remote LIBS instrument that will probe samples up to 7 m from the mast of the rover and a remote micro-imager (RMI) that will record context images. The elemental compositions of 100 igneous and highly-metamorphosed rocks are determined with LIBS using three variations of multivariate analysis, with a goal of improving the analytical accuracy. Two forms of partial least squares (PLS) regression are employed with finely-tuned parameters: PLS-1 regresses a single response variable (elemental concentration) against the observation variables (spectra, or intensity at each of 6144 spectrometer channels), while PLS-2 simultaneously regresses multiple response variables (concentrations of the ten major elements in rocks) against the observation predictor variables, taking advantage of natural correlations between elements. Those results are contrasted with those from the multivariate regression technique of the least absolute shrinkage and selection operator (lasso), which is a penalized shrunken regression method that selects the specific channels for each element that explain the most variance in the concentration of that element. To make this comparison, we use results of cross-validation and of held-out testing, and employ unscaled and uncentered spectral intensity data because all of the input variables are already in the same units. Results demonstrate that the lasso, PLS-1, and PLS-2 all yield comparable results in terms of accuracy for this dataset. However, the interpretability of these methods differs greatly in terms of fundamental understanding of LIBS emissions. PLS techniques generate principal components, linear combinations of intensities at any number of spectrometer channels, which explain as much variance in the
Partial Least Squares Strukturgleichungsmodellierung (PLS-SEM)

DEFF Research Database (Denmark)

Hair, Joseph F.; Hult, G. Tomas M.; Ringle, Christian M.

(PLS-SEM) hat sich in der wirtschafts- und sozialwissenschaftlichen Forschung als geeignetes Verfahren zur Schätzung von Kausalmodellen behauptet. Dank der Anwenderfreundlichkeit des Verfahrens und der vorhandenen Software ist es inzwischen auch in der Praxis etabliert. Dieses Buch liefert eine...... anwendungsorientierte Einführung in die PLS-SEM. Der Fokus liegt auf den Grundlagen des Verfahrens und deren praktischer Umsetzung mit Hilfe der SmartPLS-Software. Das Konzept des Buches setzt dabei auf einfache Erläuterungen statistischer Ansätze und die anschauliche Darstellung zahlreicher Anwendungsbeispiele anhand...... einer einheitlichen Fallstudie. Viele Grafiken, Tabellen und Illustrationen erleichtern das Verständnis der PLS-SEM. Zudem werden dem Leser herunterladbare Datensätze, Aufgaben und weitere Fachartikel zur Vertiefung angeboten. Damit eignet sich das Buch hervorragend für Studierende, Forscher und...
Determinação simultânea dos teores de cinza e proteína em farinha de trigo empregando NIRR-PLS e DRIFT-PLS Simultaneous determination of ash content and protein in wheat flour using infrared reflection techniques and partial least-squares regression (PLS

Directory of Open Access Journals (Sweden)

Marco Flôres Ferrão

2004-09-01

Full Text Available As técnicas de espectroscopia por reflexão no infravermelho próximo (NIRRS e por reflexão difusa no infravermelho médio com transformada de Fourier (DRIFTS foram empregadas com o método de regressão multivariado por mínimos quadrados parciais (PLS para a determinação simultânea dos teores de proteína e cinza em amostras de farinha de trigo da variedade Triticum aestivum L. Foram coletados espectros no infravermelho em duplicata de 100 amostras, empregando-se acessórios de reflexão difusa. Os teores de proteína (8,85-13,23% e cinza (0,330-1,287%, empregados como referência, foram determinados pelo método Kjeldhal e método gravimétrico, respectivamente. Os dados espectrais foram utilizados no formato log(1/R, bem como suas derivadas de primeira e segunda ordem, sendo pré-processados usando-se os dados centrados na média (MC ou escalados pela variância (VS ou ambos. Cinqüenta e cinco amostras foram usadas para calibração e 45 para validação dos modelos, adotando-se como critério de construção os valores mínimos do erro padrão de calibração (SEC e do erro padrão de validação (SEV. Estes valores foram inferiores a 0,33% para proteína e a 0,07% para cinza. Os métodos desenvolvidos apresentam como vantagens a não agressão ao ambiente, bem como permitem uma determinação direta, simultânea, rápida e não destrutiva dos teores de proteína e cinza em amostras de farinha de trigo.Partial Least Square (PLS multivariate calibration associated to Near Infrared Reflection Spectroscopy (NIRRS or Diffuse Reflectance Infrared Fourier Transform Spectroscopy (DRIFTS were used to establish methods for simultaneous determination of protein and ash content on commercial wheat flour samples of Triticum aestivum L. Duplicate spectra of 100 samples with protein content between 8.85-13.23% (Kjeldahl method and ash content between 0.330-1.287% (gravimetric method were employed to build calibration methods. The spectra were used
Comparison of FTIR-ATR and Raman spectroscopy in determination of VLDL triglycerides in blood serum with PLS regression

Science.gov (United States)

Oleszko, Adam; Hartwich, Jadwiga; Wójtowicz, Anna; Gąsior-Głogowska, Marlena; Huras, Hubert; Komorowska, Małgorzata

2017-08-01

Hypertriglyceridemia, related with triglyceride (TG) in plasma above 1.7 mmol/L is one of the cardiovascular risk factors. Very low density lipoproteins (VLDL) are the main TG carriers. Despite being time consuming, demanding well-qualified staff and expensive instrumentation, ultracentrifugation technique still remains the gold standard for the VLDL isolation. Therefore faster and simpler method of VLDL-TG determination is needed. Vibrational spectroscopy, including FT-IR and Raman, is widely used technique in lipid and protein research. The aim of this study was assessment of Raman and FT-IR spectroscopy in determination of VLDL-TG directly in serum with the isolation step omitted. TG concentration in serum and in ultracentrifugated VLDL fractions from 32 patients were measured with reference colorimetric method. FT-IR and Raman spectra of VLDL and serum samples were acquired. Partial least square (PLS) regression was used for calibration and leave-one-out cross validation. Our results confirmed possibility of reagent-free determination of VLDL-TG directly in serum with both Raman and FT-IR spectroscopy. Quantitative VLDL testing by FT-IR and/or Raman spectroscopy applied directly to maternal serum seems to be promising screening test to identify women with increased risk of adverse pregnancy outcomes and patient friendly method of choice based on ease of performance, accuracy and efficiency.
PLS2 regression as a tool for selection of optimal analytical modality

DEFF Research Database (Denmark)

Madsen, Michael; Esbensen, Kim

Intelligent use of modern process analysers allows process technicians and engineers to look deep into the dynamic behaviour of production systems. This opens up for a plurality of new possibilities with respect to process optimisation. Oftentimes, several instruments representing different...... technologies and price classes are able to decipher relevant process information simultaneously. The question then is: how to choose between available technologies without compromising the quality and usability of the data. We apply PLS2 modelling to quantify the relative merits of competing, or complementing......, analytical modalities. We here present results from a feasibility study, where Fourier Transform Near InfraRed (FT-NIR), Fourier Transform Mid InfraRed (FT-MIR), and Raman laser spectroscopy were applied on the same set of samples obtained from a pilot-scale beer brewing process. Quantitative PLS1 models...
Quantitative analysis of glycated albumin in serum based on ATR-FTIR spectrum combined with SiPLS and SVM.

Science.gov (United States)

Li, Yuanpeng; Li, Fucui; Yang, Xinhao; Guo, Liu; Huang, Furong; Chen, Zhenqiang; Chen, Xingdan; Zheng, Shifu

2018-08-05

A rapid quantitative analysis model for determining the glycated albumin (GA) content based on Attenuated total reflectance (ATR)-Fourier transform infrared spectroscopy (FTIR) combining with linear SiPLS and nonlinear SVM has been developed. Firstly, the real GA content in human serum was determined by GA enzymatic method, meanwhile, the ATR-FTIR spectra of serum samples from the population of health examination were obtained. The spectral data of the whole spectra mid-infrared region (4000-600 cm -1 ) and GA's characteristic region (1800-800 cm -1 ) were used as the research object of quantitative analysis. Secondly, several preprocessing steps including first derivative, second derivative, variable standardization and spectral normalization, were performed. Lastly, quantitative analysis regression models were established by using SiPLS and SVM respectively. The SiPLS modeling results are as follows: root mean square error of cross validation (RMSECV T ) = 0.523 g/L, calibration coefficient (R C ) = 0.937, Root Mean Square Error of Prediction (RMSEP T ) = 0.787 g/L, and prediction coefficient (R P ) = 0.938. The SVM modeling results are as follows: RMSECV T  = 0.0048 g/L, R C  = 0.998, RMSEP T  = 0.442 g/L, and R p  = 0.916. The results indicated that the model performance was improved significantly after preprocessing and optimization of characteristic regions. While modeling performance of nonlinear SVM was considerably better than that of linear SiPLS. Hence, the quantitative analysis model for GA in human serum based on ATR-FTIR combined with SiPLS and SVM is effective. And it does not need sample preprocessing while being characterized by simple operations and high time efficiency, providing a rapid and accurate method for GA content determination. Copyright © 2018 Elsevier B.V. All rights reserved.
Comparing the analytical performances of Micro-NIR and FT-NIR spectrometers in the evaluation of acerola fruit quality, using PLS and SVM regression algorithms.

Science.gov (United States)

Malegori, Cristina; Nascimento Marques, Emanuel José; de Freitas, Sergio Tonetto; Pimentel, Maria Fernanda; Pasquini, Celio; Casiraghi, Ernestina

2017-04-01

The main goal of this study was to investigate the analytical performances of a state-of-the-art device, one of the smallest dispersion NIR spectrometers on the market (MicroNIR 1700), making a critical comparison with a benchtop FT-NIR spectrometer in the evaluation of the prediction accuracy. In particular, the aim of this study was to estimate in a non-destructive manner, titratable acidity and ascorbic acid content in acerola fruit during ripening, in a view of direct applicability in field of this new miniaturised handheld device. Acerola (Malpighia emarginata DC.) is a super-fruit characterised by a considerable amount of ascorbic acid, ranging from 1.0% to 4.5%. However, during ripening, acerola colour changes and the fruit may lose as much as half of its ascorbic acid content. Because the variability of chemical parameters followed a non-strictly linear profile, two different regression algorithms were compared: PLS and SVM. Regression models obtained with Micro-NIR spectra give better results using SVM algorithm, for both ascorbic acid and titratable acidity estimation. FT-NIR data give comparable results using both SVM and PLS algorithms, with lower errors for SVM regression. The prediction ability of the two instruments was statistically compared using the Passing-Bablok regression algorithm; the outcomes are critically discussed together with the regression models, showing the suitability of the portable Micro-NIR for in field monitoring of chemical parameters of interest in acerola fruits. Copyright © 2016 Elsevier B.V. All rights reserved.
Group-wise partial least square regression

NARCIS (Netherlands)

Camacho, José; Saccenti, Edoardo

2018-01-01

This paper introduces the group-wise partial least squares (GPLS) regression. GPLS is a new sparse PLS technique where the sparsity structure is defined in terms of groups of correlated variables, similarly to what is done in the related group-wise principal component analysis. These groups are
Hybrid ANN–PLS approach to scroll compressor thermodynamic performance prediction

International Nuclear Information System (INIS)

Tian, Z.; Gu, B.; Yang, L.; Lu, Y.

2015-01-01

In this paper, a scroll compressor thermodynamic performance prediction was carried out by applying a hybrid ANN–PLS model. Firstly, an experimental platform with second-refrigeration calorimeter was set up and steady-state scroll compressor data sets were collected from experiments. Then totally 148 data sets were introduced to train and verify the validity of the ANN–PLS model for predicting the scroll compressor parameters such as volumetric efficiency, refrigerant mass flow rate, discharge temperature and power consumption. The ANN–PLS model was determined with 5 hidden neurons and 7 latent variables through the training process. Ultimately, the ANN–PLS model showed better performance than the ANN model and the PLS model working separately. ANN–PLS predictions agree well with the experimental values with mean relative errors (MREs) in the range of 0.34–1.96%, correlation coefficients (R 2 ) in the range of 0.9703–0.9999 and very low root mean square errors (RMSEs). - Highlights: • Hybrid ANN–PLS is utilized to predict the thermodynamic performance of scroll compressor. • ANN–PLS model is determined with 5 hidden neurons and 7 latent variables. • ANN–PLS model demonstrates better performance than ANN and PLS working separately. • The values of MRE and RMSE are in the range of 0.34–1.96% and 0.9703–0.9999, respectively
On-line monitoring the extract process of Fu-fang Shuanghua oral solution using near infrared spectroscopy and different PLS algorithms

Science.gov (United States)

Kang, Qian; Ru, Qingguo; Liu, Yan; Xu, Lingyan; Liu, Jia; Wang, Yifei; Zhang, Yewen; Li, Hui; Zhang, Qing; Wu, Qing

2016-01-01

An on-line near infrared (NIR) spectroscopy monitoring method with an appropriate multivariate calibration method was developed for the extraction process of Fu-fang Shuanghua oral solution (FSOS). On-line NIR spectra were collected through two fiber optic probes, which were designed to transmit NIR radiation by a 2 mm flange. Partial least squares (PLS), interval PLS (iPLS) and synergy interval PLS (siPLS) algorithms were used comparatively for building the calibration regression models. During the extraction process, the feasibility of NIR spectroscopy was employed to determine the concentrations of chlorogenic acid (CA) content, total phenolic acids contents (TPC), total flavonoids contents (TFC) and soluble solid contents (SSC). High performance liquid chromatography (HPLC), ultraviolet spectrophotometric method (UV) and loss on drying methods were employed as reference methods. Experiment results showed that the performance of siPLS model is the best compared with PLS and iPLS. The calibration models for AC, TPC, TFC and SSC had high values of determination coefficients of (R2) (0.9948, 0.9992, 0.9950 and 0.9832) and low root mean square error of cross validation (RMSECV) (0.0113, 0.0341, 0.1787 and 1.2158), which indicate a good correlation between reference values and NIR predicted values. The overall results show that the on line detection method could be feasible in real application and would be of great value for monitoring the mixed decoction process of FSOS and other Chinese patent medicines.
Helium Leak Test for the PLS Storage Ring Chamber

International Nuclear Information System (INIS)

Choi, M. H.; Kim, H. J.; Choi, W. C.

1993-01-01

The storage ring vacuum system for the Pohang Light Source (PLS) has been designed to maintain the vacuum pressure of 10 1 0 Torr which requires UHV welding to have helium leak rate less than 1x10 1 0 Torr·L/sec. In order to develop new technique (PLS) welding technique), a prototype vacuum chamber has been welded by using Tungsten Inert Gas welding method and all the welded joints have been tested with a non-destructive method, so called helium leak detection, to investigate the vacuum tightness of the weld joints. The test was performed with a detection limit of 1x10 1 0 Torr·L/sec for helium and no detectable leaks were found for all the welded joints. Thus the performance of welding technique is proven to meet the criteria of helium leak rate required in the PLS Storage Ring. Both the principle and the procedure for the helium leak detection are also discussed
Development of a partial least squares-artificial neural network (PLS-ANN) hybrid model for the prediction of consumer liking scores of ready-to-drink green tea beverages.

Science.gov (United States)

Yu, Peigen; Low, Mei Yin; Zhou, Weibiao

2018-01-01

In order to develop products that would be preferred by consumers, the effects of the chemical compositions of ready-to-drink green tea beverages on consumer liking were studied through regression analyses. Green tea model systems were prepared by dosing solutions of 0.1% green tea extract with differing concentrations of eight flavour keys deemed to be important for green tea aroma and taste, based on a D-optimal experimental design, before undergoing commercial sterilisation. Sensory evaluation of the green tea model system was carried out using an untrained consumer panel to obtain hedonic liking scores of the samples. Regression models were subsequently trained to objectively predict the consumer liking scores of the green tea model systems. A linear partial least squares (PLS) regression model was developed to describe the effects of the eight flavour keys on consumer liking, with a coefficient of determination (R 2 ) of 0.733, and a root-mean-square error (RMSE) of 3.53%. The PLS model was further augmented with an artificial neural network (ANN) to establish a PLS-ANN hybrid model. The established hybrid model was found to give a better prediction of consumer liking scores, based on its R 2 (0.875) and RMSE (2.41%). Copyright © 2017 Elsevier Ltd. All rights reserved.
Structured Additive Regression Models: An R Interface to BayesX

Directory of Open Access Journals (Sweden)

Nikolaus Umlauf

2015-02-01

Full Text Available Structured additive regression (STAR models provide a flexible framework for model- ing possible nonlinear effects of covariates: They contain the well established frameworks of generalized linear models and generalized additive models as special cases but also allow a wider class of effects, e.g., for geographical or spatio-temporal data, allowing for specification of complex and realistic models. BayesX is standalone software package providing software for fitting general class of STAR models. Based on a comprehensive open-source regression toolbox written in C++, BayesX uses Bayesian inference for estimating STAR models based on Markov chain Monte Carlo simulation techniques, a mixed model representation of STAR models, or stepwise regression techniques combining penalized least squares estimation with model selection. BayesX not only covers models for responses from univariate exponential families, but also models from less-standard regression situations such as models for multi-categorical responses with either ordered or unordered categories, continuous time survival data, or continuous time multi-state models. This paper presents a new fully interactive R interface to BayesX: the R package R2BayesX. With the new package, STAR models can be conveniently specified using Rs formula language (with some extended terms, fitted using the BayesX binary, represented in R with objects of suitable classes, and finally printed/summarized/plotted. This makes BayesX much more accessible to users familiar with R and adds extensive graphics capabilities for visualizing fitted STAR models. Furthermore, R2BayesX complements the already impressive capabilities for semiparametric regression in R by a comprehensive toolbox comprising in particular more complex response types and alternative inferential procedures such as simulation-based Bayesian inference.
[Influence of Spectral Pre-Processing on PLS Quantitative Model of Detecting Cu in Navel Orange by LIBS].

Science.gov (United States)

Li, Wen-bing; Yao, Lin-tao; Liu, Mu-hua; Huang, Lin; Yao, Ming-yin; Chen, Tian-bing; He, Xiu-wen; Yang, Ping; Hu, Hui-qin; Nie, Jiang-hui

2015-05-01

Cu in navel orange was detected rapidly by laser-induced breakdown spectroscopy (LIBS) combined with partial least squares (PLS) for quantitative analysis, then the effect on the detection accuracy of the model with different spectral data ptetreatment methods was explored. Spectral data for the 52 Gannan navel orange samples were pretreated by different data smoothing, mean centralized and standard normal variable transform. Then 319~338 nm wavelength section containing characteristic spectral lines of Cu was selected to build PLS models, the main evaluation indexes of models such as regression coefficient (r), root mean square error of cross validation (RMSECV) and the root mean square error of prediction (RMSEP) were compared and analyzed. Three indicators of PLS model after 13 points smoothing and processing of the mean center were found reaching 0. 992 8, 3. 43 and 3. 4 respectively, the average relative error of prediction model is only 5. 55%, and in one word, the quality of calibration and prediction of this model are the best results. The results show that selecting the appropriate data pre-processing method, the prediction accuracy of PLS quantitative model of fruits and vegetables detected by LIBS can be improved effectively, providing a new method for fast and accurate detection of fruits and vegetables by LIBS.
The Effect of Nonnormality on CB-SEM and PLS-SEM Path Estimates

OpenAIRE

Z. Jannoo; B. W. Yap; N. Auchoybur; M. A. Lazim

2014-01-01

The two common approaches to Structural Equation Modeling (SEM) are the Covariance-Based SEM (CB-SEM) and Partial Least Squares SEM (PLS-SEM). There is much debate on the performance of CB-SEM and PLS-SEM for small sample size and when distributions are nonnormal. This study evaluates the performance of CB-SEM and PLS-SEM under normality and nonnormality conditions via a simulation. Monte Carlo Simulation in R programming language was employed to generate data based on the theoretical model w...
Determination of Trace Amounts of Gold in Environmental Samples by Adsorptive Stripping Voltammetry of Its Complex with Rhodamine Using Osc-Pls

Directory of Open Access Journals (Sweden)

A. Akrami

2012-11-01

Full Text Available The multivariate calibration method was applied for the determination of trace amounts of gold based on a hanging mercury drop electrode (HMDE in the presence of rhodanine, followed by reduction of adsorbed gold by voltammetric scan using differential pulse modulation The optimum experimental conditions are: rhodanine concentration of 0.20 mg mL-1, pH 5.0, accumulation potential of -600 mV versus Ag/AgCl, accumulation time of 100 sec, scan rate of 30 mV s-1 and pulse height of 100 mV. The calibration matrix for partial least squares (PLS regression was designed with 9 samples. Orthogonal signal correction (OSC is a preprocessing technique used for removing the information unrelated to the target variables based on constrained principal component analysis. OSC is a suitable preprocessing method for PLS calibration without loss of prediction capacity using electrochemical method. The RMSEP for gold determination with PLS and OSC-PLS were 8.51 and 1.94, respectively. This procedure allows the determination of gold in synthetic and real samples with good reliability of the determination.
The feasibility of using explicit method for linear correction of the particle size variation using NIR Spectroscopy combined with PLS2regression method

Science.gov (United States)

Yulia, M.; Suhandy, D.

2018-03-01

NIR spectra obtained from spectral data acquisition system contains both chemical information of samples as well as physical information of the samples, such as particle size and bulk density. Several methods have been established for developing calibration models that can compensate for sample physical information variations. One common approach is to include physical information variation in the calibration model both explicitly and implicitly. The objective of this study was to evaluate the feasibility of using explicit method to compensate the influence of different particle size of coffee powder in NIR calibration model performance. A number of 220 coffee powder samples with two different types of coffee (civet and non-civet) and two different particle sizes (212 and 500 µm) were prepared. Spectral data was acquired using NIR spectrometer equipped with an integrating sphere for diffuse reflectance measurement. A discrimination method based on PLS-DA was conducted and the influence of different particle size on the performance of PLS-DA was investigated. In explicit method, we add directly the particle size as predicted variable results in an X block containing only the NIR spectra and a Y block containing the particle size and type of coffee. The explicit inclusion of the particle size into the calibration model is expected to improve the accuracy of type of coffee determination. The result shows that using explicit method the quality of the developed calibration model for type of coffee determination is a little bit superior with coefficient of determination (R2) = 0.99 and root mean square error of cross-validation (RMSECV) = 0.041. The performance of the PLS2 calibration model for type of coffee determination with particle size compensation was quite good and able to predict the type of coffee in two different particle sizes with relatively high R2 pred values. The prediction also resulted in low bias and RMSEP values.
Nonlinear Regression with R

CERN Document Server

Ritz, Christian; Parmigiani, Giovanni

2009-01-01

R is a rapidly evolving lingua franca of graphical display and statistical analysis of experiments from the applied sciences. This book provides a coherent treatment of nonlinear regression with R by means of examples from a diversity of applied sciences such as biology, chemistry, engineering, medicine and toxicology.

Fibre Morphological Characteristics of Kraft Pulps of Acacia melanoxylon Estimated by NIR-PLS-R Models

Directory of Open Access Journals (Sweden)

Helena Pereira

2015-12-01

Full Text Available In this paper, the morphological properties of fiber length (weighted in length and of fiber width of unbleached Kraft pulp of Acacia melanoxylon were determined using TECHPAP Morfi® equipment (Techpap SAS, Grenoble, France, and were used in the calibration development of Near Infrared (NIR partial least squares regression (PLS-R models based on the spectral data obtained for the wood. It is the first time that fiber length and width of pulp were predicted with NIR spectral data of the initial woodmeal, with high accuracy and precision, and with ratios of performance to deviation (RPD fulfilling the requirements for screening in breeding programs. The selected models for fiber length and fiber width used the second derivative and first derivative + multiplicative scatter correction (2ndDer and 1stDer + MSC pre-processed spectra, respectively, in the wavenumber ranges from 7506 to 5440 cm−1. The statistical parameters of cross-validation (RMSECV (root mean square error of cross-validation of 0.009 mm and 0.39 μm and validation (RMSEP (root mean square error of prediction of 0.007 mm and 0.36 μm with RPDTS (ratios of performance to deviation of test set values of 3.9 and 3.3, respectively, confirmed that the models are robust and well qualified for prediction. This modeling approach shows a high potential to be used for tree breeding and improvement programs, providing a rapid screening for desired fiber morphological properties of pulp prediction.
Statistical process control of cocrystallization processes: A comparison between OPLS and PLS.

Science.gov (United States)

Silva, Ana F T; Sarraguça, Mafalda Cruz; Ribeiro, Paulo R; Santos, Adenilson O; De Beer, Thomas; Lopes, João Almeida

2017-03-30

Orthogonal partial least squares regression (OPLS) is being increasingly adopted as an alternative to partial least squares (PLS) regression due to the better generalization that can be achieved. Particularly in multivariate batch statistical process control (BSPC), the use of OPLS for estimating nominal trajectories is advantageous. In OPLS, the nominal process trajectories are expected to be captured in a single predictive principal component while uncorrelated variations are filtered out to orthogonal principal components. In theory, OPLS will yield a better estimation of the Hotelling's T 2 statistic and corresponding control limits thus lowering the number of false positives and false negatives when assessing the process disturbances. Although OPLS advantages have been demonstrated in the context of regression, its use on BSPC was seldom reported. This study proposes an OPLS-based approach for BSPC of a cocrystallization process between hydrochlorothiazide and p-aminobenzoic acid monitored on-line with near infrared spectroscopy and compares the fault detection performance with the same approach based on PLS. A series of cocrystallization batches with imposed disturbances were used to test the ability to detect abnormal situations by OPLS and PLS-based BSPC methods. Results demonstrated that OPLS was generally superior in terms of sensibility and specificity in most situations. In some abnormal batches, it was found that the imposed disturbances were only detected with OPLS. Copyright © 2017 Elsevier B.V. All rights reserved.
Deflation in multiblock PLS

NARCIS (Netherlands)

Westerhuis, J. A.; Smilde, A. K.

2001-01-01

This paper describes some of the deflation problems in multiblock PLS. Deflation of X using block scores leads to inferior prediction of Y. Deflation of X using super scores gives the same predictions as standard PLS with all variables in one large X-block, but the information of the separate blocks
Linear feature selection in texture analysis - A PLS based method

DEFF Research Database (Denmark)

Marques, Joselene; Igel, Christian; Lillholm, Martin

2013-01-01

We present a texture analysis methodology that combined uncommitted machine-learning techniques and partial least square (PLS) in a fully automatic framework. Our approach introduces a robust PLS-based dimensionality reduction (DR) step to specifically address outliers and high-dimensional feature...... and considering all CV groups, the methods selected 36 % of the original features available. The diagnosis evaluation reached a generalization area-under-the-ROC curve of 0.92, which was higher than established cartilage-based markers known to relate to OA diagnosis....
Near infrared spectrometric technique for testing fruit quality: optimisation of regression models using genetic algorithms

Science.gov (United States)

Isingizwe Nturambirwe, J. Frédéric; Perold, Willem J.; Opara, Umezuruike L.

2016-02-01

Near infrared (NIR) spectroscopy has gained extensive use in quality evaluation. It is arguably one of the most advanced spectroscopic tools in non-destructive quality testing of food stuff, from measurement to data analysis and interpretation. NIR spectral data are interpreted through means often involving multivariate statistical analysis, sometimes associated with optimisation techniques for model improvement. The objective of this research was to explore the extent to which genetic algorithms (GA) can be used to enhance model development, for predicting fruit quality. Apple fruits were used, and NIR spectra in the range from 12000 to 4000 cm-1 were acquired on both bruised and healthy tissues, with different degrees of mechanical damage. GAs were used in combination with partial least squares regression methods to develop bruise severity prediction models, and compared to PLS models developed using the full NIR spectrum. A classification model was developed, which clearly separated bruised from unbruised apple tissue. GAs helped improve prediction models by over 10%, in comparison with full spectrum-based models, as evaluated in terms of error of prediction (Root Mean Square Error of Cross-validation). PLS models to predict internal quality, such as sugar content and acidity were developed and compared to the versions optimized by genetic algorithm. Overall, the results highlighted the potential use of GA method to improve speed and accuracy of fruit quality prediction.
Control Point Generated PLS - lines

Data.gov (United States)

Minnesota Department of Natural Resources — The Control Point Generated PLS layer contains line and polygon features to the 1/4 of 1/4 PLS section (approximately 40 acres) and government lot level. The layer...
Control Point Generated PLS - polygons

Data.gov (United States)

Minnesota Department of Natural Resources — The Control Point Generated PLS layer contains line and polygon features to the 1/4 of 1/4 PLS section (approximately 40 acres) and government lot level. The layer...
R for statistics

CERN Document Server

Cornillon, Pierre-Andre; Husson, Francois; Jegou, Nicolas; Josse, Julie; Kloareg, Maela; Matzner-Lober, Eric; Rouviere, Laurent

2012-01-01

An Overview of RMain ConceptsInstalling RWork SessionHelpR ObjectsFunctionsPackagesExercisesPreparing DataReading Data from FileExporting ResultsManipulating VariablesManipulating IndividualsConcatenating Data TablesCross-TabulationExercisesR GraphicsConventional Graphical FunctionsGraphical Functions with latticeExercisesMaking Programs with RControl FlowsPredefined FunctionsCreating a FunctionExercisesStatistical MethodsIntroduction to the Statistical MethodsA Quick Start with RInstalling ROpening and Closing RThe Command PromptAttribution, Objects, and FunctionSelectionOther Rcmdr PackageImporting (or Inputting) DataGraphsStatistical AnalysisHypothesis TestConfidence Intervals for a MeanChi-Square Test of IndependenceComparison of Two MeansTesting Conformity of a ProportionComparing Several ProportionsThe Power of a TestRegressionSimple Linear RegressionMultiple Linear RegressionPartial Least Squares (PLS) RegressionAnalysis of Variance and CovarianceOne-Way Analysis of VarianceMulti-Way Analysis of Varian...
Statistical Downscaling Output GCM Modeling with Continuum Regression and Pre-Processing PCA Approach

Directory of Open Access Journals (Sweden)

Sutikno Sutikno

2010-08-01

Full Text Available One of the climate models used to predict the climatic conditions is Global Circulation Models (GCM. GCM is a computer-based model that consists of different equations. It uses numerical and deterministic equation which follows the physics rules. GCM is a main tool to predict climate and weather, also it uses as primary information source to review the climate change effect. Statistical Downscaling (SD technique is used to bridge the large-scale GCM with a small scale (the study area. GCM data is spatial and temporal data most likely to occur where the spatial correlation between different data on the grid in a single domain. Multicollinearity problems require the need for pre-processing of variable data X. Continuum Regression (CR and pre-processing with Principal Component Analysis (PCA methods is an alternative to SD modelling. CR is one method which was developed by Stone and Brooks (1990. This method is a generalization from Ordinary Least Square (OLS, Principal Component Regression (PCR and Partial Least Square method (PLS methods, used to overcome multicollinearity problems. Data processing for the station in Ambon, Pontianak, Losarang, Indramayu and Yuntinyuat show that the RMSEP values and R2 predict in the domain 8x8 and 12x12 by uses CR method produces results better than by PCR and PLS.
Robust PLS approach for KPI-related prediction and diagnosis against outliers and missing data

Science.gov (United States)

Yin, Shen; Wang, Guang; Yang, Xu

2014-07-01

In practical industrial applications, the key performance indicator (KPI)-related prediction and diagnosis are quite important for the product quality and economic benefits. To meet these requirements, many advanced prediction and monitoring approaches have been developed which can be classified into model-based or data-driven techniques. Among these approaches, partial least squares (PLS) is one of the most popular data-driven methods due to its simplicity and easy implementation in large-scale industrial process. As PLS is totally based on the measured process data, the characteristics of the process data are critical for the success of PLS. Outliers and missing values are two common characteristics of the measured data which can severely affect the effectiveness of PLS. To ensure the applicability of PLS in practical industrial applications, this paper introduces a robust version of PLS to deal with outliers and missing values, simultaneously. The effectiveness of the proposed method is finally demonstrated by the application results of the KPI-related prediction and diagnosis on an industrial benchmark of Tennessee Eastman process.
On-line monitoring of extraction process of Flos Lonicerae Japonicae using near infrared spectroscopy combined with synergy interval PLS and genetic algorithm

Science.gov (United States)

Yang, Yue; Wang, Lei; Wu, Yongjiang; Liu, Xuesong; Bi, Yuan; Xiao, Wei; Chen, Yong

2017-07-01

There is a growing need for the effective on-line process monitoring during the manufacture of traditional Chinese medicine to ensure quality consistency. In this study, the potential of near infrared (NIR) spectroscopy technique to monitor the extraction process of Flos Lonicerae Japonicae was investigated. A new algorithm of synergy interval PLS with genetic algorithm (Si-GA-PLS) was proposed for modeling. Four different PLS models, namely Full-PLS, Si-PLS, GA-PLS, and Si-GA-PLS, were established, and their performances in predicting two quality parameters (viz. total acid and soluble solid contents) were compared. In conclusion, Si-GA-PLS model got the best results due to the combination of superiority of Si-PLS and GA. For Si-GA-PLS, the determination coefficient (Rp2) and root-mean-square error for the prediction set (RMSEP) were 0.9561 and 147.6544 μg/ml for total acid, 0.9062 and 0.1078% for soluble solid contents, correspondingly. The overall results demonstrated that the NIR spectroscopy technique combined with Si-GA-PLS calibration is a reliable and non-destructive alternative method for on-line monitoring of the extraction process of TCM on the production scale.
Exploring a physico-chemical multi-array explanatory model with a new multiple covariance-based technique: structural equation exploratory regression.

Science.gov (United States)

Bry, X; Verron, T; Cazes, P

2009-05-29

In this work, we consider chemical and physical variable groups describing a common set of observations (cigarettes). One of the groups, minor smoke compounds (minSC), is assumed to depend on the others (minSC predictors). PLS regression (PLSR) of m inSC on the set of all predictors appears not to lead to a satisfactory analytic model, because it does not take into account the expert's knowledge. PLS path modeling (PLSPM) does not use the multidimensional structure of predictor groups. Indeed, the expert needs to separate the influence of several pre-designed predictor groups on minSC, in order to see what dimensions this influence involves. To meet these needs, we consider a multi-group component-regression model, and propose a method to extract from each group several strong uncorrelated components that fit the model. Estimation is based on a global multiple covariance criterion, used in combination with an appropriate nesting approach. Compared to PLSR and PLSPM, the structural equation exploratory regression (SEER) we propose fully uses predictor group complementarity, both conceptually and statistically, to predict the dependent group.
Directional quantile regression in R

Czech Academy of Sciences Publication Activity Database

Boček, Pavel; Šiman, Miroslav

2017-01-01

Roč. 53, č. 3 (2017), s. 480-492 ISSN 0023-5954 R&D Projects: GA ČR GA14-07234S Institutional support: RVO:67985556 Keywords : multivariate quantile * regression quantile * halfspace depth * depth contour Subject RIV: BD - Theory of Information OBOR OECD: Applied mathematics Impact factor: 0.379, year: 2016 http://library.utia.cas.cz/separaty/2017/SI/bocek-0476587.pdf
Application of Fourier transform infrared spectroscopy and orthogonal projections to latent structures/partial least squares regression for estimation of procyanidins average degree of polymerisation.

Science.gov (United States)

Passos, Cláudia P; Cardoso, Susana M; Barros, António S; Silva, Carlos M; Coimbra, Manuel A

2010-02-28

Fourier transform infrared (FTIR) spectroscopy has being emphasised as a widespread technique in the quick assess of food components. In this work, procyanidins were extracted with methanol and acetone/water from the seeds of white and red grape varieties. A fractionation by graded methanol/chloroform precipitations allowed to obtain 26 samples that were characterised using thiolysis as pre-treatment followed by HPLC-UV and MS detection. The average degree of polymerisation (DPn) of the procyanidins in the samples ranged from 2 to 11 flavan-3-ol residues. FTIR spectroscopy within the wavenumbers region of 1800-700 cm(-1) allowed to build a partial least squares (PLS1) regression model with 8 latent variables (LVs) for the estimation of the DPn, giving a RMSECV of 11.7%, with a R(2) of 0.91 and a RMSEP of 2.58. The application of orthogonal projection to latent structures (O-PLS1) clarifies the interpretation of the regression model vectors. Moreover, the O-PLS procedure has removed 88% of non-correlated variations with the DPn, allowing to relate the increase of the absorbance peaks at 1203 and 1099 cm(-1) with the increase of the DPn due to the higher proportion of substitutions in the aromatic ring of the polymerised procyanidin molecules. Copyright 2009 Elsevier B.V. All rights reserved.
Distributed Monitoring of the R(sup 2) Statistic for Linear Regression

Science.gov (United States)

Bhaduri, Kanishka; Das, Kamalika; Giannella, Chris R.

2011-01-01

The problem of monitoring a multivariate linear regression model is relevant in studying the evolving relationship between a set of input variables (features) and one or more dependent target variables. This problem becomes challenging for large scale data in a distributed computing environment when only a subset of instances is available at individual nodes and the local data changes frequently. Data centralization and periodic model recomputation can add high overhead to tasks like anomaly detection in such dynamic settings. Therefore, the goal is to develop techniques for monitoring and updating the model over the union of all nodes data in a communication-efficient fashion. Correctness guarantees on such techniques are also often highly desirable, especially in safety-critical application scenarios. In this paper we develop DReMo a distributed algorithm with very low resource overhead, for monitoring the quality of a regression model in terms of its coefficient of determination (R2 statistic). When the nodes collectively determine that R2 has dropped below a fixed threshold, the linear regression model is recomputed via a network-wide convergecast and the updated model is broadcast back to all nodes. We show empirically, using both synthetic and real data, that our proposed method is highly communication-efficient and scalable, and also provide theoretical guarantees on correctness.
PLS-based memory control scheme for enhanced process monitoring

KAUST Repository

Harrou, Fouzi

2017-01-20

Fault detection is important for safe operation of various modern engineering systems. Partial least square (PLS) has been widely used in monitoring highly correlated process variables. Conventional PLS-based methods, nevertheless, often fail to detect incipient faults. In this paper, we develop new PLS-based monitoring chart, combining PLS with multivariate memory control chart, the multivariate exponentially weighted moving average (MEWMA) monitoring chart. The MEWMA are sensitive to incipient faults in the process mean, which significantly improves the performance of PLS methods and widen their applicability in practice. Using simulated distillation column data, we demonstrate that the proposed PLS-based MEWMA control chart is more effective in detecting incipient fault in the mean of the multivariate process variables, and outperform the conventional PLS-based monitoring charts.
Evaluation of syngas production unit cost of bio-gasification facility using regression analysis techniques

Energy Technology Data Exchange (ETDEWEB)

Deng, Yangyang; Parajuli, Prem B.

2011-08-10

Evaluation of economic feasibility of a bio-gasification facility needs understanding of its unit cost under different production capacities. The objective of this study was to evaluate the unit cost of syngas production at capacities from 60 through 1800Nm 3/h using an economic model with three regression analysis techniques (simple regression, reciprocal regression, and log-log regression). The preliminary result of this study showed that reciprocal regression analysis technique had the best fit curve between per unit cost and production capacity, with sum of error squares (SES) lower than 0.001 and coefficient of determination of (R 2) 0.996. The regression analysis techniques determined the minimum unit cost of syngas production for micro-scale bio-gasification facilities of $0.052/Nm 3, under the capacity of 2,880 Nm 3/h. The results of this study suggest that to reduce cost, facilities should run at a high production capacity. In addition, the contribution of this technique could be the new categorical criterion to evaluate micro-scale bio-gasification facility from the perspective of economic analysis.
Involvement of PlsX and the acyl-phosphate dependent sn-glycerol-3-phosphate acyltransferase PlsY in the initial stage of glycerolipid synthesis in Bacillus subtilis.

Science.gov (United States)

Hara, Yoshinori; Seki, Masahide; Matsuoka, Satoshi; Hara, Hiroshi; Yamashita, Atsushi; Matsumoto, Kouji

2008-12-01

The gene responsible for the first acylation of sn-glycerol-3-phosphate (G3P) in Bacillus subtilis has not yet been determined with certainty. The product of this first acylation, lysophosphatidic acid (LPA), is subsequently acylated again to form phosphatidic acid (PA), the primary precursor to membrane glycerolipids. A novel G3P acyltransferase (GPAT), the gene product of plsY, which uses acyl-phosphate formed by the plsX gene product, has recently been found to synthesize LPA in Streptococcus pneumoniae. We found that in B. subtilis growth arrests after repression of either a plsY homologue or a plsX homologue were overcome by expression of E. coli plsB, which encodes an acyl-acylcarrier protein (acyl-ACP)-dependent GPAT, although in the case of plsX repression a high level of plsB expression was required. B. subtilis has, therefore, a capability to use the acyl-ACP dependent GPAT of PlsB. Simultaneous expression of plsY and plsX suppressed the glycerol requirement of a strict glycerol auxotrophic derivative of the E. coli plsB26 mutant, although either one alone did not. Membrane fractions from B. subtilis cells catalyzed palmitoylphosphate-dependent acylation of [14C]-labeled G3P to synthesize [14C]-labeled LPA, whereas those from DeltaplsY cells did not. The results indicate unequivocally that PlsY is an acyl-phosphate dependent GPAT. Expression of plsX corrected the glycerol auxotrophy of a DeltaygiH (the deleted allele of an E. coli homologue of plsY) derivative of BB26-36 (plsB26 plsX50), suggesting an essential role of plsX other than substrate supply for acyl-phosphate dependent LPA synthesis. Two-hybrid examinations suggested that PlsY is associated with PlsX and that each may exist in multimeric form.
A graphical method to evaluate spectral preprocessing in multivariate regression calibrations: example with Savitzky-Golay filters and partial least squares regression.

Science.gov (United States)

Delwiche, Stephen R; Reeves, James B

2010-01-01

In multivariate regression analysis of spectroscopy data, spectral preprocessing is often performed to reduce unwanted background information (offsets, sloped baselines) or accentuate absorption features in intrinsically overlapping bands. These procedures, also known as pretreatments, are commonly smoothing operations or derivatives. While such operations are often useful in reducing the number of latent variables of the actual decomposition and lowering residual error, they also run the risk of misleading the practitioner into accepting calibration equations that are poorly adapted to samples outside of the calibration. The current study developed a graphical method to examine this effect on partial least squares (PLS) regression calibrations of near-infrared (NIR) reflection spectra of ground wheat meal with two analytes, protein content and sodium dodecyl sulfate sedimentation (SDS) volume (an indicator of the quantity of the gluten proteins that contribute to strong doughs). These two properties were chosen because of their differing abilities to be modeled by NIR spectroscopy: excellent for protein content, fair for SDS sedimentation volume. To further demonstrate the potential pitfalls of preprocessing, an artificial component, a randomly generated value, was included in PLS regression trials. Savitzky-Golay (digital filter) smoothing, first-derivative, and second-derivative preprocess functions (5 to 25 centrally symmetric convolution points, derived from quadratic polynomials) were applied to PLS calibrations of 1 to 15 factors. The results demonstrated the danger of an over reliance on preprocessing when (1) the number of samples used in a multivariate calibration is low (<50), (2) the spectral response of the analyte is weak, and (3) the goodness of the calibration is based on the coefficient of determination (R(2)) rather than a term based on residual error. The graphical method has application to the evaluation of other preprocess functions and various
Extending the linear model with R generalized linear, mixed effects and nonparametric regression models

CERN Document Server

Faraway, Julian J

2005-01-01

Linear models are central to the practice of statistics and form the foundation of a vast range of statistical methodologies. Julian J. Faraway''s critically acclaimed Linear Models with R examined regression and analysis of variance, demonstrated the different methods available, and showed in which situations each one applies. Following in those footsteps, Extending the Linear Model with R surveys the techniques that grow from the regression model, presenting three extensions to that framework: generalized linear models (GLMs), mixed effect models, and nonparametric regression models. The author''s treatment is thoroughly modern and covers topics that include GLM diagnostics, generalized linear mixed models, trees, and even the use of neural networks in statistics. To demonstrate the interplay of theory and practice, throughout the book the author weaves the use of the R software environment to analyze the data of real examples, providing all of the R commands necessary to reproduce the analyses. All of the ...

Variable selection methods in PLS regression - a comparison study on metabolomics data

DEFF Research Database (Denmark)

Karaman, İbrahim; Hedemann, Mette Skou; Knudsen, Knud Erik Bach

. The aim of the metabolomics study was to investigate the metabolic profile in pigs fed various cereal fractions with special attention to the metabolism of lignans using LC-MS based metabolomic approach. References 1. Lê Cao KA, Rossouw D, Robert-Granié C, Besse P: A Sparse PLS for Variable Selection when...... integrated approach. Due to the high number of variables in data sets (both raw data and after peak picking) the selection of important variables in an explorative analysis is difficult, especially when different data sets of metabolomics data need to be related. Variable selection (or removal of irrelevant...... different strategies for variable selection on PLSR method were considered and compared with respect to selected subset of variables and the possibility for biological validation. Sparse PLSR [1] as well as PLSR with Jack-knifing [2] was applied to data in order to achieve variable selection prior...
New strategy for determination of anthocyanins, polyphenols and antioxidant capacity of Brassica oleracea liquid extract using infrared spectroscopies and multivariate regression

Science.gov (United States)

de Oliveira, Isadora R. N.; Roque, Jussara V.; Maia, Mariza P.; Stringheta, Paulo C.; Teófilo, Reinaldo F.

2018-04-01

A new method was developed to determine the antioxidant properties of red cabbage extract (Brassica oleracea) by mid (MID) and near (NIR) infrared spectroscopies and partial least squares (PLS) regression. A 70% (v/v) ethanolic extract of red cabbage was concentrated to 9° Brix and further diluted (12 to 100%) in water. The dilutions were used as external standards for the building of PLS models. For the first time, this strategy was applied for building multivariate regression models. Reference analyses and spectral data were obtained from diluted extracts. The determinate properties were total and monomeric anthocyanins, total polyphenols and antioxidant capacity by ABTS (2,2-azino-bis(3-ethyl-benzothiazoline-6-sulfonate)) and DPPH (2,2-diphenyl-1-picrylhydrazyl) methods. Ordered predictors selection (OPS) and genetic algorithm (GA) were used for feature selection before PLS regression (PLS-1). In addition, a PLS-2 regression was applied to all properties simultaneously. PLS-1 models provided more predictive models than did PLS-2 regression. PLS-OPS and PLS-GA models presented excellent prediction results with a correlation coefficient higher than 0.98. However, the best models were obtained using PLS and variable selection with the OPS algorithm and the models based on NIR spectra were considered more predictive for all properties. Then, these models provided a simple, rapid and accurate method for determination of red cabbage extract antioxidant properties and its suitability for use in the food industry.
Status report on control system development for PLS

International Nuclear Information System (INIS)

Won, S.C.; Chang, S.S.; Huang, J.; Lee, J.W.; Lee, J.; Kim, J.H.

1992-01-01

Emphasizing reliability and flexibility, hierarchical architecture with distributed computers have been designed into the Pohang Light Source (PLS) computer control system. The PLS control system has four layers of computer systems connected via multiple data communication networks. This paper presents an overview of the PLS control system. (author)
OPLS statistical model versus linear regression to assess sonographic predictors of stroke prognosis.

Science.gov (United States)

Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi

2012-01-01

The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.
Multiclass Prediction with Partial Least Square Regression for Gene Expression Data: Applications in Breast Cancer Intrinsic Taxonomy

Directory of Open Access Journals (Sweden)

Chi-Cheng Huang

2013-01-01

Full Text Available Multiclass prediction remains an obstacle for high-throughput data analysis such as microarray gene expression profiles. Despite recent advancements in machine learning and bioinformatics, most classification tools were limited to the applications of binary responses. Our aim was to apply partial least square (PLS regression for breast cancer intrinsic taxonomy, of which five distinct molecular subtypes were identified. The PAM50 signature genes were used as predictive variables in PLS analysis, and the latent gene component scores were used in binary logistic regression for each molecular subtype. The 139 prototypical arrays for PAM50 development were used as training dataset, and three independent microarray studies with Han Chinese origin were used for independent validation (n=535. The agreement between PAM50 centroid-based single sample prediction (SSP and PLS-regression was excellent (weighted Kappa: 0.988 within the training samples, but deteriorated substantially in independent samples, which could attribute to much more unclassified samples by PLS-regression. If these unclassified samples were removed, the agreement between PAM50 SSP and PLS-regression improved enormously (weighted Kappa: 0.829 as opposed to 0.541 when unclassified samples were analyzed. Our study ascertained the feasibility of PLS-regression in multi-class prediction, and distinct clinical presentations and prognostic discrepancies were observed across breast cancer molecular subtypes.
Partial Least Squares with Structured Output for Modelling the Metabolomics Data Obtained from Complex Experimental Designs: A Study into the Y-Block Coding

Directory of Open Access Journals (Sweden)

Yun Xu

2016-10-01

Full Text Available Partial least squares (PLS is one of the most commonly used supervised modelling approaches for analysing multivariate metabolomics data. PLS is typically employed as either a regression model (PLS-R or a classification model (PLS-DA. However, in metabolomics studies it is common to investigate multiple, potentially interacting, factors simultaneously following a specific experimental design. Such data often cannot be considered as a “pure” regression or a classification problem. Nevertheless, these data have often still been treated as a regression or classification problem and this could lead to ambiguous results. In this study, we investigated the feasibility of designing a hybrid target matrix Y that better reflects the experimental design than simple regression or binary class membership coding commonly used in PLS modelling. The new design of Y coding was based on the same principle used by structural modelling in machine learning techniques. Two real metabolomics datasets were used as examples to illustrate how the new Y coding can improve the interpretability of the PLS model compared to classic regression/classification coding.
Improved intact soil-core carbon determination applying regression shrinkage and variable selection techniques to complete spectrum laser-induced breakdown spectroscopy (LIBS).

Science.gov (United States)

Bricklemyer, Ross S; Brown, David J; Turk, Philip J; Clegg, Sam M

2013-10-01

Laser-induced breakdown spectroscopy (LIBS) provides a potential method for rapid, in situ soil C measurement. In previous research on the application of LIBS to intact soil cores, we hypothesized that ultraviolet (UV) spectrum LIBS (200-300 nm) might not provide sufficient elemental information to reliably discriminate between soil organic C (SOC) and inorganic C (IC). In this study, using a custom complete spectrum (245-925 nm) core-scanning LIBS instrument, we analyzed 60 intact soil cores from six wheat fields. Predictive multi-response partial least squares (PLS2) models using full and reduced spectrum LIBS were compared for directly determining soil total C (TC), IC, and SOC. Two regression shrinkage and variable selection approaches, the least absolute shrinkage and selection operator (LASSO) and sparse multivariate regression with covariance estimation (MRCE), were tested for soil C predictions and the identification of wavelengths important for soil C prediction. Using complete spectrum LIBS for PLS2 modeling reduced the calibration standard error of prediction (SEP) 15 and 19% for TC and IC, respectively, compared to UV spectrum LIBS. The LASSO and MRCE approaches provided significantly improved calibration accuracy and reduced SEP 32-55% over UV spectrum PLS2 models. We conclude that (1) complete spectrum LIBS is superior to UV spectrum LIBS for predicting soil C for intact soil cores without pretreatment; (2) LASSO and MRCE approaches provide improved calibration prediction accuracy over PLS2 but require additional testing with increased soil and target analyte diversity; and (3) measurement errors associated with analyzing intact cores (e.g., sample density and surface roughness) require further study and quantification.
Robust Ultraviolet-Visible (UV-Vis) Partial Least-Squares (PLS) Models for Tannin Quantification in Red Wine.

Science.gov (United States)

Aleixandre-Tudo, José Luis; Nieuwoudt, Helené; Aleixandre, José Luis; Du Toit, Wessel J

2015-02-04

The validation of ultraviolet-visible (UV-vis) spectroscopy combined with partial least-squares (PLS) regression to quantify red wine tannins is reported. The methylcellulose precipitable (MCP) tannin assay and the bovine serum albumin (BSA) tannin assay were used as reference methods. To take the high variability of wine tannins into account when the calibration models were built, a diverse data set was collected from samples of South African red wines that consisted of 18 different cultivars, from regions spanning the wine grape-growing areas of South Africa with their various sites, climates, and soils, ranging in vintage from 2000 to 2012. A total of 240 wine samples were analyzed, and these were divided into a calibration set (n = 120) and a validation set (n = 120) to evaluate the predictive ability of the models. To test the robustness of the PLS calibration models, the predictive ability of the classifying variables cultivar, vintage year, and experimental versus commercial wines was also tested. In general, the statistics obtained when BSA was used as a reference method were slightly better than those obtained with MCP. Despite this, the MCP tannin assay should also be considered as a valid reference method for developing PLS calibrations. The best calibration statistics for the prediction of new samples were coefficient of correlation (R 2 val) = 0.89, root mean standard error of prediction (RMSEP) = 0.16, and residual predictive deviation (RPD) = 3.49 for MCP and R 2 val = 0.93, RMSEP = 0.08, and RPD = 4.07 for BSA, when only the UV region (260-310 nm) was selected, which also led to a faster analysis time. In addition, a difference in the results obtained when the predictive ability of the classifying variables vintage, cultivar, or commercial versus experimental wines was studied suggests that tannin composition is highly affected by many factors. This study also discusses the correlations in tannin values between the methylcellulose and protein
PLS beam position measurement and feedback system

International Nuclear Information System (INIS)

Huang, J.Y.; Lee, J.; Park, M.K.; Kim, J.H.; Won, S.C.

1992-01-01

A real-time orbit correction system is proposed for the stabilization of beam orbit and photon beam positions in Pohang Light Source. PLS beam position monitoring system is designed to be VMEbus compatible to fit the real-time digital orbit feedback system. A VMEbus based subsystem control computer, Mil-1553B communication network and 12 BPM/PS machine interface units constitute digital part of the feedback system. With the super-stable PLS correction magnet power supply, power line frequency noise is almost filtered out and the dominant spectra of beam obtit fluctuations are expected to appear below 15 Hz. Using DSP board in SCC for the computation and using an appropriate compensation circuit for the phase delay by the vacuum chamber, PLS real-time orbit correction system is realizable without changing the basic structure of PLS computer control system. (author)
Check-all-that-apply data analysed by Partial Least Squares regression

DEFF Research Database (Denmark)

Rinnan, Åsmund; Giacalone, Davide; Frøst, Michael Bom

2015-01-01

are analysed by multivariate techniques. CATA data can be analysed both by setting the CATA as the X and the Y. The former is the PLS-Discriminant Analysis (PLS-DA) version, while the latter is the ANOVA-PLS (A-PLS) version. We investigated the difference between these two approaches, concluding...
Nuclear magnetic resonance metabonomic profiling using tO2PLS

Energy Technology Data Exchange (ETDEWEB)

Kirwan, Gemma M., E-mail: gemma.kirwan@gmail.com [Department of Chemistry, School of Applied Sciences, RMIT University, City Campus, Vic 3001 (Australia); Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho Uji, Kyoto (Japan); Hancock, Timothy [Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho Uji, Kyoto (Japan); Hassell, Kathryn [Biotechnology and Environmental Biology, School of Applied Sciences, RMIT University, PO Box 71, Bundoora, Vic 3083 (Australia); Niere, Julie O. [Department of Chemistry, School of Applied Sciences, RMIT University, City Campus, Vic 3001 (Australia); Nugegoda, Dayanthi [Biotechnology and Environmental Biology, School of Applied Sciences, RMIT University, PO Box 71, Bundoora, Vic 3083 (Australia); Goto, Susumu [Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho Uji, Kyoto (Japan); Adams, Michael J. [Department of Chemistry, School of Applied Sciences, RMIT University, City Campus, Vic 3001 (Australia)

2013-06-05

Graphical abstract: -- Highlights: •Transposition of O2PLS input matrix (tO2PLS) to analyze metabonomics data. •tO2PLS specific components describe features that separate and define sample groups. •Application of tO2PLS to a {sup 1}H NMR metabonomics study of black bream fish. -- Abstract: Blood plasma collected from adult fish (black bream, Sparidae) exposed to a dose of 5 mg kg{sup −1} 17β-estradiol underwent metabonomic profiling using nuclear magnetic resonance (NMR). An extension of the orthogonal 2 projection to latent structure (O2PLS) analysis, tO2PLS, was proposed and utilized to classify changes between the control and experimental metabolic profiles. As a bidirectional modeling tool, O2PLS examines the (variable) commonality between two different data blocks, and extracts the joint correlations as well as the unique variations present within each data block. tO2PLS is a proposed matrix transposition of O2PLS to allow for commonality between experiments (spectral profiles) to be observed, rather than between sample variables. tO2PLS analysis highlighted two potential biomarkers, trimethylamine-N-oxide (TMAO) and choline, that distinguish between control and 17β-estradiol exposed fish. This study presents an alternative way of examining spectroscopic (metabolite) data, providing a method for the visual assessment of similarities and differences between control and experimental spectral features in large data sets.
Nuclear magnetic resonance metabonomic profiling using tO2PLS

International Nuclear Information System (INIS)

Kirwan, Gemma M.; Hancock, Timothy; Hassell, Kathryn; Niere, Julie O.; Nugegoda, Dayanthi; Goto, Susumu; Adams, Michael J.

2013-01-01

Graphical abstract: -- Highlights: •Transposition of O2PLS input matrix (tO2PLS) to analyze metabonomics data. •tO2PLS specific components describe features that separate and define sample groups. •Application of tO2PLS to a 1 H NMR metabonomics study of black bream fish. -- Abstract: Blood plasma collected from adult fish (black bream, Sparidae) exposed to a dose of 5 mg kg −1 17β-estradiol underwent metabonomic profiling using nuclear magnetic resonance (NMR). An extension of the orthogonal 2 projection to latent structure (O2PLS) analysis, tO2PLS, was proposed and utilized to classify changes between the control and experimental metabolic profiles. As a bidirectional modeling tool, O2PLS examines the (variable) commonality between two different data blocks, and extracts the joint correlations as well as the unique variations present within each data block. tO2PLS is a proposed matrix transposition of O2PLS to allow for commonality between experiments (spectral profiles) to be observed, rather than between sample variables. tO2PLS analysis highlighted two potential biomarkers, trimethylamine-N-oxide (TMAO) and choline, that distinguish between control and 17β-estradiol exposed fish. This study presents an alternative way of examining spectroscopic (metabolite) data, providing a method for the visual assessment of similarities and differences between control and experimental spectral features in large data sets
Interval ridge regression (iRR) as a fast and robust method for quantitative prediction and variable selection applied to edible oil adulteration.

Science.gov (United States)

Jović, Ozren; Smrečki, Neven; Popović, Zora

2016-04-01

A novel quantitative prediction and variable selection method called interval ridge regression (iRR) is studied in this work. The method is performed on six data sets of FTIR, two data sets of UV-vis and one data set of DSC. The obtained results show that models built with ridge regression on optimal variables selected with iRR significantly outperfom models built with ridge regression on all variables in both calibration (6 out of 9 cases) and validation (2 out of 9 cases). In this study, iRR is also compared with interval partial least squares regression (iPLS). iRR outperfomed iPLS in validation (insignificantly in 6 out of 9 cases and significantly in one out of 9 cases for poil, a well known health beneficial nutrient, is studied in this work by mixing it with cheap and widely used oils such as soybean (So) oil, rapeseed (R) oil and sunflower (Su) oil. Binary mixture sets of hempseed oil with these three oils (HSo, HR and HSu) and a ternary mixture set of H oil, R oil and Su oil (HRSu) were considered. The obtained accuracy indicates that using iRR on FTIR and UV-vis data, each particular oil can be very successfully quantified (in all 8 cases RMSEPoil (R(2)>0.99). Copyright © 2015 Elsevier B.V. All rights reserved.
Exact estimation of biodiesel cetane number (CN) from its fatty acid methyl esters (FAMEs) profile using partial least square (PLS) adapted by artificial neural network (ANN)

International Nuclear Information System (INIS)

Hosseinpour, Soleiman; Aghbashlo, Mortaza; Tabatabaei, Meisam; Khalife, Esmail

2016-01-01

Highlights: • Estimating the biodiesel CN from its FAMEs profile using ANN-based PLS approach. • Comparing the capability of ANN-adapted PLS approach with the standard PLS model. • Exact prediction of biodiesel CN from it FAMEs profile using ANN-based PLS method. • Developing an easy-to-use software using ANN-PLS model for computing the biodiesel CN. - Abstract: Cetane number (CN) is among the most important properties of biodiesel because it quantifies combustion speed or in better words, ignition quality. Experimental measurement of biodiesel CN is rather laborious and expensive. However, the high proportionality of biodiesel fatty acid methyl esters (FAMEs) profile with its CN is very appealing to develop straightforward and inexpensive computerized tools for biodiesel CN estimation. Unfortunately, correlating the chemical structure of biodiesel to its CN using conventional statistical and mathematical approaches is very difficult. To solve this issue, partial least square (PLS) adapted by artificial neural network (ANN) was introduced and examined herein as an innovative approach for the exact estimation of biodiesel CN from its FAMEs profile. In the proposed approach, ANN paradigm was used for modeling the inner relation between the input and the output PLS score vectors. In addition, the capability of the developed method in predicting the biodiesel CN was compared with the basal PLS method. The accuracy of the developed approaches for computing the biodiesel CN was assessed using three statistical criteria, i.e., coefficient of determination (R"2), mean-squared error (MSE), and percentage error (PE). The ANN-adapted PLS method predicted the biodiesel CN with an R"2 value higher than 0.99 demonstrating the fidelity of the developed model over the classical PLS method with a markedly lower R"2 value of about 0.85. In order to facilitate the use of the proposed model, an easy-to-use computer program was also developed on the basis of ANN-adapted PLS
A PLS-based extractive spectrophotometric method for simultaneous determination of carbamazepine and carbamazepine-10,11-epoxide in plasma and comparison with HPLC

Science.gov (United States)

Hemmateenejad, Bahram; Rezaei, Zahra; Khabnadideh, Soghra; Saffari, Maryam

2007-11-01

Carbamazepine (CBZ) undergoes enzyme biotransformation through epoxidation with the formation of its metabolite, carbamazepine-10,11-epoxide (CBZE). A simple chemometrics-assisted spectrophotometric method has been proposed for simultaneous determination of CBZ and CBZE in plasma. A liquid extraction procedure was operated to separate the analytes from plasma, and the UV absorbance spectra of the resultant solutions were subjected to partial least squares (PLS) regression. The optimum number of PLS latent variables was selected according to the PRESS values of leave-one-out cross-validation. A HPLC method was also employed for comparison. The respective mean recoveries for analysis of CBZ and CBZE in synthetic mixtures were 102.57 (±0.25)% and 103.00 (±0.09)% for PLS and 99.40 (±0.15)% and 102.20 (±0.02)%. The concentrations of CBZ and CBZE were also determined in five patients using the PLS and HPLC methods. The results showed that the data obtained by PLS were comparable with those obtained by HPLC method.
Novel, customizable scoring functions, parameterized using N-PLS, for structure-based drug discovery.

Science.gov (United States)

Catana, Cornel; Stouten, Pieter F W

2007-01-01

The ability to accurately predict biological affinity on the basis of in silico docking to a protein target remains a challenging goal in the CADD arena. Typically, "standard" scoring functions have been employed that use the calculated docking result and a set of empirical parameters to calculate a predicted binding affinity. To improve on this, we are exploring novel strategies for rapidly developing and tuning "customized" scoring functions tailored to a specific need. In the present work, three such customized scoring functions were developed using a set of 129 high-resolution protein-ligand crystal structures with measured Ki values. The functions were parametrized using N-PLS (N-way partial least squares), a multivariate technique well-known in the 3D quantitative structure-activity relationship field. A modest correlation between observed and calculated pKi values using a standard scoring function (r2 = 0.5) could be improved to 0.8 when a customized scoring function was applied. To mimic a more realistic scenario, a second scoring function was developed, not based on crystal structures but exclusively on several binding poses generated with the Flo+ docking program. Finally, a validation study was conducted by generating a third scoring function with 99 randomly selected complexes from the 129 as a training set and predicting pKi values for a test set that comprised the remaining 30 complexes. Training and test set r2 values were 0.77 and 0.78, respectively. These results indicate that, even without direct structural information, predictive customized scoring functions can be developed using N-PLS, and this approach holds significant potential as a general procedure for predicting binding affinity on the basis of in silico docking.
Evaluation of linear regression techniques for atmospheric applications: the importance of appropriate weighting

Directory of Open Access Journals (Sweden)

C. Wu

2018-03-01

Full Text Available Linear regression techniques are widely used in atmospheric science, but they are often improperly applied due to lack of consideration or inappropriate handling of measurement uncertainty. In this work, numerical experiments are performed to evaluate the performance of five linear regression techniques, significantly extending previous works by Chu and Saylor. The five techniques are ordinary least squares (OLS, Deming regression (DR, orthogonal distance regression (ODR, weighted ODR (WODR, and York regression (YR. We first introduce a new data generation scheme that employs the Mersenne twister (MT pseudorandom number generator. The numerical simulations are also improved by (a refining the parameterization of nonlinear measurement uncertainties, (b inclusion of a linear measurement uncertainty, and (c inclusion of WODR for comparison. Results show that DR, WODR and YR produce an accurate slope, but the intercept by WODR and YR is overestimated and the degree of bias is more pronounced with a low R2 XY dataset. The importance of a properly weighting parameter λ in DR is investigated by sensitivity tests, and it is found that an improper λ in DR can lead to a bias in both the slope and intercept estimation. Because the λ calculation depends on the actual form of the measurement error, it is essential to determine the exact form of measurement error in the XY data during the measurement stage. If a priori error in one of the variables is unknown, or the measurement error described cannot be trusted, DR, WODR and YR can provide the least biases in slope and intercept among all tested regression techniques. For these reasons, DR, WODR and YR are recommended for atmospheric studies when both X and Y data have measurement errors. An Igor Pro-based program (Scatter Plot was developed to facilitate the implementation of error-in-variables regressions.
Evaluation of linear regression techniques for atmospheric applications: the importance of appropriate weighting

Science.gov (United States)

Wu, Cheng; Zhen Yu, Jian

2018-03-01

Linear regression techniques are widely used in atmospheric science, but they are often improperly applied due to lack of consideration or inappropriate handling of measurement uncertainty. In this work, numerical experiments are performed to evaluate the performance of five linear regression techniques, significantly extending previous works by Chu and Saylor. The five techniques are ordinary least squares (OLS), Deming regression (DR), orthogonal distance regression (ODR), weighted ODR (WODR), and York regression (YR). We first introduce a new data generation scheme that employs the Mersenne twister (MT) pseudorandom number generator. The numerical simulations are also improved by (a) refining the parameterization of nonlinear measurement uncertainties, (b) inclusion of a linear measurement uncertainty, and (c) inclusion of WODR for comparison. Results show that DR, WODR and YR produce an accurate slope, but the intercept by WODR and YR is overestimated and the degree of bias is more pronounced with a low R2 XY dataset. The importance of a properly weighting parameter λ in DR is investigated by sensitivity tests, and it is found that an improper λ in DR can lead to a bias in both the slope and intercept estimation. Because the λ calculation depends on the actual form of the measurement error, it is essential to determine the exact form of measurement error in the XY data during the measurement stage. If a priori error in one of the variables is unknown, or the measurement error described cannot be trusted, DR, WODR and YR can provide the least biases in slope and intercept among all tested regression techniques. For these reasons, DR, WODR and YR are recommended for atmospheric studies when both X and Y data have measurement errors. An Igor Pro-based program (Scatter Plot) was developed to facilitate the implementation of error-in-variables regressions.
Olive oil sensory defects classification with data fusion of instrumental techniques and multivariate analysis (PLS-DA).

Science.gov (United States)

Borràs, Eva; Ferré, Joan; Boqué, Ricard; Mestres, Montserrat; Aceña, Laura; Calvo, Angels; Busto, Olga

2016-07-15

Three instrumental techniques, headspace-mass spectrometry (HS-MS), mid-infrared spectroscopy (MIR) and UV-visible spectrophotometry (UV-vis), have been combined to classify virgin olive oil samples based on the presence or absence of sensory defects. The reference sensory values were provided by an official taste panel. Different data fusion strategies were studied to improve the discrimination capability compared to using each instrumental technique individually. A general model was applied to discriminate high-quality non-defective olive oils (extra-virgin) and the lowest-quality olive oils considered non-edible (lampante). A specific identification of key off-flavours, such as musty, winey, fusty and rancid, was also studied. The data fusion of the three techniques improved the classification results in most of the cases. Low-level data fusion was the best strategy to discriminate musty, winey and fusty defects, using HS-MS, MIR and UV-vis, and the rancid defect using only HS-MS and MIR. The mid-level data fusion approach using partial least squares-discriminant analysis (PLS-DA) scores was found to be the best strategy for defective vs non-defective and edible vs non-edible oil discrimination. However, the data fusion did not sufficiently improve the results obtained by a single technique (HS-MS) to classify non-defective classes. These results indicate that instrumental data fusion can be useful for the identification of sensory defects in virgin olive oils. Copyright © 2016 Elsevier Ltd. All rights reserved.
[Study on predicting firmness of watermelon by Vis/NIR diffuse transmittance technique].

Science.gov (United States)

Tian, Hai-Qing; Ying, Yi-Bin; Lu, Hui-Shan; Xu, Hui-Rong; Xie, Li-Juan; Fu, Xia-Ping; Yu, Hai-Yan

2007-06-01

Watermelon is a popular fruit in the world and firmness (FM) is one of the major characteristics used for assessing watermelon quality. The objective of the present research was to study the potential of visible/near Infrared (Vis/NIR) diffuse transmittance spectroscopy as a way for the nondestructive measurement of FM of watermelon. Statistical models between the spectra and FM were developed using partial least square (PLS) and principle component regression (PCR) methods. Performance of different models was assessed in terms of correlation coefficients (r) of validation set of samples and root mean square errors of prediction (RMSEP). Models for three kinds of mathematical treatments of spectra (original, first derivative and second derivative) were established. Savitsky-Goaly filter smoothing method was used for spectra data smoothing. The PLS model of the second derivative spectra gave the best prediction of FM, with a correlation coefficient (r) of 0. 974 and root mean square errors of prediction (RMSEP) of 0. 589 N using Savitsky-Goaly filter smoothing method. The results of this study indicate that NIR diffuse transmittance spectroscopy can be used to predict the FM of watermelon. The Vis/NIR diffuse transmittance technique will be valuable for the nandestructive detection large shape and thick peel fruits'.

Application of GA-PLS and GA-KPLS calculations for the prediction of the retention indices of essential oils

Directory of Open Access Journals (Sweden)

Hadi Noorizadeh

2011-01-01

Full Text Available Genetic algorithm and partial least square (GA-PLS and kernel PLS (GA-KPLS techniques were used to investigate the correlation between retention indices (RI and descriptors for 117 diverse compounds in essential oils from 5 Pimpinella species gathered from central Turkey which were obtained by gas chromatography and gas chromatography-mass spectrometry. The square correlation coefficient leave-group-out cross validation (LGO-CV (Q² between experimental and predicted RI for training set by GA-PLS and GA-KPLS was 0.940 and 0.963, respectively. This indicates that GA-KPLS can be used as an alternative modeling tool for quantitative structure-retention relationship (QSRR studies.
New PLS analysis approach to wine volatile compounds characterization by near infrared spectroscopy (NIR).

Science.gov (United States)

Genisheva, Z; Quintelas, C; Mesquita, D P; Ferreira, E C; Oliveira, J M; Amaral, A L

2018-04-25

This work aims to explore the potential of near infrared (NIR) spectroscopy to quantify volatile compounds in Vinho Verde wines, commonly determined by gas chromatography. For this purpose, 105 Vinho Verde wine samples were analyzed using Fourier transform near infrared (FT-NIR) transmission spectroscopy in the range of 5435 cm -1 to 6357 cm -1 . Boxplot and principal components analysis (PCA) were performed for clusters identification and outliers removal. A partial least square (PLS) regression was then applied to develop the calibration models, by a new iterative approach. The predictive ability of the models was confirmed by an external validation procedure with an independent sample set. The obtained results could be considered as quite good with coefficients of determination (R 2 ) varying from 0.94 to 0.97. The current methodology, using NIR spectroscopy and chemometrics, can be seen as a promising rapid tool to determine volatile compounds in Vinho Verde wines. Copyright © 2017 Elsevier Ltd. All rights reserved.
PLS-based memory control scheme for enhanced process monitoring

KAUST Repository

Harrou, Fouzi; Sun, Ying

2017-01-01

Fault detection is important for safe operation of various modern engineering systems. Partial least square (PLS) has been widely used in monitoring highly correlated process variables. Conventional PLS-based methods, nevertheless, often fail
[Prediction of SPAD value in oilseed rape leaves using hyperspectral imaging technique].

Science.gov (United States)

Ding, Xi-bin; Liu, Fei; Zhang, Chu; He, Yong

2015-02-01

In the present work, prediction models of SPAD value (Soil and Plant Analyzer Development, often used as a parameter to indicate chlorophyll content) in oilseed rape leaves were successfully built using hyperspectral imaging technique. The hy perspectral images of 160 oilseed rape leaf samples in the spectral range of 380-1030 nm were acquired. Average spectrum was extracted from the region of interest (ROI) of each sample. We chose spectral data in the spectral range of 500-900 nm for analysis. Using Monte Carlo partial least squares(MC-PLS) algorithm, 13 samples were identified as outliers and eliminated. Based on the spectral information and measured SPAD values of the rest 147 samples, several estimation models have been built based on different parameters using different algorithms for comparison, including: (1) a SPAD value estimation model based on partial least squares(PLS) in the whole wavelength region of 500-900 nm; (2) a SPAD value estimation model based on successive projections algorithmcombined with PLS(SPA-PLS); (3) 4 kind of simple experience SPAD value estimation models in which red edge position was used as an argument; (4) 4 kind of simple experience SPAD value estimation models in which three vegetation indexes R710/R760, (R750-R705)/(R750-R705) and R860/(R550 x R708), which all have been proved to have a good relevance with chlorophyll content, were used as an argument respectively; (5) a SPAD value estimation model based on PLS using the 3 vegetation indexes mentioned above. The results indicate that the optimal prediction performance is achieved by PLS model in the whole wavelength region of 500-900 nm, which has a correlation coefficient(r(p)) of 0.8339 and a root mean squares error of predicted (RMSEP) of 1.52. The SPA-PLS model can provide avery close prediction result while the calibration computation has been significantly reduced and the calibration speed has been accelerated sharply. For simple experience models based on red edge
PLS and multicollinearity under conditions common in satisfaction studies

DEFF Research Database (Denmark)

Nielsen, Rikke; Kristensen, Kai; Eskildsen, Jacob Kjær

A number of studies have investigated the performance of the PLS path modelling algorithm in the presence of common empirical problems, such as model misspecification, skewness of manifest variables, missing values, and multicollinearity, and they have shown PLS to be quite robust (see e.g. Cassel...... et al., 1999; Kristensen, Eskildsen, 2005). However, most of the studies, including our own, have focused on somewhat simple models with very simple correlation structures. This paper extends the existing knowledge by investigating the effect of varying degrees of multicollinearity on the PLS model...
Mixed Frequency Data Sampling Regression Models: The R Package midasr

Directory of Open Access Journals (Sweden)

Eric Ghysels

2016-08-01

Full Text Available When modeling economic relationships it is increasingly common to encounter data sampled at different frequencies. We introduce the R package midasr which enables estimating regression models with variables sampled at different frequencies within a MIDAS regression framework put forward in work by Ghysels, Santa-Clara, and Valkanov (2002. In this article we define a general autoregressive MIDAS regression model with multiple variables of different frequencies and show how it can be specified using the familiar R formula interface and estimated using various optimization methods chosen by the researcher. We discuss how to check the validity of the estimated model both in terms of numerical convergence and statistical adequacy of a chosen regression specification, how to perform model selection based on a information criterion, how to assess forecasting accuracy of the MIDAS regression model and how to obtain a forecast aggregation of different MIDAS regression models. We illustrate the capabilities of the package with a simulated MIDAS regression model and give two empirical examples of application of MIDAS regression.
Determinação do Poder Calorífico de Amostras de Gasolina Utilizando Espectroscopia no Infravermelho Próximo e Regressão Multivariada

Directory of Open Access Journals (Sweden)

Janice Zulma Francesquett

2013-08-01

Full Text Available The aim this study was quantify the calorific power of 111 gasoline samples available at filling stations using near infrared spectroscopy in conjunction with the multivariate regression. The calorific power value of the fuels was determined using an adiabatic bomb calorimeter (norm ASTM D 4.809. For the construction of multivariate regression models were used 2/3 of the samples for calibration and the remainder to prediction, using the interval partial least squares (iPLS and synergy interval partial least square (siPLS algorithms. In the best iPLS model was selected the spectral range from 5561 to 6650 cm-1, obtaining RMSEP of 102 g cal-1 and showing a correlation coefficient (r of 0.8218 and 0.71% to calibration errors and 0.47% for prediction errors. The siPLS model divided into 32 intervals and grouped into three intervals was the highlighted model, which selected the region below 6000 cm-1 and above 6500 cm-1 with, presenting values of RMSECV of 89.8 cal g-1 and RMSEP of 96.7 cal g-1, and correlation coefficients for the cross-validation and prediction of 0.7834 and 0.7293, respectively. The methodology proposed in this work is efficient, with prediction errors lower than 1%, being a clean alternative, fast, safe and practical.
Early cost estimating for road construction projects using multiple regression techniques

Directory of Open Access Journals (Sweden)

Ibrahim Mahamid

2011-12-01

Full Text Available The objective of this study is to develop early cost estimating models for road construction projects using multiple regression techniques, based on 131 sets of data collected in the West Bank in Palestine. As the cost estimates are required at early stages of a project, considerations were given to the fact that the input data for the required regression model could be easily extracted from sketches or scope definition of the project. 11 regression models are developed to estimate the total cost of road construction project in US dollar; 5 of them include bid quantities as input variables and 6 include road length and road width. The coefficient of determination r2 for the developed models is ranging from 0.92 to 0.98 which indicate that the predicted values from a forecast models fit with the real-life data. The values of the mean absolute percentage error (MAPE of the developed regression models are ranging from 13% to 31%, the results compare favorably with past researches which have shown that the estimate accuracy in the early stages of a project is between ±25% and ±50%.
Preliminary antifungal and cytotoxic evaluation of synthetic cycloalkyl[b]thiophene derivatives with PLS-DA analysis.

Science.gov (United States)

Souza, Beatriz C C; De Oliveira, Tiago B; Aquino, Thiago M; de Lima, Maria C A; Pitta, Ivan R; Galdino, Suely L; Lima, Edeltrudes O; Gonçalves-Silva, Teresinha; Militão, Gardênia C G; Scotti, Luciana; Scotti, Marcus T; Mendonça, Francisco J B

2012-06-01

A series of 2-[(arylidene)amino]-cycloalkyl[b]thiophene-3-carbonitriles (2a-x) was synthesized by incorporation of substituted aromatic aldehydes in Gewald adducts (1a-c). The title compounds were screened for their antifungal activity against Candida krusei and Criptococcus neoformans and for their antiproliferative activity against a panel of 3 human cancer cell lines (HT29, NCI H-292 and HEP). For antiproliferative activity, the partial least squares (PLS) methodology was applied. Some of the prepared compounds exhibited promising antifungal and proliferative properties. The most active compounds for antifungal activity were cyclohexyl[b]thiophene derivatives, and for antiproliferative activity cycloheptyl[b]thiophene derivatives, especially 2-[(1H-indol-2-yl-methylidene)amino]- 5,6,7,8-tetrahydro-4H-cyclohepta[b]thiophene-3-carbonitrile (2r), which inhibited more than 97 % growth of the three cell lines. The PLS discriminant analysis (PLS-DA) applied generated good exploratory and predictive results and showed that the descriptors having shape characteristics were strongly correlated with the biological data.
Introduction to regression graphics

CERN Document Server

Cook, R Dennis

2009-01-01

Covers the use of dynamic and interactive computer graphics in linear regression analysis, focusing on analytical graphics. Features new techniques like plot rotation. The authors have composed their own regression code, using Xlisp-Stat language called R-code, which is a nearly complete system for linear regression analysis and can be utilized as the main computer program in a linear regression course. The accompanying disks, for both Macintosh and Windows computers, contain the R-code and Xlisp-Stat. An Instructor's Manual presenting detailed solutions to all the problems in the book is ava
Discrimination of Transgenic Rice Based on Near Infrared Reflectance Spectroscopy and Partial Least Squares Regression Discriminant Analysis

Directory of Open Access Journals (Sweden)

ZHANG Long

2015-09-01

Full Text Available Near infrared reflectance spectroscopy (NIRS, a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA to discriminate the transgenic (TCTP and mi166 and wild type (Zhonghua 11 rice. Furthermore, rice lines transformed with protein gene (OsTCTP and regulation gene (Osmi166 were also discriminated by the NIRS method. The performances of PLS-DA in spectral ranges of 4 000–8 000 cm-1 and 4 000–10 000 cm-1 were compared to obtain the optimal spectral range. As a result, the transgenic and wild type rice were distinguished from each other in the range of 4 000–10 000 cm-1, and the correct classification rate was 100.0% in the validation test. The transgenic rice TCTP and mi166 were also distinguished from each other in the range of 4 000–10 000 cm-1, and the correct classification rate was also 100.0%. In conclusion, NIRS combined with PLS-DA can be used for the discrimination of transgenic rice.
Prediction of the distillation temperatures of crude oils using ¹H NMR and support vector regression with estimated confidence intervals.

Science.gov (United States)

Filgueiras, Paulo R; Terra, Luciana A; Castro, Eustáquio V R; Oliveira, Lize M S L; Dias, Júlio C M; Poppi, Ronei J

2015-09-01

This paper aims to estimate the temperature equivalent to 10% (T10%), 50% (T50%) and 90% (T90%) of distilled volume in crude oils using (1)H NMR and support vector regression (SVR). Confidence intervals for the predicted values were calculated using a boosting-type ensemble method in a procedure called ensemble support vector regression (eSVR). The estimated confidence intervals obtained by eSVR were compared with previously accepted calculations from partial least squares (PLS) models and a boosting-type ensemble applied in the PLS method (ePLS). By using the proposed boosting strategy, it was possible to identify outliers in the T10% property dataset. The eSVR procedure improved the accuracy of the distillation temperature predictions in relation to standard PLS, ePLS and SVR. For T10%, a root mean square error of prediction (RMSEP) of 11.6°C was obtained in comparison with 15.6°C for PLS, 15.1°C for ePLS and 28.4°C for SVR. The RMSEPs for T50% were 24.2°C, 23.4°C, 22.8°C and 14.4°C for PLS, ePLS, SVR and eSVR, respectively. For T90%, the values of RMSEP were 39.0°C, 39.9°C and 39.9°C for PLS, ePLS, SVR and eSVR, respectively. The confidence intervals calculated by the proposed boosting methodology presented acceptable values for the three properties analyzed; however, they were lower than those calculated by the standard methodology for PLS. Copyright © 2015 Elsevier B.V. All rights reserved.
A study of machine learning regression methods for major elemental analysis of rocks using laser-induced breakdown spectroscopy

Science.gov (United States)

Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.

2015-05-01

The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO2, Fe2O3, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na2O, K2O, TiO2, and P2O5, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144 channels
Mixture quantification using PLS in plastic scintillation measurements

Energy Technology Data Exchange (ETDEWEB)

Bagan, H.; Tarancon, A.; Rauret, G. [Departament de Quimica Analitica, Universitat de Barcelona, Diagonal 647, E-08028 Barcelona (Spain); Garcia, J.F., E-mail: jfgarcia@ub.ed [Departament de Quimica Analitica, Universitat de Barcelona, Diagonal 647, E-08028 Barcelona (Spain)

2011-06-15

This article reports the capability of plastic scintillation (PS) combined with multivariate calibration (Partial least squares; PLS) to detect and quantify alpha and beta emitters in mixtures. While several attempts have been made with this purpose in mind using liquid scintillation (LS), no attempt was done using PS that has the great advantage of not producing mixed waste after the measurements are performed. Following this objective, ternary mixtures of alpha and beta emitters ({sup 241}Am, {sup 137}Cs and {sup 90}Sr/{sup 90}Y) have been quantified. Procedure optimisation has evaluated the use of the net spectra or the sample spectra, the inclusion of different spectra obtained at different values of the Pulse Shape Analysis parameter and the application of the PLS1 or PLS2 algorithms. The conclusions show that the use of PS+PLS2 applied to the sample spectra, without the use of any pulse shape discrimination, allows quantification of the activities with relative errors less than 10% in most of the cases. This procedure not only allows quantification of mixtures but also reduces measurement time (no blanks are required) and the application of this procedure does not require detectors that include the pulse shape analysis parameter.
Pink line syndrome (PLS) in the scleractinian coral Porites lutea

Digital Repository Service at National Institute of Oceanography (India)

Ravindran, J.; Raghukumar, C.

Reef sites Pink line syndrome (PLS) in the scleractinian coral Porites lutea Accepted: 10 May 2002 / Published online: 5 July 2002 C211 Springer-Verlag 2002 We describe here an unreport- ed diseased state of Porites lutea (Milne-Edwards and Haime...)ontheKavarattireefof the Lakshadweep group of is- lands (11C176 N; 71C176E). Pink line syndrome (PLS) causes partial mortality of the coral P. lutea around Kavaratti Island (Fig. 1), and about 10% of colonies were found to be af- fected by PLS. The dead patches were colonized by a...
A Comparison of Regression Techniques for Estimation of Above-Ground Winter Wheat Biomass Using Near-Surface Spectroscopy

Directory of Open Access Journals (Sweden)

Jibo Yue

2018-01-01

Full Text Available Above-ground biomass (AGB provides a vital link between solar energy consumption and yield, so its correct estimation is crucial to accurately monitor crop growth and predict yield. In this work, we estimate AGB by using 54 vegetation indexes (e.g., Normalized Difference Vegetation Index, Soil-Adjusted Vegetation Index and eight statistical regression techniques: artificial neural network (ANN, multivariable linear regression (MLR, decision-tree regression (DT, boosted binary regression tree (BBRT, partial least squares regression (PLSR, random forest regression (RF, support vector machine regression (SVM, and principal component regression (PCR, which are used to analyze hyperspectral data acquired by using a field spectrophotometer. The vegetation indexes (VIs determined from the spectra were first used to train regression techniques for modeling and validation to select the best VI input, and then summed with white Gaussian noise to study how remote sensing errors affect the regression techniques. Next, the VIs were divided into groups of different sizes by using various sampling methods for modeling and validation to test the stability of the techniques. Finally, the AGB was estimated by using a leave-one-out cross validation with these powerful techniques. The results of the study demonstrate that, of the eight techniques investigated, PLSR and MLR perform best in terms of stability and are most suitable when high-accuracy and stable estimates are required from relatively few samples. In addition, RF is extremely robust against noise and is best suited to deal with repeated observations involving remote-sensing data (i.e., data affected by atmosphere, clouds, observation times, and/or sensor noise. Finally, the leave-one-out cross-validation method indicates that PLSR provides the highest accuracy (R2 = 0.89, RMSE = 1.20 t/ha, MAE = 0.90 t/ha, NRMSE = 0.07, CV (RMSE = 0.18; thus, PLSR is best suited for works requiring high
Near Infrared Spectroscopy Calibration for Wood Chemistry: Which Chemometric Technique Is Best for Prediction and Interpretation?

Directory of Open Access Journals (Sweden)

Brian K. Via

2014-07-01

Full Text Available This paper addresses the precision in factor loadings during partial least squares (PLS and principal components regression (PCR of wood chemistry content from near infrared reflectance (NIR spectra. The precision of the loadings is considered important because these estimates are often utilized to interpret chemometric models or selection of meaningful wavenumbers. Standard laboratory chemistry methods were employed on a mixed genus/species hardwood sample set. PLS and PCR, before and after 1st derivative pretreatment, was utilized for model building and loadings investigation. As demonstrated by others, PLS was found to provide better predictive diagnostics. However, PCR exhibited a more precise estimate of loading peaks which makes PCR better for interpretation. Application of the 1st derivative appeared to assist in improving both PCR and PLS loading precision, but due to the small sample size, the two chemometric methods could not be compared statistically. This work is important because to date most research works have committed to PLS because it yields better predictive performance. But this research suggests there is a tradeoff between better prediction and model interpretation. Future work is needed to compare PLS and PCR for a suite of spectral pretreatment techniques.
Near infrared spectroscopy calibration for wood chemistry: which chemometric technique is best for prediction and interpretation?

Science.gov (United States)

Via, Brian K; Zhou, Chengfeng; Acquah, Gifty; Jiang, Wei; Eckhardt, Lori

2014-07-25

This paper addresses the precision in factor loadings during partial least squares (PLS) and principal components regression (PCR) of wood chemistry content from near infrared reflectance (NIR) spectra. The precision of the loadings is considered important because these estimates are often utilized to interpret chemometric models or selection of meaningful wavenumbers. Standard laboratory chemistry methods were employed on a mixed genus/species hardwood sample set. PLS and PCR, before and after 1st derivative pretreatment, was utilized for model building and loadings investigation. As demonstrated by others, PLS was found to provide better predictive diagnostics. However, PCR exhibited a more precise estimate of loading peaks which makes PCR better for interpretation. Application of the 1st derivative appeared to assist in improving both PCR and PLS loading precision, but due to the small sample size, the two chemometric methods could not be compared statistically. This work is important because to date most research works have committed to PLS because it yields better predictive performance. But this research suggests there is a tradeoff between better prediction and model interpretation. Future work is needed to compare PLS and PCR for a suite of spectral pretreatment techniques.
Collective effects of the PLS 2 GeV storage ring

International Nuclear Information System (INIS)

Yoon, M.; Choi, J.; Lee, T.

1993-01-01

Collective effects of the PLS storage ring are discussed. Evaluation of the PLS storage ring coupling impedances is presented. RF cavity Impedances are emphasized. Single-bunch threshold current is studied and longitudinal coupled-bunch instabilities caused by RF narrow-band resonances are analyzed
Analyses of direct and indirect impacts of a positive list system on pharmaceutical R&D investments.

Science.gov (United States)

Han, Euna; Kim, Tae Hyun; Jeung, Myung Jin; Lee, Eui-Kyung

2013-07-01

The South Korean government recently enacted a Positive List System (PLS) as a major change of the national formulary listing system and reimbursed prices for pharmaceutical products. Regardless of the primary goal of the PLS, its implementation might have spillover effects by influencing the pharmaceutical industry's research and development (R&D), potentially leading to a variety of responses by firms in relation to their R&D activities. We investigated the spillover effect of the PLS on R&D investments of the pharmaceutical industry in Korea through both direct and indirect channels, examining the influence of the PLS on sales profit and cash flow. Data from 9 years (5 before and 4 after PLS implementation) were drawn from the financial statements of firms whose stocks were exchanged in 2 official stock markets in Korea (526 firms) and additional pharmaceutical firms whose financial performance was officially audited by external reviewers (263 firms). Longitudinal analyses were conducted, using the panel nature of the data to control for permanent unobserved firm heterogeneity. Our results showed that the PLS was directly associated with R&D investments. In contrast, its indirect impacts stemming from the influence on sales profit and cash flow were minimal and statistically nonsignificant. The gross impact of the PLS on R&D investments increased moving further from the enactment year; R&D investments were reduced by 18.3% to 25.8% in 2009-2010 (compared with before PLS implementation) in the firm fixed-effects model. We also found that such negative direct and gross impacts of the PLS on R&D investments were significant only in firms without newly developed chemical entities. Considering the gross negative impact of the PLS on R&D investments of pharmaceutical firms and the heterogeneous response of these firms by the R&D activities, governmental efforts of cost-containment may need to consider the spillover impact of the PLS on pharmaceutical innovation

Determining the Relationship between U.S. County-Level Adult Obesity Rate and Multiple Risk Factors by PLS Regression and SVM Modeling Approaches

Directory of Open Access Journals (Sweden)

Chau-Kuang Chen

2015-02-01

Full Text Available Data from the Center for Disease Control (CDC has shown that the obesity rate doubled among adults within the past two decades. This upsurge was the result of changes in human behavior and environment. Partial least squares (PLS regression and support vector machine (SVM models were conducted to determine the relationship between U.S. county-level adult obesity rate and multiple risk factors. The outcome variable was the adult obesity rate. The 23 risk factors were categorized into four domains of the social ecological model including biological/behavioral factor, socioeconomic status, food environment, and physical environment. Of the 23 risk factors related to adult obesity, the top eight significant risk factors with high normalized importance were identified including physical inactivity, natural amenity, percent of households receiving SNAP benefits, and percent of all restaurants being fast food. The study results were consistent with those in the literature. The study showed that adult obesity rate was influenced by biological/behavioral factor, socioeconomic status, food environment, and physical environment embedded in the social ecological theory. By analyzing multiple risk factors of obesity in the communities, may lead to the proposal of more comprehensive and integrated policies and intervention programs to solve the population-based problem.
Fault detection in processes represented by PLS models using an EWMA control scheme

KAUST Repository

Harrou, Fouzi

2016-10-20

Fault detection is important for effective and safe process operation. Partial least squares (PLS) has been used successfully in fault detection for multivariate processes with highly correlated variables. However, the conventional PLS-based detection metrics, such as the Hotelling\\'s T and the Q statistics are not well suited to detect small faults because they only use information about the process in the most recent observation. Exponentially weighed moving average (EWMA), however, has been shown to be more sensitive to small shifts in the mean of process variables. In this paper, a PLS-based EWMA fault detection method is proposed for monitoring processes represented by PLS models. The performance of the proposed method is compared with that of the traditional PLS-based fault detection method through a simulated example involving various fault scenarios that could be encountered in real processes. The simulation results clearly show the effectiveness of the proposed method over the conventional PLS method.
Application of Soft Computing Techniques and Multiple Regression Models for CBR prediction of Soils

Directory of Open Access Journals (Sweden)

Fatimah Khaleel Ibrahim

2017-08-01

Full Text Available The techniques of soft computing technique such as Artificial Neutral Network (ANN have improved the predicting capability and have actually discovered application in Geotechnical engineering. The aim of this research is to utilize the soft computing technique and Multiple Regression Models (MLR for forecasting the California bearing ratio CBR( of soil from its index properties. The indicator of CBR for soil could be predicted from various soils characterizing parameters with the assist of MLR and ANN methods. The data base that collected from the laboratory by conducting tests on 86 soil samples that gathered from different projects in Basrah districts. Data gained from the experimental result were used in the regression models and soft computing techniques by using artificial neural network. The liquid limit, plastic index , modified compaction test and the CBR test have been determined. In this work, different ANN and MLR models were formulated with the different collection of inputs to be able to recognize their significance in the prediction of CBR. The strengths of the models that were developed been examined in terms of regression coefficient (R2, relative error (RE% and mean square error (MSE values. From the results of this paper, it absolutely was noticed that all the proposed ANN models perform better than that of MLR model. In a specific ANN model with all input parameters reveals better outcomes than other ANN models.
Measurement of soluble solids content in watermelon by Vis/NIR diffuse transmittance technique.

Science.gov (United States)

Tian, Hai-qing; Ying, Yi-bin; Lu, Hui-shan; Fu, Xia-ping; Yu, Hai-yan

2007-02-01

Watermelon is a popular fruit in the world with soluble solids content (SSC) being one of the major characteristics used for assessing its quality. This study was aimed at obtaining a method for nondestructive SSC detection of watermelons by means of visible/near infrared (Vis/NIR) diffuse transmittance technique. Vis/NIR transmittance spectra of intact watermelons were acquired using a low-cost commercially available spectrometer operating over the range 350~1000 nm. Spectra data were analyzed by two multivariate calibration techniques: partial least squares (PLS) and principal component regression (PCR) methods. Two experiments were designed for two varieties of watermelons [Qilin (QL), Zaochunhongyu (ZC)], which have different skin thickness range and shape dimensions. The influences of different data preprocessing and spectra treatments were also investigated. Performance of different models was assessed in terms of root mean square errors of calibration (RMSEC), root mean square errors of prediction (RMSEP) and correlation coefficient (r) between the predicted and measured parameter values. Results showed that spectra data preprocessing influenced the performance of the calibration models. The first derivative spectra showed the best results with high correlation coefficient of determination [r=0.918 (QL); r=0.954 (ZC)], low RMSEP [0.65 degrees Brix (QL); 0.58 degrees Brix (ZC)], low RMSEC [0.48 degrees Brix (QL); 0.34 degrees Brix (ZC)] and small difference between the RMSEP and the RMSEC by PLS method. The nondestructive Vis/NIR measurements provided good estimates of SSC index of watermelon, and the predicted values were highly correlated with destructively measured values for SSC. The models based on smoothing spectra (Savitzky-Golay filter smoothing method) did not enhance the performance of calibration models obviously. The results indicated the feasibility of Vis/NIR diffuse transmittance spectral analysis for predicting watermelon SSC in a
An Investigation of the Fit of Linear Regression Models to Data from an SAT[R] Validity Study. Research Report 2011-3

Science.gov (United States)

Kobrin, Jennifer L.; Sinharay, Sandip; Haberman, Shelby J.; Chajewski, Michael

2011-01-01

This study examined the adequacy of a multiple linear regression model for predicting first-year college grade point average (FYGPA) using SAT[R] scores and high school grade point average (HSGPA). A variety of techniques, both graphical and statistical, were used to examine if it is possible to improve on the linear regression model. The results…
Simultaneous determination of estrogens (ethinylestradiol and norgestimate) concentrations in human and bovine serum albumin by use of fluorescence spectroscopy and multivariate regression analysis.

Science.gov (United States)

Hordge, LaQuana N; McDaniel, Kiara L; Jones, Derick D; Fakayode, Sayo O

2016-05-15

The endocrine disruption property of estrogens necessitates the immediate need for effective monitoring and development of analytical protocols for their analyses in biological and human specimens. This study explores the first combined utility of a steady-state fluorescence spectroscopy and multivariate partial-least-square (PLS) regression analysis for the simultaneous determination of two estrogens (17α-ethinylestradiol (EE) and norgestimate (NOR)) concentrations in bovine serum albumin (BSA) and human serum albumin (HSA) samples. The influence of EE and NOR concentrations and temperature on the emission spectra of EE-HSA EE-BSA, NOR-HSA, and NOR-BSA complexes was also investigated. The binding of EE with HSA and BSA resulted in increase in emission characteristics of HSA and BSA and a significant blue spectra shift. In contrast, the interaction of NOR with HSA and BSA quenched the emission characteristics of HSA and BSA. The observed emission spectral shifts preclude the effective use of traditional univariate regression analysis of fluorescent data for the determination of EE and NOR concentrations in HSA and BSA samples. Multivariate partial-least-squares (PLS) regression analysis was utilized to correlate the changes in emission spectra with EE and NOR concentrations in HSA and BSA samples. The figures-of-merit of the developed PLS regression models were excellent, with limits of detection as low as 1.6×10(-8) M for EE and 2.4×10(-7) M for NOR and good linearity (R(2)>0.994985). The PLS models correctly predicted EE and NOR concentrations in independent validation HSA and BSA samples with a root-mean-square-percent-relative-error (RMS%RE) of less than 6.0% at physiological condition. On the contrary, the use of univariate regression resulted in poor predictions of EE and NOR in HSA and BSA samples, with RMS%RE larger than 40% at physiological conditions. High accuracy, low sensitivity, simplicity, low-cost with no prior analyte extraction or separation
The Plasmin-Sensitive Protein Pls in Methicillin-Resistant Staphylococcus aureus (MRSA Is a Glycoprotein.

Directory of Open Access Journals (Sweden)

Isabelle Bleiziffer

2017-01-01

Full Text Available Most bacterial glycoproteins identified to date are virulence factors of pathogenic bacteria, i.e. adhesins and invasins. However, the impact of protein glycosylation on the major human pathogen Staphylococcus aureus remains incompletely understood. To study protein glycosylation in staphylococci, we analyzed lysostaphin lysates of methicillin-resistant Staphylococcus aureus (MRSA strains by SDS-PAGE and subsequent periodic acid-Schiff's staining. We detected four (>300, ∼250, ∼165, and ∼120 kDa and two (>300 and ∼175 kDa glycosylated surface proteins with strain COL and strain 1061, respectively. The ∼250, ∼165, and ∼175 kDa proteins were identified as plasmin-sensitive protein (Pls by mass spectrometry. Previously, Pls has been demonstrated to be a virulence factor in a mouse septic arthritis model. The pls gene is encoded by the staphylococcal cassette chromosome (SCCmec type I in MRSA that also encodes the methicillin resistance-conferring mecA and further genes. In a search for glycosyltransferases, we identified two open reading frames encoded downstream of pls on the SCCmec element, which we termed gtfC and gtfD. Expression and deletion analysis revealed that both gtfC and gtfD mediate glycosylation of Pls. Additionally, the recently reported glycosyltransferases SdgA and SdgB are involved in Pls glycosylation. Glycosylation occurs at serine residues in the Pls SD-repeat region and modifying carbohydrates are N-acetylhexosaminyl residues. Functional characterization revealed that Pls can confer increased biofilm formation, which seems to involve two distinct mechanisms. The first mechanism depends on glycosylation of the SD-repeat region by GtfC/GtfD and probably also involves eDNA, while the second seems to be independent of glycosylation as well as eDNA and may involve the centrally located G5 domains. Other previously known Pls properties are not related to the sugar modifications. In conclusion, Pls is a glycoprotein and
Dependence between fusion temperatures and chemical components of a certain type of coal using classical, non-parametric and bootstrap techniques

Energy Technology Data Exchange (ETDEWEB)

Gonzalez-Manteiga, W.; Prada-Sanchez, J.M.; Fiestras-Janeiro, M.G.; Garcia-Jurado, I. (Universidad de Santiago de Compostela, Santiago de Compostela (Spain). Dept. de Estadistica e Investigacion Operativa)

1990-11-01

A statistical study of the dependence between various critical fusion temperatures of a certain kind of coal and its chemical components is carried out. As well as using classical dependence techniques (multiple, stepwise and PLS regression, principal components, canonical correlation, etc.) together with the corresponding inference on the parameters of interest, non-parametric regression and bootstrap inference are also performed. 11 refs., 3 figs., 8 tabs.
Offset Free Tracking Predictive Control Based on Dynamic PLS Framework

Directory of Open Access Journals (Sweden)

Jin Xin

2017-10-01

Full Text Available This paper develops an offset free tracking model predictive control based on a dynamic partial least square (PLS framework. First, state space model is used as the inner model of PLS to describe the dynamic system, where subspace identification method is used to identify the inner model. Based on the obtained model, multiple independent model predictive control (MPC controllers are designed. Due to the decoupling character of PLS, these controllers are running separately, which is suitable for distributed control framework. In addition, the increment of inner model output is considered in the cost function of MPC, which involves integral action in the controller. Hence, the offset free tracking performance is guaranteed. The results of an industry background simulation demonstrate the effectiveness of proposed method.
[MEG]PLS: A pipeline for MEG data analysis and partial least squares statistics.

Science.gov (United States)

Cheung, Michael J; Kovačević, Natasa; Fatima, Zainab; Mišić, Bratislav; McIntosh, Anthony R

2016-01-01

The emphasis of modern neurobiological theories has recently shifted from the independent function of brain areas to their interactions in the context of whole-brain networks. As a result, neuroimaging methods and analyses have also increasingly focused on network discovery. Magnetoencephalography (MEG) is a neuroimaging modality that captures neural activity with a high degree of temporal specificity, providing detailed, time varying maps of neural activity. Partial least squares (PLS) analysis is a multivariate framework that can be used to isolate distributed spatiotemporal patterns of neural activity that differentiate groups or cognitive tasks, to relate neural activity to behavior, and to capture large-scale network interactions. Here we introduce [MEG]PLS, a MATLAB-based platform that streamlines MEG data preprocessing, source reconstruction and PLS analysis in a single unified framework. [MEG]PLS facilitates MRI preprocessing, including segmentation and coregistration, MEG preprocessing, including filtering, epoching, and artifact correction, MEG sensor analysis, in both time and frequency domains, MEG source analysis, including multiple head models and beamforming algorithms, and combines these with a suite of PLS analyses. The pipeline is open-source and modular, utilizing functions from FieldTrip (Donders, NL), AFNI (NIMH, USA), SPM8 (UCL, UK) and PLScmd (Baycrest, CAN), which are extensively supported and continually developed by their respective communities. [MEG]PLS is flexible, providing both a graphical user interface and command-line options, depending on the needs of the user. A visualization suite allows multiple types of data and analyses to be displayed and includes 4-D montage functionality. [MEG]PLS is freely available under the GNU public license (http://meg-pls.weebly.com). Copyright © 2015 Elsevier Inc. All rights reserved.
Timing system for PLS

International Nuclear Information System (INIS)

Chang, S.S.; Kim, M.S.; Won, S.C.; Choi, S.J.

1991-01-01

The PLS timing system consists of a master oscillator, a repetition rate pulse generator, a storage ring rf synchronizing system, and a rf driver and kicker trigger system composed of a fixed delay module and variable delay modules. All the timing modules are installed in the VME crates and controlled by the 32 bit microprocessors, and communicating with the Host computer via Ethernet. This paper describes the architectural design of this system as well as the requirements of performance
PLS-NIR determination of five parameters in different types of Chinese rice wine

Science.gov (United States)

Yu, Haiyan; Ying, Yibin; Fu, Xiaping; Lu, Huishan

2005-11-01

To evaluate the applicability of near infrared spectroscopy for determination of the five enological parameters (alcoholic degree, pH value, total acid and amino acid nitrogen, °Brix) of Chinese rice wine, transmission spectra were collected in the spectral range from 12500 to 3800 cm-1 in a 1 mm path length rectangular quartz cuvette with air as reference at room temperature. Five calibration equations for the five parameters were established between the reference data and spectra by partial least squares (PLS) regression, separately. The best calibration results were achieved for the determination of alcoholic degree and °Brix. The RPD (ration of the standard deviation of the samples to the SECV) values of the calibration for both alcoholic degree and °Brix were higher than 3 (4.30 and 7.94, respectively), which demonstrated the robustness and power of the calibration models. The determination coefficients (R2) for alcoholic degree and °Brix were 0.987 and 0.991, respectively. The performance of pH, total acid and amino acid nitrogen was not as good as that of alcoholic degree and °Brix. The RPD values for the three parameters were 1.48, 1.85 and 1.82, respectively, and R2 values were 0.964, 0.970 and 0.971, respectively. In validation step, R2 value of the five parameters are all higher than 0.7, especially for alcoholic degree and °Brix (0.968 and 0.956, respectively). The results demonstrated that NIR spectroscopy could be used to predict the concentration of the five enological parameters in Chinese rice wine.
A study of machine learning regression methods for major elemental analysis of rocks using laser-induced breakdown spectroscopy

International Nuclear Information System (INIS)

Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.

2015-01-01

The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO 2 , Fe 2 O 3 , CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na 2 O, K 2 O, TiO 2 , and P 2 O 5 , the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144
An improved partial least-squares regression method for Raman spectroscopy

Science.gov (United States)

Momenpour Tehran Monfared, Ali; Anis, Hanan

2017-10-01

It is known that the performance of partial least-squares (PLS) regression analysis can be improved using the backward variable selection method (BVSPLS). In this paper, we further improve the BVSPLS based on a novel selection mechanism. The proposed method is based on sorting the weighted regression coefficients, and then the importance of each variable of the sorted list is evaluated using root mean square errors of prediction (RMSEP) criterion in each iteration step. Our Improved BVSPLS (IBVSPLS) method has been applied to leukemia and heparin data sets and led to an improvement in limit of detection of Raman biosensing ranged from 10% to 43% compared to PLS. Our IBVSPLS was also compared to the jack-knifing (simpler) and Genetic Algorithm (more complex) methods. Our method was consistently better than the jack-knifing method and showed either a similar or a better performance compared to the genetic algorithm.
PLS-Prediction and Confirmation of Hydrojuglone Glucoside as the Antitrypanosomal Constituent of Juglans Spp.

Directory of Open Access Journals (Sweden)

Therese Ellendorff

2015-05-01

Full Text Available Naphthoquinones (NQs occur naturally in a large variety of plants. Several NQs are highly active against protozoans, amongst them the causative pathogens of neglected tropical diseases such as human African trypanosomiasis (sleeping sickness, Chagas disease and leishmaniasis. Prominent NQ-producing plants can be found among Juglans spp. (Juglandaceae with juglone derivatives as known constituents. In this study, 36 highly variable extracts were prepared from different plant parts of J. regia, J. cinerea and J. nigra. For all extracts, antiprotozoal activity was determined against the protozoans Trypanosoma cruzi, T. brucei rhodesiense and Leishmania donovani. In addition, an LC-MS fingerprint was recorded for each extract. With each extract’s fingerprint and the data on in vitro growth inhibitory activity against T. brucei rhodesiense a Partial Least Squares (PLS regression model was calculated in order to obtain an indication of compounds responsible for the differences in bioactivity between the 36 extracts. By means of PLS, hydrojuglone glucoside was predicted as an active compound against T. brucei and consequently isolated and tested in vitro. In fact, the pure compound showed activity against T. brucei at a significantly lower cytotoxicity towards mammalian cells than established antiprotozoal NQs such as lapachol.
Application of stepwise multiple regression techniques to inversion of Nimbus 'IRIS' observations.

Science.gov (United States)

Ohring, G.

1972-01-01

Exploratory studies with Nimbus-3 infrared interferometer-spectrometer (IRIS) data indicate that, in addition to temperature, such meteorological parameters as geopotential heights of pressure surfaces, tropopause pressure, and tropopause temperature can be inferred from the observed spectra with the use of simple regression equations. The technique of screening the IRIS spectral data by means of stepwise regression to obtain the best radiation predictors of meteorological parameters is validated. The simplicity of application of the technique and the simplicity of the derived linear regression equations - which contain only a few terms - suggest usefulness for this approach. Based upon the results obtained, suggestions are made for further development and exploitation of the stepwise regression analysis technique.
Chemiluminescence-based multivariate sensing of local equivalence ratios in premixed atmospheric methane-air flames

Energy Technology Data Exchange (ETDEWEB)

Tripathi, Markandey M.; Krishnan, Sundar R.; Srinivasan, Kalyan K.; Yueh, Fang-Yu; Singh, Jagdish P.

2011-09-07

Chemiluminescence emissions from OH*, CH*, C2, and CO2 formed within the reaction zone of premixed flames depend upon the fuel-air equivalence ratio in the burning mixture. In the present paper, a new partial least square regression (PLS-R) based multivariate sensing methodology is investigated and compared with an OH*/CH* intensity ratio-based calibration model for sensing equivalence ratio in atmospheric methane-air premixed flames. Five replications of spectral data at nine different equivalence ratios ranging from 0.73 to 1.48 were used in the calibration of both models. During model development, the PLS-R model was initially validated with the calibration data set using the leave-one-out cross validation technique. Since the PLS-R model used the entire raw spectral intensities, it did not need the nonlinear background subtraction of CO2 emission that is required for typical OH*/CH* intensity ratio calibrations. An unbiased spectral data set (not used in the PLS-R model development), for 28 different equivalence ratio conditions ranging from 0.71 to 1.67, was used to predict equivalence ratios using the PLS-R and the intensity ratio calibration models. It was found that the equivalence ratios predicted with the PLS-R based multivariate calibration model matched the experimentally measured equivalence ratios within 7%; whereas, the OH*/CH* intensity ratio calibration grossly underpredicted equivalence ratios in comparison to measured equivalence ratios, especially under rich conditions ( > 1.2). The practical implications of the chemiluminescence-based multivariate equivalence ratio sensing methodology are also discussed.
Prediction of gas chromatography/electron capture detector retention times of chlorinated pesticides, herbicides, and organohalides by multivariate chemometrics methods

International Nuclear Information System (INIS)

Ghasemi, Jahanbakhsh; Asadpour, Saeid; Abdolmaleki, Azizeh

2007-01-01

A quantitative structure-retention relationship (QSRR) study, has been carried out on the gas chromatograph/electron capture detector (GC/ECD) system retention times (t R s) of 38 diverse chlorinated pesticides, herbicides, and organohalides by using molecular structural descriptors. Modeling of retention times of these compounds as a function of the theoretically derived descriptors was established by multiple linear regression (MLR) and partial least squares (PLS) regression. The stepwise regression using SPSS was used for the selection of the variables that resulted in the best-fitted models. Appropriate models with low standard errors and high correlation coefficients were obtained. Three types of molecular descriptors including electronic, steric and thermodynamic were used to develop a quantitative relationship between the retention times and structural properties. MLR and PLS analysis has been carried out to derive the best QSRR models. After variables selection, MLR and PLS methods used with leave-one-out cross validation for building the regression models. The predictive quality of the QSRR models were tested for an external prediction set of 12 compounds randomly chosen from 38 compounds. The PLS regression method was used to model the structure-retention relationships, more accurately. However, the results surprisingly showed more or less the same quality for MLR and PLS modeling according to squared regression coefficients R 2 which were 0.951 and 0.948 for MLR and PLS, respectively
Projects Delay Factors of Saudi Arabia Construction Industry Using PLS-SEM Path Modelling Approach

Directory of Open Access Journals (Sweden)

Abdul Rahman Ismail

2016-01-01

Full Text Available This paper presents the development of PLS-SEM Path Model of delay factors of Saudi Arabia construction industry focussing on Mecca City. The model was developed and assessed using SmartPLS v3.0 software and it consists of 37 factors/manifests in 7 groups/independent variables and one dependent variable which is delay of the construction projects. The model was rigorously assessed at measurement and structural components and the outcomes found that the model has achieved the required threshold values. At structural level of the model, among the seven groups, the client and consultant group has the highest impact on construction delay with path coefficient β-value of 0.452 and the project management and contract administration group is having the least impact to the construction delay with β-value of 0.016. The overall model has moderate explaining power ability with R2 value of 0.197 for Saudi Arabia construction industry representation. This model will able to assist practitioners in Mecca city to pay more attention in risk analysis for potential construction delay.
Determination of carbohydrates present in Saccharomyces cerevisiae using mid-infrared spectroscopy and partial least squares regression

OpenAIRE

Plata, Maria R.; Koch, Cosima; Wechselberger, Patrick; Herwig, Christoph; Lendl, Bernhard

2013-01-01

A fast and simple method to control variations in carbohydrate composition of Saccharomyces cerevisiae, baker's yeast, during fermentation was developed using mid-infrared (mid-IR) spectroscopy. The method allows for precise and accurate determinations with minimal or no sample preparation and reagent consumption based on mid-IR spectra and partial least squares (PLS) regression. The PLS models were developed employing the results from reference analysis of the yeast cells. The reference anal...

Construction of Network Management Information System of Agricultural Products Supply Chain Based on 3PLs

Institute of Scientific and Technical Information of China (English)

2010-01-01

The necessity to construct the network management information system of 3PLs agricultural supply chain is analyzed,showing that 3PLs can improve the overall competitive advantage of agricultural supply chain.3PLs changes the homogeneity management into specialized management of logistics service and achieves the alliance of the subjects at different nodes of agricultural products supply chain.Network management information system structure of agricultural products supply chain based on 3PLs is constructed,including the four layers (the network communication layer,the hardware and software environment layer,the database layer,and the application layer) and 7 function modules (centralized control,transportation process management,material and vehicle scheduling,customer relationship,storage management,customer inquiry,and financial management).Framework for the network management information system of agricultural products supply chain based on 3PLs is put forward.The management of 3PLs mainly includes purchasing management,supplier relationship management,planning management,customer relationship management,storage management and distribution management.Thus,a management system of internal and external integrated agricultural enterprises is obtained.The network management information system of agricultural products supply chain based on 3PLs has realized the effective sharing of enterprise information of agricultural products supply chain at different nodes,establishing a long-term partnership revolving around the 3PLs core enterprise,as well as a supply chain with stable relationship based on the supply chain network system,so as to improve the circulation efficiency of agricultural products,and to explore the sales market for agricultural products.
A study of machine learning regression methods for major elemental analysis of rocks using laser-induced breakdown spectroscopy

Energy Technology Data Exchange (ETDEWEB)

Boucher, Thomas F., E-mail: boucher@cs.umass.edu [School of Computer Science, University of Massachusetts Amherst, 140 Governor' s Drive, Amherst, MA 01003, United States. (United States); Ozanne, Marie V. [Department of Astronomy, Mount Holyoke College, South Hadley, MA 01075 (United States); Carmosino, Marco L. [School of Computer Science, University of Massachusetts Amherst, 140 Governor' s Drive, Amherst, MA 01003, United States. (United States); Dyar, M. Darby [Department of Astronomy, Mount Holyoke College, South Hadley, MA 01075 (United States); Mahadevan, Sridhar [School of Computer Science, University of Massachusetts Amherst, 140 Governor' s Drive, Amherst, MA 01003, United States. (United States); Breves, Elly A.; Lepore, Kate H. [Department of Astronomy, Mount Holyoke College, South Hadley, MA 01075 (United States); Clegg, Samuel M. [Los Alamos National Laboratory, P.O. Box 1663, MS J565, Los Alamos, NM 87545 (United States)

2015-05-01

The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO{sub 2}, Fe{sub 2}O{sub 3}, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na{sub 2}O, K{sub 2}O, TiO{sub 2}, and P{sub 2}O{sub 5}, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high
Tutorial on Using Regression Models with Count Outcomes Using R

Directory of Open Access Journals (Sweden)

A. Alexander Beaujean

2016-02-01

Full Text Available Education researchers often study count variables, such as times a student reached a goal, discipline referrals, and absences. Most researchers that study these variables use typical regression methods (i.e., ordinary least-squares either with or without transforming the count variables. In either case, using typical regression for count data can produce parameter estimates that are biased, thus diminishing any inferences made from such data. As count-variable regression models are seldom taught in training programs, we present a tutorial to help educational researchers use such methods in their own research. We demonstrate analyzing and interpreting count data using Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial regression models. The count regression methods are introduced through an example using the number of times students skipped class. The data for this example are freely available and the R syntax used run the example analyses are included in the Appendix.
Fault detection in processes represented by PLS models using an EWMA control scheme

KAUST Repository

Harrou, Fouzi; Nounou, Mohamed N.; Nounou, Hazem N.

2016-01-01

with that of the traditional PLS-based fault detection method through a simulated example involving various fault scenarios that could be encountered in real processes. The simulation results clearly show the effectiveness of the proposed method over the conventional PLS
Genetic algorithm-based wavelength selection in multicomponent spectrophotometric determination by PLS: Application on sulfamethoxazole and trimethoprim mixture in bovine milk

Directory of Open Access Journals (Sweden)

Givianrad Hadi Mohammad

2013-01-01

Full Text Available The simultaneous determination of sulfamethoxazole (SMX and trimethoprim (TMP mixtures in bovine milk by spectrophotometric method is a difficult problem in analytical chemistry, due to spectral interferences. By means of multivariate calibration methods, such as partial least square (PLS regression, it is possible to obtain a model adjusted to the concentration values of the mixtures used in the calibration range. Genetic algorithm (GA is a suitable method for selecting wavelengths for PLS calibration of mixtures with almost identical spectra without loss of prediction capacity using the spectrophotometric method. In this study, the calibration model based on absorption spectra in the 200-400 nm range for 25 different mixtures of SMX and TMP Calibration matrices were formed form samples containing 0.25-20 and 0.3-21 μg mL-1 for SMX and TMP, at pH=10, respectively. The root mean squared error of deviation (RMSED for SMX and TMP with PLS and genetic algorithm partial least square (GAPLS were 0.242, 0.066 μgmL-1 and 0.074, 0.027 μg mL-1, respectively. This procedure was allowed the simultaneous determination of SMX and TMP in synthetic and real samples and good reliability of the determination was proved.
Two-step superresolution approach for surveillance face image through radial basis function-partial least squares regression and locality-induced sparse representation

Science.gov (United States)

Jiang, Junjun; Hu, Ruimin; Han, Zhen; Wang, Zhongyuan; Chen, Jun

2013-10-01

Face superresolution (SR), or face hallucination, refers to the technique of generating a high-resolution (HR) face image from a low-resolution (LR) one with the help of a set of training examples. It aims at transcending the limitations of electronic imaging systems. Applications of face SR include video surveillance, in which the individual of interest is often far from cameras. A two-step method is proposed to infer a high-quality and HR face image from a low-quality and LR observation. First, we establish the nonlinear relationship between LR face images and HR ones, according to radial basis function and partial least squares (RBF-PLS) regression, to transform the LR face into the global face space. Then, a locality-induced sparse representation (LiSR) approach is presented to enhance the local facial details once all the global faces for each LR training face are constructed. A comparison of some state-of-the-art SR methods shows the superiority of the proposed two-step approach, RBF-PLS global face regression followed by LiSR-based local patch reconstruction. Experiments also demonstrate the effectiveness under both simulation conditions and some real conditions.
Collision cross section prediction of deprotonated phenolics in a travelling-wave ion mobility spectrometer using molecular descriptors and chemometrics

Energy Technology Data Exchange (ETDEWEB)

Gonzales, Gerard Bryan, E-mail: gerard.gonzales@ugent.be [Food Chemistry and Human Nutrition (NutriFOODChem), Department of Food Safety and Food Quality, Faculty of Bioscience Engineering, Ghent University (Belgium); Laboratory of Agrozoology, Department of Crop Protection, Faculty of Bioscience Engineering, Ghent University (Belgium); Department of Applied Biological Science, Faculty of Bioscience Engineering, Ghent University (Belgium); Smagghe, Guy [Laboratory of Agrozoology, Department of Crop Protection, Faculty of Bioscience Engineering, Ghent University (Belgium); Coelus, Sofie; Adriaenssens, Dieter [Food Chemistry and Human Nutrition (NutriFOODChem), Department of Food Safety and Food Quality, Faculty of Bioscience Engineering, Ghent University (Belgium); De Winter, Karel; Desmet, Tom [Center for Industrial Biotechnology and Biocatalysis, Faculty of Bioscience Engineering, Ghent University (Belgium); Raes, Katleen [Department of Applied Biological Science, Faculty of Bioscience Engineering, Ghent University (Belgium); Van Camp, John, E-mail: john.vancamp@ugent.be [Food Chemistry and Human Nutrition (NutriFOODChem), Department of Food Safety and Food Quality, Faculty of Bioscience Engineering, Ghent University (Belgium)

2016-06-14

The combination of ion mobility and mass spectrometry (MS) affords significant improvements over conventional MS/MS, especially in the characterization of isomeric metabolites due to the differences in their collision cross sections (CCS). Experimentally obtained CCS values are typically matched with theoretical CCS values from Trajectory Method (TM) and/or Projection Approximation (PA) calculations. In this paper, predictive models for CCS of deprotonated phenolics were developed using molecular descriptors and chemometric tools, stepwise multiple linear regression (SMLR), principal components regression (PCR), and partial least squares regression (PLS). A total of 102 molecular descriptors were generated and reduced to 28 after employing a feature selection tool, composed of mass, topological descriptors, Jurs descriptors and shadow indices. Therefore, the generated models considered the effects of mass, 3D conformation and partial charge distribution on CCS, which are the main parameters for either TM or PA (only 3D conformation) calculations. All three techniques yielded highly predictive models for both the training (R{sup 2}{sub SMLR} = 0.9911; R{sup 2}{sub PCR} = 0.9917; R{sup 2}{sub PLS} = 0.9918) and validation datasets (R{sup 2}{sub SMLR} = 0.9489; R{sup 2}{sub PCR} = 0.9761; R{sup 2}{sub PLS} = 0.9760). Also, the high cross validated R{sup 2} values indicate that the generated models are robust and highly predictive (Q{sup 2}{sub SMLR} = 0.9859; Q{sup 2}{sub PCR} = 0.9748; Q{sup 2}{sub PLS} = 0.9760). The predictions were also very comparable to the results from TM calculations using modified mobcal (N2). Most importantly, this method offered a rapid (<10 min) alternative to TM calculations without compromising predictive ability. These methods could therefore be used in routine analysis and could be easily integrated to metabolite identification platforms. - Highlights: • CCS for deprotonated phenolics were measured using TWIMS. �
Using multiple linear regression techniques to quantify carbon ...

African Journals Online (AJOL)

Fallow ecosystems provide a significant carbon stock that can be quantified for inclusion in the accounts of global carbon budgets. Process and statistical models of productivity, though useful, are often technically rigid as the conditions for their application are not easy to satisfy. Multiple regression techniques have been ...
Sparse kernel orthonormalized PLS for feature extraction in large datasets

DEFF Research Database (Denmark)

Arenas-García, Jerónimo; Petersen, Kaare Brandt; Hansen, Lars Kai

2006-01-01

In this paper we are presenting a novel multivariate analysis method for large scale problems. Our scheme is based on a novel kernel orthonormalized partial least squares (PLS) variant for feature extraction, imposing sparsity constrains in the solution to improve scalability. The algorithm...... is tested on a benchmark of UCI data sets, and on the analysis of integrated short-time music features for genre prediction. The upshot is that the method has strong expressive power even with rather few features, is clearly outperforming the ordinary kernel PLS, and therefore is an appealing method...
P-R-R Study Technique, Group Counselling And Gender Influence ...

African Journals Online (AJOL)

Read-Recall (P-R-R) study technique and group counselling on the academic performance of senior secondary school students. The objectives of this study were to determine the effect of Group Counselling combined with P-R-R study ...
A heuristic approach using multiple criteria for environmentally benign 3PLs selection

Science.gov (United States)

Kongar, Elif

2005-11-01

Maintaining competitiveness in an environment where price and quality differences between competing products are disappearing depends on the company's ability to reduce costs and supply time. Timely responses to rapidly changing market conditions require an efficient Supply Chain Management (SCM). Outsourcing logistics to third-party logistics service providers (3PLs) is one commonly used way of increasing the efficiency of logistics operations, while creating a more "core competency focused" business environment. However, this alone may not be sufficient. Due to recent environmental regulations and growing public awareness regarding environmental issues, 3PLs need to be not only efficient but also environmentally benign to maintain companies' competitiveness. Even though an efficient and environmentally benign combination of 3PLs can theoretically be obtained using exhaustive search algorithms, heuristics approaches to the selection process may be superior in terms of the computational complexity. In this paper, a hybrid approach that combines a multiple criteria Genetic Algorithm (GA) with Linear Physical Weighting Algorithm (LPPW) to be used in efficient and environmentally benign 3PLs is proposed. A numerical example is also provided to illustrate the method and the analyses.
Easy methods for extracting individual regression slopes: Comparing SPSS, R, and Excel

Directory of Open Access Journals (Sweden)

Roland Pfister

2013-10-01

Full Text Available Three different methods for extracting coefficientsof linear regression analyses are presented. The focus is on automatic and easy-to-use approaches for common statistical packages: SPSS, R, and MS Excel / LibreOffice Calc. Hands-on examples are included for each analysis, followed by a brief description of how a subsequent regression coefficient analysis is performed.
Estimación de biomasa en herbáceas a partir de datos hiperespectrales, regresión PLS y la transformación continuum removal

Directory of Open Access Journals (Sweden)

M. Marabel-García

2014-12-01

Full Text Available El objetivo del estudio fue comparar los resultados de dos métodos para la estimación de la biomasa aérea a partir de datos de espectroradiometría de campo: (i regresión por mínimos cuadrados parciales (Partial Least Squares Regression, PLSR y (ii regresión lineal utilizando los índices Profundidad del Mínimo (Maximum Band Depth, MBD y Área Sobre el Mínimo (Area Over the Minimum, AOM como descriptores. En ambos casos se llevó a cabo una previa transformación de los espectros mediante Continuum Removal (CR. Como los resultados empleando PLS (R2=0,920, RMSE=3,622 g/m2 fueron muy similares a los obtenidos con los índices (para AOM: R2=0,915, RMSE=3,615 g/m2, recomendamos los índices derivados del CR puesto que su interpretación es más sencilla que la del PLSR.
Beam property studies in the PLS diagnostic beamline

CERN Document Server

Ko, I S; Seon, D K; Kim, C B; Lee, T Y

1999-01-01

A diagnostic beamline has been operated in the Pohang Light Source (PLS) storage ring for the diagnostics of electron and photon beam properties. It consists of two 1:1 imaging systems: a visible-light imaging system and a soft X-ray imaging system. We have measured the transverse and the longitudinal structures of beams by using a streak camera to obtain a visible image. Accurate transverse beam size have been measured to be 186 mu m horizontally and 43.1 mu m vertically by using soft X-ray images with minimum diffraction errors. The corresponding emittances are 11.7 nm-rad horizontally and 0.59 nm-rad vertically. By comparing the measured data with the design values, we confirmed that the PLS storage ring has reached its designed performance within an error of 3.3 % in the transverse direction.
Prediction of Caffeine Content in Java Preanger Coffee Beans by NIR Spectroscopy Using PLS and MLR Method

Science.gov (United States)

Budiastra, I. W.; Sutrisno; Widyotomo, S.; Ayu, P. C.

2018-05-01

Caffeine is one of important components in coffee that contributes to the coffee beverages flavor. Caffeine concentration in coffee bean is usually determined by chemical method which is time consuming and destructive method. A nondestructive method using NIR spectroscopy was successfully applied to determine the caffeine concentration of Arabica gayo coffee bean. In this study, NIR Spectroscopy was assessed to determine the caffeine concentration of java preanger coffee bean. A hundred samples, each consist of 96 g coffee beans were prepared for reflectance and chemical measurement. Reflectance of the sample was measured by FT-NIR spectrometer in the wavelength of 1000-2500 nm (10000-4000 cm-1) followed by determination of caffeine content using LCMS method. Calibration of NIR spectra and the caffeine content was carried out using PLS and MLR methods. Several spectra data processing was conducted to increase the accuracy of prediction. The result of the study showed that caffeine content could be determined by PLS model using 7 factors and spectra data processing of combination of the first derivative and MSC of spectra absorbance (r = 0.946; CV = 1.54 %; RPD = 2.28). A lower accuracy was obtained by MLR model consisted of three caffeine and other four absorption wavelengths (r = 0.683; CV = 3.31%; RPD = 1.18).
An extension of PPLS-DA for classification and comparison to ordinary PLS-DA.

Directory of Open Access Journals (Sweden)

Anna Telaar

Full Text Available Classification studies are widely applied, e.g. in biomedical research to classify objects/patients into predefined groups. The goal is to find a classification function/rule which assigns each object/patient to a unique group with the greatest possible accuracy (classification error. Especially in gene expression experiments often a lot of variables (genes are measured for only few objects/patients. A suitable approach is the well-known method PLS-DA, which searches for a transformation to a lower dimensional space. Resulting new components are linear combinations of the original variables. An advancement of PLS-DA leads to PPLS-DA, introducing a so called 'power parameter', which is maximized towards the correlation between the components and the group-membership. We introduce an extension of PPLS-DA for optimizing this power parameter towards the final aim, namely towards a minimal classification error. We compare this new extension with the original PPLS-DA and also with the ordinary PLS-DA using simulated and experimental datasets. For the investigated data sets with weak linear dependency between features/variables, no improvement is shown for PPLS-DA and for the extensions compared to PLS-DA. A very weak linear dependency, a low proportion of differentially expressed genes for simulated data, does not lead to an improvement of PPLS-DA over PLS-DA, but our extension shows a lower prediction error. On the contrary, for the data set with strong between-feature collinearity and a low proportion of differentially expressed genes and a large total number of genes, the prediction error of PPLS-DA and the extensions is clearly lower than for PLS-DA. Moreover we compare these prediction results with results of support vector machines with linear kernel and linear discriminant analysis.
Combining pharmacophore fingerprints and PLS-discriminant analysis for virtual screening and SAR elucidation

DEFF Research Database (Denmark)

Askjær, Sune; Langgård, Morten

2008-01-01

The criterion of success for the initial stages of a ligand-based drug-discovery project is dual. First, a set of suitable lead compounds has to be identified. Second, a level of a preliminary structure-activity relationship (SAR) of the identified ligands has to be established in order to guide ...... by the protein-binding site known from X-ray complexes. The result of this analysis assists in explaining the efficiency of 2D pharmacophore fingerprints as descriptors in virtual screening....... the lead optimization toward a final drug candidate. This paper presents a combined approach to solving these two problems of ligand-based virtual screening and elucidation of SAR based on interplay between pharmacophore fingerprints and interpretation of PLS-discriminant analysis (PLS-DA) models....... The virtual screening capability of the PLS-DA method is compared to group fusion maximum similarity searching in a test using four graph-based pharmacophore fingerprints over a range of 10 diverse targets. The PLS-DA method was generally found to do better than the Smax method. The GpiDAPH3 and PCH...
Application of multivariate chemometric techniques for simultaneous determination of five parameters of cottonseed oil by single bounce attenuated total reflectance Fourier transform infrared spectroscopy.

Science.gov (United States)

Talpur, M Younis; Kara, Huseyin; Sherazi, S T H; Ayyildiz, H Filiz; Topkafa, Mustafa; Arslan, Fatma Nur; Naz, Saba; Durmaz, Fatih; Sirajuddin

2014-11-01

Single bounce attenuated total reflectance (SB-ATR) Fourier transform infrared (FTIR) spectroscopy in conjunction with chemometrics was used for accurate determination of free fatty acid (FFA), peroxide value (PV), iodine value (IV), conjugated diene (CD) and conjugated triene (CT) of cottonseed oil (CSO) during potato chips frying. Partial least square (PLS), stepwise multiple linear regression (SMLR), principal component regression (PCR) and simple Beer׳s law (SBL) were applied to develop the calibrations for simultaneous evaluation of five stated parameters of cottonseed oil (CSO) during frying of French frozen potato chips at 170°C. Good regression coefficients (R(2)) were achieved for FFA, PV, IV, CD and CT with value of >0.992 by PLS, SMLR, PCR, and SBL. Root mean square error of prediction (RMSEP) was found to be less than 1.95% for all determinations. Result of the study indicated that SB-ATR FTIR in combination with multivariate chemometrics could be used for accurate and simultaneous determination of different parameters during the frying process without using any toxic organic solvent. Copyright © 2014 Elsevier B.V. All rights reserved.
Support vector machine regression (SVR/LS-SVM)--an alternative to neural networks (ANN) for analytical chemistry? Comparison of nonlinear methods on near infrared (NIR) spectroscopy data.

Science.gov (United States)

Balabin, Roman M; Lomakina, Ekaterina I

2011-04-21

In this study, we make a general comparison of the accuracy and robustness of five multivariate calibration models: partial least squares (PLS) regression or projection to latent structures, polynomial partial least squares (Poly-PLS) regression, artificial neural networks (ANNs), and two novel techniques based on support vector machines (SVMs) for multivariate data analysis: support vector regression (SVR) and least-squares support vector machines (LS-SVMs). The comparison is based on fourteen (14) different datasets: seven sets of gasoline data (density, benzene content, and fractional composition/boiling points), two sets of ethanol gasoline fuel data (density and ethanol content), one set of diesel fuel data (total sulfur content), three sets of petroleum (crude oil) macromolecules data (weight percentages of asphaltenes, resins, and paraffins), and one set of petroleum resins data (resins content). Vibrational (near-infrared, NIR) spectroscopic data are used to predict the properties and quality coefficients of gasoline, biofuel/biodiesel, diesel fuel, and other samples of interest. The four systems presented here range greatly in composition, properties, strength of intermolecular interactions (e.g., van der Waals forces, H-bonds), colloid structure, and phase behavior. Due to the high diversity of chemical systems studied, general conclusions about SVM regression methods can be made. We try to answer the following question: to what extent can SVM-based techniques replace ANN-based approaches in real-world (industrial/scientific) applications? The results show that both SVR and LS-SVM methods are comparable to ANNs in accuracy. Due to the much higher robustness of the former, the SVM-based approaches are recommended for practical (industrial) application. This has been shown to be especially true for complicated, highly nonlinear objects.
Simultaneous measurement of two enzyme activities using infrared spectroscopy: A comparative evaluation of PARAFAC, TUCKER and N-PLS modeling.

Science.gov (United States)

Baum, Andreas; Hansen, Per Waaben; Meyer, Anne S; Mikkelsen, Jørn Dalgaard

2013-08-06

Enzymes are used in many processes to release fermentable sugars for green production of biofuel, or the refinery of biomass for extraction of functional food ingredients such as pectin or prebiotic oligosaccharides. The complex biomasses may, however, require a multitude of specific enzymes which are active on specific substrates generating a multitude of products. In this paper we use the plant polymer, pectin, to present a method to quantify enzyme activity of two pectolytic enzymes by monitoring their superimposed spectral evolutions simultaneously. The data is analyzed by three chemometric multiway methods, namely PARAFAC, TUCKER3 and N-PLS, to establish simultaneous enzyme activity assays for pectin lyase and pectin methyl esterase. Correlation coefficients Rpred(2) for prediction test sets are 0.48, 0.96 and 0.96 for pectin lyase and 0.70, 0.89 and 0.89 for pectin methyl esterase, respectively. The retrieved models are compared and prediction test sets show that especially TUCKER3 performs well, even in comparison to the supervised regression method N-PLS. Copyright © 2013 Elsevier B.V. All rights reserved.

Determination of boiling point of petrochemicals by gas chromatography-mass spectrometry and multivariate regression analysis of structural activity relationship.

Science.gov (United States)

Fakayode, Sayo O; Mitchell, Breanna S; Pollard, David A

2014-08-01

Accurate understanding of analyte boiling points (BP) is of critical importance in gas chromatographic (GC) separation and crude oil refinery operation in petrochemical industries. This study reported the first combined use of GC separation and partial-least-square (PLS1) multivariate regression analysis of petrochemical structural activity relationship (SAR) for accurate BP determination of two commercially available (D3710 and MA VHP) calibration gas mix samples. The results of the BP determination using PLS1 multivariate regression were further compared with the results of traditional simulated distillation method of BP determination. The developed PLS1 regression was able to correctly predict analytes BP in D3710 and MA VHP calibration gas mix samples, with a root-mean-square-%-relative-error (RMS%RE) of 6.4%, and 10.8% respectively. In contrast, the overall RMS%RE of 32.9% and 40.4%, respectively obtained for BP determination in D3710 and MA VHP using a traditional simulated distillation method were approximately four times larger than the corresponding RMS%RE of BP prediction using MRA, demonstrating the better predictive ability of MRA. The reported method is rapid, robust, and promising, and can be potentially used routinely for fast analysis, pattern recognition, and analyte BP determination in petrochemical industries. Copyright © 2014 Elsevier B.V. All rights reserved.
Assessing the impacts of human activities and climate variations on grassland productivity by partial least squares structural equation modeling (PLS-SEM)

Institute of Scientific and Technical Information of China (English)

SHA Zongyao; XIE Yichun; TAN Xicheng; BAI Yongfei; LI Jonathan; LIU Xuefeng

2017-01-01

The cause-effect associations between geographical phenomena are an important focus in ecological research.Recent studies in structural equation modeling (SEM) demonstrated the potential for analyzing such associations.We applied the variance-based partial least squares SEM (PLS-SEM) and geographically-weighted regression (GWR) modeling to assess the human-climate impact on grassland productivity represented by above-ground biomass (AGB).The human and climate factors and their interaction were taken to explain the AGB variance by a PLS-SEM developed for the grassland ecosystem in Inner Mongolia,China.Results indicated that 65.5％ of the AGB variance could be explained by the human and climate factors and their interaction.The case study showed that the human and climate factors imposed a significant and negative impact on the AGB and that their interaction alleviated to some extent the threat from the intensified human-climate pressure.The alleviation may be attributable to vegetation adaptation to high human-climate stresses,to human adaptation to climate conditions or/and to recent vegetation restoration programs in the highly degraded areas.Furthermore,the AGB response to the human and climate factors modeled by GWR exhibited significant spatial variations.This study demonstrated that the combination of PLS-SEM and GWR model is feasible to investigate the cause-effect relation in socio-ecological systems.
Klystron-modulator system availability of PLS 2 GeV electron linac

International Nuclear Information System (INIS)

Cho, M.H.; Park, S.S.; Oh, J.S.; Namkung, W.

1996-01-01

PLS Linac has been injecting 2 GeV electron beams to the Pohang Light Source (PLS) storage ring since September 1994. PLS 2 GeV linac employs 11 sets of high power klystron-modulator (K and M) system for the main RF source for the beam acceleration. The klystron has rated output peak power of 80 MW at 4 microsec pulse width and at 60 pps. The matching modulator has 200 MW peak output power. The total accumulated high voltage run time of the oldest unit has reached beyond 23,000 hour and the sum of all the high voltage run time is approximately 230,000 hour as of May 1996. In this paper, we review overall system performance of the high-power K and M system. A special attention is paid on the analysis of all failures and troubles of the K and M system which affected the linac high power RF operations as well as beam injection operations for the period of 1994 to May 1996. (author)
Hybrid robust model based on an improved functional link neural network integrating with partial least square (IFLNN-PLS) and its application to predicting key process variables.

Science.gov (United States)

He, Yan-Lin; Xu, Yuan; Geng, Zhi-Qiang; Zhu, Qun-Xiong

2016-03-01

In this paper, a hybrid robust model based on an improved functional link neural network integrating with partial least square (IFLNN-PLS) is proposed. Firstly, an improved functional link neural network with small norm of expanded weights and high input-output correlation (SNEWHIOC-FLNN) was proposed for enhancing the generalization performance of FLNN. Unlike the traditional FLNN, the expanded variables of the original inputs are not directly used as the inputs in the proposed SNEWHIOC-FLNN model. The original inputs are attached to some small norm of expanded weights. As a result, the correlation coefficient between some of the expanded variables and the outputs is enhanced. The larger the correlation coefficient is, the more relevant the expanded variables tend to be. In the end, the expanded variables with larger correlation coefficient are selected as the inputs to improve the performance of the traditional FLNN. In order to test the proposed SNEWHIOC-FLNN model, three UCI (University of California, Irvine) regression datasets named Housing, Concrete Compressive Strength (CCS), and Yacht Hydro Dynamics (YHD) are selected. Then a hybrid model based on the improved FLNN integrating with partial least square (IFLNN-PLS) was built. In IFLNN-PLS model, the connection weights are calculated using the partial least square method but not the error back propagation algorithm. Lastly, IFLNN-PLS was developed as an intelligent measurement model for accurately predicting the key variables in the Purified Terephthalic Acid (PTA) process and the High Density Polyethylene (HDPE) process. Simulation results illustrated that the IFLNN-PLS could significant improve the prediction performance. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.
Two biased estimation techniques in linear regression: Application to aircraft

Science.gov (United States)

Klein, Vladislav

1988-01-01

Several ways for detection and assessment of collinearity in measured data are discussed. Because data collinearity usually results in poor least squares estimates, two estimation techniques which can limit a damaging effect of collinearity are presented. These two techniques, the principal components regression and mixed estimation, belong to a class of biased estimation techniques. Detection and assessment of data collinearity and the two biased estimation techniques are demonstrated in two examples using flight test data from longitudinal maneuvers of an experimental aircraft. The eigensystem analysis and parameter variance decomposition appeared to be a promising tool for collinearity evaluation. The biased estimators had far better accuracy than the results from the ordinary least squares technique.
Investigation of adulteration of sunflower oil with thermally deteriorated oil using Fourier transform mid-infrared spectroscopy and chemometrics

Directory of Open Access Journals (Sweden)

Joana Vilela

2015-12-01

Full Text Available Fourier transform infrared spectroscopy based on attenuated total reflectance sampling technique, combined with multivariate analysis methods was used to monitor the adulteration of pure sunflower oil (SO with thermally deteriorated oil (TDO. Contrary to published research, in this work, SO was thermally deteriorated in the absence of foodstuff. SO samples were exposed to temperatures between 125 and 225°C from 6 to 24 h. Quantification of adulteration of SO with TDO, based on principal components regression (PCR, partial least squares regression (PLS-R, and linear discriminant analysis (LDA applied to mid-infrared spectra and to their first and second derivatives is reported for the first time. Infrared frequencies associated with the biochemical differences between TDO samples deteriorated in different conditions were investigated by principal component analysis (PCA. LDA was effective in the twofold classification presence/absence of TDO in adulterated SO (with 5% V/V of less of TDO. It provided 93.7% correct classification for the calibration set and 91.3% correct classification when cross-validated. A detection limit of 1% V/V of TDO in SO was determined. Investigation of an external set of samples allowed the evaluation of the predictability of the models. The regression coefficient (R2 for prediction was 0.95 and 0.96 and the RMSE was 2.1 and 1.9% V/V when using the PCR or PLS-R models, respectively, and the first derivative of spectra. To the best of our knowledge, no investigation of adulteration of SO with TDO based on PCR, PLS-R, and LDA has been reported so far.
Multivariate nonparametric regression and visualization with R and applications to finance

CERN Document Server

Klemelä, Jussi

2014-01-01

A modern approach to statistical learning and its applications through visualization methods With a unique and innovative presentation, Multivariate Nonparametric Regression and Visualization provides readers with the core statistical concepts to obtain complete and accurate predictions when given a set of data. Focusing on nonparametric methods to adapt to the multiple types of data generatingmechanisms, the book begins with an overview of classification and regression. The book then introduces and examines various tested and proven visualization techniques for learning samples and functio
Analyses of polycyclic aromatic hydrocarbon (PAH) and chiral-PAH analogues-methyl-β-cyclodextrin guest-host inclusion complexes by fluorescence spectrophotometry and multivariate regression analysis.

Science.gov (United States)

Greene, LaVana; Elzey, Brianda; Franklin, Mariah; Fakayode, Sayo O

2017-03-05

The negative health impact of polycyclic aromatic hydrocarbons (PAHs) and differences in pharmacological activity of enantiomers of chiral molecules in humans highlights the need for analysis of PAHs and their chiral analogue molecules in humans. Herein, the first use of cyclodextrin guest-host inclusion complexation, fluorescence spectrophotometry, and chemometric approach to PAH (anthracene) and chiral-PAH analogue derivatives (1-(9-anthryl)-2,2,2-triflouroethanol (TFE)) analyses are reported. The binding constants (K b ), stoichiometry (n), and thermodynamic properties (Gibbs free energy (ΔG), enthalpy (ΔH), and entropy (ΔS)) of anthracene and enantiomers of TFE-methyl-β-cyclodextrin (Me-β-CD) guest-host complexes were also determined. Chemometric partial-least-square (PLS) regression analysis of emission spectra data of Me-β-CD-guest-host inclusion complexes was used for the determination of anthracene and TFE enantiomer concentrations in Me-β-CD-guest-host inclusion complex samples. The values of calculated K b and negative ΔG suggest the thermodynamic favorability of anthracene-Me-β-CD and enantiomeric of TFE-Me-β-CD inclusion complexation reactions. However, anthracene-Me-β-CD and enantiomer TFE-Me-β-CD inclusion complexations showed notable differences in the binding affinity behaviors and thermodynamic properties. The PLS regression analysis resulted in square-correlation-coefficients of 0.997530 or better and a low LOD of 3.81×10 -7 M for anthracene and 3.48×10 -8 M for TFE enantiomers at physiological conditions. Most importantly, PLS regression accurately determined the anthracene and TFE enantiomer concentrations with an average low error of 2.31% for anthracene, 4.44% for R-TFE and 3.60% for S-TFE. The results of the study are highly significant because of its high sensitivity and accuracy for analysis of PAH and chiral PAH analogue derivatives without the need of an expensive chiral column, enantiomeric resolution, or use of a polarized
Predicting blood β-hydroxybutyrate using milk Fourier transform infrared spectrum, milk composition, and producer-reported variables with multiple linear regression, partial least squares regression, and artificial neural network.

Science.gov (United States)

Pralle, R S; Weigel, K W; White, H M

2018-05-01

Prediction of postpartum hyperketonemia (HYK) using Fourier transform infrared (FTIR) spectrometry analysis could be a practical diagnostic option for farms because these data are now available from routine milk analysis during Dairy Herd Improvement testing. The objectives of this study were to (1) develop and evaluate blood β-hydroxybutyrate (BHB) prediction models using multivariate linear regression (MLR), partial least squares regression (PLS), and artificial neural network (ANN) methods and (2) evaluate whether milk FTIR spectrum (mFTIR)-based models are improved with the inclusion of test-day variables (mTest; milk composition and producer-reported data). Paired blood and milk samples were collected from multiparous cows 5 to 18 d postpartum at 3 Wisconsin farms (3,629 observations from 1,013 cows). Blood BHB concentration was determined by a Precision Xtra meter (Abbot Diabetes Care, Alameda, CA), and milk samples were analyzed by a privately owned laboratory (AgSource, Menomonie, WI) for components and FTIR spectrum absorbance. Producer-recorded variables were extracted from farm management software. A blood BHB ≥1.2 mmol/L was considered HYK. The data set was divided into a training set (n = 3,020) and an external testing set (n = 609). Model fitting was implemented with JMP 12 (SAS Institute, Cary, NC). A 5-fold cross-validation was performed on the training data set for the MLR, PLS, and ANN prediction methods, with square root of blood BHB as the dependent variable. Each method was fitted using 3 combinations of variables: mFTIR, mTest, or mTest + mFTIR variables. Models were evaluated based on coefficient of determination, root mean squared error, and area under the receiver operating characteristic curve. Four models (PLS-mTest + mFTIR, ANN-mFTIR, ANN-mTest, and ANN-mTest + mFTIR) were chosen for further evaluation in the testing set after fitting to the full training set. In the cross-validation analysis, model fit was greatest for ANN, followed
Quantitative analysis of red wine tannins using Fourier-transform mid-infrared spectrometry.

Science.gov (United States)

Fernandez, Katherina; Agosin, Eduardo

2007-09-05

Tannin content and composition are critical quality components of red wines. No spectroscopic method assessing these phenols in wine has been described so far. We report here a new method using Fourier transform mid-infrared (FT-MIR) spectroscopy and chemometric techniques for the quantitative analysis of red wine tannins. Calibration models were developed using protein precipitation and phloroglucinolysis as analytical reference methods. After spectra preprocessing, six different predictive partial least-squares (PLS) models were evaluated, including the use of interval selection procedures such as iPLS and CSMWPLS. PLS regression with full-range (650-4000 cm(-1)), second derivative of the spectra and phloroglucinolysis as the reference method gave the most accurate determination for tannin concentration (RMSEC = 2.6%, RMSEP = 9.4%, r = 0.995). The prediction of the mean degree of polymerization (mDP) of the tannins also gave a reasonable prediction (RMSEC = 6.7%, RMSEP = 10.3%, r = 0.958). These results represent the first step in the development of a spectroscopic methodology for the quantification of several phenolic compounds that are critical for wine quality.
Detection of Cyanuric Acid and Melamine in Infant Formula Powders by Mid-FTIR Spectroscopy and Multivariate Analysis

Directory of Open Access Journals (Sweden)

Edwin García-Miguel

2018-01-01

Full Text Available Chemometric methods using mid-FTIR spectroscopy were developed in order to reduce the time of study of melamine and cyanuric acid in infant formulas. Chemometric models were constructed using the algorithms Partial Least Squares (PLS1, PLS2 and Principal Component Regression (PCR in order to correlate the IR signal with the levels of melamine or cyanuric acid in the infant formula samples. Results showed that the best correlations were obtained using PLS1 (R2: 0.9998, SEC: 0.0793, and SEP: 0.5545 for melamine and R2: 0.9997, SEC: 0.1074, and SEP: 0.5021 for cyanuric acid. Also, the SIMCA model was studied to distinguish between adulterated formulas and nonadulterated samples, giving optimum discrimination and good interclass distances between samples. Results showed that chemometric models demonstrated a good predictive ability of melamine and cyanuric acid concentrations in infant formulas, showing that this is a rapid and accurate technique to be used in the identification and quantification of these adulterants in infant formulas.
Comparison of Machine Learning Techniques in Inferring Phytoplankton Size Classes

Directory of Open Access Journals (Sweden)

Shuibo Hu

2018-03-01

Full Text Available The size of phytoplankton not only influences its physiology, metabolic rates and marine food web, but also serves as an indicator of phytoplankton functional roles in ecological and biogeochemical processes. Therefore, some algorithms have been developed to infer the synoptic distribution of phytoplankton cell size, denoted as phytoplankton size classes (PSCs, in surface ocean waters, by the means of remotely sensed variables. This study, using the NASA bio-Optical Marine Algorithm Data set (NOMAD high performance liquid chromatography (HPLC database, and satellite match-ups, aimed to compare the effectiveness of modeling techniques, including partial least square (PLS, artificial neural networks (ANN, support vector machine (SVM and random forests (RF, and feature selection techniques, including genetic algorithm (GA, successive projection algorithm (SPA and recursive feature elimination based on support vector machine (SVM-RFE, for inferring PSCs from remote sensing data. Results showed that: (1 SVM-RFE worked better in selecting sensitive features; (2 RF performed better than PLS, ANN and SVM in calibrating PSCs retrieval models; (3 machine learning techniques produced better performance than the chlorophyll-a based three-component method; (4 sea surface temperature, wind stress, and spectral curvature derived from the remote sensing reflectance at 490, 510, and 555 nm were among the most sensitive features to PSCs; and (5 the combination of SVM-RFE feature selection techniques and random forests regression was recommended for inferring PSCs. This study demonstrated the effectiveness of machine learning techniques in selecting sensitive features and calibrating models for PSCs estimations with remote sensing.
Relative Importance for Linear Regression in R: The Package relaimpo

Directory of Open Access Journals (Sweden)

Ulrike Gromping

2006-09-01

Full Text Available Relative importance is a topic that has seen a lot of interest in recent years, particularly in applied work. The R package relaimpo implements six different metrics for assessing relative importance of regressors in the linear model, two of which are recommended - averaging over orderings of regressors and a newly proposed metric (Feldman 2005 called pmvd. Apart from delivering the metrics themselves, relaimpo also provides (exploratory bootstrap confidence intervals. This paper offers a brief tutorial introduction to the package. The methods and relaimpo’s functionality are illustrated using the data set swiss that is generally available in R. The paper targets readers who have a basic understanding of multiple linear regression. For the background of more advanced aspects, references are provided.
Sulfur Speciation of Crude Oils by Partial Least Squares Regression Modeling of Their Infrared Spectra

NARCIS (Netherlands)

de Peinder, P.; Visser, T.; Wagemans, R.W.P.; Blomberg, J.; Chaabani, H.; Soulimani, F.; Weckhuysen, B.M.

2013-01-01

Research has been carried out to determine the feasibility of partial least-squares regression (PLS) modeling of infrared (IR) spectra of crude oils as a tool for fast sulfur speciation. The study is a continuation of a previously developed method to predict long and short residue properties of
Improving the robustness of a partial least squares (PLS) model based on pure component selectivity analysis and range optimization: Case study for the analysis of an etching solution containing hydrogen peroxide

Energy Technology Data Exchange (ETDEWEB)

Lee, Youngbok [Department of Chemistry, College of Natural Sciences, Hanyang University Haengdang-Dong, Seoul 133-791 (Korea, Republic of); Chung, Hoeil [Department of Chemistry, College of Natural Sciences, Hanyang University Haengdang-Dong, Seoul 133-791 (Korea, Republic of)]. E-mail: hoeil@hanyang.ac.kr; Arnold, Mark A. [Optical Science and Technology Center and Department of Chemistry, University of Iowa, Iowa City, IA 52242 (United States)

2006-07-14

Pure component selectivity analysis (PCSA) was successfully utilized to enhance the robustness of a partial least squares (PLS) model by examining the selectivity of a given component to other components. The samples used in this study were composed of NH{sub 4}OH, H{sub 2}O{sub 2} and H{sub 2}O, a popular etchant solution in the electronic industry. Corresponding near-infrared (NIR) spectra (9000-7500 cm{sup -1}) were used to build PLS models. The selective determination of H{sub 2}O{sub 2} without influences from NH{sub 4}OH and H{sub 2}O was a key issue since its molecular structure is similar to that of H{sub 2}O and NH{sub 4}OH also has a hydroxyl functional group. The best spectral ranges for the determination of NH{sub 4}OH and H{sub 2}O{sub 2} were found with the use of moving window PLS (MW-PLS) and corresponding selectivity was examined by pure component selectivity analysis. The PLS calibration for NH{sub 4}OH was free from interferences from the other components due to the presence of its unique NH absorption bands. Since the spectral variation from H{sub 2}O{sub 2} was broadly overlapping and much less distinct than that from NH{sub 4}OH, the selectivity and prediction performance for the H{sub 2}O{sub 2} calibration were sensitively varied depending on the spectral ranges and number of factors used. PCSA, based on the comparison between regression vectors from PLS and the net analyte signal (NAS), was an effective method to prevent over-fitting of the H{sub 2}O{sub 2} calibration. A robust H{sub 2}O{sub 2} calibration model with minimal interferences from other components was developed. PCSA should be included as a standard method in PLS calibrations where prediction error only is the usual measure of performance.
Global classification of human facial healthy skin using PLS discriminant analysis and clustering analysis.

Science.gov (United States)

Guinot, C; Latreille, J; Tenenhaus, M; Malvy, D J

2001-04-01

Today's classifications of healthy skin are predominantly based on a very limited number of skin characteristics, such as skin oiliness or susceptibility to sun exposure. The aim of the present analysis was to set up a global classification of healthy facial skin, using mathematical models. This classification is based on clinical, biophysical skin characteristics and self-reported information related to the skin, as well as the results of a theoretical skin classification assessed separately for the frontal and the malar zones of the face. In order to maximize the predictive power of the models with a minimum of variables, the Partial Least Square (PLS) discriminant analysis method was used. The resulting PLS components were subjected to clustering analyses to identify the plausible number of clusters and to group the individuals according to their proximities. Using this approach, four PLS components could be constructed and six clusters were found relevant. So, from the 36 hypothetical combinations of the theoretical skin types classification, we tended to a strengthened six classes proposal. Our data suggest that the association of the PLS discriminant analysis and the clustering methods leads to a valid and simple way to classify healthy human skin and represents a potentially useful tool for cosmetic and dermatological research.
Technological Similarity, Post-acquisition R&D Reorganization, and Innovation Performance in Horizontal Acquisition

DEFF Research Database (Denmark)

Colombo, Massimo G.; Rabbiosi, Larissa

2014-01-01

This paper aims to disentangle the mechanisms through which technological similarity between acquiring and acquired firms influences innovation in horizontal acquisitions. We develop a theoretical model that links technological similarity to: (i) two key aspects of post-acquisition reorganization...... of acquired R&D operations – the rationalization of the R&D operations and the replacement of the R&D top manager, and (ii) two intermediate effects that are closely associated with the post-acquisition innovation performance of the combined firm – improvements in R&D productivity and disruptions in R......&D personnel. We rely on PLS techniques to test our theoretical model using detailed information on 31 horizontal acquisitions in high- and medium-tech industries. Our results indicate that in horizontal acquisitions, technological similarity negatively affects post-acquisition innovation performance...
Influence of the nature of soil organic matter on the sorption behaviour of pentadecane as determined by PLS analysis of mid-infrared DRIFT and solid-state {sup 13}C NMR spectra

Energy Technology Data Exchange (ETDEWEB)

Clark Ehlers, G.A. [Institute of Environmental Biotechnology, Department IFA-Tulln, University of Natural Resources and Applied Life Sciences, Vienna, Konrad Lorenz Str. 20, Tulln A-3430 (Austria); Forrester, Sean T. [CSIRO Land and Water, Waite Rd, Urrbrae SA 5064 (Australia); Scherr, Kerstin E. [Institute of Environmental Biotechnology, Department IFA-Tulln, University of Natural Resources and Applied Life Sciences, Vienna, Konrad Lorenz Str. 20, Tulln A-3430 (Austria); Loibner, Andreas P., E-mail: andreas.loibner@boku.ac.a [Institute of Environmental Biotechnology, Department IFA-Tulln, The University of Natural Resources and Applied Life Sciences, Vienna, Konrad Lorenz Str. 20, Tulln A-3430 (Austria); Janik, Les J. [CSIRO Land and Water, Waite Rd, Urrbrae SA 5064 (Australia)

2010-01-15

The nature of soil organic matter (SOM) functional groups associated with sorption processes was determined by correlating partitioning coefficients with solid-state {sup 13}C nuclear magnetic resonance (NMR) and diffuse reflectance mid-infrared (DRIFT) spectral features using partial least squares (PLS) regression analysis. Partitioning sorption coefficients for n-pentadecane (n-C{sub 15}) were determined for three alternative models: the Langmuir model, the dual distributed reactive domain model (DRDM) and the Freundlich model, where the latter was found to be the most appropriate. NMR-derived constitutional descriptors did not correlate with Freundlich model parameters. By contrast, PLS analysis revealed the most likely nature of the functional groups in SOM associated with n-C{sub 15} sorption coefficients (K{sub F}) to be aromatic, possibly porous soil char, rather than aliphatic organic components for the presently investigated soils. High PLS cross-validation correlation suggested that the model was robust for the purpose of characterising the functional group chemistry important for n-C{sub 15} sorption. - NMR/IR spectroscopy and chemometrics reveal the aromatic fraction of soil organic matter being responsible for alkane sorption.
Different frontal involvement in ALS and PLS revealed by Stroop event-related potentials and reaction times

Directory of Open Access Journals (Sweden)

Ninfa eAmato

2013-12-01

Full Text Available BACKGROUND: A growing body of evidence suggests a link between cognitive and pathological changes in amyotrophic lateral sclerosis (ALS and in frontotemporal lobar dementia (FTLD. Cognitive deficits have been investigated much less extensively in primary lateral sclerosis (PLS than in ALS. OBJECTIVE: to investigate bioelectrical activity to Stroop test, assessing frontal function, in ALS, PLS and control groups. METHODS: 32 non-demented ALS patients, 10 non-demented PLS patients and 27 healthy subjects were included. Twenty-nine electroencephalography (EEG channels with binaural reference were recorded during covert Stroop task performance, involving mental discrimination of the stimuli and not vocal or motor response. Group effects on event related potentials (ERPs latency were analyzed using statistical multivariate analysis. Topographic analysis was performed using low resolution brain electromagnetic tomography (LORETA. RESULTS: ALS patients committed more errors in the execution of the task but they were not slower, whereas PLS patients did not show reduced accuracy, despite a slowing of reaction times (RTs. The main ERP components were delayed in ALS, but not in PLS, compared with controls. Moreover, RTs speed but not ERP latency correlated with clinical scores. ALS had decreased frontotemporal activity in the P2, P3 and N4 time windows compared to controls. CONCLUSION: These findings suggest a different pattern of psychophysiological involvement in ALS compared with PLS. The former is increasingly recognized to be a multisystems disorder, with a spectrum of executive and behavioural impairments reflecting frontotemporal dysfunction. The latter seems to mainly involve the motor system, with largely spared cognitive functions. Moreover, our results suggest that the covert version of the Stroop task used in the present study, may be useful to assess cognitive state in the very advanced stage of the disease, when other cognitive tasks are not
Variable selection in near-infrared spectroscopy: Benchmarking of feature selection methods on biodiesel data

International Nuclear Information System (INIS)

Balabin, Roman M.; Smirnov, Sergey V.

2011-01-01

During the past several years, near-infrared (near-IR/NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields from petroleum to biomedical sectors. The NIR spectrum (above 4000 cm -1 ) of a sample is typically measured by modern instruments at a few hundred of wavelengths. Recently, considerable effort has been directed towards developing procedures to identify variables (wavelengths) that contribute useful information. Variable selection (VS) or feature selection, also called frequency selection or wavelength selection, is a critical step in data analysis for vibrational spectroscopy (infrared, Raman, or NIRS). In this paper, we compare the performance of 16 different feature selection methods for the prediction of properties of biodiesel fuel, including density, viscosity, methanol content, and water concentration. The feature selection algorithms tested include stepwise multiple linear regression (MLR-step), interval partial least squares regression (iPLS), backward iPLS (BiPLS), forward iPLS (FiPLS), moving window partial least squares regression (MWPLS), (modified) changeable size moving window partial least squares (CSMWPLS/MCSMWPLSR), searching combination moving window partial least squares (SCMWPLS), successive projections algorithm (SPA), uninformative variable elimination (UVE, including UVE-SPA), simulated annealing (SA), back-propagation artificial neural networks (BP-ANN), Kohonen artificial neural network (K-ANN), and genetic algorithms (GAs, including GA-iPLS). Two linear techniques for calibration model building, namely multiple linear regression (MLR) and partial least squares regression/projection to latent structures (PLS/PLSR), are used for the evaluation of biofuel properties. A comparison with a non-linear calibration model, artificial neural networks (ANN-MLP), is also provided. Discussion of gasoline, ethanol-gasoline (bioethanol), and diesel fuel data is presented. The results of other spectroscopic

Integrated Multiscale Latent Variable Regression and Application to Distillation Columns

Directory of Open Access Journals (Sweden)

Muddu Madakyaru

2013-01-01

Full Text Available Proper control of distillation columns requires estimating some key variables that are challenging to measure online (such as compositions, which are usually estimated using inferential models. Commonly used inferential models include latent variable regression (LVR techniques, such as principal component regression (PCR, partial least squares (PLS, and regularized canonical correlation analysis (RCCA. Unfortunately, measured practical data are usually contaminated with errors, which degrade the prediction abilities of inferential models. Therefore, noisy measurements need to be filtered to enhance the prediction accuracy of these models. Multiscale filtering has been shown to be a powerful feature extraction tool. In this work, the advantages of multiscale filtering are utilized to enhance the prediction accuracy of LVR models by developing an integrated multiscale LVR (IMSLVR modeling algorithm that integrates modeling and feature extraction. The idea behind the IMSLVR modeling algorithm is to filter the process data at different decomposition levels, model the filtered data from each level, and then select the LVR model that optimizes a model selection criterion. The performance of the developed IMSLVR algorithm is illustrated using three examples, one using synthetic data, one using simulated distillation column data, and one using experimental packed bed distillation column data. All examples clearly demonstrate the effectiveness of the IMSLVR algorithm over the conventional methods.
A thioesterase bypasses the requirement for exogenous fatty acids in the plsX deletion of Streptococcus pneumoniae

NARCIS (Netherlands)

Parsons, J.B.; Frank, M.W.; Eleveld, M.J.; Schalkwijk, J.; Broussard, T.C.; Jonge, M.I. de; Rock, C.O.

2015-01-01

PlsX is an acyl-acyl carrier protein (ACP):phosphate transacylase that interconverts the two acyl donors in Gram-positive bacterial phospholipid synthesis. The deletion of plsX in Staphylococcus aureus results in a requirement for both exogenous fatty acids and de novo type II fatty acid
An R package to compute commonality coefficients in the multiple regression case: an introduction to the package and a practical example.

Science.gov (United States)

Nimon, Kim; Lewis, Mitzi; Kane, Richard; Haynes, R Michael

2008-05-01

Multiple regression is a widely used technique for data analysis in social and behavioral research. The complexity of interpreting such results increases when correlated predictor variables are involved. Commonality analysis provides a method of determining the variance accounted for by respective predictor variables and is especially useful in the presence of correlated predictors. However, computing commonality coefficients is laborious. To make commonality analysis accessible to more researchers, a program was developed to automate the calculation of unique and common elements in commonality analysis, using the statistical package R. The program is described, and a heuristic example using data from the Holzinger and Swineford (1939) study, readily available in the MBESS R package, is presented.
Application of Genetic Algorithm (GA) Assisted Partial Least Square (PLS) Analysis on Trilinear and Non-trilinear Fluorescence Data Sets to Quantify the Fluorophores in Multifluorophoric Mixtures: Improving Quantification Accuracy of Fluorimetric Estimations of Dilute Aqueous Mixtures.

Science.gov (United States)

Kumar, Keshav

2018-03-29

Excitation-emission matrix fluorescence (EEMF) and total synchronous fluorescence spectroscopy (TSFS) are the 2 fluorescence techniques that are commonly used for the analysis of multifluorophoric mixtures. These 2 fluorescence techniques are conceptually different and provide certain advantages over each other. The manual analysis of such highly correlated large volume of EEMF and TSFS towards developing a calibration model is difficult. Partial least square (PLS) analysis can analyze the large volume of EEMF and TSFS data sets by finding important factors that maximize the correlation between the spectral and concentration information for each fluorophore. However, often the application of PLS analysis on entire data sets does not provide a robust calibration model and requires application of suitable pre-processing step. The present work evaluates the application of genetic algorithm (GA) analysis prior to PLS analysis on EEMF and TSFS data sets towards improving the precision and accuracy of the calibration model. The GA algorithm essentially combines the advantages provided by stochastic methods with those provided by deterministic approaches and can find the set of EEMF and TSFS variables that perfectly correlate well with the concentration of each of the fluorophores present in the multifluorophoric mixtures. The utility of the GA assisted PLS analysis is successfully validated using (i) EEMF data sets acquired for dilute aqueous mixture of four biomolecules and (ii) TSFS data sets acquired for dilute aqueous mixtures of four carcinogenic polycyclic aromatic hydrocarbons (PAHs) mixtures. In the present work, it is shown that by using the GA it is possible to significantly improve the accuracy and precision of the PLS calibration model developed for both EEMF and TSFS data set. Hence, GA must be considered as a useful pre-processing technique while developing an EEMF and TSFS calibration model.
Desenvolvimento de Modelos de Regressão Multivariada para a Quantificação de Benzoilmetronidazol na Presença de seus Produtos de Degradação por Espectroscopia no Infravermelho Próximo

Directory of Open Access Journals (Sweden)

Willian Ricardo da Rosa de Almeida

2015-12-01

Full Text Available Benzoyl metronidazole (BMZ is a drug with antiparasitic and antibacterial activity available in the form of pediatric suspensions. The BMZ main degradation products are metronidazole and benzoic acid, and there are no reports in the literature on the determination of BMZ in the presence of its degradation products using near infrared spectroscopy. Therefore, in this study a method for determining the content of BMZ pharmaceutical ingredient in the presence of its main degradation products by near infrared spectroscopy associated with multivariate calibration were to develop. Regression with variable selection methods such as partial least squares regression for interval (iPLS and partial least squares regression for synergism intervals (siPLS were applied in order to select spectral regions that produce models with smaller errors. The best model using the iPLS algorithm was obtained when the spectrum was divided into 12 sub-intervals and select a period 11 (RSEP% = 1.37. Once the spectrum has been divided into 16 intervals and combined subintervals 9, 13 and 18 yielded the best model for siPLS algorithm (RSEP = 1.30%. The proposed method can be considered selective; it allows determining the BMZ in the presence of its degradation products. DOI: http://dx.doi.org/10.17807/orbital.v7i4.741
Calibration sets selection strategy for the construction of robust PLS models for prediction of biodiesel/diesel blends physico-chemical properties using NIR spectroscopy

Science.gov (United States)

Palou, Anna; Miró, Aira; Blanco, Marcelo; Larraz, Rafael; Gómez, José Francisco; Martínez, Teresa; González, Josep Maria; Alcalà, Manel

2017-06-01

Even when the feasibility of using near infrared (NIR) spectroscopy combined with partial least squares (PLS) regression for prediction of physico-chemical properties of biodiesel/diesel blends has been widely demonstrated, inclusion in the calibration sets of the whole variability of diesel samples from diverse production origins still remains as an important challenge when constructing the models. This work presents a useful strategy for the systematic selection of calibration sets of samples of biodiesel/diesel blends from diverse origins, based on a binary code, principal components analysis (PCA) and the Kennard-Stones algorithm. Results show that using this methodology the models can keep their robustness over time. PLS calculations have been done using a specialized chemometric software as well as the software of the NIR instrument installed in plant, and both produced RMSEP under reproducibility values of the reference methods. The models have been proved for on-line simultaneous determination of seven properties: density, cetane index, fatty acid methyl esters (FAME) content, cloud point, boiling point at 95% of recovery, flash point and sulphur.
Multiple regression technique for Pth degree polynominals with and without linear cross products

Science.gov (United States)

Davis, J. W.

1973-01-01

A multiple regression technique was developed by which the nonlinear behavior of specified independent variables can be related to a given dependent variable. The polynomial expression can be of Pth degree and can incorporate N independent variables. Two cases are treated such that mathematical models can be studied both with and without linear cross products. The resulting surface fits can be used to summarize trends for a given phenomenon and provide a mathematical relationship for subsequent analysis. To implement this technique, separate computer programs were developed for the case without linear cross products and for the case incorporating such cross products which evaluate the various constants in the model regression equation. In addition, the significance of the estimated regression equation is considered and the standard deviation, the F statistic, the maximum absolute percent error, and the average of the absolute values of the percent of error evaluated. The computer programs and their manner of utilization are described. Sample problems are included to illustrate the use and capability of the technique which show the output formats and typical plots comparing computer results to each set of input data.
PLS models for determination of SARA analysis of Colombian vacuum residues and molecular distillation fractions using MIR-ATR

Directory of Open Access Journals (Sweden)

Jorge A. Orrego-Ruiz

2014-06-01

Full Text Available In this work, prediction models of Saturates, Aromatics, Resins and Asphaltenes fractions (SARA from thirty-seven vacuum residues of representative Colombian crudes and eighteen fractions of molecular distillation process were obtained. Mid-Infrared (MIR Attenuated Total Reflection (ATR spectroscopy in combination with partial least squares (PLS regression analysis was used to estimate accurately SARA analysis in these kind of samples. Calibration coefficients of prediction models were for saturates, aromatics, resins and asphaltenes fractions, 0.99, 0.96, 0.97 and 0.99, respectively. This methodology permits to control the molecular distillation process since small differences in chemical composition can be detected. Total time elapsed to give the SARA analysis per sample is 10 minutes.
Parafac and PLS Applied to Determination of Captopril in Pharmaceutical Preparation and Biological Fluids by Ultraviolet Spectrophotometry

International Nuclear Information System (INIS)

Niazi, A.; Ghasemi, N.

2007-01-01

A new ultraviolet spectrophotometric method has been developed for the direct qualitative determination of captopril in pharmaceutical preparation and biological fluids such as human plasma and urine samples. The method was accomplished based on parallel factor analysis (PARAFAC) and partial least squares (PLS). The study was carried out in the pH range from 2.0 to 12.8 and with a concentration from 0.70 to 61.50 μg ml -1 of captopril. Multivariate calibration models PLS at various pH and PARAFAC were elaborated from ultraviolet spectra deconvolution and captopril determination. The best models for this system were obtained with PARAFAC and PLS at pH = 2.04 (PLS-PH2). The applications of the method for the determination of real samples were evaluated by analysis of captopril in pharmaceutical preparations and biological (human plasma and urine) fluids with satisfactory results. The accuracy of the method, evaluated through the root mean square error of prediction (RMSEP), was 0.58 for captopril with PARAFAC and 0.67 for captopril with PLS-PH2 model. Acidity constant of captopril at 25 0 C and ionic strength of 0.1 M have also been determined spectrophotometrically. The obtained pK a values of captopril are 3.90 ± 0.05 and 10.03 ± 0.08 for pK a1 and pK a2 , respectively
AO–MW–PLS method applied to rapid quantification of teicoplanin with near-infrared spectroscopy

Directory of Open Access Journals (Sweden)

Jiemei Chen

2017-01-01

Full Text Available Teicoplanin (TCP is an important lipoglycopeptide antibiotic produced by fermenting Actinoplanes teichomyceticus. The change in TCP concentration is important to measure in the fermentation process. In this study, a reagent-free and rapid quantification method for TCP in the TCP–Tris–HCl mixture samples was developed using near-infrared (NIR spectroscopy by focusing our attention on the fermentation process for TCP. The absorbance optimization (AO partial least squares (PLS was proposed and integrated with the moving window (MW PLS, which is called AO–MW–PLS method, to select appropriate wavebands. A model set that includes various wavebands that were equivalent to the optimal AO–MW–PLS waveband was proposed based on statistical considerations. The public region of all equivalent wavebands was just one of the equivalent wavebands. The obtained public regions were 1540–1868nm for TCP and 1114–1310nm for Tris. The root-mean-square error and correlation coefficient for leave-one-out cross validation were 0.046mg mL−1 and 0.9998mg mL−1 for TCP, and 0.235mg mL−1 and 0.9986mg mL−1 for Tris, respectively. All the models achieved highly accurate prediction effects, and the selected wavebands provided valuable references for designing specialized spectrometers. This study provided a valuable reference for further application of the proposed methods to TCP fermentation broth and to other spectroscopic analysis fields.
Improved anomaly detection using multi-scale PLS and generalized likelihood ratio test

KAUST Repository

Madakyaru, Muddu

2017-02-16

Process monitoring has a central role in the process industry to enhance productivity, efficiency, and safety, and to avoid expensive maintenance. In this paper, a statistical approach that exploit the advantages of multiscale PLS models (MSPLS) and those of a generalized likelihood ratio (GLR) test to better detect anomalies is proposed. Specifically, to consider the multivariate and multi-scale nature of process dynamics, a MSPLS algorithm combining PLS and wavelet analysis is used as modeling framework. Then, GLR hypothesis testing is applied using the uncorrelated residuals obtained from MSPLS model to improve the anomaly detection abilities of these latent variable based fault detection methods even further. Applications to a simulated distillation column data are used to evaluate the proposed MSPLS-GLR algorithm.
Improved anomaly detection using multi-scale PLS and generalized likelihood ratio test

KAUST Repository

Madakyaru, Muddu; Harrou, Fouzi; Sun, Ying

2017-01-01

Process monitoring has a central role in the process industry to enhance productivity, efficiency, and safety, and to avoid expensive maintenance. In this paper, a statistical approach that exploit the advantages of multiscale PLS models (MSPLS) and those of a generalized likelihood ratio (GLR) test to better detect anomalies is proposed. Specifically, to consider the multivariate and multi-scale nature of process dynamics, a MSPLS algorithm combining PLS and wavelet analysis is used as modeling framework. Then, GLR hypothesis testing is applied using the uncorrelated residuals obtained from MSPLS model to improve the anomaly detection abilities of these latent variable based fault detection methods even further. Applications to a simulated distillation column data are used to evaluate the proposed MSPLS-GLR algorithm.
Classification of cassava starch films by physicochemical properties and water vapor permeability quantification by FTIR and PLS.

Science.gov (United States)

Henrique, C M; Teófilo, R F; Sabino, L; Ferreira, M M C; Cereda, M P

2007-05-01

Cassava starches are widely used in the production of biodegradable films, but their resistance to humidity migration is very low. In this work, commercial cassava starch films were studied and classified according to their physicochemical properties. A nondestructive method for water vapor permeability determination, which combines with infrared spectroscopy and multivariate calibration, is also presented. The following commercial cassava starches were studied: pregelatinized (amidomax 3550), carboxymethylated starch (CMA) of low and high viscosities, and esterified starches. To make the films, 2 different starch concentrations were evaluated, consisting of water suspensions with 3% and 5% starch. The filmogenic solutions were dried and characterized for their thickness, grammage, water vapor permeability, water activity, tensile strength (deformation force), water solubility, and puncture strength (deformation). The minimum thicknesses were 0.5 to 0.6 mm in pregelatinized starch films. The results were treated by means of the following chemometric methods: principal component analysis (PCA) and partial least squares (PLS) regression. PCA analysis on the physicochemical properties of the films showed that the differences in concentration of the dried material (3% and 5% starch) and also in the type of starch modification were mainly related to the following properties: permeability, solubility, and thickness. IR spectra collected in the region of 4000 to 600 cm(-1) were used to build a PLS model with good predictive power for water vapor permeability determination, with mean relative errors of 10.0% for cross-validation and 7.8% for the prediction set.
Propensity Score Estimation with Data Mining Techniques: Alternatives to Logistic Regression

Science.gov (United States)

Keller, Bryan S. B.; Kim, Jee-Seon; Steiner, Peter M.

2013-01-01

Propensity score analysis (PSA) is a methodological technique which may correct for selection bias in a quasi-experiment by modeling the selection process using observed covariates. Because logistic regression is well understood by researchers in a variety of fields and easy to implement in a number of popular software packages, it has…
The comparison of partial least squares and principal component regression in simultaneous spectrophotometric determination of ascorbic acid, dopamine and uric acid in real samples

Directory of Open Access Journals (Sweden)

Habiboallah Khajehsharifi

2017-05-01

Full Text Available Partial least squares (PLS1 and principal component regression (PCR are two multivariate calibration methods that allow simultaneous determination of several analytes in spite of their overlapping spectra. In this research, a spectrophotometric method using PLS1 is proposed for the simultaneous determination of ascorbic acid (AA, dopamine (DA and uric acid (UA. The linear concentration ranges for AA, DA and UA were 1.76–47.55, 0.57–22.76 and 1.68–28.58 (in μg mL−1, respectively. However, PLS1 and PCR were applied to design calibration set based on absorption spectra in the 250–320 nm range for 36 different mixtures of AA, DA and UA, in all cases, the PLS1 calibration method showed more quantitative prediction ability than PCR method. Cross validation method was used to select the optimum number of principal components (NPC. The NPC for AA, DA and UA was found to be 4 by PLS1 and 5, 12, 8 by PCR. Prediction error sum of squares (PRESS of AA, DA and UA were 1.2461, 1.1144, 2.3104 for PLS1 and 11.0563, 1.3819, 4.0956 for PCR, respectively. Satisfactory results were achieved for the simultaneous determination of AA, DA and UA in some real samples such as human urine, serum and pharmaceutical formulations.
Data mining techniques for thermophysical properties of refrigerants

International Nuclear Information System (INIS)

Kuecueksille, Ecir Ugur; Selbas, Resat; Sencan, Arzu

2009-01-01

This study presents ten modeling techniques within data mining process for the prediction of thermophysical properties of refrigerants (R134a, R404a, R407c and R410a). These are linear regression (LR), multi layer perception (MLP), pace regression (PR), simple linear regression (SLR), sequential minimal optimization (SMO), KStar, additive regression (AR), M5 model tree, decision table (DT), M5'Rules models. Relations depending on temperature and pressure were carried out for the determination of thermophysical properties as the specific heat capacity, viscosity, heat conduction coefficient, density of the refrigerants. Obtained model results for every refrigerant were compared and the best model was investigated. Results indicate that use of derived formulations from these techniques will facilitate design and optimize of heat exchangers which is component of especially vapor compression refrigeration system
Near Infrared Spectroscopy Calibration for Wood Chemistry: Which Chemometric Technique Is Best for Prediction and Interpretation?

OpenAIRE

Via, Brian K.; Zhou, Chengfeng; Acquah, Gifty; Jiang, Wei; Eckhardt, Lori

2014-01-01

This paper addresses the precision in factor loadings during partial least squares (PLS) and principal components regression (PCR) of wood chemistry content from near infrared reflectance (NIR) spectra. The precision of the loadings is considered important because these estimates are often utilized to interpret chemometric models or selection of meaningful wavenumbers. Standard laboratory chemistry methods were employed on a mixed genus/species hardwood sample set. PLS and PCR, before and af...
Mobility of the native Bacillus subtilis conjugative plasmid pLS20 is regulated by intercellular signaling.

Science.gov (United States)

Singh, Praveen K; Ramachandran, Gayetri; Ramos-Ruiz, Ricardo; Peiró-Pastor, Ramón; Abia, David; Wu, Ling J; Meijer, Wilfried J J

2013-10-01

Horizontal gene transfer mediated by plasmid conjugation plays a significant role in the evolution of bacterial species, as well as in the dissemination of antibiotic resistance and pathogenicity determinants. Characterization of their regulation is important for gaining insights into these features. Relatively little is known about how conjugation of Gram-positive plasmids is regulated. We have characterized conjugation of the native Bacillus subtilis plasmid pLS20. Contrary to the enterococcal plasmids, conjugation of pLS20 is not activated by recipient-produced pheromones but by pLS20-encoded proteins that regulate expression of the conjugation genes. We show that conjugation is kept in the default "OFF" state and identified the master repressor responsible for this. Activation of the conjugation genes requires relief of repression, which is mediated by an anti-repressor that belongs to the Rap family of proteins. Using both RNA sequencing methodology and genetic approaches, we have determined the regulatory effects of the repressor and anti-repressor on expression of the pLS20 genes. We also show that the activity of the anti-repressor is in turn regulated by an intercellular signaling peptide. Ultimately, this peptide dictates the timing of conjugation. The implications of this regulatory mechanism and comparison with other mobile systems are discussed.
Wind Power Ramp Events Prediction with Hybrid Machine Learning Regression Techniques and Reanalysis Data

Directory of Open Access Journals (Sweden)

Laura Cornejo-Bueno

2017-11-01

Full Text Available Wind Power Ramp Events (WPREs are large fluctuations of wind power in a short time interval, which lead to strong, undesirable variations in the electric power produced by a wind farm. Its accurate prediction is important in the effort of efficiently integrating wind energy in the electric system, without affecting considerably its stability, robustness and resilience. In this paper, we tackle the problem of predicting WPREs by applying Machine Learning (ML regression techniques. Our approach consists of using variables from atmospheric reanalysis data as predictive inputs for the learning machine, which opens the possibility of hybridizing numerical-physical weather models with ML techniques for WPREs prediction in real systems. Specifically, we have explored the feasibility of a number of state-of-the-art ML regression techniques, such as support vector regression, artificial neural networks (multi-layer perceptrons and extreme learning machines and Gaussian processes to solve the problem. Furthermore, the ERA-Interim reanalysis from the European Center for Medium-Range Weather Forecasts is the one used in this paper because of its accuracy and high resolution (in both spatial and temporal domains. Aiming at validating the feasibility of our predicting approach, we have carried out an extensive experimental work using real data from three wind farms in Spain, discussing the performance of the different ML regression tested in this wind power ramp event prediction problem.
Postharvest monitoring of organic potato (cv. Anuschka) during hot-air drying using visible-NIR hyperspectral imaging.

Science.gov (United States)

Moscetti, Roberto; Sturm, Barbara; Crichton, Stuart Oj; Amjad, Waseem; Massantini, Riccardo

2018-05-01

The potential of hyperspectral imaging (500-1010 nm) was evaluated for monitoring of the quality of potato slices (var. Anuschka) of 5, 7 and 9 mm thickness subjected to air drying at 50 °C. The study investigated three different feature selection methods for the prediction of dry basis moisture content and colour of potato slices using partial least squares regression (PLS). The feature selection strategies tested include interval PLS regression (iPLS), and differences and ratios between raw reflectance values for each possible pair of wavelengths (R[λ 1 ]-R[λ 2 ] and R[λ 1 ]:R[λ 2 ], respectively). Moreover, the combination of spectral and spatial domains was tested. Excellent results were obtained using the iPLS algorithm. However, features from both datasets of raw reflectance differences and ratios represent suitable alternatives for development of low-complex prediction models. Finally, the dry basis moisture content was high accurately predicted by combining spectral data (i.e. R[511 nm]-R[994 nm]) and spatial domain (i.e. relative area shrinkage of slice). Modelling the data acquired during drying through hyperspectral imaging can provide useful information concerning the chemical and physicochemical changes of the product. With all this information, the proposed approach lays the foundations for a more efficient smart dryer that can be designed and its process optimized for drying of potato slices. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.

The importance of the chosen technique to estimate diffuse solar radiation by means of regression

Energy Technology Data Exchange (ETDEWEB)

Arslan, Talha; Altyn Yavuz, Arzu [Department of Statistics. Science and Literature Faculty. Eskisehir Osmangazi University (Turkey)], email: mtarslan@ogu.edu.tr, email: aaltin@ogu.edu.tr; Acikkalp, Emin [Department of Mechanical and Manufacturing Engineering. Engineering Faculty. Bilecik University (Turkey)], email: acikkalp@gmail.com

2011-07-01

The Ordinary Least Squares (OLS) method is one of the most frequently used for estimation of diffuse solar radiation. The data set must provide certain assumptions for the OLS method to work. The most important is that the regression equation offered by OLS error terms must fit within the normal distribution. Utilizing an alternative robust estimator to get parameter estimations is highly effective in solving problems where there is a lack of normal distribution due to the presence of outliers or some other factor. The purpose of this study is to investigate the value of the chosen technique for the estimation of diffuse radiation. This study described alternative robust methods frequently used in applications and compared them with the OLS method. Making a comparison of the data set analysis of the OLS and that of the M Regression (Huber, Andrews and Tukey) techniques, it was study found that robust regression techniques are preferable to OLS because of the smoother explanation values.
Determination of Ethanol in Blood Samples Using Partial Least Square Regression Applied to Surface Enhanced Raman Spectroscopy.

Science.gov (United States)

Açikgöz, Güneş; Hamamci, Berna; Yildiz, Abdulkadir

2018-04-01

Alcohol consumption triggers toxic effect to organs and tissues in the human body. The risks are essentially thought to be related to ethanol content in alcoholic beverages. The identification of ethanol in blood samples requires rapid, minimal sample handling, and non-destructive analysis, such as Raman Spectroscopy. This study aims to apply Raman Spectroscopy for identification of ethanol in blood samples. Silver nanoparticles were synthesized to obtain Surface Enhanced Raman Spectroscopy (SERS) spectra of blood samples. The SERS spectra were used for Partial Least Square (PLS) for determining ethanol quantitatively. To apply PLS method, 920~820 cm -1 band interval was chosen and the spectral changes of the observed concentrations statistically associated with each other. The blood samples were examined according to this model and the quantity of ethanol was determined as that: first a calibration method was established. A strong relationship was observed between known concentration values and the values obtained by PLS method (R 2 = 1). Second instead of then, quantities of ethanol in 40 blood samples were predicted according to the calibration method. Quantitative analysis of the ethanol in the blood was done by analyzing the data obtained by Raman spectroscopy and the PLS method.
Building a new predictor for multiple linear regression technique-based corrective maintenance turnaround time.

Science.gov (United States)

Cruz, Antonio M; Barr, Cameron; Puñales-Pozo, Elsa

2008-01-01

This research's main goals were to build a predictor for a turnaround time (TAT) indicator for estimating its values and use a numerical clustering technique for finding possible causes of undesirable TAT values. The following stages were used: domain understanding, data characterisation and sample reduction and insight characterisation. Building the TAT indicator multiple linear regression predictor and clustering techniques were used for improving corrective maintenance task efficiency in a clinical engineering department (CED). The indicator being studied was turnaround time (TAT). Multiple linear regression was used for building a predictive TAT value model. The variables contributing to such model were clinical engineering department response time (CE(rt), 0.415 positive coefficient), stock service response time (Stock(rt), 0.734 positive coefficient), priority level (0.21 positive coefficient) and service time (0.06 positive coefficient). The regression process showed heavy reliance on Stock(rt), CE(rt) and priority, in that order. Clustering techniques revealed the main causes of high TAT values. This examination has provided a means for analysing current technical service quality and effectiveness. In doing so, it has demonstrated a process for identifying areas and methods of improvement and a model against which to analyse these methods' effectiveness.
Development and performance test of a new high power RF window in S-band PLS-II LINAC

Science.gov (United States)

Hwang, Woon-Ha; Joo, Young-Do; Kim, Seung-Hwan; Choi, Jae-Young; Noh, Sung-Ju; Ryu, Ji-Wan; Cho, Young-Ki

2017-12-01

A prototype of RF window was developed in collaboration with the Pohang Accelerator Laboratory (PAL) and domestic companies. High power performance tests of the single RF window were conducted at PAL to verify the operational characteristics for its application in the Pohang Light Source-II (PLS-II) linear accelerator (Linac). The tests were performed in the in-situ facility consisting of a modulator, klystron, waveguide network, vacuum system, cooling system, and RF analyzing equipment. The test results with Stanford linear accelerator energy doubler (SLED) have shown no breakdown up to 75 MW peak power with 4.5 μs RF pulse width at a repetition rate of 10 Hz. The test results with the current operation level of PLS-II Linac confirm that the RF window well satisfies the criteria for PLS-II Linac operation.
Differential item functioning analysis with ordinal logistic regression techniques. DIFdetect and difwithpar.

Science.gov (United States)

Crane, Paul K; Gibbons, Laura E; Jolley, Lance; van Belle, Gerald

2006-11-01

We present an ordinal logistic regression model for identification of items with differential item functioning (DIF) and apply this model to a Mini-Mental State Examination (MMSE) dataset. We employ item response theory ability estimation in our models. Three nested ordinal logistic regression models are applied to each item. Model testing begins with examination of the statistical significance of the interaction term between ability and the group indicator, consistent with nonuniform DIF. Then we turn our attention to the coefficient of the ability term in models with and without the group term. If including the group term has a marked effect on that coefficient, we declare that it has uniform DIF. We examined DIF related to language of test administration in addition to self-reported race, Hispanic ethnicity, age, years of education, and sex. We used PARSCALE for IRT analyses and STATA for ordinal logistic regression approaches. We used an iterative technique for adjusting IRT ability estimates on the basis of DIF findings. Five items were found to have DIF related to language. These same items also had DIF related to other covariates. The ordinal logistic regression approach to DIF detection, when combined with IRT ability estimates, provides a reasonable alternative for DIF detection. There appear to be several items with significant DIF related to language of test administration in the MMSE. More attention needs to be paid to the specific criteria used to determine whether an item has DIF, not just the technique used to identify DIF.
Interaction of PLS and PIN and hormonal crosstalk in Arabidopsis root developmentHormonal crosstalk in Arabidopsis

Directory of Open Access Journals (Sweden)

Junli eLiu

2013-04-01

Full Text Available Understanding how hormones and genes interact to coordinate plant growth is a major challenge in developmental biology. The activities of auxin, ethylene and cytokinin depend on cellular context and exhibit either synergistic or antagonistic interactions. Here we use experimentation and network construction to elucidate the role of the interaction of the POLARIS peptide (PLS and the auxin efflux carrier PIN proteins in the crosstalk of three hormones (auxin, ethylene and cytokinin in Arabidopsis root development. In ethylene hypersignalling mutants such as polaris (pls, we show experimentally that expression of both PIN1 and PIN2 significantly increases. This relationship is analysed in the context of the crosstalk between auxin, ethylene and cytokinin: in pls, endogenous auxin, ethylene and cytokinin concentration decreases, approximately remains unchanged and increases, respectively. Experimental data are integrated into a hormonal crosstalk network through combination with information in literature. Network construction reveals that the regulation of both PIN1 and PIN2 is predominantly via ethylene signalling. In addition, it is deduced that the relationship between cytokinin and PIN1 and PIN2 levels implies a regulatory role of cytokinin in addition to its regulation to auxin, ethylene and PLS levels. We discuss how the network of hormones and genes coordinates plant growth by simultaneously regulating the activities of auxin, ethylene and cytokinin signalling pathways.
Simultaneous chemometric determination of pyridoxine hydrochloride and isoniazid in tablets by multivariate regression methods.

Science.gov (United States)

Dinç, Erdal; Ustündağ, Ozgür; Baleanu, Dumitru

2010-08-01

The sole use of pyridoxine hydrochloride during treatment of tuberculosis gives rise to pyridoxine deficiency. Therefore, a combination of pyridoxine hydrochloride and isoniazid is used in pharmaceutical dosage form in tuberculosis treatment to reduce this side effect. In this study, two chemometric methods, partial least squares (PLS) and principal component regression (PCR), were applied to the simultaneous determination of pyridoxine (PYR) and isoniazid (ISO) in their tablets. A concentration training set comprising binary mixtures of PYR and ISO consisting of 20 different combinations were randomly prepared in 0.1 M HCl. Both multivariate calibration models were constructed using the relationships between the concentration data set (concentration data matrix) and absorbance data matrix in the spectral region 200-330 nm. The accuracy and the precision of the proposed chemometric methods were validated by analyzing synthetic mixtures containing the investigated drugs. The recovery results obtained by applying PCR and PLS calibrations to the artificial mixtures were found between 100.0 and 100.7%. Satisfactory results obtained by applying the PLS and PCR methods to both artificial and commercial samples were obtained. The results obtained in this manuscript strongly encourage us to use them for the quality control and the routine analysis of the marketing tablets containing PYR and ISO drugs. Copyright © 2010 John Wiley & Sons, Ltd.
New approach to breast cancer CAD using partial least squares and kernel-partial least squares

Science.gov (United States)

Land, Walker H., Jr.; Heine, John; Embrechts, Mark; Smith, Tom; Choma, Robert; Wong, Lut

2005-04-01

Breast cancer is second only to lung cancer as a tumor-related cause of death in women. Currently, the method of choice for the early detection of breast cancer is mammography. While sensitive to the detection of breast cancer, its positive predictive value (PPV) is low, resulting in biopsies that are only 15-34% likely to reveal malignancy. This paper explores the use of two novel approaches called Partial Least Squares (PLS) and Kernel-PLS (K-PLS) to the diagnosis of breast cancer. The approach is based on optimization for the partial least squares (PLS) algorithm for linear regression and the K-PLS algorithm for non-linear regression. Preliminary results show that both the PLS and K-PLS paradigms achieved comparable results with three separate support vector learning machines (SVLMs), where these SVLMs were known to have been trained to a global minimum. That is, the average performance of the three separate SVLMs were Az = 0.9167927, with an average partial Az (Az90) = 0.5684283. These results compare favorably with the K-PLS paradigm, which obtained an Az = 0.907 and partial Az = 0.6123. The PLS paradigm provided comparable results. Secondly, both the K-PLS and PLS paradigms out performed the ANN in that the Az index improved by about 14% (Az ~ 0.907 compared to the ANN Az of ~ 0.8). The "Press R squared" value for the PLS and K-PLS machine learning algorithms were 0.89 and 0.9, respectively, which is in good agreement with the other MOP values.
DEMAND FOR AND SUPPLY OF MARK-UP AND PLS FUNDS IN ISLAMIC BANKING: SOME ALTERNATIVE EXPLANATIONS

OpenAIRE

KHAN, TARIQULLAH

1995-01-01

Profit and loss-sharing (PLS) and bai’ al murabahah lil amir bil shira (mark-up) are the two parent principles of Islamic financing. The use of PLS is limited and that of mark-up overwhelming in the operations of the Islamic banks. Several studies provide different explanations for this phenomenon. The dominant among these is the moral hazard hypothesis. Some alternative explanations are given in the present paper. The discussion is based on both demand (user of funds) and supply (bank) side ...
truncSP: An R Package for Estimation of Semi-Parametric Truncated Linear Regression Models

Directory of Open Access Journals (Sweden)

Maria Karlsson

2014-05-01

Full Text Available Problems with truncated data occur in many areas, complicating estimation and inference. Regarding linear regression models, the ordinary least squares estimator is inconsistent and biased for these types of data and is therefore unsuitable for use. Alternative estimators, designed for the estimation of truncated regression models, have been developed. This paper presents the R package truncSP. The package contains functions for the estimation of semi-parametric truncated linear regression models using three different estimators: the symmetrically trimmed least squares, quadratic mode, and left truncated estimators, all of which have been shown to have good asymptotic and ?nite sample properties. The package also provides functions for the analysis of the estimated models. Data from the environmental sciences are used to illustrate the functions in the package.
Prediction of clinical depression scores and detection of changes in whole-brain using resting-state functional MRI data with partial least squares regression.

Directory of Open Access Journals (Sweden)

Kosuke Yoshida

Full Text Available In diagnostic applications of statistical machine learning methods to brain imaging data, common problems include data high-dimensionality and co-linearity, which often cause over-fitting and instability. To overcome these problems, we applied partial least squares (PLS regression to resting-state functional magnetic resonance imaging (rs-fMRI data, creating a low-dimensional representation that relates symptoms to brain activity and that predicts clinical measures. Our experimental results, based upon data from clinically depressed patients and healthy controls, demonstrated that PLS and its kernel variants provided significantly better prediction of clinical measures than ordinary linear regression. Subsequent classification using predicted clinical scores distinguished depressed patients from healthy controls with 80% accuracy. Moreover, loading vectors for latent variables enabled us to identify brain regions relevant to depression, including the default mode network, the right superior frontal gyrus, and the superior motor area.
Multivariate analysis of nystatin and metronidazole in a semi-solid matrix by means of diffuse reflectance NIR spectroscopy and PLS regression.

Science.gov (United States)

Baratieri, Sabrina C; Barbosa, Juliana M; Freitas, Matheus P; Martins, José A

2006-01-23

A multivariate method of analysis of nystatin and metronidazole in a semi-solid matrix, based on diffuse reflectance NIR measurements and partial least squares regression, is reported. The product, a vaginal cream used in the antifungal and antibacterial treatment, is usually, quantitatively analyzed through microbiological tests (nystatin) and HPLC technique (metronidazole), according to pharmacopeial procedures. However, near infrared spectroscopy has demonstrated to be a valuable tool for content determination, given the rapidity and scope of the method. In the present study, it was successfully applied in the prediction of nystatin (even in low concentrations, ca. 0.3-0.4%, w/w, which is around 100,000 IU/5g) and metronidazole contents, as demonstrated by some figures of merit, namely linearity, precision (mean and repeatability) and accuracy.
Multivariate reference technique for quantitative analysis of fiber-optic tissue Raman spectroscopy.

Science.gov (United States)

Bergholt, Mads Sylvest; Duraipandian, Shiyamala; Zheng, Wei; Huang, Zhiwei

2013-12-03

We report a novel method making use of multivariate reference signals of fused silica and sapphire Raman signals generated from a ball-lens fiber-optic Raman probe for quantitative analysis of in vivo tissue Raman measurements in real time. Partial least-squares (PLS) regression modeling is applied to extract the characteristic internal reference Raman signals (e.g., shoulder of the prominent fused silica boson peak (~130 cm(-1)); distinct sapphire ball-lens peaks (380, 417, 646, and 751 cm(-1))) from the ball-lens fiber-optic Raman probe for quantitative analysis of fiber-optic Raman spectroscopy. To evaluate the analytical value of this novel multivariate reference technique, a rapid Raman spectroscopy system coupled with a ball-lens fiber-optic Raman probe is used for in vivo oral tissue Raman measurements (n = 25 subjects) under 785 nm laser excitation powers ranging from 5 to 65 mW. An accurate linear relationship (R(2) = 0.981) with a root-mean-square error of cross validation (RMSECV) of 2.5 mW can be obtained for predicting the laser excitation power changes based on a leave-one-subject-out cross-validation, which is superior to the normal univariate reference method (RMSE = 6.2 mW). A root-mean-square error of prediction (RMSEP) of 2.4 mW (R(2) = 0.985) can also be achieved for laser power prediction in real time when we applied the multivariate method independently on the five new subjects (n = 166 spectra). We further apply the multivariate reference technique for quantitative analysis of gelatin tissue phantoms that gives rise to an RMSEP of ~2.0% (R(2) = 0.998) independent of laser excitation power variations. This work demonstrates that multivariate reference technique can be advantageously used to monitor and correct the variations of laser excitation power and fiber coupling efficiency in situ for standardizing the tissue Raman intensity to realize quantitative analysis of tissue Raman measurements in vivo, which is particularly appealing in
Application of near-infrared spectroscopy for the rapid quality assessment of Radix Paeoniae Rubra

Science.gov (United States)

Zhan, Hao; Fang, Jing; Tang, Liying; Yang, Hongjun; Li, Hua; Wang, Zhuju; Yang, Bin; Wu, Hongwei; Fu, Meihong

2017-08-01

Near-infrared (NIR) spectroscopy with multivariate analysis was used to quantify gallic acid, catechin, albiflorin, and paeoniflorin in Radix Paeoniae Rubra, and the feasibility to classify the samples originating from different areas was investigated. A new high-performance liquid chromatography method was developed and validated to analyze gallic acid, catechin, albiflorin, and paeoniflorin in Radix Paeoniae Rubra as the reference. Partial least squares (PLS), principal component regression (PCR), and stepwise multivariate linear regression (SMLR) were performed to calibrate the regression model. Different data pretreatments such as derivatives (1st and 2nd), multiplicative scatter correction, standard normal variate, Savitzky-Golay filter, and Norris derivative filter were applied to remove the systematic errors. The performance of the model was evaluated according to the root mean square of calibration (RMSEC), root mean square error of prediction (RMSEP), root mean square error of cross-validation (RMSECV), and correlation coefficient (r). The results show that compared to PCR and SMLR, PLS had a lower RMSEC, RMSECV, and RMSEP and higher r for all the four analytes. PLS coupled with proper pretreatments showed good performance in both the fitting and predicting results. Furthermore, the original areas of Radix Paeoniae Rubra samples were partly distinguished by principal component analysis. This study shows that NIR with PLS is a reliable, inexpensive, and rapid tool for the quality assessment of Radix Paeoniae Rubra.
Advanced statistics: linear regression, part I: simple linear regression.

Science.gov (United States)

Marill, Keith A

2004-01-01

Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.
VG2 URA PLS DERIVED SUMMARY ION FIT 48SEC V1.0

Data.gov (United States)

National Aeronautics and Space Administration — This data set contains the total ion density obtained from Voyager 2 PLS data (voltage range 10-5950 eV/Q) at Uranus by fitting the measured spectra with isotropic...
Variable selection based on clustering analysis for improvement of polyphenols prediction in green tea using synchronous fluorescence spectra

Science.gov (United States)

Shan, Jiajia; Wang, Xue; Zhou, Hao; Han, Shuqing; Riza, Dimas Firmanda Al; Kondo, Naoshi

2018-04-01

Synchronous fluorescence spectra, combined with multivariate analysis were used to predict flavonoids content in green tea rapidly and nondestructively. This paper presented a new and efficient spectral intervals selection method called clustering based partial least square (CL-PLS), which selected informative wavelengths by combining clustering concept and partial least square (PLS) methods to improve models’ performance by synchronous fluorescence spectra. The fluorescence spectra of tea samples were obtained and k-means and kohonen-self organizing map clustering algorithms were carried out to cluster full spectra into several clusters, and sub-PLS regression model was developed on each cluster. Finally, CL-PLS models consisting of gradually selected clusters were built. Correlation coefficient (R) was used to evaluate the effect on prediction performance of PLS models. In addition, variable influence on projection partial least square (VIP-PLS), selectivity ratio partial least square (SR-PLS), interval partial least square (iPLS) models and full spectra PLS model were investigated and the results were compared. The results showed that CL-PLS presented the best result for flavonoids prediction using synchronous fluorescence spectra.
VG2 URA PLS DERIVED RDR ION FIT 48SEC V1.0

Data.gov (United States)

National Aeronautics and Space Administration — This data set contains the ion densities and temperatures along with formal 1 Sigma errors obtained from Voyager 2 PLS data (voltage range 10-5950 eV/Q) at Uranus by...
Evaluation of the efficiency of continuous wavelet transform as processing and preprocessing algorithm for resolution of overlapped signals in univariate and multivariate regression analyses; an application to ternary and quaternary mixtures

Science.gov (United States)

Hegazy, Maha A.; Lotfy, Hayam M.; Mowaka, Shereen; Mohamed, Ekram Hany

2016-07-01

Wavelets have been adapted for a vast number of signal-processing applications due to the amount of information that can be extracted from a signal. In this work, a comparative study on the efficiency of continuous wavelet transform (CWT) as a signal processing tool in univariate regression and a pre-processing tool in multivariate analysis using partial least square (CWT-PLS) was conducted. These were applied to complex spectral signals of ternary and quaternary mixtures. CWT-PLS method succeeded in the simultaneous determination of a quaternary mixture of drotaverine (DRO), caffeine (CAF), paracetamol (PAR) and p-aminophenol (PAP, the major impurity of paracetamol). While, the univariate CWT failed to simultaneously determine the quaternary mixture components and was able to determine only PAR and PAP, the ternary mixtures of DRO, CAF, and PAR and CAF, PAR, and PAP. During the calculations of CWT, different wavelet families were tested. The univariate CWT method was validated according to the ICH guidelines. While for the development of the CWT-PLS model a calibration set was prepared by means of an orthogonal experimental design and their absorption spectra were recorded and processed by CWT. The CWT-PLS model was constructed by regression between the wavelet coefficients and concentration matrices and validation was performed by both cross validation and external validation sets. Both methods were successfully applied for determination of the studied drugs in pharmaceutical formulations.
Rapid evaluation technique to differentiate mushroom disease-related moulds by detecting microbial volatile organic compounds using HS-SPME-GC-MS.

Science.gov (United States)

Radványi, Dalma; Gere, Attila; Jókai, Zsuzsa; Fodor, Péter

2015-01-01

Headspace solid-phase microextraction (HS-SPME) coupled with gas chromatography-mass spectrometry (GC-MS) was used to analyse microbial volatile organic compounds (MVOCs) of mushroom disease-related microorganisms. Mycogone perniciosa, Lecanicillum fungicola var. fungicola, and Trichoderma aggressivum f. europaeum species, which are typically harmful in mushroom cultivation, were examined, and Agaricus bisporus (bisporic button mushroom) was also examined as a control. For internal standard, a mixture of alkanes was used; these were introduced as the memory effect of primed septa in the vial seal. Several different marker compounds were found in each sample, which enabled us to distinguish the different moulds and the mushroom mycelium from each other. Monitoring of marker compounds enabled us to investigate the behaviour of moulds. The records of the temporal pattern changes were used to produce partial least squares regression (PLS-R) models that enabled determination of the exact time of contamination (the infection time of the media). Using these evaluation techniques, the presence of mushroom disease-related fungi can be easily detected and monitored via their emitted MVOCs.

Real time flaw detection and characterization in tube through partial least squares and SVR: Application to eddy current testing

Science.gov (United States)

Ahmed, Shamim; Miorelli, Roberto; Calmon, Pierre; Anselmi, Nicola; Salucci, Marco

2018-04-01

This paper describes Learning-By-Examples (LBE) technique for performing quasi real time flaw localization and characterization within a conductive tube based on Eddy Current Testing (ECT) signals. Within the framework of LBE, the combination of full-factorial (i.e., GRID) sampling and Partial Least Squares (PLS) feature extraction (i.e., GRID-PLS) techniques are applied for generating a suitable training set in offine phase. Support Vector Regression (SVR) is utilized for model development and inversion during offine and online phases, respectively. The performance and robustness of the proposed GIRD-PLS/SVR strategy on noisy test set is evaluated and compared with standard GRID/SVR approach.
Selecting minimum dataset soil variables using PLSR as a regressive multivariate method

Science.gov (United States)

Stellacci, Anna Maria; Armenise, Elena; Castellini, Mirko; Rossi, Roberta; Vitti, Carolina; Leogrande, Rita; De Benedetto, Daniela; Ferrara, Rossana M.; Vivaldi, Gaetano A.

2017-04-01

) statistics was used to quantitatively assess the predictors most relevant for response variable estimation and then for variable selection (Andersen and Bro, 2010). PCA and SDA returned TOC and RFC as influential variables both on the set of chemical and physical data analyzed separately as well as on the whole dataset (Stellacci et al., 2016). Highly weighted variables in PCA were also TEC, followed by K, and AC, followed by Pmac and BD, in the first PC (41.2% of total variance); Olsen P and HA-FA in the second PC (12.6%), Ca in the third (10.6%) component. Variables enabling maximum discrimination among treatments for SDA were WEOC, on the whole dataset, humic substances, followed by Olsen P, EC and clay, in the separate data analyses. The highest PLS-VIP statistics were recorded for Olsen P and Pmac, followed by TOC, TEC, pH and Mg for chemical variables and clay, RFC and AC for the physical variables. Results show that different methods may provide different ranking of the selected variables and the presence of a response variable, in regressive techniques, may affect variable selection. Further investigation with different response variables and with multi-year datasets would allow to better define advantages and limits of single or combined approaches. Acknowledgment The work was supported by the projects "BIOTILLAGE, approcci innovative per il miglioramento delle performances ambientali e produttive dei sistemi cerealicoli no-tillage", financed by PSR-Basilicata 2007-2013, and "DESERT, Low-cost water desalination and sensor technology compact module" financed by ERANET-WATERWORKS 2014. References Andersen C.M. and Bro R., 2010. Variable selection in regression - a tutorial. Journal of Chemometrics, 24 728-737. Armenise et al., 2013. Developing a soil quality index to compare soil fitness for agricultural use under different managements in the mediterranean environment. Soil and Tillage Research, 130:91-98. de Paul Obade et al., 2016. A standardized soil quality index
Online Monitoring of Copper Damascene Electroplating Bath by Voltammetry: Selection of Variables for Multiblock and Hierarchical Chemometric Analysis of Voltammetric Data

Directory of Open Access Journals (Sweden)

Aleksander Jaworski

2017-01-01

Full Text Available The Real Time Analyzer (RTA utilizing DC- and AC-voltammetric techniques is an in situ, online monitoring system that provides a complete chemical analysis of different electrochemical deposition solutions. The RTA employs multivariate calibration when predicting concentration parameters from a multivariate data set. Although the hierarchical and multiblock Principal Component Regression- (PCR- and Partial Least Squares- (PLS- based methods can handle data sets even when the number of variables significantly exceeds the number of samples, it can be advantageous to reduce the number of variables to obtain improvement of the model predictions and better interpretation. This presentation focuses on the introduction of a multistep, rigorous method of data-selection-based Least Squares Regression, Simple Modeling of Class Analogy modeling power, and, as a novel application in electroanalysis, Uninformative Variable Elimination by PLS and by PCR, Variable Importance in the Projection coupled with PLS, Interval PLS, Interval PCR, and Moving Window PLS. Selection criteria of the optimum decomposition technique for the specific data are also demonstrated. The chief goal of this paper is to introduce to the community of electroanalytical chemists numerous variable selection methods which are well established in spectroscopy and can be successfully applied to voltammetric data analysis.
A Technique of Fuzzy C-Mean in Multiple Linear Regression Model toward Paddy Yield

Science.gov (United States)

Syazwan Wahab, Nur; Saifullah Rusiman, Mohd; Mohamad, Mahathir; Amira Azmi, Nur; Che Him, Norziha; Ghazali Kamardan, M.; Ali, Maselan

2018-04-01

In this paper, we propose a hybrid model which is a combination of multiple linear regression model and fuzzy c-means method. This research involved a relationship between 20 variates of the top soil that are analyzed prior to planting of paddy yields at standard fertilizer rates. Data used were from the multi-location trials for rice carried out by MARDI at major paddy granary in Peninsular Malaysia during the period from 2009 to 2012. Missing observations were estimated using mean estimation techniques. The data were analyzed using multiple linear regression model and a combination of multiple linear regression model and fuzzy c-means method. Analysis of normality and multicollinearity indicate that the data is normally scattered without multicollinearity among independent variables. Analysis of fuzzy c-means cluster the yield of paddy into two clusters before the multiple linear regression model can be used. The comparison between two method indicate that the hybrid of multiple linear regression model and fuzzy c-means method outperform the multiple linear regression model with lower value of mean square error.
Short-term stream flow forecasting at Australian river sites using data-driven regression techniques

CSIR Research Space (South Africa)

Steyn, Melise

2017-09-01

Full Text Available This study proposes a computationally efficient solution to stream flow forecasting for river basins where historical time series data are available. Two data-driven modeling techniques are investigated, namely support vector regression...
Techniques for trans-catheter retrieval of embolized Nit-Occlud® PDA-R and ASD-R devices.

Science.gov (United States)

Sinha, Sanjay; Levi, Daniel; Peirone, Alejandro; Pedra, Carlos

2018-02-15

Nit-Occlud ® (atrial septal defect) ASD-R and (patent ductus arteriosus) PDA-R devices are used outside the United States for percutaneous closure of the patent ductus arteriosus and atrial septal defects. When embolization occurs, these devices have been difficult to retrieve. Bench simulations of retrieval of PDA-R and ASD-R devices were performed in a vascular model. Retrieval of each device was attempted using snare techniques or with bioptome forceps with a range of devices. The same devices were then intentionally embolized in an animal model. Retrieval methods were systematically tested in a range of sheath sizes, and graded in terms of difficulty and retrieval time. Devices that were grasped by the bioptome in the center of the proximal part of the devices were easily retrieved in both models. Bench studies determined the minimum sheath sizes needed for retrieval of each device with this method. In general sheathes two french sizes greater than the delivery sheath were successful with this technique. Three out of the four PDA-R devices were successfully retrieved in vivo. Two were retrieved by grasping the middle of the PA end of the PDA-R device with a Maslanka bioptome and one small PDA-R device was retrieved using a 10 mm Snare. Four of the five ASD-R devices were retrieved successfully grasping the right atrial ASD-R disc or by passing a wire through the device and snaring this loop. For ASD-R 28 and 30 mm devices, a double bioptome technique was needed to retrieve the device. ASD-R and PDA-R devices can be successfully retrieved in the catheterization lab. It is critical to grab the center portion of the right atrial disc of the ASD-R device or pulmonary portion of the PDA-R device and to use adequately sized sheathes. © 2018 Wiley Periodicals, Inc.
A comparative QSAR study on the estrogenic activities of persistent organic pollutants by PLS and SVM

Directory of Open Access Journals (Sweden)

Fei Li

2015-11-01

Full Text Available Quantitative structure-activity relationships (QSARs were determined using partial least square (PLS and support vector machine (SVM. The predicted values by the final QSAR models were in good agreement with the corresponding experimental values. Chemical estrogenic activities are related to atomic properties (atomic Sanderson electronegativities, van der Waals volumes and polarizabilities. Comparison of the results obtained from two models, the SVM method exhibited better overall performances. Besides, three PLS models were constructed for some specific families based on their chemical structures. These predictive models should be useful to rapidly identify potential estrogenic endocrine disrupting chemicals.
Statistical learning techniques applied to epidemiology: a simulated case-control comparison study with logistic regression

Directory of Open Access Journals (Sweden)

Land Walker H

2011-01-01

Full Text Available Abstract Background When investigating covariate interactions and group associations with standard regression analyses, the relationship between the response variable and exposure may be difficult to characterize. When the relationship is nonlinear, linear modeling techniques do not capture the nonlinear information content. Statistical learning (SL techniques with kernels are capable of addressing nonlinear problems without making parametric assumptions. However, these techniques do not produce findings relevant for epidemiologic interpretations. A simulated case-control study was used to contrast the information embedding characteristics and separation boundaries produced by a specific SL technique with logistic regression (LR modeling representing a parametric approach. The SL technique was comprised of a kernel mapping in combination with a perceptron neural network. Because the LR model has an important epidemiologic interpretation, the SL method was modified to produce the analogous interpretation and generate odds ratios for comparison. Results The SL approach is capable of generating odds ratios for main effects and risk factor interactions that better capture nonlinear relationships between exposure variables and outcome in comparison with LR. Conclusions The integration of SL methods in epidemiology may improve both the understanding and interpretation of complex exposure/disease relationships.
Employing 3R Techniques in Managing Cement Industry Waste

Directory of Open Access Journals (Sweden)

Lamyaa Mohammed Dawood

2018-01-01

Full Text Available Waste management conserves human health, ownership, environment, and keeps valuable natural resources. Lean-green waste of an organization’s operations can be decreased through implementation 3R (Reduce, Reuse, and Recycling techniques by reduction of manufacturing system wastes. This research aims to integrate lean-green waste of the manufacturing system throughout employing 3R techniques and weighted properties method in order to manage waste. Al-Kufa cement plant is employed as a case study. Results are generated using Edraw Max Version 7 and Excel. Overall results show reduce technique of lean-green waste management has major contribution of 55 % and recycling technique has minor contribution 18 %. Defects waste has major integration of lean-green waste, while air emissions waste has minor integration of lean-green waste.
The influence of R and S configurations of a series of amphetamine derivatives on quantitative structure–activity relationship models

International Nuclear Information System (INIS)

Fresqui, Maíra A.C.; Ferreira, Márcia M.C.; Trsic, Milan

2013-01-01

Highlights: ► The QSAR model is not dependent of ligand conformation. ► Amphetamines were analyzed by quantum chemical, steric and hydrophobic descriptors. ► CHELPG atomic charges on the benzene ring are one of the most important descriptors. ► The PLS models built were extensively validated. ► Manual docking supports the QSAR results by pi–pi stacking interactions. - Abstract: Chiral molecules need special attention in drug design. In this sense, the R and S configurations of a series of thirty-four amphetamines were evaluated by quantitative structure–activity relationship (QSAR). This class of compounds has antidepressant, anti-Parkinson and anti-Alzheimer effects against the enzyme monoamine oxidase A (MAO A). A set of thirty-eight descriptors, including electronic, steric and hydrophobic ones, were calculated. Variable selection was performed through the correlation coefficients followed by the ordered predictor selection (OPS) algorithm. Six descriptors (CHELPG atomic charges C3, C4 and C5, electrophilicity, molecular surface area and log P) were selected for both configurations and a satisfactory model was obtained by PLS regression with three latent variables with R 2 = 0.73 and Q 2 = 0.60, with external predictability Q 2 = 0.68, and R 2 = 0.76 and Q 2 = 0.67 with external predictability Q 2 = 0.50, for R and S configurations, respectively. To confirm the robustness of each model, leave-N-out cross validation (LNO) was carried out and the y-randomization test was used to check if these models present chance correlation. Moreover, both automated or a manual molecular docking indicate that the reaction of ligands with the enzyme occurs via pi–pi stacking interaction with Tyr407, inclined face-to-face interaction with Tyr444, while aromatic hydrogen–hydrogen interactions with Tyr197 are preferable for R instead of S configurations.
Spatial Estimation of Losses Attributable to Meteorological Disasters in a Specific Area (105.0°E–115.0°E, 25°N–35°N Using Bayesian Maximum Entropy and Partial Least Squares Regression

Directory of Open Access Journals (Sweden)

F. S. Zhang

2016-01-01

Full Text Available The spatial mapping of losses attributable to such disasters is now well established as a means of describing the spatial patterns of disaster risk, and it has been shown to be suitable for many types of major meteorological disasters. However, few studies have been carried out by developing a regression model to estimate the effects of the spatial distribution of meteorological factors on losses associated with meteorological disasters. In this study, the proposed approach is capable of the following: (a estimating the spatial distributions of seven meteorological factors using Bayesian maximum entropy, (b identifying the four mapping methods used in this research with the best performance based on the cross validation, and (c establishing a fitted model between the PLS components and disaster losses information using partial least squares regression within a specific research area. The results showed the following: (a best mapping results were produced by multivariate Bayesian maximum entropy with probabilistic soft data; (b the regression model using three PLS components, extracted from seven meteorological factors by PLS method, was the most predictive by means of PRESS/SS test; (c northern Hunan Province sustains the most damage, and southeastern Gansu Province and western Guizhou Province sustained the least.
Estimating monotonic rates from biological data using local linear regression.

Science.gov (United States)

Olito, Colin; White, Craig R; Marshall, Dustin J; Barneche, Diego R

2017-03-01

Accessing many fundamental questions in biology begins with empirical estimation of simple monotonic rates of underlying biological processes. Across a variety of disciplines, ranging from physiology to biogeochemistry, these rates are routinely estimated from non-linear and noisy time series data using linear regression and ad hoc manual truncation of non-linearities. Here, we introduce the R package LoLinR, a flexible toolkit to implement local linear regression techniques to objectively and reproducibly estimate monotonic biological rates from non-linear time series data, and demonstrate possible applications using metabolic rate data. LoLinR provides methods to easily and reliably estimate monotonic rates from time series data in a way that is statistically robust, facilitates reproducible research and is applicable to a wide variety of research disciplines in the biological sciences. © 2017. Published by The Company of Biologists Ltd.
Accounting for estimated IQ in neuropsychological test performance with regression-based techniques.

Science.gov (United States)

Testa, S Marc; Winicki, Jessica M; Pearlson, Godfrey D; Gordon, Barry; Schretlen, David J

2009-11-01

Regression-based normative techniques account for variability in test performance associated with multiple predictor variables and generate expected scores based on algebraic equations. Using this approach, we show that estimated IQ, based on oral word reading, accounts for 1-9% of the variability beyond that explained by individual differences in age, sex, race, and years of education for most cognitive measures. These results confirm that adding estimated "premorbid" IQ to demographic predictors in multiple regression models can incrementally improve the accuracy with which regression-based norms (RBNs) benchmark expected neuropsychological test performance in healthy adults. It remains to be seen whether the incremental variance in test performance explained by estimated "premorbid" IQ translates to improved diagnostic accuracy in patient samples. We describe these methods, and illustrate the step-by-step application of RBNs with two cases. We also discuss the rationale, assumptions, and caveats of this approach. More broadly, we note that adjusting test scores for age and other characteristics might actually decrease the accuracy with which test performance predicts absolute criteria, such as the ability to drive or live independently.
Prediction of long-residue properties of potential blends from mathematically mixed infrared spectra of pure crude oils by partial least-squares regression models

NARCIS (Netherlands)

de Peinder, P.; Visser, T.; Petrauskas, D.D.; Salvatori, F.; Soulimani, F.; Weckhuysen, B.M.

2009-01-01

Research has been carried out to determine the feasibility of partial least-squares (PLS) regression models to predict the long-residue (LR) properties of potential blends from infrared (IR) spectra that have been created by linearly co-adding the IR spectra of crude oils. The study is the follow-up
Biostatistics Series Module 6: Correlation and Linear Regression.

Science.gov (United States)

Hazra, Avijit; Gogtay, Nithya

2016-01-01

Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient. If both variables x and y are normally distributed, we calculate Pearson's correlation coefficient ( r ). If normality assumption is not met for one or both variables in a correlation analysis, a rank correlation coefficient, such as Spearman's rho (ρ) may be calculated. A hypothesis test of correlation tests whether the linear relationship between the two variables holds in the underlying population, in which case it returns a P correlation coefficient can also be calculated for an idea of the correlation in the population. The value r 2 denotes the proportion of the variability of the dependent variable y that can be attributed to its linear relation with the independent variable x and is called the coefficient of determination. Linear regression is a technique that attempts to link two correlated variables x and y in the form of a mathematical equation ( y = a + bx ), such that given the value of one variable the other may be predicted. In general, the method of least squares is applied to obtain the equation of the regression line. Correlation and linear regression analysis are based on certain assumptions pertaining to the data sets. If these assumptions are not met, misleading conclusions may be drawn. The first assumption is that of linear relationship between the two variables. A scatter plot is essential before embarking on any correlation-regression analysis to show that this is indeed the case. Outliers or clustering within data sets can distort the correlation coefficient value. Finally, it is vital to remember that though strong correlation can be a pointer toward causation, the two are not synonymous.
Advanced statistics: linear regression, part II: multiple linear regression.

Science.gov (United States)

Marill, Keith A

2004-01-01

The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.
High-throughput prediction of tablet weight and trimethoprim content of compound sulfamethoxazole tablets for controlling the uniformity of dosage units by NIR.

Science.gov (United States)

Dong, Yanhong; Li, Juan; Zhong, Xiaoxiao; Cao, Liya; Luo, Yang; Fan, Qi

2016-04-15

This paper establishes a novel method to simultaneously predict the tablet weight (TW) and trimethoprim (TMP) content of compound sulfamethoxazole tablets (SMZCO) by near infrared (NIR) spectroscopy with partial least squares (PLS) regression for controlling the uniformity of dosage units (UODU). The NIR spectra for 257 samples were measured using the optimized parameter values and pretreated using the optimized chemometric techniques. After the outliers were ignored, two PLS models for predicting TW and TMP content were respectively established by using the selected spectral sub-ranges and the reference values. The TW model reaches the correlation coefficient of calibration (R(c)) 0.9543 and the TMP content model has the R(c) 0.9205. The experimental results indicate that this strategy expands the NIR application in controlling UODU, especially in the high-throughput and rapid analysis of TWs and contents of the compound pharmaceutical tablets, and may be an important complement to the common NIR on-line analytical method for pharmaceutical tablets. Copyright © 2016 Elsevier B.V. All rights reserved.
Using Partial Least-Squares Regression in Multivariate UV Spectroscopic Analysis ofMixtures of Imidazolium-Based Ionic Liquids and 1-Methylimidazole for Measurements of Liquid-Liquid Equilibria

Czech Academy of Sciences Publication Activity Database

Bendová, Magdalena; Sedláková, Zuzana; Andresová, Adéla; Wagner, Zdeněk

2012-01-01

Roč. 41, č. 12 (2012), s. 2164-2172 ISSN 0095-9782 R&D Projects: GA ČR GP203/09/P141; GA AV ČR IAA400720710 Institutional support: RVO:67985858 Keywords : room-temperature ionic liquids * PLS2 * uv spectroscopy Subject RIV: CF - Physical ; Theoretical Chemistry Impact factor: 1.128, year: 2012
Study on rapid valid acidity evaluation of apple by fiber optic diffuse reflectance technique

Science.gov (United States)

Liu, Yande; Ying, Yibin; Fu, Xiaping; Jiang, Xuesong

2004-03-01

Some issues related to nondestructive evaluation of valid acidity in intact apples by means of Fourier transform near infrared (FTNIR) (800-2631nm) method were addressed. A relationship was established between the diffuse reflectance spectra recorded with a bifurcated optic fiber and the valid acidity. The data were analyzed by multivariate calibration analysis such as partial least squares (PLS) analysis and principal component regression (PCR) technique. A total of 120 Fuji apples were tested and 80 of them were used to form a calibration data set. The influence of data preprocessing and different spectra treatments were also investigated. Models based on smoothing spectra were slightly worse than models based on derivative spectra and the best result was obtained when the segment length was 5 and the gap size was 10. Depending on data preprocessing and multivariate calibration technique, the best prediction model had a correlation efficient (0.871), a low RMSEP (0.0677), a low RMSEC (0.056) and a small difference between RMSEP and RMSEC by PLS analysis. The results point out the feasibility of FTNIR spectral analysis to predict the fruit valid acidity non-destructively. The ratio of data standard deviation to the root mean square error of prediction (SDR) is better to be less than 3 in calibration models, however, the results cannot meet the demand of actual application. Therefore, further study is required for better calibration and prediction.
A regression technique for evaluation and quantification for water quality parameters from remote sensing data

International Nuclear Information System (INIS)

Whitlock, C.H.; Kuo, C.Y.

1979-01-01

The paper attempts to define optical physics and/or environmental conditions under which the linear multiple-regression should be applicable. It is reported that investigation of the signal response shows that the exact solution for a number of optical physics conditions is of the same form as a linearized multiple-regression equation, even if nonlinear contributions from surface reflections, atmospheric constituents, or other water pollutants are included. Limitations on achieving this type of solution are defined. Laboratory data are used to demonstrate that the technique is applicable to water mixtures which contain constituents with both linear and nonlinear radiance gradients. Finally, it is concluded that instrument noise, ground-truth placement, and time lapse between remote sensor overpass and water sample operations are serious barriers to successful use of the technique

Determination of carbohydrates present in Saccharomyces cerevisiae using mid-infrared spectroscopy and partial least squares regression.

Science.gov (United States)

Plata, Maria R; Koch, Cosima; Wechselberger, Patrick; Herwig, Christoph; Lendl, Bernhard

2013-10-01

A fast and simple method to control variations in carbohydrate composition of Saccharomyces cerevisiae, baker's yeast, during fermentation was developed using mid-infrared (mid-IR) spectroscopy. The method allows for precise and accurate determinations with minimal or no sample preparation and reagent consumption based on mid-IR spectra and partial least squares (PLS) regression. The PLS models were developed employing the results from reference analysis of the yeast cells. The reference analyses quantify the amount of trehalose, glucose, glycogen, and mannan in S. cerevisiae. The selection and optimization of pretreatment steps of samples such as the disruption of the yeast cells and the hydrolysis of mannan and glycogen to obtain monosaccharides were carried out. Trehalose, glucose, and mannose were determined using high-performance liquid chromatography coupled with a refractive index detector and total carbohydrates were measured using the phenol-sulfuric method. Linear concentration range, accuracy, precision, LOD and LOQ were examined to check the reliability of the chromatographic method for each analyte.
Sensory and instrumental texture assessment of roasted pistachio nut/kernel by partial least square (PLS) regression analysis: effect of roasting conditions.

Science.gov (United States)

Mohammadi Moghaddam, Toktam; Razavi, Seyed M A; Taghizadeh, Masoud; Sazgarnia, Ameneh

2016-01-01

Roasting is an important step in the processing of pistachio nuts. The effect of hot air roasting temperature (90, 120 and 150 °C), time (20, 35 and 50 min) and air velocity (0.5, 1.5 and 2.5 m/s) on textural and sensory characteristics of pistachio nuts and kernels were investigated. The results showed that increasing the roasting temperature decreased the fracture force (82-25.54 N), instrumental hardness (82.76-37.59 N), apparent modulus of elasticity (47-21.22 N/s), compressive energy (280.73-101.18 N.s) and increased amount of bitterness (1-2.5) and the hardness score (6-8.40) of pistachio kernels. Higher roasting time improved the flavor of samples. The results of the consumer test showed that the roasted pistachio kernels have good acceptability for flavor (score 5.83-8.40), color (score 7.20-8.40) and hardness (score 6-8.40) acceptance. Moreover, Partial Least Square (PLS) analysis of instrumental and sensory data provided important information for the correlation of objective and subjective properties. The univariate analysis showed that over 93.87 % of the variation in sensory hardness and almost 87 % of the variation in sensory acceptability could be explained by instrumental texture properties.
Reliable and relevant modelling of real world data: a personal account of the development of PLS Regression

DEFF Research Database (Denmark)

Martens, Harald

2001-01-01

Why and how the Partial Least Squares Regression (PLSR) was developed, is here described from the author's perspective. The paper outlines my frustrating experiences in the 70'ies with two conflicting and equally over-ambitious and oversimplified modelling cultures - in traditional chemistry...
Modeling ionospheric foF 2 response during geomagnetic storms using neural network and linear regression techniques

Science.gov (United States)

Tshisaphungo, Mpho; Habarulema, John Bosco; McKinnell, Lee-Anne

2018-06-01

In this paper, the modeling of the ionospheric foF 2 changes during geomagnetic storms by means of neural network (NN) and linear regression (LR) techniques is presented. The results will lead to a valuable tool to model the complex ionospheric changes during disturbed days in an operational space weather monitoring and forecasting environment. The storm-time foF 2 data during 1996-2014 from Grahamstown (33.3°S, 26.5°E), South Africa ionosonde station was used in modeling. In this paper, six storms were reserved to validate the models and hence not used in the modeling process. We found that the performance of both NN and LR models is comparable during selected storms which fell within the data period (1996-2014) used in modeling. However, when validated on storm periods beyond 1996-2014, the NN model gives a better performance (R = 0.62) compared to LR model (R = 0.56) for a storm that reached a minimum Dst index of -155 nT during 19-23 December 2015. We also found that both NN and LR models are capable of capturing the ionospheric foF 2 responses during two great geomagnetic storms (28 October-1 November 2003 and 6-12 November 2004) which have been demonstrated to be difficult storms to model in previous studies.
Regresion PLS y PCA como Solucion al Problema de Multicolinealidad en Regresion Multiple

Directory of Open Access Journals (Sweden)

José Carlos Vega Vilca

2011-03-01

Full Text Available We present and compare principal components regression and partial least squares regression, and their solution to the problem of multicollinearity. We illustrate the use of both techniques, and demonstrate the superiority of partial least squares.
Prediction of wastewater quality using amperometric bioelectronic tongues

DEFF Research Database (Denmark)

Czolkos, Ilja; Dock, Eva; Tonning, Erik

2016-01-01

regression (PLS-R), we could link the sensor responses to the Microtox (R) toxicity parameter, as well as to global organic pollution parameters (COD, BOD, and TOC). From investigating the influences of individual sensors in the array, it was found that the best models were in most cases obtained when all...
A comparison of artificial neural networks with other statistical approaches for the prediction of true metabolizable energy of meat and bone meal.

Science.gov (United States)

Perai, A H; Nassiri Moghaddam, H; Asadpour, S; Bahrampour, J; Mansoori, Gh

2010-07-01

There has been a considerable and continuous interest to develop equations for rapid and accurate prediction of the ME of meat and bone meal. In this study, an artificial neural network (ANN), a partial least squares (PLS), and a multiple linear regression (MLR) statistical method were used to predict the TME(n) of meat and bone meal based on its CP, ether extract, and ash content. The accuracy of the models was calculated by R(2) value, MS error, mean absolute percentage error, mean absolute deviation, bias, and Theil's U. The predictive ability of an ANN was compared with a PLS and a MLR model using the same training data sets. The squared regression coefficients of prediction for the MLR, PLS, and ANN models were 0.38, 0.36, and 0.94, respectively. The results revealed that ANN produced more accurate predictions of TME(n) as compared with PLS and MLR methods. Based on the results of this study, ANN could be used as a promising approach for rapid prediction of nutritive value of meat and bone meal.
Instrumentation and control system for PLS-IM-T 60 MeV LINAC

International Nuclear Information System (INIS)

Liu, D.K.; Yei, K.R.; Cheng, H.J.

1992-01-01

The PLSIMT is a 60 MeV LINAC as a preinjector for 2 GeV LINAC of PLS project. The instrumentation and control system have been designed under the institutional collaboration between the IHEP (Beijing, China) and POSTECH (Pohang, Korea). So far, the I and C system are being set up nowadays at the POSTECH of Pohang. This paper describes its major characteristics and present status. (author)
Characterisation of PDO olive oil Chianti Classico by non-selective (UV–visible, NIR and MIR spectroscopy) and selective (fatty acid composition) analytical techniques

International Nuclear Information System (INIS)

Casale, M.; Oliveri, P.; Casolino, C.; Sinelli, N.; Zunin, P.; Armanino, C.; Forina, M.; Lanteri, S.

2012-01-01

Highlights: ► Characterisation of the Italian PDO extra virgin olive oil Chianti Classico. ► Comparison between non-selective (UV–vis, NIR and MIR spectroscopy) and selective (fatty acid composition) analytical techniques. ► Synergy among spectroscopic techniques, by the fusion of the respective spectra. ► Prediction of the content of oleic and linoleic acids in the olive oils. - Abstract: An authentication study of the Italian PDO (protected designation of origin) extra virgin olive oil Chianti Classico was performed; UV–visible (UV–vis), Near-Infrared (NIR) and Mid-Infrared (MIR) spectroscopies were applied to a set of samples representative of the whole Chianti Classico production area. The non-selective signals (fingerprints) provided by the three spectroscopic techniques were utilised both individually and jointly, after fusion of the respective profile vectors, in order to build a model for the Chianti Classico PDO olive oil. Moreover, these results were compared with those obtained by the gas chromatographic determination of the fatty acids composition. In order to characterise the olive oils produced in the Chianti Classico PDO area, UNEQ (unequal class models) and SIMCA (soft independent modelling of class analogy) were employed both on the MIR, NIR and UV–vis spectra, individually and jointly, and on the fatty acid composition. Finally, PLS (partial least square) regression was applied on the UV–vis, NIR and MIR spectra, in order to predict the content of oleic and linoleic acids in the extra virgin olive oils. UNEQ, SIMCA and PLS were performed after selection of the relevant predictors, in order to increase the efficiency of both classification and regression models. The non-selective information obtained from UV–vis, NIR and MIR spectroscopy allowed to build reliable models for checking the authenticity of the Italian PDO extra virgin olive oil Chianti Classico.
Characterisation of PDO olive oil Chianti Classico by non-selective (UV-visible, NIR and MIR spectroscopy) and selective (fatty acid composition) analytical techniques

Energy Technology Data Exchange (ETDEWEB)

Casale, M., E-mail: monica@dictfa.unige.it [Universita degli Studi di Genova, Department of Chemistry and Food and Pharmaceutical Technologies, Via Brigata Salerno 13, I-16147, Genoa (Italy); Oliveri, P.; Casolino, C. [Universita degli Studi di Genova, Department of Chemistry and Food and Pharmaceutical Technologies, Via Brigata Salerno 13, I-16147, Genoa (Italy); Sinelli, N. [Universita degli Studi di Milano, Department of Food Science and Technology, Via Celoria, 2 - I-20133 Milan (Italy); Zunin, P.; Armanino, C.; Forina, M.; Lanteri, S. [Universita degli Studi di Genova, Department of Chemistry and Food and Pharmaceutical Technologies, Via Brigata Salerno 13, I-16147, Genoa (Italy)

2012-01-27

Highlights: Black-Right-Pointing-Pointer Characterisation of the Italian PDO extra virgin olive oil Chianti Classico. Black-Right-Pointing-Pointer Comparison between non-selective (UV-vis, NIR and MIR spectroscopy) and selective (fatty acid composition) analytical techniques. Black-Right-Pointing-Pointer Synergy among spectroscopic techniques, by the fusion of the respective spectra. Black-Right-Pointing-Pointer Prediction of the content of oleic and linoleic acids in the olive oils. - Abstract: An authentication study of the Italian PDO (protected designation of origin) extra virgin olive oil Chianti Classico was performed; UV-visible (UV-vis), Near-Infrared (NIR) and Mid-Infrared (MIR) spectroscopies were applied to a set of samples representative of the whole Chianti Classico production area. The non-selective signals (fingerprints) provided by the three spectroscopic techniques were utilised both individually and jointly, after fusion of the respective profile vectors, in order to build a model for the Chianti Classico PDO olive oil. Moreover, these results were compared with those obtained by the gas chromatographic determination of the fatty acids composition. In order to characterise the olive oils produced in the Chianti Classico PDO area, UNEQ (unequal class models) and SIMCA (soft independent modelling of class analogy) were employed both on the MIR, NIR and UV-vis spectra, individually and jointly, and on the fatty acid composition. Finally, PLS (partial least square) regression was applied on the UV-vis, NIR and MIR spectra, in order to predict the content of oleic and linoleic acids in the extra virgin olive oils. UNEQ, SIMCA and PLS were performed after selection of the relevant predictors, in order to increase the efficiency of both classification and regression models. The non-selective information obtained from UV-vis, NIR and MIR spectroscopy allowed to build reliable models for checking the authenticity of the Italian PDO extra virgin olive oil
Sensitive Wavelengths Selection in Identification of Ophiopogon japonicus Based on Near-Infrared Hyperspectral Imaging Technology

Directory of Open Access Journals (Sweden)

Zhengyan Xia

2017-01-01

Full Text Available Hyperspectral imaging (HSI technology has increasingly been applied as an analytical tool in fields of agricultural, food, and Traditional Chinese Medicine over the past few years. The HSI spectrum of a sample is typically achieved by a spectroradiometer at hundreds of wavelengths. In recent years, considerable effort has been made towards identifying wavelengths (variables that contribute useful information. Wavelengths selection is a critical step in data analysis for Raman, NIRS, or HSI spectroscopy. In this study, the performances of 10 different wavelength selection methods for the discrimination of Ophiopogon japonicus of different origin were compared. The wavelength selection algorithms tested include successive projections algorithm (SPA, loading weights (LW, regression coefficients (RC, uninformative variable elimination (UVE, UVE-SPA, competitive adaptive reweighted sampling (CARS, interval partial least squares regression (iPLS, backward iPLS (BiPLS, forward iPLS (FiPLS, and genetic algorithms (GA-PLS. One linear technique (partial least squares-discriminant analysis was established for the evaluation of identification. And a nonlinear calibration model, support vector machine (SVM, was also provided for comparison. The results indicate that wavelengths selection methods are tools to identify more concise and effective spectral data and play important roles in the multivariate analysis, which can be used for subsequent modeling analysis.
"PowerPoint[R] Engagement" Techniques to Foster Deep Learning

Science.gov (United States)

Berk, Ronald A.

2011-01-01

The purpose of this article is to describe a bunch of strategies with which teachers may already be familiar and, perhaps, use regularly, but not always in the context of a formal PowerPoint[R] presentation. Here are the author's top 10 engagement techniques that fit neatly within any version of PowerPoint[R]. Some of these may also be used with…
Fast Measurement of Soluble Solid Content in Mango Based on Visible and Infrared Spectroscopy Technique

Science.gov (United States)

Yu, Jiajia; He, Yong

Mango is a kind of popular tropical fruit, and the soluble solid content is an important in this study visible and short-wave near-infrared spectroscopy (VIS/SWNIR) technique was applied. For sake of investigating the feasibility of using VIS/SWNIR spectroscopy to measure the soluble solid content in mango, and validating the performance of selected sensitive bands, for the calibration set was formed by 135 mango samples, while the remaining 45 mango samples for the prediction set. The combination of partial least squares and backpropagation artificial neural networks (PLS-BP) was used to calculate the prediction model based on raw spectrum data. Based on PLS-BP, the determination coefficient for prediction (Rp) was 0.757 and root mean square and the process is simple and easy to operate. Compared with the Partial least squares (PLS) result, the performance of PLS-BP is better.
The crucial role of the Pls1 tetraspanin during ascospore germination in Podospora anserina provides an example of the convergent evolution of morphogenetic processes in fungal plant pathogens and saprobes.

Science.gov (United States)

Lambou, Karine; Malagnac, Fabienne; Barbisan, Crystel; Tharreau, Didier; Lebrun, Marc-Henri; Silar, Philippe

2008-10-01

Pls1 tetraspanins were shown for some pathogenic fungi to be essential for appressorium-mediated penetration into their host plants. We show here that Podospora anserina, a saprobic fungus lacking appressorium, contains PaPls1, a gene orthologous to known PLS1 genes. Inactivation of PaPls1 demonstrates that this gene is specifically required for the germination of ascospores in P. anserina. These ascospores are heavily melanized cells that germinate under inducing conditions through a specific pore. On the contrary, MgPLS1, which fully complements a DeltaPaPls1 ascospore germination defect, has no role in the germination of Magnaporthe grisea nonmelanized ascospores but is required for the formation of the penetration peg at the pore of its melanized appressorium. P. anserina mutants with mutation of PaNox2, which encodes the NADPH oxidase of the NOX2 family, display the same ascospore-specific germination defect as the DeltaPaPls1 mutant. Both mutant phenotypes are suppressed by the inhibition of melanin biosynthesis, suggesting that they are involved in the same cellular process required for the germination of P. anserina melanized ascospores. The analysis of the distribution of PLS1 and NOX2 genes in fungal genomes shows that they are either both present or both absent. These results indicate that the germination of P. anserina ascospores and the formation of the M. grisea appressorium penetration peg use the same molecular machinery that includes Pls1 and Nox2. This machinery is specifically required for the emergence of polarized hyphae from reinforced structures such as appressoria and ascospores. Its recurrent recruitment during fungal evolution may account for some of the morphogenetic convergence observed in fungi.
Analysis of pork adulteration in beef meatball using Fourier transform infrared (FTIR) spectroscopy.

Science.gov (United States)

Rohman, A; Sismindari; Erwanto, Y; Che Man, Yaakob B

2011-05-01

Meatball is one of the favorite foods in Indonesia. The adulteration of pork in beef meatball is frequently occurring. This study was aimed to develop a fast and non destructive technique for the detection and quantification of pork in beef meatball using Fourier transform infrared (FTIR) spectroscopy and partial least square (PLS) calibration. The spectral bands associated with pork fat (PF), beef fat (BF), and their mixtures in meatball formulation were scanned, interpreted, and identified by relating them to those spectroscopically representative to pure PF and BF. For quantitative analysis, PLS regression was used to develop a calibration model at the selected fingerprint regions of 1200-1000 cm(-1). The equation obtained for the relationship between actual PF value and FTIR predicted values in PLS calibration model was y = 0.999x + 0.004, with coefficient of determination (R(2)) and root mean square error of calibration are 0.999 and 0.442, respectively. The PLS calibration model was subsequently used for the prediction of independent samples using laboratory made meatball samples containing the mixtures of BF and PF. Using 4 principal components, root mean square error of prediction is 0.742. The results showed that FTIR spectroscopy can be used for the detection and quantification of pork in beef meatball formulation for Halal verification purposes. Copyright © 2010 The American Meat Science Association. Published by Elsevier Ltd. All rights reserved.
Iterative random vs. Kennard-Stone sampling for IR spectrum-based classification task using PLS2-DA

Science.gov (United States)

Lee, Loong Chuen; Liong, Choong-Yeun; Jemain, Abdul Aziz

2018-04-01

External testing (ET) is preferred over auto-prediction (AP) or k-fold-cross-validation in estimating more realistic predictive ability of a statistical model. With IR spectra, Kennard-stone (KS) sampling algorithm is often used to split the data into training and test sets, i.e. respectively for model construction and for model testing. On the other hand, iterative random sampling (IRS) has not been the favored choice though it is theoretically more likely to produce reliable estimation. The aim of this preliminary work is to compare performances of KS and IRS in sampling a representative training set from an attenuated total reflectance - Fourier transform infrared spectral dataset (of four varieties of blue gel pen inks) for PLS2-DA modeling. The `best' performance achievable from the dataset is estimated with AP on the full dataset (APF, error). Both IRS (n = 200) and KS were used to split the dataset in the ratio of 7:3. The classic decision rule (i.e. maximum value-based) is employed for new sample prediction via partial least squares - discriminant analysis (PLS2-DA). Error rate of each model was estimated repeatedly via: (a) AP on full data (APF, error); (b) AP on training set (APS, error); and (c) ET on the respective test set (ETS, error). A good PLS2-DA model is expected to produce APS, error and EVS, error that is similar to the APF, error. Bearing that in mind, the similarities between (a) APS, error vs. APF, error; (b) ETS, error vs. APF, error and; (c) APS, error vs. ETS, error were evaluated using correlation tests (i.e. Pearson and Spearman's rank test), using series of PLS2-DA models computed from KS-set and IRS-set, respectively. Overall, models constructed from IRS-set exhibits more similarities between the internal and external error rates than the respective KS-set, i.e. less risk of overfitting. In conclusion, IRS is more reliable than KS in sampling representative training set.
Fungible weights in logistic regression.

Science.gov (United States)

Jones, Jeff A; Waller, Niels G

2016-06-01

In this article we develop methods for assessing parameter sensitivity in logistic regression models. To set the stage for this work, we first review Waller's (2008) equations for computing fungible weights in linear regression. Next, we describe 2 methods for computing fungible weights in logistic regression. To demonstrate the utility of these methods, we compute fungible logistic regression weights using data from the Centers for Disease Control and Prevention's (2010) Youth Risk Behavior Surveillance Survey, and we illustrate how these alternate weights can be used to evaluate parameter sensitivity. To make our work accessible to the research community, we provide R code (R Core Team, 2015) that will generate both kinds of fungible logistic regression weights. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Measurement of internal quality of watermelon by Vis/NIR diffuse transmittance technique

Science.gov (United States)

Tian, Haiqing; Xu, Huirong; Ying, Yibin; Lu, Huishan; Yu, Haiyan

2006-10-01

Watermelon is a popular fruit in the world. Soluble solids content (SSC) is major characteristic used for assessing watermelon internal quality. This study was about a method for nondestructive internal quality detection of watermelons by means of visible/Near Infrared (Vis/NIR) diffuse transmittance technique. Vis/NIR transmittance spectra of intact watermelons were acquired using a low-cost commercially available spectrometer when the watermelon was in motion (1.4m/s) and in static state. Spectra data were analyzed by partial least squares (PLS) method. The influences of different data preprocessing and spectra treatments were also investigated. Performance of different models was assessed in terms of root mean square errors of calibration (RMSEC), root mean square errors of prediction (RMSEP) and correlation coefficient (r) between the predicted and measured parameter values. Results showed that spectra data preprocessing influenced the performance of the calibration models and the PLS method can provide good results. The nondestructive Vis/NIR measurements provided good estimates of SSC index of watermelon both in motion and in static state, and the predicted values were highly correlated with destructively measured values. The results indicated the feasibility of Vis/NIR diffuse transmittance spectral analysis for predicting watermelon internal quality in a nondestructive way.
The influence of R and S configurations of a series of amphetamine derivatives on quantitative structure-activity relationship models

Energy Technology Data Exchange (ETDEWEB)

Fresqui, Maira A.C., E-mail: maira@iqsc.usp.br [Institute of Chemistry of Sao Carlos, University of Sao Paulo, Av. Trabalhador Sao-carlense, 400, POB 780, 13560-970 Sao Carlos, SP (Brazil); Ferreira, Marcia M.C., E-mail: marcia@iqm.unicamp.br [Institute of Chemistry, University of Campinas - UNICAMP, POB 6154, 13083-970 Campinas, SP (Brazil); Trsic, Milan, E-mail: cra612@gmail.com [Institute of Chemistry of Sao Carlos, University of Sao Paulo, Av. Trabalhador Sao-carlense, 400, POB 780, 13560-970 Sao Carlos, SP (Brazil)

2013-01-08

Highlights: Black-Right-Pointing-Pointer The QSAR model is not dependent of ligand conformation. Black-Right-Pointing-Pointer Amphetamines were analyzed by quantum chemical, steric and hydrophobic descriptors. Black-Right-Pointing-Pointer CHELPG atomic charges on the benzene ring are one of the most important descriptors. Black-Right-Pointing-Pointer The PLS models built were extensively validated. Black-Right-Pointing-Pointer Manual docking supports the QSAR results by pi-pi stacking interactions. - Abstract: Chiral molecules need special attention in drug design. In this sense, the R and S configurations of a series of thirty-four amphetamines were evaluated by quantitative structure-activity relationship (QSAR). This class of compounds has antidepressant, anti-Parkinson and anti-Alzheimer effects against the enzyme monoamine oxidase A (MAO A). A set of thirty-eight descriptors, including electronic, steric and hydrophobic ones, were calculated. Variable selection was performed through the correlation coefficients followed by the ordered predictor selection (OPS) algorithm. Six descriptors (CHELPG atomic charges C3, C4 and C5, electrophilicity, molecular surface area and log P) were selected for both configurations and a satisfactory model was obtained by PLS regression with three latent variables with R{sup 2} = 0.73 and Q{sup 2} = 0.60, with external predictability Q{sup 2} = 0.68, and R{sup 2} = 0.76 and Q{sup 2} = 0.67 with external predictability Q{sup 2} = 0.50, for R and S configurations, respectively. To confirm the robustness of each model, leave-N-out cross validation (LNO) was carried out and the y-randomization test was used to check if these models present chance correlation. Moreover, both automated or a manual molecular docking indicate that the reaction of ligands with the enzyme occurs via pi-pi stacking interaction with Tyr407, inclined face-to-face interaction with Tyr444, while aromatic hydrogen-hydrogen interactions with Tyr197 are preferable
MALDI-TOF-MS with PLS Modeling Enables Strain Typing of the Bacterial Plant Pathogen Xanthomonas axonopodis

Science.gov (United States)

Sindt, Nathan M.; Robison, Faith; Brick, Mark A.; Schwartz, Howard F.; Heuberger, Adam L.; Prenni, Jessica E.

2018-02-01

Matrix-assisted desorption/ionization time of flight mass spectrometry (MALDI-TOF-MS) is a fast and effective tool for microbial species identification. However, current approaches are limited to species-level identification even when genetic differences are known. Here, we present a novel workflow that applies the statistical method of partial least squares discriminant analysis (PLS-DA) to MALDI-TOF-MS protein fingerprint data of Xanthomonas axonopodis, an important bacterial plant pathogen of fruit and vegetable crops. Mass spectra of 32 X. axonopodis strains were used to create a mass spectral library and PLS-DA was employed to model the closely related strains. A robust workflow was designed to optimize the PLS-DA model by assessing the model performance over a range of signal-to-noise ratios (s/n) and mass filter (MF) thresholds. The optimized parameters were observed to be s/n = 3 and MF = 0.7. The model correctly classified 83% of spectra withheld from the model as a test set. A new decision rule was developed, termed the rolled-up Maximum Decision Rule (ruMDR), and this method improved identification rates to 92%. These results demonstrate that MALDI-TOF-MS protein fingerprints of bacterial isolates can be utilized to enable identification at the strain level. Furthermore, the open-source framework of this workflow allows for broad implementation across various instrument platforms as well as integration with alternative modeling and classification algorithms.

Towards a user-friendly brain-computer interface: initial tests in ALS and PLS patients.

Science.gov (United States)

Bai, Ou; Lin, Peter; Huang, Dandan; Fei, Ding-Yu; Floeter, Mary Kay

2010-08-01

Patients usually require long-term training for effective EEG-based brain-computer interface (BCI) control due to fatigue caused by the demands for focused attention during prolonged BCI operation. We intended to develop a user-friendly BCI requiring minimal training and less mental load. Testing of BCI performance was investigated in three patients with amyotrophic lateral sclerosis (ALS) and three patients with primary lateral sclerosis (PLS), who had no previous BCI experience. All patients performed binary control of cursor movement. One ALS patient and one PLS patient performed four-directional cursor control in a two-dimensional domain under a BCI paradigm associated with human natural motor behavior using motor execution and motor imagery. Subjects practiced for 5-10min and then participated in a multi-session study of either binary control or four-directional control including online BCI game over 1.5-2h in a single visit. Event-related desynchronization and event-related synchronization in the beta band were observed in all patients during the production of voluntary movement either by motor execution or motor imagery. The online binary control of cursor movement was achieved with an average accuracy about 82.1+/-8.2% with motor execution and about 80% with motor imagery, whereas offline accuracy was achieved with 91.4+/-3.4% with motor execution and 83.3+/-8.9% with motor imagery after optimization. In addition, four-directional cursor control was achieved with an accuracy of 50-60% with motor execution and motor imagery. Patients with ALS or PLS may achieve BCI control without extended training, and fatigue might be reduced during operation of a BCI associated with human natural motor behavior. The development of a user-friendly BCI will promote practical BCI applications in paralyzed patients. Copyright 2010 International Federation of Clinical Neurophysiology. All rights reserved.
Data Mining of Chemogenomics Data Using Bi-Modal PLS Methods and Chemical Interpretation for Molecular Design.

Science.gov (United States)

Hasegawa, Kiyoshi; Funatsu, Kimito

2014-12-01

Chemogenomics is a new strategy in drug discovery for interrogating all molecules capable of interacting with all biological targets. Because of the almost infinite number of drug-like organic molecules, bench-based experimental chemogenomics methods are not generally feasible. Several in silico chemogenomics models have therefore been developed for high-throughput screening of large numbers of drug candidate compounds and target proteins. In previous studies, we described two novel bi-modal PLS approaches. These methods provide a significant advantage in that they enable direct connections to be made between biological activities and ligand and protein descriptors. In this special issue, we review these two PLS-based approaches using two different chemogenomics datasets for illustration. We then compare the predictive and interpretive performance of the two methods using the same congeneric data set. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Quantification of trace metals in infant formula premixes using laser-induced breakdown spectroscopy

Science.gov (United States)

Cama-Moncunill, Raquel; Casado-Gavalda, Maria P.; Cama-Moncunill, Xavier; Markiewicz-Keszycka, Maria; Dixit, Yash; Cullen, Patrick J.; Sullivan, Carl

2017-09-01

Infant formula is a human milk substitute generally based upon fortified cow milk components. In order to mimic the composition of breast milk, trace elements such as copper, iron and zinc are usually added in a single operation using a premix. The correct addition of premixes must be verified to ensure that the target levels in infant formulae are achieved. In this study, a laser-induced breakdown spectroscopy (LIBS) system was assessed as a fast validation tool for trace element premixes. LIBS is a promising emission spectroscopic technique for elemental analysis, which offers real-time analyses, little to no sample preparation and ease of use. LIBS was employed for copper and iron determinations of premix samples ranging approximately from 0 to 120 mg/kg Cu/1640 mg/kg Fe. LIBS spectra are affected by several parameters, hindering subsequent quantitative analyses. This work aimed at testing three matrix-matched calibration approaches (simple-linear regression, multi-linear regression and partial least squares regression (PLS)) as means for precision and accuracy enhancement of LIBS quantitative analysis. All calibration models were first developed using a training set and then validated with an independent test set. PLS yielded the best results. For instance, the PLS model for copper provided a coefficient of determination (R2) of 0.995 and a root mean square error of prediction (RMSEP) of 14 mg/kg. Furthermore, LIBS was employed to penetrate through the samples by repetitively measuring the same spot. Consequently, LIBS spectra can be obtained as a function of sample layers. This information was used to explore whether measuring deeper into the sample could reduce possible surface-contaminant effects and provide better quantifications.
Modelos de regressão multivariada empregando seleção de intervalos para a quantificação do biodiesel em blendas biodiesel/diesel

Directory of Open Access Journals (Sweden)

Marco Flôres Ferrão

2010-01-01

Full Text Available No presente trabalho foram analisados e comparados modelos de regressão multivariados por mínimos quadrados parciais porintervalo (iPLS e por mínimos quadrados parciais por exclusão (biPLS que selecionaram regiões do espectro mais adequadas,retirando informações não relevantes e otimizando o modelo de calibração, a fim de determinar a concentração de biodiesel emblendas de biodiesel/diesel a partir de dados obtidos por espectroscopia no infravermelho por reflectância total atenuada (HATRFTIR.Foram utilizadas 45 amostras de blendas biodiesel/diesel com concentrações de 8 a 30% de biodiesel e os espectros foramadquiridos em dois distintos espectrofotômetros e misturados aleatoriamente para a realização dos modelos, onde foram construídosmodelos para calibração utilizando 2/3 dos espectros das amostras obtendo assim os valores de RMSECV, e o restante dos espectrosforam empregados no conjunto de previsão, obtendo então os valores de RMSEP. Os dados espectrais foram autoescalados (AUTOou centrados na média (MEAN, com ou sem o emprego da correção multiplicativa de sinal (MSC. A utilização dos métodos deseleção das faixas espectrais aplicados aos espectros por ATR se mostrou viável para a quantificação do biodiesel nas blendas, sendoque a utilização da espectroscopia no infravermelho apresenta vantagens como à necessidade de pequena quantidade de amostra ebaixo tempo de análise, além de ser um procedimento não destrutivo e não gerador de resíduos, otimizando assim o processo emquestão.Abstract In the present work multivariate regressionmodels using interval partial least square (iPLS and backwardinterval partial least square (biPLS had been analyzed andcompared. iPLS and biPLS models had been developed todetermine the concentration of biodiesel in blends ofbiodiesel/diesel using infrared spectroscopy signals. 45samples with concentrations in range 8-30% of biodiesel, andtwo distinct spectrophotometers were
An Assessment of Polynomial Regression Techniques for the Relative Radiometric Normalization (RRN of High-Resolution Multi-Temporal Airborne Thermal Infrared (TIR Imagery

Directory of Open Access Journals (Sweden)

Mir Mustafizur Rahman

2014-11-01

Full Text Available Thermal Infrared (TIR remote sensing images of urban environments are increasingly available from airborne and satellite platforms. However, limited access to high-spatial resolution (H-res: ~1 m TIR satellite images requires the use of TIR airborne sensors for mapping large complex urban surfaces, especially at micro-scales. A critical limitation of such H-res mapping is the need to acquire a large scene composed of multiple flight lines and mosaic them together. This results in the same scene components (e.g., roads, buildings, green space and water exhibiting different temperatures in different flight lines. To mitigate these effects, linear relative radiometric normalization (RRN techniques are often applied. However, the Earth’s surface is composed of features whose thermal behaviour is characterized by complexity and non-linearity. Therefore, we hypothesize that non-linear RRN techniques should demonstrate increased radiometric agreement over similar linear techniques. To test this hypothesis, this paper evaluates four (linear and non-linear RRN techniques, including: (i histogram matching (HM; (ii pseudo-invariant feature-based polynomial regression (PIF_Poly; (iii no-change stratified random sample-based linear regression (NCSRS_Lin; and (iv no-change stratified random sample-based polynomial regression (NCSRS_Poly; two of which (ii and iv are newly proposed non-linear techniques. When applied over two adjacent flight lines (~70 km2 of TABI-1800 airborne data, visual and statistical results show that both new non-linear techniques improved radiometric agreement over the previously evaluated linear techniques, with the new fully-automated method, NCSRS-based polynomial regression, providing the highest improvement in radiometric agreement between the master and the slave images, at ~56%. This is ~5% higher than the best previously evaluated linear technique (NCSRS-based linear regression.
pulver: an R package for parallel ultra-rapid p-value computation for linear regression interaction terms.

Science.gov (United States)

Molnos, Sophie; Baumbach, Clemens; Wahl, Simone; Müller-Nurasyid, Martina; Strauch, Konstantin; Wang-Sattler, Rui; Waldenberger, Melanie; Meitinger, Thomas; Adamski, Jerzy; Kastenmüller, Gabi; Suhre, Karsten; Peters, Annette; Grallert, Harald; Theis, Fabian J; Gieger, Christian

2017-09-29

Genome-wide association studies allow us to understand the genetics of complex diseases. Human metabolism provides information about the disease-causing mechanisms, so it is usual to investigate the associations between genetic variants and metabolite levels. However, only considering genetic variants and their effects on one trait ignores the possible interplay between different "omics" layers. Existing tools only consider single-nucleotide polymorphism (SNP)-SNP interactions, and no practical tool is available for large-scale investigations of the interactions between pairs of arbitrary quantitative variables. We developed an R package called pulver to compute p-values for the interaction term in a very large number of linear regression models. Comparisons based on simulated data showed that pulver is much faster than the existing tools. This is achieved by using the correlation coefficient to test the null-hypothesis, which avoids the costly computation of inversions. Additional tricks are a rearrangement of the order, when iterating through the different "omics" layers, and implementing this algorithm in the fast programming language C++. Furthermore, we applied our algorithm to data from the German KORA study to investigate a real-world problem involving the interplay among DNA methylation, genetic variants, and metabolite levels. The pulver package is a convenient and rapid tool for screening huge numbers of linear regression models for significant interaction terms in arbitrary pairs of quantitative variables. pulver is written in R and C++, and can be downloaded freely from CRAN at https://cran.r-project.org/web/packages/pulver/ .
The Crucial Role of the Pls1 Tetraspanin during Ascospore Germination in Podospora anserina Provides an Example of the Convergent Evolution of Morphogenetic Processes in Fungal Plant Pathogens and Saprobes▿ †

Science.gov (United States)

Lambou, Karine; Malagnac, Fabienne; Barbisan, Crystel; Tharreau, Didier; Lebrun, Marc-Henri; Silar, Philippe

2008-01-01

Pls1 tetraspanins were shown for some pathogenic fungi to be essential for appressorium-mediated penetration into their host plants. We show here that Podospora anserina, a saprobic fungus lacking appressorium, contains PaPls1, a gene orthologous to known PLS1 genes. Inactivation of PaPls1 demonstrates that this gene is specifically required for the germination of ascospores in P. anserina. These ascospores are heavily melanized cells that germinate under inducing conditions through a specific pore. On the contrary, MgPLS1, which fully complements a ΔPaPls1 ascospore germination defect, has no role in the germination of Magnaporthe grisea nonmelanized ascospores but is required for the formation of the penetration peg at the pore of its melanized appressorium. P. anserina mutants with mutation of PaNox2, which encodes the NADPH oxidase of the NOX2 family, display the same ascospore-specific germination defect as the ΔPaPls1 mutant. Both mutant phenotypes are suppressed by the inhibition of melanin biosynthesis, suggesting that they are involved in the same cellular process required for the germination of P. anserina melanized ascospores. The analysis of the distribution of PLS1 and NOX2 genes in fungal genomes shows that they are either both present or both absent. These results indicate that the germination of P. anserina ascospores and the formation of the M. grisea appressorium penetration peg use the same molecular machinery that includes Pls1 and Nox2. This machinery is specifically required for the emergence of polarized hyphae from reinforced structures such as appressoria and ascospores. Its recurrent recruitment during fungal evolution may account for some of the morphogenetic convergence observed in fungi. PMID:18757568
Assessing a moderating effect and the global fit of a PLS model on online trading

Directory of Open Access Journals (Sweden)

Juan J. García-Machado

2017-12-01

Full Text Available This paper proposes a PLS Model for the study of Online Trading. Traditional investing has experienced a revolution due to the rise of e-trading services that enable investors to use Internet conduct secure trading. On the hand, model results show that there is a positive, direct and statistically significant relationship between personal outcome expectations, perceived relative advantage, shared vision and economy-based trust with the quality of knowledge. On the other hand, trading frequency and portfolio performance has also this relationship. After including the investor’s income and financial wealth (IFW as moderating effect, the PLS model was enhanced, and we found that the interaction term is negative and statistically significant, so, higher IFW levels entail a weaker relationship between trading frequency and portfolio performance and vice-versa. Finally, with regard to the goodness of overall model fit measures, they showed that the model is fit for SRMR and dG measures, so it is likely that the model is true.
Linear and nonlinear methods in modeling the aqueous solubility of organic compounds.

Science.gov (United States)

Catana, Cornel; Gao, Hua; Orrenius, Christian; Stouten, Pieter F W

2005-01-01

Solubility data for 930 diverse compounds have been analyzed using linear Partial Least Square (PLS) and nonlinear PLS methods, Continuum Regression (CR), and Neural Networks (NN). 1D and 2D descriptors from MOE package in combination with E-state or ISIS keys have been used. The best model was obtained using linear PLS for a combination between 22 MOE descriptors and 65 ISIS keys. It has a correlation coefficient (r2) of 0.935 and a root-mean-square error (RMSE) of 0.468 log molar solubility (log S(w)). The model validated on a test set of 177 compounds not included in the training set has r2 0.911 and RMSE 0.475 log S(w). The descriptors were ranked according to their importance, and at the top of the list have been found the 22 MOE descriptors. The CR model produced results as good as PLS, and because of the way in which cross-validation has been done it is expected to be a valuable tool in prediction besides PLS model. The statistics obtained using nonlinear methods did not surpass those got with linear ones. The good statistic obtained for linear PLS and CR recommends these models to be used in prediction when it is difficult or impossible to make experimental measurements, for virtual screening, combinatorial library design, and efficient leads optimization.
[Determination of Cu in Shell of Preserved Egg by LIBS Coupled with PLS].

Science.gov (United States)

Hu, Hui-qin; Xu, Xue-hong; Liu, Mu-hua; Tu, Jian-ping; Huang, Le; Huang, Lin; Yao, Ming-yin; Chen, Tian-bing; Yang, Ping

2015-12-01

In this work, the content of copper in the shell of preserved eggs were determined directly by Laser induced breakdown spectroscopy (LIBS), and the characteristics lines of Cu was obtained. The samples of eggshell were pretreated by acid wet digestion, and the real content of Cu was obtained by atomic absorption spectrophotometer (AAS). Due to the test precision and accuracy of LIBS was influenced by a serious of factors, for example, the complex matrix effect of sample, the enviro nment noise, the system noise of the instrument, the stability of laser energy and so on. And the conventional unvariate linear calibration curve between LIBS intensity and content of element of sample, such as by use of Schiebe G-Lomakin equation, can not meet the requirement of quantitative analysis. In account of that, a kind of multivariate calibration method is needed. In this work, the data of LIBS spectra were processed by partial least squares (PLS), the precision and accuracy of PLS model were compared by different smoothing treatment and five pretreatment methods. The result showed that the correlation coefficient and the accuracy of the PLS model were improved, and the root mean square error and the average relative error were reduced effectively by 11 point smoothing with Multiplicative scatter correction (MSC) pretreatment. The results of the study show that, heavy metal Cu in preserved egg shells can be direct detected accurately by laser induced breakdown spectroscopy, and the next step batch tests will been conducted to find out the relationship of heavy metal Cu content in the preserved egg between the eggshell, egg white and egg yolk. And the goal of the contents of heavy metals in the egg white, egg yolk can be knew through determinate the eggshell by the LIBS can be achieved, to provide new method for rapid non-destructive testing technology for quality and satety of agricultural products.
Regression Techniques for Determining the Effective Impervious Area in Southern California Watersheds

Science.gov (United States)

Sultana, R.; Mroczek, M.; Dallman, S.; Sengupta, A.; Stein, E. D.

2016-12-01

The portion of the Total Impervious Area (TIA) that is hydraulically connected to the storm drainage network is called the Effective Impervious Area (EIA). The remaining fraction of impervious area, called the non-effective impervious area, drains onto pervious surfaces which do not contribute to runoff for smaller events. Using the TIA instead of EIA in models and calculations can lead to overestimates of runoff volumes peak discharges and oversizing of drainage system since it is assumed all impervious areas produce urban runoff that is directly connected to storm drains. This makes EIA a better predictor of actual runoff from urban catchments for hydraulic design of storm drain systems and modeling non-point source pollution. Compared to TIA, determining the EIA is considerably more difficult to calculate since it cannot be found by using remote sensing techniques, readily available EIA datasets, or aerial imagery interpretation alone. For this study, EIA percentages were calculated by two successive regression methods for five watersheds (with areas of 8.38 - 158mi2) located in Southern California using rainfall-runoff event data for the years 2004 - 2007. Runoff generated from the smaller storm events are considered to be emanating only from the effective impervious areas. Therefore, larger events that were considered to have runoff from both impervious and pervious surfaces were successively removed in the regression methods using a criterion of (1) 1mm and (2) a max (2 , 1mm) above the regression line. MSE is calculated from actual runoff and runoff predicted by the regression. Analysis of standard deviations showed that criterion of max (2 , 1mm) better fit the regression line and is the preferred method in predicting the EIA percentage. The estimated EIAs have shown to be approximately 78% to 43% of the TIA which shows use of EIA instead of TIA can have significant impact on the cost building urban hydraulic systems and stormwater capture devices.
A simple fabrication of plasmonic surface-enhanced Raman scattering (SERS) substrate for pesticide analysis via the immobilization of gold nanoparticles on UF membrane

Science.gov (United States)

Hong, Jangho; Kawashima, Ayato; Hamada, Noriaki

2017-06-01

In this study, we developed a facile fabrication method to access a highly reproducible plasmonic surface enhanced Raman scattering substrate via the immobilization of gold nanoparticles on an Ultrafiltration (UF) membrane using a suction technique. This was combined with a simple and rapid analyte concentration and detection method utilizing portable Raman spectroscopy. The minimum detectable concentrations for aqueous thiabendazole standard solution and thiabendazole in orange extract are 0.01 μg/mL and 0.125 μg/g, respectively. The partial least squares (PLS) regression plot shows a good linear relationship between 0.001 and 100 μg/mL of analyte, with a root mean square error of prediction (RMSEP) of 0.294 and a correlation coefficient (R2) of 0.976 for the thiabendazole standard solution. Meanwhile, the PLS plot also shows a good linear relationship between 0.0 and 2.5 μg/g of analyte, with an RMSEP value of 0.298 and an R2 value of 0.993 for the orange peel extract. In addition to the detection of other types of pesticides in agricultural products, this highly uniform plasmonic substrate has great potential for application in various environmentally-related areas.
PLS Torino: A way to discover semiconductors in a school lab

International Nuclear Information System (INIS)

Marzolla, F.

2015-01-01

In the wide range of PLS activities, one on semiconductors was realized with high-school 4th- and 5th-year students. After an introduction on semiconductor and electromagnetic radiation concepts, students assembled circuits, observed photoresistor and LED behavior and compared experimental and theoretical results. We especially paid attention to energy conversions and devices applications. An important point of the project is that it can be easily realized in our schools because low-cost devices are used. Moreover, discussing experimental results, it is possible to correct or complete students phenomena interpretation.
Characterisation of PDO olive oil Chianti Classico by non-selective (UV-visible, NIR and MIR spectroscopy) and selective (fatty acid composition) analytical techniques.

Science.gov (United States)

Casale, M; Oliveri, P; Casolino, C; Sinelli, N; Zunin, P; Armanino, C; Forina, M; Lanteri, S

2012-01-27

An authentication study of the Italian PDO (protected designation of origin) extra virgin olive oil Chianti Classico was performed; UV-visible (UV-vis), Near-Infrared (NIR) and Mid-Infrared (MIR) spectroscopies were applied to a set of samples representative of the whole Chianti Classico production area. The non-selective signals (fingerprints) provided by the three spectroscopic techniques were utilised both individually and jointly, after fusion of the respective profile vectors, in order to build a model for the Chianti Classico PDO olive oil. Moreover, these results were compared with those obtained by the gas chromatographic determination of the fatty acids composition. In order to characterise the olive oils produced in the Chianti Classico PDO area, UNEQ (unequal class models) and SIMCA (soft independent modelling of class analogy) were employed both on the MIR, NIR and UV-vis spectra, individually and jointly, and on the fatty acid composition. Finally, PLS (partial least square) regression was applied on the UV-vis, NIR and MIR spectra, in order to predict the content of oleic and linoleic acids in the extra virgin olive oils. UNEQ, SIMCA and PLS were performed after selection of the relevant predictors, in order to increase the efficiency of both classification and regression models. The non-selective information obtained from UV-vis, NIR and MIR spectroscopy allowed to build reliable models for checking the authenticity of the Italian PDO extra virgin olive oil Chianti Classico. Copyright © 2011 Elsevier B.V. All rights reserved.
A study for lattice comparison for PLS 2 GeV storage ring

International Nuclear Information System (INIS)

Yoon, M.

1991-01-01

TBA and DBA lattices are compared for 1.5-2.5 GeV synchrotron light source, with particular attention to the PLS 2 GeV electron storage ring currently being developed in Pohang, Korea. For the comparison study, the optimum electron energy was chosen to be 2 GeV and the circumference of the ring is less than 280.56 m, the natural beam emittance no greater than 13 nm. Results from various linear and nonlinear optics comparison studies are presented
Normalization Ridge Regression in Practice I: Comparisons Between Ordinary Least Squares, Ridge Regression and Normalization Ridge Regression.

Science.gov (United States)

Bulcock, J. W.

The problem of model estimation when the data are collinear was examined. Though the ridge regression (RR) outperforms ordinary least squares (OLS) regression in the presence of acute multicollinearity, it is not a problem free technique for reducing the variance of the estimates. It is a stochastic procedure when it should be nonstochastic and it…
Scaling model for prediction of radionuclide activity in cooling water using a regression triplet technique

International Nuclear Information System (INIS)

Silvia Dulanska; Lubomir Matel; Milan Meloun

2010-01-01

The decommissioning of the nuclear power plant (NPP) A1 Jaslovske Bohunice (Slovakia) is a complicated set of problems that is highly demanding both technically and financially. The basic goal of the decommissioning process is the total elimination of radioactive materials from the nuclear power plant area, and radwaste treatment to a form suitable for its safe disposal. The initial conditions of decommissioning also include elimination of the operational events, preparation and transport of the fuel from the plant territory, radiochemical and physical-chemical characterization of the radioactive wastes. One of the problems was and still is the processing of the liquid radioactive wastes. Such media is also the cooling water of the long-term storage of spent fuel. A suitable scaling model for predicting the activity of hard-to-detect radionuclides 239,240 Pu, 90 Sr and summary beta in cooling water using a regression triplet technique has been built using the regression triplet analysis and regression diagnostics. (author)
The Occupancy Rate Modeling of Kendari Hotel Room using Mexican Hat Transformation and Partial Least Squares

Directory of Open Access Journals (Sweden)

Margaretha Ohyver

2016-12-01

Full Text Available Partial Least Squares (PLS method was developed in 1960 by Herman Wold. The method particularly suits with construct a regression model when the number of independent variables is many and highly collinear. The PLS can be combined with other methods, one of which is a Continuous Wavelet Transformation (CWT. By considering that the presence of outliers can lead to a less reliable model, and this kind of transformation may be required at a stage of pre-processing, the data is free of noise or outliers. Based on the previous study, Kendari hotel room occupancy rate was affected by the outlier, and it had a low value of R2. Therefore, this research aimed to obtain a good model by combining the PLS method and CWT transformation using the Mexican Hats them other wavelet of CWT. The research concludes that merging the PLS and the Mexican Hat transformation has resulted in a better model compared to the model that combined the PLS and the Haar wavelet transformation as shown in the previous study. The research shows that by changing the mother of the wavelet, the value of R2 can be improved significantly. The result provides information on how to increase the value of R2. The other advantage is the information for hotel managements to notice the age of the hotel, the maximum rates, the facilities, and the number of rooms to increase the number of visitors.
Quantitative analysis of bayberry juice acidity based on visible and near-infrared spectroscopy

International Nuclear Information System (INIS)

Shao Yongni; He Yong; Mao Jingyuan

2007-01-01

Visible and near-infrared (Vis/NIR) reflectance spectroscopy has been investigated for its ability to nondestructively detect acidity in bayberry juice. What we believe to be a new, better mathematic model is put forward, which we have named principal component analysis-stepwise regression analysis-backpropagation neural network (PCA-SRA-BPNN), to build a correlation between the spectral reflectivity data and the acidity of bayberry juice. In this model, the optimum network parameters,such as the number of input nodes, hidden nodes, learning rate, and momentum, are chosen by the value of root-mean-square (rms) error. The results show that its prediction statistical parameters are correlation coefficient (r) of 0.9451 and root-mean-square error of prediction(RMSEP) of 0.1168. Partial least-squares (PLS) regression is also established to compare with this model. Before doing this, the influences of various spectral pretreatments (standard normal variate, multiplicative scatter correction, S. Golay first derivative, and wavelet package transform) are compared. The PLS approach with wavelet package transform preprocessing spectra is found to provide the best results, and its prediction statistical parameters are correlation coefficient (r) of 0.9061 and RMSEP of 0.1564. Hence, these two models are both desirable to analyze the data from Vis/NIR spectroscopy and to solve the problem of the acidity prediction of bayberry juice. This supplies basal research to ultimately realize the online measurements of the juice's internal quality through this Vis/NIR spectroscopy technique
Classification of structurally related commercial contrast media by near infrared spectroscopy.

Science.gov (United States)

Yip, Wai Lam; Soosainather, Tom Collin; Dyrstad, Knut; Sande, Sverre Arne

2014-03-01

Near infrared spectroscopy (NIRS) is a non-destructive measurement technique with broad application in pharmaceutical industry. Correct identification of pharmaceutical ingredients is an important task for quality control. Failure in this step can result in several adverse consequences, varied from economic loss to negative impact on patient safety. We have compared different methods in classification of a set of commercially available structurally related contrast media, Iodixanol (Visipaque(®)), Iohexol (Omnipaque(®)), Caldiamide Sodium and Gadodiamide (Omniscan(®)), by using NIR spectroscopy. The performance of classification models developed by soft independent modelling of class analogy (SIMCA), partial least squares discriminant analysis (PLS-DA) and Main and Interactions of Individual Principal Components Regression (MIPCR) were compared. Different variable selection methods were applied to optimize the classification models. Models developed by backward variable elimination partial least squares regression (BVE-PLS) and MIPCR were found to be most effective for classification of the set of contrast media. Below 1.5% of samples from the independent test set were not recognized by the BVE-PLS and MIPCR models, compared to up to 15% when models developed by other techniques were applied. Copyright © 2013 Elsevier B.V. All rights reserved.

Repeated Results Analysis for Middleware Regression Benchmarking

Czech Academy of Sciences Publication Activity Database

Bulej, Lubomír; Kalibera, T.; Tůma, P.

2005-01-01

Roč. 60, - (2005), s. 345-358 ISSN 0166-5316 R&D Projects: GA ČR GA102/03/0672 Institutional research plan: CEZ:AV0Z10300504 Keywords : middleware benchmarking * regression benchmarking * regression testing Subject RIV: JD - Computer Applications, Robotics Impact factor: 0.756, year: 2005
Simultaneous spectrophotometric determination of crystal violet and malachite green in water samples using partial least squares regression and central composite design after preconcentration by dispersive solid-phase extraction.

Science.gov (United States)

Razi-Asrami, Mahboobeh; Ghasemi, Jahan B; Amiri, Nayereh; Sadeghi, Seyed Jamal

2017-04-01

In this paper, a simple, fast, and inexpensive method is introduced for the simultaneous spectrophotometric determination of crystal violet (CV) and malachite green (MG) contents in aquatic samples using partial least squares regression (PLS) as a multivariate calibration technique after preconcentration by graphene oxide (GO). The method was based on the sorption and desorption of analytes onto GO and direct determination by ultraviolet-visible spectrophotometric techniques. GO was synthesized according to Hummers method. To characterize the shape and structure of GO, FT-IR, SEM, and XRD were used. The effective factors on the extraction efficiency such as pH, extraction time, and the amount of adsorbent were optimized using central composite design. The optimum values of these factors were 6, 15 min, and 12 mg, respectively. The maximum capacity of GO for the adsorption of CV and MG was 63.17 and 77.02 mg g -1 , respectively. Preconcentration factors and extraction recoveries were obtained and were 19.6, 98% for CV and 20, 100% for MG, respectively. LOD and linear dynamic ranges for CV and MG were 0.009, 0.03-0.3, 0.015, and 0.05-0.5 (μg mL -1 ), respectively. The intra-day and inter-day relative standard deviations were 1.99 and 0.58 for CV and 1.69 and 3.13 for MG at the concentration level of 50 ng mL -1 , respectively. Finally, the proposed DSPE/PLS method was successfully applied for the simultaneous determination of the trace amount of CV and MG in the real water samples.
Realtime control system for microprobe beamline at PLS

Energy Technology Data Exchange (ETDEWEB)

Yoon, J.C.; Lee, J.W.; Kim, K.H.; Ko, I.S. [Pohang Accelerator Laboratory, POSTECH, Pohang (Korea)

1998-11-01

The microprobe beamline of the Pohang Light Source (PLS) consists of main and second slits, a microprobe system, two ion chambers, a video-microscope, and a Si(Li) detector. These machine components must be controlled remodely through the computer system to make user's experiments precise and speedy. A real-time computer control system was developed to control and monitor these components. A VMEbus computer with an OS-9 real-time operating system was used for the low-level data acquisition and control. VME I/O modules were used for the step motor control and the scalar control. The software has a modular structure for the maximum performance and the easy maintenance. We developed the database, the I/O driver, and the control software. We used PC/Windows 95 for the data logging and the operator interface. Visual C{sup ++} was used for the graphical user interface programming. RS232C was used for the communication between the VME and the PC. (author)
Prediction of SOC content by Vis-NIR spectroscopy at European scale using a modified local PLS algorithm

Science.gov (United States)

Nocita, M.; Stevens, A.; Toth, G.; van Wesemael, B.; Montanarella, L.

2012-12-01

In the context of global environmental change, the estimation of carbon fluxes between soils and the atmosphere has been the object of a growing number of studies. This has been motivated notably by the possibility to sequester CO2 into soils by increasing the soil organic carbon (SOC) stocks and by the role of SOC in maintaining soil quality. Spatial variability of SOC masks its slow accumulation or depletion, and the sampling density required to detect a change in SOC content is often very high and thus very expensive and labour intensive. Visible near infrared diffuse reflectance spectroscopy (Vis-NIR DRS) has been shown to be a fast, cheap and efficient tool for the prediction of SOC at fine scales. However, when applied to regional or country scales, Vis-NIR DRS did not provide sufficient accuracy as an alternative to standard laboratory soil analysis for SOC monitoring. Under the framework of Land Use/Cover Area Frame Statistical Survey (LUCAS) project of the European Commission's Joint Research Centre (JRC), about 20,000 samples were collected all over European Union. Soil samples were analyzed for several physical and chemical parameters, and scanned with a Vis-NIR spectrometer in the same laboratory. The scope of our research was to predict SOC content at European scale using LUCAS spectral library. We implemented a modified local partial least square regression (l-PLS) including, in addition to spectral distance, other potentially useful covariates (geography, texture, etc.) to select for each unknown sample a group of predicting neighbours. The dataset was split in mineral soils under cropland, mineral soils under grassland, mineral soils under woodland, and organic soils due to the extremely diverse spectral response of the four classes. Four every class training (70%) and test (30%) sets were created to calibrate and validate the SOC prediction models. The results showed very good prediction ability for mineral soils under cropland and mineral soils
Stock price forecasting for companies listed on Tehran stock exchange using multivariate adaptive regression splines model and semi-parametric splines technique

Science.gov (United States)

Rounaghi, Mohammad Mahdi; Abbaszadeh, Mohammad Reza; Arashi, Mohammad

2015-11-01

One of the most important topics of interest to investors is stock price changes. Investors whose goals are long term are sensitive to stock price and its changes and react to them. In this regard, we used multivariate adaptive regression splines (MARS) model and semi-parametric splines technique for predicting stock price in this study. The MARS model as a nonparametric method is an adaptive method for regression and it fits for problems with high dimensions and several variables. semi-parametric splines technique was used in this study. Smoothing splines is a nonparametric regression method. In this study, we used 40 variables (30 accounting variables and 10 economic variables) for predicting stock price using the MARS model and using semi-parametric splines technique. After investigating the models, we select 4 accounting variables (book value per share, predicted earnings per share, P/E ratio and risk) as influencing variables on predicting stock price using the MARS model. After fitting the semi-parametric splines technique, only 4 accounting variables (dividends, net EPS, EPS Forecast and P/E Ratio) were selected as variables effective in forecasting stock prices.
Logistic regression applied to natural hazards: rare event logistic regression with replications

Science.gov (United States)

Guns, M.; Vanacker, V.

2012-06-01

Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.
Partial Least Squares tutorial for analyzing neuroimaging data

Directory of Open Access Journals (Sweden)

Patricia Van Roon

2014-09-01

Full Text Available Partial least squares (PLS has become a respected and meaningful soft modeling analysis technique that can be applied to very large datasets where the number of factors or variables is greater than the number of observations. Current biometric studies (e.g., eye movements, EKG, body movements, EEG are often of this nature. PLS eliminates the multiple linear regression issues of over-fitting data by finding a few underlying or latent variables (factors that account for most of the variation in the data. In real-world applications, where linear models do not always apply, PLS can model the non-linear relationship well. This tutorial introduces two PLS methods, PLS Correlation (PLSC and PLS Regression (PLSR and their applications in data analysis which are illustrated with neuroimaging examples. Both methods provide straightforward and comprehensible techniques for determining and modeling relationships between two multivariate data blocks by finding latent variables that best describes the relationships. In the examples, the PLSC will analyze the relationship between neuroimaging data such as Event-Related Potential (ERP amplitude averages from different locations on the scalp with their corresponding behavioural data. Using the same data, the PLSR will be used to model the relationship between neuroimaging and behavioural data. This model will be able to predict future behaviour solely from available neuroimaging data. To find latent variables, Singular Value Decomposition (SVD for PLSC and Non-linear Iterative PArtial Least Squares (NIPALS for PLSR are implemented in this tutorial. SVD decomposes the large data block into three manageable matrices containing a diagonal set of singular values, as well as left and right singular vectors. For PLSR, NIPALS algorithms are used because it provides amore precise estimation of the latent variables. Mathematica notebooks are provided for each PLS method with clearly labeled sections and subsections. The
Comparison of Classical Linear Regression and Orthogonal Regression According to the Sum of Squares Perpendicular Distances

OpenAIRE

KELEŞ, Taliha; ALTUN, Murat

2016-01-01

Regression analysis is a statistical technique for investigating and modeling the relationship between variables. The purpose of this study was the trivial presentation of the equation for orthogonal regression (OR) and the comparison of classical linear regression (CLR) and OR techniques with respect to the sum of squared perpendicular distances. For that purpose, the analyses were shown by an example. It was found that the sum of squared perpendicular distances of OR is smaller. Thus, it wa...
Post-processing through linear regression

Science.gov (United States)

van Schaeybroeck, B.; Vannitsem, S.

2011-03-01

Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS) method, a new time-dependent Tikhonov regularization (TDTR) method, the total least-square method, a new geometric-mean regression (GM), a recently introduced error-in-variables (EVMOS) method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified. These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise). At long lead times the regression schemes (EVMOS, TDTR) which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.
Abstract Expression Grammar Symbolic Regression

Science.gov (United States)

Korns, Michael F.

This chapter examines the use of Abstract Expression Grammars to perform the entire Symbolic Regression process without the use of Genetic Programming per se. The techniques explored produce a symbolic regression engine which has absolutely no bloat, which allows total user control of the search space and output formulas, which is faster, and more accurate than the engines produced in our previous papers using Genetic Programming. The genome is an all vector structure with four chromosomes plus additional epigenetic and constraint vectors, allowing total user control of the search space and the final output formulas. A combination of specialized compiler techniques, genetic algorithms, particle swarm, aged layered populations, plus discrete and continuous differential evolution are used to produce an improved symbolic regression sytem. Nine base test cases, from the literature, are used to test the improvement in speed and accuracy. The improved results indicate that these techniques move us a big step closer toward future industrial strength symbolic regression systems.
Thermoluminescence dating of chinese porcelain using a regression method of saturating exponential in pre-dose technique

International Nuclear Information System (INIS)

Wang Weida; Xia Junding; Zhou Zhixin; Leung, P.L.

2001-01-01

Thermoluminescence (TL) dating using a regression method of saturating exponential in pre-dose technique was described. 23 porcelain samples from past dynasties of China were dated by this method. The results show that the TL ages are in reasonable agreement with archaeological dates within a standard deviation of 27%. Such error can be accepted in porcelain dating
Study on feasibility of determination of glucosamine content of fermentation process using a micro NIR spectrometer.

Science.gov (United States)

Sun, Zhongyu; Li, Can; Li, Lian; Nie, Lei; Dong, Qin; Li, Danyang; Gao, Lingling; Zang, Hengchang

2018-08-05

N-acetyl-d-glucosamine (GlcNAc) is a microbial fermentation product, and NIR spectroscopy is an effective process analytical technology (PAT) tool in detecting the key quality attribute: the GlcNAc content. Meanwhile, the design of NIR spectrometers is under the trend of miniaturization, portability and low-cost nowadays. The aim of this study was to explore a portable micro NIR spectrometer with the fermentation process. First, FT-NIR spectrometer and Micro-NIR 1700 spectrometer were compared with simulated fermentation process solutions. The R c 2 , R p 2 , RMSECV and RMSEP of the optimal FT-NIR and Micro-NIR 1700 models were 0.999, 0.999, 3.226 g/L, 1.388 g/L and 0.999, 0.999, 1.821 g/L, 0.967 g/L. Passing-Bablok regression method and paired t-test results showed there were no significant differences between the two instruments. Then the Micro-NIR 1700 was selected for the practical fermentation process, 135 samples from 10 batches were collected. Spectral pretreatment methods and variables selection methods (BiPLS, FiPLS, MWPLS and CARS-PLS) for PLS modeling were discussed. The R c 2 , R p 2 , RMSECV and RMSEP of the optimal GlcNAc content PLS model of the practical fermentation process were 0.994, 0.995, 2.792 g/L and 1.946 g/L. The results have a positive reference for application of the Micro-NIR spectrometer. To some extent, it could provide theoretical supports in guiding the microbial fermentation or the further assessment of bioprocess. Copyright © 2018. Published by Elsevier B.V.
Comparative investigation of two different self-organizing map ...

African Journals Online (AJOL)

Purpose: To demonstrate the ability and investigate the performance of two different wavelength selection approaches based on self-organizing map (SOM) technique in partial least-squares (PLS) regression for analysis of pharmaceutical binary mixtures with strongly overlapping spectra. Methods: Two different variable ...
Efectivity of Additive Spline for Partial Least Square Method in Regression Model Estimation

Directory of Open Access Journals (Sweden)

Ahmad Bilfarsah

2005-04-01

Full Text Available Additive Spline of Partial Least Square method (ASPL as one generalization of Partial Least Square (PLS method. ASPLS method can be acommodation to non linear and multicollinearity case of predictor variables. As a principle, The ASPLS method approach is cahracterized by two idea. The first is to used parametric transformations of predictors by spline function; the second is to make ASPLS components mutually uncorrelated, to preserve properties of the linear PLS components. The performance of ASPLS compared with other PLS method is illustrated with the fisher economic application especially the tuna fish production.
A consensus successive projections algorithm--multiple linear regression method for analyzing near infrared spectra.

Science.gov (United States)

Liu, Ke; Chen, Xiaojing; Li, Limin; Chen, Huiling; Ruan, Xiukai; Liu, Wenbin

2015-02-09

The successive projections algorithm (SPA) is widely used to select variables for multiple linear regression (MLR) modeling. However, SPA used only once may not obtain all the useful information of the full spectra, because the number of selected variables cannot exceed the number of calibration samples in the SPA algorithm. Therefore, the SPA-MLR method risks the loss of useful information. To make a full use of the useful information in the spectra, a new method named "consensus SPA-MLR" (C-SPA-MLR) is proposed herein. This method is the combination of consensus strategy and SPA-MLR method. In the C-SPA-MLR method, SPA-MLR is used to construct member models with different subsets of variables, which are selected from the remaining variables iteratively. A consensus prediction is obtained by combining the predictions of the member models. The proposed method is evaluated by analyzing the near infrared (NIR) spectra of corn and diesel. The results of C-SPA-MLR method showed a better prediction performance compared with the SPA-MLR and full-spectra PLS methods. Moreover, these results could serve as a reference for combination the consensus strategy and other variable selection methods when analyzing NIR spectra and other spectroscopic techniques. Copyright © 2014 Elsevier B.V. All rights reserved.
Logistic regression applied to natural hazards: rare event logistic regression with replications

Directory of Open Access Journals (Sweden)

M. Guns

2012-06-01

Full Text Available Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.
Genomic-Enabled Prediction Based on Molecular Markers and Pedigree Using the Bayesian Linear Regression Package in R

Directory of Open Access Journals (Sweden)

Paulino Pérez

2010-09-01

Full Text Available The availability of dense molecular markers has made possible the use of genomic selection in plant and animal breeding. However, models for genomic selection pose several computational and statistical challenges and require specialized computer programs, not always available to the end user and not implemented in standard statistical software yet. The R-package BLR (Bayesian Linear Regression implements several statistical procedures (e.g., Bayesian Ridge Regression, Bayesian LASSO in a unified framework that allows including marker genotypes and pedigree data jointly. This article describes the classes of models implemented in the BLR package and illustrates their use through examples. Some challenges faced when applying genomic-enabled selection, such as model choice, evaluation of predictive ability through cross-validation, and choice of hyper-parameters, are also addressed.
Potable NIR spectroscopy predicting soluble solids content of pears based on LEDs

Energy Technology Data Exchange (ETDEWEB)

Liu Yande; Liu Wei; Sun Xudong; Gao Rongjie; Pan Yuanyuan; Ouyang Aiguo, E-mail: jxliuyd@163.com [School of Mechatronics Engineering, East China Jiaotong University, Changbei Open and Developing District, Nanchang, 330013 (China)

2011-01-01

A portable near-infrared (NIR) instrument was developed for predicting soluble solids content (SSC) of pears equipped with light emitting diodes (LEDs). NIR spectra were collected on the calibration and prediction sets (145:45). Relationships between spectra and SSC were developed by multivariate linear regression (MLR), partial least squares (PLS) and artificial neural networks (ANNs) in the calibration set. The 45 unknown pears were applied to evaluate the performance of them in terms of root mean square errors of prediction (RMSEP) and correlation coefficients (r). The best result was obtained by PLS with RMSEP of 0.62{sup 0}Brix and r of 0.82. The results showed that the SSC of pears could be predicted by the portable NIR instrument.
Potable NIR spectroscopy predicting soluble solids content of pears based on LEDs

International Nuclear Information System (INIS)

Liu Yande; Liu Wei; Sun Xudong; Gao Rongjie; Pan Yuanyuan; Ouyang Aiguo

2011-01-01

A portable near-infrared (NIR) instrument was developed for predicting soluble solids content (SSC) of pears equipped with light emitting diodes (LEDs). NIR spectra were collected on the calibration and prediction sets (145:45). Relationships between spectra and SSC were developed by multivariate linear regression (MLR), partial least squares (PLS) and artificial neural networks (ANNs) in the calibration set. The 45 unknown pears were applied to evaluate the performance of them in terms of root mean square errors of prediction (RMSEP) and correlation coefficients (r). The best result was obtained by PLS with RMSEP of 0.62 0 Brix and r of 0.82. The results showed that the SSC of pears could be predicted by the portable NIR instrument.
Railway Crossing Risk Area Detection Using Linear Regression and Terrain Drop Compensation Techniques

Science.gov (United States)

Chen, Wen-Yuan; Wang, Mei; Fu, Zhou-Xing

2014-01-01

Most railway accidents happen at railway crossings. Therefore, how to detect humans or objects present in the risk area of a railway crossing and thus prevent accidents are important tasks. In this paper, three strategies are used to detect the risk area of a railway crossing: (1) we use a terrain drop compensation (TDC) technique to solve the problem of the concavity of railway crossings; (2) we use a linear regression technique to predict the position and length of an object from image processing; (3) we have developed a novel strategy called calculating local maximum Y-coordinate object points (CLMYOP) to obtain the ground points of the object. In addition, image preprocessing is also applied to filter out the noise and successfully improve the object detection. From the experimental results, it is demonstrated that our scheme is an effective and corrective method for the detection of railway crossing risk areas. PMID:24936948

Railway Crossing Risk Area Detection Using Linear Regression and Terrain Drop Compensation Techniques

Directory of Open Access Journals (Sweden)

Wen-Yuan Chen

2014-06-01

Full Text Available Most railway accidents happen at railway crossings. Therefore, how to detect humans or objects present in the risk area of a railway crossing and thus prevent accidents are important tasks. In this paper, three strategies are used to detect the risk area of a railway crossing: (1 we use a terrain drop compensation (TDC technique to solve the problem of the concavity of railway crossings; (2 we use a linear regression technique to predict the position and length of an object from image processing; (3 we have developed a novel strategy called calculating local maximum Y-coordinate object points (CLMYOP to obtain the ground points of the object. In addition, image preprocessing is also applied to filter out the noise and successfully improve the object detection. From the experimental results, it is demonstrated that our scheme is an effective and corrective method for the detection of railway crossing risk areas.
Applied linear regression

CERN Document Server

Weisberg, Sanford

2013-01-01

Praise for the Third Edition ""...this is an excellent book which could easily be used as a course text...""-International Statistical Institute The Fourth Edition of Applied Linear Regression provides a thorough update of the basic theory and methodology of linear regression modeling. Demonstrating the practical applications of linear regression analysis techniques, the Fourth Edition uses interesting, real-world exercises and examples. Stressing central concepts such as model building, understanding parameters, assessing fit and reliability, and drawing conclusions, the new edition illus
Prediction of octanol-water partition coefficients of organic compounds by multiple linear regression, partial least squares, and artificial neural network.

Science.gov (United States)

Golmohammadi, Hassan

2009-11-30

A quantitative structure-property relationship (QSPR) study was performed to develop models those relate the structure of 141 organic compounds to their octanol-water partition coefficients (log P(o/w)). A genetic algorithm was applied as a variable selection tool. Modeling of log P(o/w) of these compounds as a function of theoretically derived descriptors was established by multiple linear regression (MLR), partial least squares (PLS), and artificial neural network (ANN). The best selected descriptors that appear in the models are: atomic charge weighted partial positively charged surface area (PPSA-3), fractional atomic charge weighted partial positive surface area (FPSA-3), minimum atomic partial charge (Qmin), molecular volume (MV), total dipole moment of molecule (mu), maximum antibonding contribution of a molecule orbital in the molecule (MAC), and maximum free valency of a C atom in the molecule (MFV). The result obtained showed the ability of developed artificial neural network to prediction of partition coefficients of organic compounds. Also, the results revealed the superiority of ANN over the MLR and PLS models. Copyright 2009 Wiley Periodicals, Inc.
Collision cross section prediction of deprotonated phenolics in a travelling-wave ion mobility spectrometer using molecular descriptors and chemometrics

International Nuclear Information System (INIS)

Gonzales, Gerard Bryan; Smagghe, Guy; Coelus, Sofie; Adriaenssens, Dieter; De Winter, Karel; Desmet, Tom; Raes, Katleen; Van Camp, John

2016-01-01

The combination of ion mobility and mass spectrometry (MS) affords significant improvements over conventional MS/MS, especially in the characterization of isomeric metabolites due to the differences in their collision cross sections (CCS). Experimentally obtained CCS values are typically matched with theoretical CCS values from Trajectory Method (TM) and/or Projection Approximation (PA) calculations. In this paper, predictive models for CCS of deprotonated phenolics were developed using molecular descriptors and chemometric tools, stepwise multiple linear regression (SMLR), principal components regression (PCR), and partial least squares regression (PLS). A total of 102 molecular descriptors were generated and reduced to 28 after employing a feature selection tool, composed of mass, topological descriptors, Jurs descriptors and shadow indices. Therefore, the generated models considered the effects of mass, 3D conformation and partial charge distribution on CCS, which are the main parameters for either TM or PA (only 3D conformation) calculations. All three techniques yielded highly predictive models for both the training (R"2_S_M_L_R = 0.9911; R"2_P_C_R = 0.9917; R"2_P_L_S = 0.9918) and validation datasets (R"2_S_M_L_R = 0.9489; R"2_P_C_R = 0.9761; R"2_P_L_S = 0.9760). Also, the high cross validated R"2 values indicate that the generated models are robust and highly predictive (Q"2_S_M_L_R = 0.9859; Q"2_P_C_R = 0.9748; Q"2_P_L_S = 0.9760). The predictions were also very comparable to the results from TM calculations using modified mobcal (N2). Most importantly, this method offered a rapid (<10 min) alternative to TM calculations without compromising predictive ability. These methods could therefore be used in routine analysis and could be easily integrated to metabolite identification platforms. - Highlights: • CCS for deprotonated phenolics were measured using TWIMS. • Isomeric phenolics were separated in the IMS based on their CCS. • SMLR
Potential of a newly developed high-speed near-infrared (NIR) camera (Compovision) in polymer industrial analyses: monitoring crystallinity and crystal evolution of polylactic acid (PLA) and concentration of PLA in PLA/Poly-(R)-3-hydroxybutyrate (PHB) blends.

Science.gov (United States)

Ishikawa, Daitaro; Nishii, Takashi; Mizuno, Fumiaki; Sato, Harumi; Kazarian, Sergei G; Ozaki, Yukihiro

2013-12-01

This study was carried out to evaluate a new high-speed hyperspectral near-infrared (NIR) camera named Compovision. Quantitative analyses of the crystallinity and crystal evolution of biodegradable polymer, polylactic acid (PLA), and its concentration in PLA/poly-(R)-3-hydroxybutyrate (PHB) blends were investigated using near-infrared (NIR) imaging. This NIR camera can measure two-dimensional NIR spectral data in the 1000-2350 nm region obtaining images with wide field of view of 150 × 250 mm(2) (approximately 100 000 pixels) at high speeds (in less than 5 s). PLA with differing crystallinities between 0 and 50% blended samples with PHB in ratios of 80/20, 60/40, 40/60, 20/80, and pure films of 100% PLA and PHB were prepared. Compovision was used to collect respective NIR spectra in the 1000-2350 nm region and investigate the crystallinity of PLA and its concentration in the blends. The partial least squares (PLS) regression models for the crystallinity of PLA were developed using absorbance, second derivative, and standard normal variate (SNV) spectra from the most informative region of the spectra, between 1600 and 2000 nm. The predicted results of PLS models achieved using the absorbance and second derivative spectra were fairly good with a root mean square error (RMSE) of less than 6.1% and a determination of coefficient (R(2)) of more than 0.88 for PLS factor 1. The results obtained using the SNV spectra yielded the best prediction with the smallest RMSE of 2.93% and the highest R(2) of 0.976. Moreover, PLS models developed for estimating the concentration of PLA in the blend polymers using SNV spectra gave good predicted results where the RMSE was 4.94% and R(2) was 0.98. The SNV-based models provided the best-predicted results, since it can reduce the effects of the spectral changes induced by the inhomogeneity and the thickness of the samples. Wide area crystal evolution of PLA on a plate where a temperature slope of 70-105 °C had occurred was also
Utilization of Chemometric Technique to Determine the Quality of Fresh and Used Palm, Corn and Coconut Oil

International Nuclear Information System (INIS)

Hamizah Mat Agil; Mohd Zuli Jaafar; Suzeren Jamil; Azwan Mat Lazim

2014-01-01

This study was conducted to evaluate the quality of natural oil and the deterioration of frying oil. A total of 12 different oil samples from palm oil, corn oil and coconut oil were used. The frying process was repeated four times at 180 degree Celsius in order to observe the stability of the oil towards oxidation. Three main parameters have been studied to determine oil qualities which were peroxide value, iodine value and acid value. This study emphasized on the usage of FTIR in the range of 4000-700 cm -1 . Alternatively, the chemometrics method based on pattern recognition has been used to determination the oil quality. Data analysis were conducted by using PCA and PLS method in the Matlab modeling. The PCA provided data classification according to types of oil while PLS predicted the oil quality of the parameters studied. For the classification of pure oil, the variance for PC1 was 70 % while PC2 was 15 %. For the fried/ used oil, PC1 gave 57 % while PC2 gave 25 %. By using PLS, the iodine the best model for pure oils value model variable based on correlation with R2CV > 0.984. Whereas, the peroxide value model for fried/ used oils, was the best obtained R 2 CV > 0.7423. (author)
Measurement of β/Λ ratio in IEA-R1 reactor using noise technique

International Nuclear Information System (INIS)

Moreira, J.M.L.; Kassar, E.

1986-01-01

The ratio β/Λ for the IEA-R1 reactor is obtained experimentally through the noise analysis technique. This technique is based on the determination of the power spectral density of the reactor neutron population, with the reactor in a subcritical state driven by a 'white' neutron source. A ratio β/Λ of 43,5 s -1 is estimated from the break frequency of the measured transfer function of the IEA-R1 reactor. (Author) [pt
Post-processing through linear regression

Directory of Open Access Journals (Sweden)

B. Van Schaeybroeck

2011-03-01

Full Text Available Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS method, a new time-dependent Tikhonov regularization (TDTR method, the total least-square method, a new geometric-mean regression (GM, a recently introduced error-in-variables (EVMOS method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified.

These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise. At long lead times the regression schemes (EVMOS, TDTR which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.
Time-of-flight secondary ion mass spectrometry of a range of coal samples: a chemometrics (PCA, cluster, and PLS) analysis

Energy Technology Data Exchange (ETDEWEB)

Lei Pei; Guilin Jiang; Bonnie J. Tyler; Larry L. Baxter; Matthew R. Linford [Brigham Young University, Provo, UT (United States). Department of Chemistry and Biochemistry

2008-03-15

This paper documents time-of-flight secondary ion mass spectrometry (ToF-SIMS) analyses of 34 different coal samples. In many cases, the inorganic Na{sup +}, Al{sup +}, Si{sup +}, and K{sup +} ions dominate the spectra, eclipsing the organic peaks. A scores plot of principal component 1 (PC1) versus principal component 2 (PC2) in a principal components analysis (PCA) effectively separates the coal spectra into a triangular pattern, where the different vertices of this pattern come from (I) spectra that have a strong inorganic signature that is dominated by Na{sup +}, (ii) spectra that have a strong inorganic signature that is dominated by Al{sup +}, Si{sup +}, and K{sup +}, and (iii) spectra that have a strong organic signature. Loadings plots of PC1 and PC2 confirm these observations. The spectra with the more prominent inorganic signatures come from samples with higher ash contents. Cluster analysis with the K-means algorithm was also applied to the data. The progressive clustering revealed in the dendrogram correlates extremely well with the clustering of the data points found in the scores plot of PC1 versus PC2 from the PCA. In addition, this clustering often correlates with properties of the coal samples, as measured by traditional analyses. Partial least-squares (PLS), which included the use of interval PLS and a genetic algorithm for variable selection, shows a good correlation between ToF-SIMS spectra and some of the properties measured by traditional means. Thus, ToF-SIMS appears to be a promising technique for the analysis of this important fuel. 33 refs., 9 figs., 5 tabs.
Rapid Quantitative Analysis of Forest Biomass Using Fourier Transform Infrared Spectroscopy and Partial Least Squares Regression

Directory of Open Access Journals (Sweden)

Gifty E. Acquah

2016-01-01

Full Text Available Fourier transform infrared reflectance (FTIR spectroscopy has been used to predict properties of forest logging residue, a very heterogeneous feedstock material. Properties studied included the chemical composition, thermal reactivity, and energy content. The ability to rapidly determine these properties is vital in the optimization of conversion technologies for the successful commercialization of biobased products. Partial least squares regression of first derivative treated FTIR spectra had good correlations with the conventionally measured properties. For the chemical composition, constructed models generally did a better job of predicting the extractives and lignin content than the carbohydrates. In predicting the thermochemical properties, models for volatile matter and fixed carbon performed very well (i.e., R2 > 0.80, RPD > 2.0. The effect of reducing the wavenumber range to the fingerprint region for PLS modeling and the relationship between the chemical composition and higher heating value of logging residue were also explored. This study is new and different in that it is the first to use FTIR spectroscopy to quantitatively analyze forest logging residue, an abundant resource that can be used as a feedstock in the emerging low carbon economy. Furthermore, it provides a complete and systematic characterization of this heterogeneous raw material.
Simultaneous determination of penicillin G salts by infrared spectroscopy: Evaluation of combining orthogonal signal correction with radial basis function-partial least squares regression

Science.gov (United States)

Talebpour, Zahra; Tavallaie, Roya; Ahmadi, Seyyed Hamid; Abdollahpour, Assem

2010-09-01

In this study, a new method for the simultaneous determination of penicillin G salts in pharmaceutical mixture via FT-IR spectroscopy combined with chemometrics was investigated. The mixture of penicillin G salts is a complex system due to similar analytical characteristics of components. Partial least squares (PLS) and radial basis function-partial least squares (RBF-PLS) were used to develop the linear and nonlinear relation between spectra and components, respectively. The orthogonal signal correction (OSC) preprocessing method was used to correct unexpected information, such as spectral overlapping and scattering effects. In order to compare the influence of OSC on PLS and RBF-PLS models, the optimal linear (PLS) and nonlinear (RBF-PLS) models based on conventional and OSC preprocessed spectra were established and compared. The obtained results demonstrated that OSC clearly enhanced the performance of both RBF-PLS and PLS calibration models. Also in the case of some nonlinear relation between spectra and component, OSC-RBF-PLS gave satisfactory results than OSC-PLS model which indicated that the OSC was helpful to remove extrinsic deviations from linearity without elimination of nonlinear information related to component. The chemometric models were tested on an external dataset and finally applied to the analysis commercialized injection product of penicillin G salts.
Design of High Field Multipole Wiggler at PLS

International Nuclear Information System (INIS)

Kim, D. E.; Park, K. H.; Lee, H. G.; Suh, H. S.; Han, H. S.; Jung, Y. G.; Chung, C. W.

2007-01-01

Pohang Accelerator Laboratory (PAL) is developing a high field multipole wiggler for new EXAFS beamline. The beamline is planning to utilize very high photon energy (∼40keV) synchrotron radiation at Pohang Light Source (PLS). To achieve higher critical photon energy, the wiggler field need to be maximized. A magnetic structure with wedged pole and blocks with additional side blocks which are similar to asymmetric wiggler of ESRF are designed to achieve higher flux density. The end structures were designed to be asymmetric along the beam direction to ensure systematic zero 1st field integral. The thickness of the last magnets were adjusted to minimize the transition sequence to the fully developed periodic field. This approach is more convenient to control than adjusting the strength of the end magnets. The final design features 140mm period, 2.5 Tesla peak flux density at 12mm pole gap, 1205mm magnetic structure length with 16 full field poles. In this article, all the design, engineering efforts for the HFMSII wiggler will be described
On weighted and locally polynomial directional quantile regression

Czech Academy of Sciences Publication Activity Database

Boček, Pavel; Šiman, Miroslav

2017-01-01

Roč. 32, č. 3 (2017), s. 929-946 ISSN 0943-4062 R&D Projects: GA ČR GA14-07234S Institutional support: RVO:67985556 Keywords : Quantile regression * Nonparametric regression * Nonparametric regression Subject RIV: IN - Informatics, Computer Science OBOR OECD: Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8) Impact factor: 0.434, year: 2016 http://library.utia.cas.cz/separaty/2017/SI/bocek-0458380.pdf
riskRegression

DEFF Research Database (Denmark)

Ozenne, Brice; Sørensen, Anne Lyngholm; Scheike, Thomas

2017-01-01

In the presence of competing risks a prediction of the time-dynamic absolute risk of an event can be based on cause-specific Cox regression models for the event and the competing risks (Benichou and Gail, 1990). We present computationally fast and memory optimized C++ functions with an R interface...... for predicting the covariate specific absolute risks, their confidence intervals, and their confidence bands based on right censored time to event data. We provide explicit formulas for our implementation of the estimator of the (stratified) baseline hazard function in the presence of tied event times. As a by...... functionals. The software presented here is implemented in the riskRegression package....
Combined computational-experimental approach to predict blood-brain barrier (BBB) permeation based on "green" salting-out thin layer chromatography supported by simple molecular descriptors.

Science.gov (United States)

Ciura, Krzesimir; Belka, Mariusz; Kawczak, Piotr; Bączek, Tomasz; Markuszewski, Michał J; Nowakowska, Joanna

2017-09-05

The objective of this paper is to build QSRR/QSAR model for predicting the blood-brain barrier (BBB) permeability. The obtained models are based on salting-out thin layer chromatography (SOTLC) constants and calculated molecular descriptors. Among chromatographic methods SOTLC was chosen, since the mobile phases are free of organic solvent. As consequences, there are less toxic, and have lower environmental impact compared to classical reserved phases liquid chromatography (RPLC). During the study three stationary phase silica gel, cellulose plates and neutral aluminum oxide were examined. The model set of solutes presents a wide range of log BB values, containing compounds which cross the BBB readily and molecules poorly distributed to the brain including drugs acting on the nervous system as well as peripheral acting drugs. Additionally, the comparison of three regression models: multiple linear regression (MLR), partial least-squares (PLS) and orthogonal partial least squares (OPLS) were performed. The designed QSRR/QSAR models could be useful to predict BBB of systematically synthesized newly compounds in the drug development pipeline and are attractive alternatives of time-consuming and demanding directed methods for log BB measurement. The study also shown that among several regression techniques, significant differences can be obtained in models performance, measured by R 2 and Q 2 , hence it is strongly suggested to evaluate all available options as MLR, PLS and OPLS. Copyright © 2017 Elsevier B.V. All rights reserved.
Prediction-oriented modeling in business research by means of PLS path modeling : Introduction to a JBR special section

NARCIS (Netherlands)

Cepeda Carrion, Gabriel; Henseler, Jörg; Ringle, Christian M.; Roldan, Jose Luis

2016-01-01

Under the main theme “prediction-oriented modeling in business research by means of partial least squares path modeling” (PLS), the special issue presents 17 papers. Most contributions include content from presentations at the 2nd International Symposium on Partial Least Squares Path Modeling: The
Boosted beta regression.

Directory of Open Access Journals (Sweden)

Matthias Schmid

Full Text Available Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1. Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures.
Data analysis of photon beam position at PLS-II

Energy Technology Data Exchange (ETDEWEB)

Ko, J.; Shin, S., E-mail: tlssh@postech.ac.kr; Huang, Jung-Yun; Kim, D.; Kim, C.; Kim, Ilyou; Lee, T.-Y.; Park, C.-D.; Kim, K. R. [Pohang Accelerator Laboratory, Pohang, Kyungbuk 790-834 (Korea, Republic of); Cho, Moohyun [Department of Physics, POSTECH, Pohang, Kyungbuk 790-834 (Korea, Republic of)

2016-07-27

In the third generation light source, photon beam position stability is critical issue on user experiment. Generally photon beam position monitors have been developed for the detection of the real photon beam position and the position is controlled by feedback system in order to keep the reference photon beam position. In the PLS-II, photon beam position stability for front end of particular beam line, in which photon beam position monitor is installed, has been obtained less than rms 1μm for user service period. Nevertheless, detail analysis for photon beam position data in order to demonstrate the performance of photon beam position monitor is necessary, since it can be suffers from various unknown noises. (for instance, a back ground contamination due to upstream or downstream dipole radiation, undulator gap dependence, etc.) In this paper, we will describe the start to end study for photon beam position stability and the Singular Value Decomposition (SVD) analysis to demonstrate the reliability on photon beam position data.
45 CFR 303.15 - Agreements to use the Federal Parent Locator Service (PLS) in parental kidnapping and child...

Science.gov (United States)

2010-10-01

... Service (PLS) in parental kidnapping and child custody or visitation cases. 303.15 Section 303.15 Public... parental kidnapping and child custody or visitation cases. (a) Definitions. The following definitions apply... responsibilities require access in connection with child custody and parental kidnapping cases; (ii) Store the...
A regression approach for Zircaloy-2 in-reactor creep constitutive equations

International Nuclear Information System (INIS)

Yung Liu, Y.; Bement, A.L.

1977-01-01

In this paper the methodology of multiple regressions as applied to Zircaloy-2 in-reactor creep data analysis and construction of constitutive equation are illustrated. While the resulting constitutive equation can be used in creep analysis of in-reactor Zircaloy structural components, the methodology itself is entirely general and can be applied to any creep data analysis. The promising aspects of multiple regression creep data analysis are briefly outlined as follows: (1) When there are more than one variable involved, there is no need to make the assumption that each variable affects the response independently. No separate normalizations are required either and the estimation of parameters is obtained by solving many simultaneous equations. The number of simultaneous equations is equal to the number of data sets. (2) Regression statistics such as R 2 - and F-statistics provide measures of the significance of regression creep equation in correlating the overall data. The relative weights of each variable on the response can also be obtained. (3) Special regression techniques such as step-wise, ridge, and robust regressions and residual plots, etc., provide diagnostic tools for model selections. Multiple regression analysis performed on a set of carefully selected Zircaloy-2 in-reactor creep data leads to a model which provides excellent correlations for the data. (Auth.)

Repetitive transarterial chemoembolization (rTACE) of hepatocellular carcinoma: comparisons between an arterial port system and conventional angiographic technique

International Nuclear Information System (INIS)

Hidajat, Nico; Griesshaber, Volker; Hildebrandt, Bert; Hosten, Norbert; Schroeder, Ralf-Juergen; Felix, Roland

2004-01-01

Purpose: To compare the cost and radiation exposure of repetitive transarterial chemoembolization (rTACE) using percutaneously implantable port system with rTACE using conventional catheterization technique. Materials and methods: In five patients with unresectable hepatocellular carcinoma, three cycles of TACE were performed using conventional technique and six cycles using port. The cumulative cost of material and contrast agent and dose area product (DAP) were compared with the cost and DAP that would be expected if the rTACE was performed conventionally. Results: The cost of material and contrast agent was 1002.6 Euro after three cycles of TACE using conventional technique and six cycles using port, but would be 1111.8 Euro if the nine cycles were performed using conventional technique alone. The rTACE with three cycles using conventional technique and six cycles using port led to ∼63% of the cumulative DAP that would be expected in rTACE using conventional technique alone. Conclusion: In rTACE, the use of percutaneously implantable port system might enable a reduction of cost and radiation exposure
A comparison of multiple regression and neural network techniques for mapping in situ pCO2 data

International Nuclear Information System (INIS)

Lefevre, Nathalie; Watson, Andrew J.; Watson, Adam R.

2005-01-01

Using about 138,000 measurements of surface pCO 2 in the Atlantic subpolar gyre (50-70 deg N, 60-10 deg W) during 1995-1997, we compare two methods of interpolation in space and time: a monthly distribution of surface pCO 2 constructed using multiple linear regressions on position and temperature, and a self-organizing neural network approach. Both methods confirm characteristics of the region found in previous work, i.e. the subpolar gyre is a sink for atmospheric CO 2 throughout the year, and exhibits a strong seasonal variability with the highest undersaturations occurring in spring and summer due to biological activity. As an annual average the surface pCO 2 is higher than estimates based on available syntheses of surface pCO 2 . This supports earlier suggestions that the sink of CO 2 in the Atlantic subpolar gyre has decreased over the last decade instead of increasing as previously assumed. The neural network is able to capture a more complex distribution than can be well represented by linear regressions, but both techniques agree relatively well on the average values of pCO 2 and derived fluxes. However, when both techniques are used with a subset of the data, the neural network predicts the remaining data to a much better accuracy than the regressions, with a residual standard deviation ranging from 3 to 11 μatm. The subpolar gyre is a net sink of CO 2 of 0.13 Gt-C/yr using the multiple linear regressions and 0.15 Gt-C/yr using the neural network, on average between 1995 and 1997. Both calculations were made with the NCEP monthly wind speeds converted to 10 m height and averaged between 1995 and 1997, and using the gas exchange coefficient of Wanninkhof
In vivo diagnosis of cervical precancer using Raman spectroscopy and genetic algorithm techniques.

Science.gov (United States)

Duraipandian, Shiyamala; Zheng, Wei; Ng, Joseph; Low, Jeffrey J H; Ilancheran, A; Huang, Zhiwei

2011-10-21

This study aimed to evaluate the clinical utility of applying near-infrared (NIR) Raman spectroscopy and genetic algorithm-partial least squares-discriminant analysis (GA-PLS-DA) to identify biomolecular changes of cervical tissues associated with dysplastic transformation during colposcopic examination. A total of 105 in vivo Raman spectra were measured from 57 cervical sites (35 normal and 22 precancer sites) of 29 patients recruited, in which 65 spectra were from normal sites, while 40 spectra were from cervical precancerous lesions (i.e., 7 low-grade CIN and 33 high-grade CIN). The GA feature selection technique incorporated with PLS was utilized to study the significant biochemical Raman bands for differentiation between normal and precancer cervical tissues. The GA-PLS-DA algorithm with double cross-validation (dCV) identified seven diagnostically significant Raman bands in the ranges of 925-935, 979-999, 1080-1090, 1240-1260, 1320-1340, 1400-1420, and 1625-1645 cm(-1) related to proteins, nucleic acids and lipids in tissue, and yielded a diagnostic accuracy of 82.9% (sensitivity of 72.5% (29/40) and specificity of 89.2% (58/65)) for precancer detection. The results of this exploratory study suggest that Raman spectroscopy in conjunction with GA-PLS-DA and dCV methods has the potential to provide clinically significant discrimination between normal and precancer cervical tissues at the molecular level.
Predicting heavy metal concentrations in soils and plants using field spectrophotometry

Science.gov (United States)

Muradyan, V.; Tepanosyan, G.; Asmaryan, Sh.; Sahakyan, L.; Saghatelyan, A.; Warner, T. A.

2017-09-01

Aim of this study is to predict heavy metal (HM) concentrations in soils and plants using field remote sensing methods. The studied sites were an industrial town of Kajaran and city of Yerevan. The research also included sampling of soils and leaves of two tree species exposed to different pollution levels and determination of contents of HM in lab conditions. The obtained spectral values were then collated with contents of HM in Kajaran soils and the tree leaves sampled in Yerevan, and statistical analysis was done. Consequently, Zn and Pb have a negative correlation coefficient (p regression models and artificial neural network (ANN) for HM prediction were developed. Good results were obtained for the best stress sensitive spectral band ANN (R2 0.9, RPD 2.0), Simple Linear Regression (SLR) and Partial Least Squares Regression (PLSR) (R2 0.7, RPD 1.4) models. Multiple Linear Regression (MLR) model was not applicable to predict Pb and Zn concentrations in soils in this research. Almost all full spectrum PLS models provide good calibration and validation results (RPD>1.4). Full spectrum ANN models are characterized by excellent calibration R2, rRMSE and RPD (0.9; 0.1 and >2.5 respectively). For prediction of Pb and Ni contents in plants SLR and PLS models were used. The latter provide almost the same results. Our findings indicate that it is possible to make coarse direct estimation of HM content in soils and plants using rapid and economic reflectance spectroscopy.
Testing Heteroscedasticity in Robust Regression

Czech Academy of Sciences Publication Activity Database

Kalina, Jan

2011-01-01

Roč. 1, č. 4 (2011), s. 25-28 ISSN 2045-3345 Grant - others:GA ČR(CZ) GA402/09/0557 Institutional research plan: CEZ:AV0Z10300504 Keywords : robust regression * heteroscedasticity * regression quantiles * diagnostics Subject RIV: BB - Applied Statistics , Operational Research http://www.researchjournals.co.uk/documents/Vol4/06%20Kalina.pdf
Rapid classification of pharmaceutical ingredients with Raman spectroscopy using compressive detection strategy with PLS-DA multivariate filters.

Science.gov (United States)

Cebeci Maltaş, Derya; Kwok, Kaho; Wang, Ping; Taylor, Lynne S; Ben-Amotz, Dor

2013-06-01

Identifying pharmaceutical ingredients is a routine procedure required during industrial manufacturing. Here we show that a recently developed Raman compressive detection strategy can be employed to classify various widely used pharmaceutical materials using a hybrid supervised/unsupervised strategy in which only two ingredients are used for training and yet six other ingredients can also be distinguished. More specifically, our liquid crystal spatial light modulator (LC-SLM) based compressive detection instrument is trained using only the active ingredient, tadalafil, and the excipient, lactose, but is tested using these and various other excipients; microcrystalline cellulose, magnesium stearate, titanium (IV) oxide, talc, sodium lauryl sulfate and hydroxypropyl cellulose. Partial least squares discriminant analysis (PLS-DA) is used to generate the compressive detection filters necessary for fast chemical classification. Although the filters used in this study are trained on only lactose and tadalafil, we show that all the pharmaceutical ingredients mentioned above can be differentiated and classified using PLS-DA compressive detection filters with an accumulation time of 10ms per filter. Copyright © 2013 Elsevier B.V. All rights reserved.
Comparison of several measure-correlate-predict models using support vector regression techniques to estimate wind power densities. A case study

International Nuclear Information System (INIS)

Díaz, Santiago; Carta, José A.; Matías, José M.

2017-01-01

Highlights: • Eight measure-correlate-predict (MCP) models used to estimate the wind power densities (WPDs) at a target site are compared. • Support vector regressions are used as the main prediction techniques in the proposed MCPs. • The most precise MCP uses two sub-models which predict wind speed and air density in an unlinked manner. • The most precise model allows to construct a bivariable (wind speed and air density) WPD probability density function. • MCP models trained to minimise wind speed prediction error do not minimise WPD prediction error. - Abstract: The long-term annual mean wind power density (WPD) is an important indicator of wind as a power source which is usually included in regional wind resource maps as useful prior information to identify potentially attractive sites for the installation of wind projects. In this paper, a comparison is made of eight proposed Measure-Correlate-Predict (MCP) models to estimate the WPDs at a target site. Seven of these models use the Support Vector Regression (SVR) and the eighth the Multiple Linear Regression (MLR) technique, which serves as a basis to compare the performance of the other models. In addition, a wrapper technique with 10-fold cross-validation has been used to select the optimal set of input features for the SVR and MLR models. Some of the eight models were trained to directly estimate the mean hourly WPDs at a target site. Others, however, were firstly trained to estimate the parameters on which the WPD depends (i.e. wind speed and air density) and then, using these parameters, the target site mean hourly WPDs. The explanatory features considered are different combinations of the mean hourly wind speeds, wind directions and air densities recorded in 2014 at ten weather stations in the Canary Archipelago (Spain). The conclusions that can be drawn from the study undertaken include the argument that the most accurate method for the long-term estimation of WPDs requires the execution of a
Exploratory regression analysis: a tool for selecting models and determining predictor importance.

Science.gov (United States)

Braun, Michael T; Oswald, Frederick L

2011-06-01

Linear regression analysis is one of the most important tools in a researcher's toolbox for creating and testing predictive models. Although linear regression analysis indicates how strongly a set of predictor variables, taken together, will predict a relevant criterion (i.e., the multiple R), the analysis cannot indicate which predictors are the most important. Although there is no definitive or unambiguous method for establishing predictor variable importance, there are several accepted methods. This article reviews those methods for establishing predictor importance and provides a program (in Excel) for implementing them (available for direct download at http://dl.dropbox.com/u/2480715/ERA.xlsm?dl=1) . The program investigates all 2(p) - 1 submodels and produces several indices of predictor importance. This exploratory approach to linear regression, similar to other exploratory data analysis techniques, has the potential to yield both theoretical and practical benefits.
Short term load forecasting technique based on the seasonal exponential adjustment method and the regression model

International Nuclear Information System (INIS)

Wu, Jie; Wang, Jianzhou; Lu, Haiyan; Dong, Yao; Lu, Xiaoxiao

2013-01-01

Highlights: ► The seasonal and trend items of the data series are forecasted separately. ► Seasonal item in the data series is verified by the Kendall τ correlation testing. ► Different regression models are applied to the trend item forecasting. ► We examine the superiority of the combined models by the quartile value comparison. ► Paired-sample T test is utilized to confirm the superiority of the combined models. - Abstract: For an energy-limited economy system, it is crucial to forecast load demand accurately. This paper devotes to 1-week-ahead daily load forecasting approach in which load demand series are predicted by employing the information of days before being similar to that of the forecast day. As well as in many nonlinear systems, seasonal item and trend item are coexisting in load demand datasets. In this paper, the existing of the seasonal item in the load demand data series is firstly verified according to the Kendall τ correlation testing method. Then in the belief of the separate forecasting to the seasonal item and the trend item would improve the forecasting accuracy, hybrid models by combining seasonal exponential adjustment method (SEAM) with the regression methods are proposed in this paper, where SEAM and the regression models are employed to seasonal and trend items forecasting respectively. Comparisons of the quartile values as well as the mean absolute percentage error values demonstrate this forecasting technique can significantly improve the accuracy though models applied to the trend item forecasting are eleven different ones. This superior performance of this separate forecasting technique is further confirmed by the paired-sample T tests
Elliptical multiple-output quantile regression and convex optimization

Czech Academy of Sciences Publication Activity Database

Hallin, M.; Šiman, Miroslav

2016-01-01

Roč. 109, č. 1 (2016), s. 232-237 ISSN 0167-7152 R&D Projects: GA ČR GA14-07234S Institutional support: RVO:67985556 Keywords : quantile regression * elliptical quantile * multivariate quantile * multiple-output regression Subject RIV: BA - General Mathematics Impact factor: 0.540, year: 2016 http://library.utia.cas.cz/separaty/2016/SI/siman-0458243.pdf
Near and mid infrared spectroscopy and multivariate data analysis in studies of oxidation of edible oils.

Science.gov (United States)

Wójcicki, Krzysztof; Khmelinskii, Igor; Sikorski, Marek; Sikorska, Ewa

2015-11-15

Infrared spectroscopic techniques and chemometric methods were used to study oxidation of olive, sunflower and rapeseed oils. Accelerated oxidative degradation of oils at 60°C was monitored using peroxide values and FT-MIR ATR and FT-NIR transmittance spectroscopy. Principal component analysis (PCA) facilitated visualization and interpretation of spectral changes occurring during oxidation. Multivariate curve resolution (MCR) method found three spectral components in the NIR and MIR spectral matrix, corresponding to the oxidation products, and saturated and unsaturated structures. Good quantitative relation was found between peroxide value and contribution of oxidation products evaluated using MCR--based on NIR (R(2) = 0.890), MIR (R(2) = 0.707) and combined NIR and MIR (R(2) = 0.747) data. Calibration models for prediction peroxide value established using partial least squares (PLS) regression were characterized for MIR (R(2) = 0.701, RPD = 1.7), NIR (R(2) = 0.970, RPD = 5.3), and combined NIR and MIR data (R(2) = 0.954, RPD = 3.1). Copyright © 2015 Elsevier Ltd. All rights reserved.
Evaluation of platelet thromboxane radioimmunoassay method to measure platelet life-span: Comparison with /sup 111/indium-platelet method

International Nuclear Information System (INIS)

Vallabhajosula, S.; Machac, J.; Badimon, L.; Lipszyc, H.; Goldsmith, S.J.; Fuster, V.

1985-01-01

The platelet activation during radiolabeling in vitro with Cr-51 and In-111 may affect the platelet life-span (PLS) in vivo. A new RIA method to measure PLS is being evaluated. Aspirin inhibits platelet thromboxane (TxA/sub 2/) by acetylating cyclooxygenase. The time required for the TxA/sub 2/ levels to return towards control values depends on the rate of new platelets entering circulation and is a measure of PLS. A single dose of aspirin (150mg) was given to 5 normal human subjects. Blood samples were collected for 2 days before aspirin and daily for 10 days. TxA/sub 2/ production in response to endogenous thrombin was studied by allowing 1 ml blood sample to clot at 37 0 C for 90 min. Serum TxB/sub 2/ (stable breakdown product of Tx-A/sub 2/) levels determined by RIA technique. The plot of TxB/sub 2/ levels (% control) against time showed a gradual increase. The PLS calculated by linear regression analysis assuming a 2-day lag period before cyclooxygenase recovery is 9.7 +- 2.37. In the same 5 subjects, platelets from a 50ml blood sample were labeled with /sup 111/In-tropolone in 2 ml autologous plasma. Starting at 1 hr after injection of labeled platelets, 10 blood samples were obtained over a 8 day period. The PLS calculated based on a linear regression analysis is 10.2 +. 1.4. The PLS measured from the rate of platelet disappearance from circulation and the rate of platelet regeneration into circulation are quite comparable in normal subjects. TxA/sub 2/ regeneration RIA may provide a method to measure PLS without administering radioactivity to patient
Fish mercury levels in lakes - adjusting for Hg and fish-size covariation

International Nuclear Information System (INIS)

Sonesten, Lars

2003-01-01

Fish-size covariation can be circumvented by regression intercepts of Hg vs. fish length as lake-specific Hg levels. - Accurate estimates of lake-specific mercury levels are vital in assessing the environmental impact on the mercury content in fish. The intercepts of lake-specific regressions of Hg concentration in fish vs. fish length provide accurate estimates when there is a prominent Hg and fish-size covariation. Commonly used regression methods, such as analysis of covariance (ANCOVA) and various standardization techniques are less suitable, since they do not completely remove the fish-size covariation when regression slopes are not parallel. Partial least squares (PLS) regression analysis reveals that catchment area and water chemistry have the strongest influence on the Hg level in fish in circumneutral lakes. PLS is a multivariate projection method that allows biased linear regression analysis of multicollinear data. The method is applicable to statistical and visual exploration of large data sets, even if there are more variables than observations. Environmental descriptors have no significant impact on the slopes of linear regressions of the Hg concentration in perch (Perca fluviatilis L.) vs. fish length, suggesting that the slopes mainly reflect ontogenetic dietary shifts during the perch life span
Combined wavelet transform-artificial neural network use in tablet active content determination by near-infrared spectroscopy.

Science.gov (United States)

Chalus, Pascal; Walter, Serge; Ulmschneider, Michel

2007-05-22

The pharmaceutical industry faces increasing regulatory pressure to optimize quality control. Content uniformity is a basic release test for solid dosage forms. To accelerate test throughput and comply with the Food and Drug Administration's process analytical technology initiative, attention is increasingly turning to nondestructive spectroscopic techniques, notably near-infrared (NIR) spectroscopy (NIRS). However, validation of NIRS using requisite linearity and standard error of prediction (SEP) criteria remains a challenge. This study applied wavelet transformation of the NIR spectra of a commercial tablet to build a model using conventional partial least squares (PLS) regression and an artificial neural network (ANN). Wavelet coefficients in the PLS and ANN models reduced SEP by up to 60% compared to PLS models using mathematical spectra pretreatment. ANN modeling yielded high-linearity calibration and a correlation coefficient exceeding 0.996.
Study on Development of Non-Destructive Measurement Technique for Viability of Lettuce Seed (Lactuca sativa L) Using Hyperspectral Reflectance Imaging

Energy Technology Data Exchange (ETDEWEB)

Ahn, Chi Kook; Cho, Byoung Kwan [College of Agriculture and Life Science, Chungnam National University, Daejeon (Korea, Republic of); Mo, Chang Yeon [National Acadamy of Agricultural Science, Daejeon (Korea, Republic of); Kim, Moon S. [Environmental Microbial and Food Safety Laboratory, Animal and Natural Resources Institute, Agricultural Research Service, United States Department of Agriculture, Washington (United States)

2012-10-15

In this study, the feasibility of hyperspectral reflectance imaging technique was investigated for the discrimination of viable and non-viable lettuce seeds. The spectral data of hyperspectral reflectance images with the spectral range between 750 nm and 1000 nm were used to develop PLS-DA model for the classification of viable and non-viable lettuce seeds. The discrimination accuracy of the calibration set was 81.6% and that of the test set was 81.2%. The image analysis method was developed to construct the discriminant images of non-viable seeds with the developed PLS-DA model. The discrimination accuracy obtained from the resultant image were 91%, which showed the feasibility of hyperspectral reflectance imaging technique for the mass discrimination of non-viable lettuce seeds from viable ones.
Study on Development of Non-Destructive Measurement Technique for Viability of Lettuce Seed (Lactuca sativa L) Using Hyperspectral Reflectance Imaging

International Nuclear Information System (INIS)

Ahn, Chi Kook; Cho, Byoung Kwan; Mo, Chang Yeon; Kim, Moon S.

2012-01-01

In this study, the feasibility of hyperspectral reflectance imaging technique was investigated for the discrimination of viable and non-viable lettuce seeds. The spectral data of hyperspectral reflectance images with the spectral range between 750 nm and 1000 nm were used to develop PLS-DA model for the classification of viable and non-viable lettuce seeds. The discrimination accuracy of the calibration set was 81.6% and that of the test set was 81.2%. The image analysis method was developed to construct the discriminant images of non-viable seeds with the developed PLS-DA model. The discrimination accuracy obtained from the resultant image were 91%, which showed the feasibility of hyperspectral reflectance imaging technique for the mass discrimination of non-viable lettuce seeds from viable ones.
A Comparative Investigation of the Combined Effects of Pre-Processing, Wavelength Selection, and Regression Methods on Near-Infrared Calibration Model Performance.

Science.gov (United States)

Wan, Jian; Chen, Yi-Chieh; Morris, A Julian; Thennadil, Suresh N

2017-07-01

Near-infrared (NIR) spectroscopy is being widely used in various fields ranging from pharmaceutics to the food industry for analyzing chemical and physical properties of the substances concerned. Its advantages over other analytical techniques include available physical interpretation of spectral data, nondestructive nature and high speed of measurements, and little or no need for sample preparation. The successful application of NIR spectroscopy relies on three main aspects: pre-processing of spectral data to eliminate nonlinear variations due to temperature, light scattering effects and many others, selection of those wavelengths that contribute useful information, and identification of suitable calibration models using linear/nonlinear regression . Several methods have been developed for each of these three aspects and many comparative studies of different methods exist for an individual aspect or some combinations. However, there is still a lack of comparative studies for the interactions among these three aspects, which can shed light on what role each aspect plays in the calibration and how to combine various methods of each aspect together to obtain the best calibration model. This paper aims to provide such a comparative study based on four benchmark data sets using three typical pre-processing methods, namely, orthogonal signal correction (OSC), extended multiplicative signal correction (EMSC) and optical path-length estimation and correction (OPLEC); two existing wavelength selection methods, namely, stepwise forward selection (SFS) and genetic algorithm optimization combined with partial least squares regression for spectral data (GAPLSSP); four popular regression methods, namely, partial least squares (PLS), least absolute shrinkage and selection operator (LASSO), least squares support vector machine (LS-SVM), and Gaussian process regression (GPR). The comparative study indicates that, in general, pre-processing of spectral data can play a significant
A comparison of random forest regression and multiple linear regression for prediction in neuroscience.

Science.gov (United States)

Smith, Paul F; Ganesh, Siva; Liu, Ping

2013-10-30

Regression is a common statistical tool for prediction in neuroscience. However, linear regression is by far the most common form of regression used, with regression trees receiving comparatively little attention. In this study, the results of conventional multiple linear regression (MLR) were compared with those of random forest regression (RFR), in the prediction of the concentrations of 9 neurochemicals in the vestibular nucleus complex and cerebellum that are part of the l-arginine biochemical pathway (agmatine, putrescine, spermidine, spermine, l-arginine, l-ornithine, l-citrulline, glutamate and γ-aminobutyric acid (GABA)). The R(2) values for the MLRs were higher than the proportion of variance explained values for the RFRs: 6/9 of them were ≥ 0.70 compared to 4/9 for RFRs. Even the variables that had the lowest R(2) values for the MLRs, e.g. ornithine (0.50) and glutamate (0.61), had much lower proportion of variance explained values for the RFRs (0.27 and 0.49, respectively). The RSE values for the MLRs were lower than those for the RFRs in all but two cases. In general, MLRs seemed to be superior to the RFRs in terms of predictive value and error. In the case of this data set, MLR appeared to be superior to RFR in terms of its explanatory value and error. This result suggests that MLR may have advantages over RFR for prediction in neuroscience with this kind of data set, but that RFR can still have good predictive value in some cases. Copyright © 2013 Elsevier B.V. All rights reserved.
Modeling soil organic matter (SOM) from satellite data using VISNIR-SWIR spectroscopy and PLS regression with step-down variable selection algorithm: case study of Campos Amazonicos National Park savanna enclave, Brazil

Science.gov (United States)

Rosero-Vlasova, O.; Borini Alves, D.; Vlassova, L.; Perez-Cabello, F.; Montorio Lloveria, R.

2017-10-01

Deforestation in Amazon basin due, among other factors, to frequent wildfires demands continuous post-fire monitoring of soil and vegetation. Thus, the study posed two objectives: (1) evaluate the capacity of Visible - Near InfraRed - ShortWave InfraRed (VIS-NIR-SWIR) spectroscopy to estimate soil organic matter (SOM) in fire-affected soils, and (2) assess the feasibility of SOM mapping from satellite images. For this purpose, 30 soil samples (surface layer) were collected in 2016 in areas of grass and riparian vegetation of Campos Amazonicos National Park, Brazil, repeatedly affected by wildfires. Standard laboratory procedures were applied to determine SOM. Reflectance spectra of soils were obtained in controlled laboratory conditions using Fieldspec4 spectroradiometer (spectral range 350nm- 2500nm). Measured spectra were resampled to simulate reflectances for Landsat-8, Sentinel-2 and EnMap spectral bands, used as predictors in SOM models developed using Partial Least Squares regression and step-down variable selection algorithm (PLSR-SD). The best fit was achieved with models based on reflectances simulated for EnMap bands (R2=0.93; R2cv=0.82 and NMSE=0.07; NMSEcv=0.19). The model uses only 8 out of 244 predictors (bands) chosen by the step-down variable selection algorithm. The least reliable estimates (R2=0.55 and R2cv=0.40 and NMSE=0.43; NMSEcv=0.60) resulted from Landsat model, while Sentinel-2 model showed R2=0.68 and R2cv=0.63; NMSE=0.31 and NMSEcv=0.38. The results confirm high potential of VIS-NIR-SWIR spectroscopy for SOM estimation. Application of step-down produces sparser and better-fit models. Finally, SOM can be estimated with an acceptable accuracy (NMSE 0.35) from EnMap and Sentinel-2 data enabling mapping and analysis of impacts of repeated wildfires on soils in the study area.
A combined technique using SEM and TOPSIS for the commercialization capability of R&D project evaluation

Directory of Open Access Journals (Sweden)

Charttirot Karaveg

2015-07-01

Full Text Available There is a high risk of R&D based innovation being commercialized, especially in the innovation transfer process which is a concern to many entrepreneurs and researchers. The purpose of this research is to develop the criteria of R&D commercialization capability and to propose a combined technique of Structural Equation Modelling (SEM and Technique for Order Preference by Similarity to Ideal Solution (TOPSIS for R&D project evaluation. The research utilized a mixed-method approach. The first phase comprised a qualitative study on commercialization criteria development though the survey research of 272 successful entrepreneurs and researchers in all industrial sectors in Thailand. The data was collected with a structured questionnaire and analyzed by SEM. The second phase was involved with SEM-TOPSIS technique development and a case study of 45 R&D projects in research institutes and incubators for technique validation. The research results reveal that there were six criteria for R&D project commercialization capability, these are arranged according to the significance; marketing, technology, finance, non-financial impact, intellectual property, and human resource. The holistic criteria is presented in decreasing order on the ambiguous subjectivity of the fuzzy-expert system, to help with effectively funding R&D and to prevent a resource meltdown. This study applies SEM to the relative weighting of hierarchical criteria. The TOPSIS approach is employed to rank the alternative performance. An integrated SEM-TOPSIS is proposed for the first time and applied to present R&D projects shown to be effective and feasible in evaluating R&D commercialization capacity.

A novel exploratory chemometric approach to environmental monitorring by combining block clustering with Partial Least Square (PLS) analysis

Science.gov (United States)

2013-01-01

Background Given the serious threats posed to terrestrial ecosystems by industrial contamination, environmental monitoring is a standard procedure used for assessing the current status of an environment or trends in environmental parameters. Measurement of metal concentrations at different trophic levels followed by their statistical analysis using exploratory multivariate methods can provide meaningful information on the status of environmental quality. In this context, the present paper proposes a novel chemometric approach to standard statistical methods by combining the Block clustering with Partial least square (PLS) analysis to investigate the accumulation patterns of metals in anthropized terrestrial ecosystems. The present study focused on copper, zinc, manganese, iron, cobalt, cadmium, nickel, and lead transfer along a soil-plant-snai food chain, and the hepatopancreas of the Roman snail (Helix pomatia) was used as a biological end-point of metal accumulation. Results Block clustering deliniates between the areas exposed to industrial and vehicular contamination. The toxic metals have similar distributions in the nettle leaves and snail hepatopancreas. PLS analysis showed that (1) zinc and copper concentrations at the lower trophic levels are the most important latent factors that contribute to metal accumulation in land snails; (2) cadmium and lead are the main determinants of pollution pattern in areas exposed to industrial contamination; (3) at the sites located near roads lead is the most threatfull metal for terrestrial ecosystems. Conclusion There were three major benefits by applying block clustering with PLS for processing the obtained data: firstly, it helped in grouping sites depending on the type of contamination. Secondly, it was valuable for identifying the latent factors that contribute the most to metal accumulation in land snails. Finally, it optimized the number and type of data that are best for monitoring the status of metallic
A novel exploratory chemometric approach to environmental monitorring by combining block clustering with Partial Least Square (PLS) analysis.

Science.gov (United States)

Nica, Dragos V; Bordean, Despina Maria; Pet, Ioan; Pet, Elena; Alda, Simion; Gergen, Iosif

2013-08-30

Given the serious threats posed to terrestrial ecosystems by industrial contamination, environmental monitoring is a standard procedure used for assessing the current status of an environment or trends in environmental parameters. Measurement of metal concentrations at different trophic levels followed by their statistical analysis using exploratory multivariate methods can provide meaningful information on the status of environmental quality. In this context, the present paper proposes a novel chemometric approach to standard statistical methods by combining the Block clustering with Partial least square (PLS) analysis to investigate the accumulation patterns of metals in anthropized terrestrial ecosystems. The present study focused on copper, zinc, manganese, iron, cobalt, cadmium, nickel, and lead transfer along a soil-plant-snai food chain, and the hepatopancreas of the Roman snail (Helix pomatia) was used as a biological end-point of metal accumulation. Block clustering deliniates between the areas exposed to industrial and vehicular contamination. The toxic metals have similar distributions in the nettle leaves and snail hepatopancreas. PLS analysis showed that (1) zinc and copper concentrations at the lower trophic levels are the most important latent factors that contribute to metal accumulation in land snails; (2) cadmium and lead are the main determinants of pollution pattern in areas exposed to industrial contamination; (3) at the sites located near roads lead is the most threatfull metal for terrestrial ecosystems. There were three major benefits by applying block clustering with PLS for processing the obtained data: firstly, it helped in grouping sites depending on the type of contamination. Secondly, it was valuable for identifying the latent factors that contribute the most to metal accumulation in land snails. Finally, it optimized the number and type of data that are best for monitoring the status of metallic contamination in terrestrial ecosystems
Direct-on-Filter α-Quartz Estimation in Respirable Coal Mine Dust Using Transmission Fourier Transform Infrared Spectrometry and Partial Least Squares Regression.

Science.gov (United States)

Miller, Arthur L; Weakley, Andrew Todd; Griffiths, Peter R; Cauda, Emanuele G; Bayman, Sean

2017-05-01

In order to help reduce silicosis in miners, the National Institute for Occupational Health and Safety (NIOSH) is developing field-portable methods for measuring airborne respirable crystalline silica (RCS), specifically the polymorph α-quartz, in mine dusts. In this study we demonstrate the feasibility of end-of-shift measurement of α-quartz using a direct-on-filter (DoF) method to analyze coal mine dust samples deposited onto polyvinyl chloride filters. The DoF method is potentially amenable for on-site analyses, but deviates from the current regulatory determination of RCS for coal mines by eliminating two sample preparation steps: ashing the sampling filter and redepositing the ash prior to quantification by Fourier transform infrared (FT-IR) spectrometry. In this study, the FT-IR spectra of 66 coal dust samples from active mines were used, and the RCS was quantified by using: (1) an ordinary least squares (OLS) calibration approach that utilizes standard silica material as done in the Mine Safety and Health Administration's P7 method; and (2) a partial least squares (PLS) regression approach. Both were capable of accounting for kaolinite, which can confound the IR analysis of silica. The OLS method utilized analytical standards for silica calibration and kaolin correction, resulting in a good linear correlation with P7 results and minimal bias but with the accuracy limited by the presence of kaolinite. The PLS approach also produced predictions well-correlated to the P7 method, as well as better accuracy in RCS prediction, and no bias due to variable kaolinite mass. Besides decreased sensitivity to mineral or substrate confounders, PLS has the advantage that the analyst is not required to correct for the presence of kaolinite or background interferences related to the substrate, making the method potentially viable for automated RCS prediction in the field. This study demonstrated the efficacy of FT-IR transmission spectrometry for silica determination in
The number of subjects per variable required in linear regression analyses.

Science.gov (United States)

Austin, Peter C; Steyerberg, Ewout W

2015-06-01

To determine the number of independent variables that can be included in a linear regression model. We used a series of Monte Carlo simulations to examine the impact of the number of subjects per variable (SPV) on the accuracy of estimated regression coefficients and standard errors, on the empirical coverage of estimated confidence intervals, and on the accuracy of the estimated R(2) of the fitted model. A minimum of approximately two SPV tended to result in estimation of regression coefficients with relative bias of less than 10%. Furthermore, with this minimum number of SPV, the standard errors of the regression coefficients were accurately estimated and estimated confidence intervals had approximately the advertised coverage rates. A much higher number of SPV were necessary to minimize bias in estimating the model R(2), although adjusted R(2) estimates behaved well. The bias in estimating the model R(2) statistic was inversely proportional to the magnitude of the proportion of variation explained by the population regression model. Linear regression models require only two SPV for adequate estimation of regression coefficients, standard errors, and confidence intervals. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Prediction of ethanol in bottled Chinese rice wine by NIR spectroscopy

Science.gov (United States)

Ying, Yibin; Yu, Haiyan; Pan, Xingxiang; Lin, Tao

2006-10-01

To evaluate the applicability of non-invasive visible and near infrared (VIS-NIR) spectroscopy for determining ethanol concentration of Chinese rice wine in square brown glass bottle, transmission spectra of 100 bottled Chinese rice wine samples were collected in the spectral range of 350-1200 nm. Statistical equations were established between the reference data and VIS-NIR spectra by partial least squares (PLS) regression method. Performance of three kinds of mathematical treatment of spectra (original spectra, first derivative spectra and second derivative spectra) were also discussed. The PLS models of original spectra turned out better results, with higher correlation coefficient in calibration (R cal) of 0.89, lower root mean standard error of calibration (RMSEC) of 0.165, and lower root mean standard error of cross validation (RMSECV) of 0.179. Using original spectra, PLS models for ethanol concentration prediction were developed. The R cal and the correlation coefficient in validation (R val) were 0.928 and 0.875, respectively; and the RMSEC and the root mean standard error of validation (RMSEP) were 0.135 (%, v v -1) and 0.177 (%, v v -1), respectively. The results demonstrated that VIS-NIR spectroscopy could be used to predict ethanol concentration in bottled Chinese rice wine.
A Technique for Estimating Intensity of Emotional Expressions and Speaking Styles in Speech Based on Multiple-Regression HSMM

Science.gov (United States)

Nose, Takashi; Kobayashi, Takao

In this paper, we propose a technique for estimating the degree or intensity of emotional expressions and speaking styles appearing in speech. The key idea is based on a style control technique for speech synthesis using a multiple regression hidden semi-Markov model (MRHSMM), and the proposed technique can be viewed as the inverse of the style control. In the proposed technique, the acoustic features of spectrum, power, fundamental frequency, and duration are simultaneously modeled using the MRHSMM. We derive an algorithm for estimating explanatory variables of the MRHSMM, each of which represents the degree or intensity of emotional expressions and speaking styles appearing in acoustic features of speech, based on a maximum likelihood criterion. We show experimental results to demonstrate the ability of the proposed technique using two types of speech data, simulated emotional speech and spontaneous speech with different speaking styles. It is found that the estimated values have correlation with human perception.
Regression modeling methods, theory, and computation with SAS

CERN Document Server

Panik, Michael

2009-01-01

Regression Modeling: Methods, Theory, and Computation with SAS provides an introduction to a diverse assortment of regression techniques using SAS to solve a wide variety of regression problems. The author fully documents the SAS programs and thoroughly explains the output produced by the programs.The text presents the popular ordinary least squares (OLS) approach before introducing many alternative regression methods. It covers nonparametric regression, logistic regression (including Poisson regression), Bayesian regression, robust regression, fuzzy regression, random coefficients regression,
Pengaruh Competitor Accounting Sebagai Strategic Management Accounting Techniques Terhadap Competitive Advantage Dan Organization Performance

OpenAIRE

Alan, Hartanto

2015-01-01

The purpose of this study was to know the affect of Competitor Accounting as a Strategic Management Accounting Techniques toward Competitive Advantage and Organization Performance on manufacturing companies in Surabaya and Sidoarjo. In this study primary data was used by using questionnaire distributed to manufacturing companies in Surabaya and Sidoarjo. This study used path modeling analysis technique with PLS tools. From the examination showed that there were positive and significant affect...
Using the partial least squares (PLS) method to establish critical success factor interdependence in ERP implementation projects

OpenAIRE

Esteves, José; Pastor Collado, Juan Antonio; Casanovas Garcia, Josep

2002-01-01

This technical research report proposes the usage of a statistical approach named Partial Least squares (PLS) to define the relationships between critical success factors for ERP implementation projects. In previous research work, we developed a unified model of critical success factors for ERP implementation projects. Some researchers have evidenced the relationships between these critical success factors, however no one has defined in a form...
A QSAR, Pharmacokinetic and Toxicological Study of New Artemisinin Compounds with Anticancer Activity

Directory of Open Access Journals (Sweden)

Josinete B. Vieira

2014-07-01

Full Text Available The Density Functional Theory (DFT method and the 6-31G** basis set were employed to calculate the molecular properties of artemisinin and 20 derivatives with different degrees of cytotoxicity against the human hepatocellular carcinoma HepG2 line. Principal component analysis (PCA and hierarchical cluster analysis (HCA were employed to select the most important descriptors related to anticancer activity. The significant molecular descriptors related to the compounds with anticancer activity were the ALOGPS_log, Mor29m, IC5 and GAP energy. The Pearson correlation between activity and most important descriptors were used for the regression partial least squares (PLS and principal component regression (PCR models built. The regression PLS and PCR were very close, with variation between PLS and PCR of R2 = ±0.0106, R2ajust = ±0.0125, s = ±0.0234, F(4,11 = ±12.7802, Q2 = ±0.0088, SEV = ±0.0132, PRESS = ±0.4808 and SPRESS = ±0.0057. These models were used to predict the anticancer activity of eight new artemisinin compounds (test set with unknown activity, and for these new compounds were predicted pharmacokinetic properties: human intestinal absorption (HIA, cellular permeability (PCaCO2, cell permeability Maden Darby Canine Kidney (PMDCK, skin permeability (PSkin, plasma protein binding (PPB and penetration of the blood-brain barrier (CBrain/Blood, and toxicological: mutagenicity and carcinogenicity. The test set showed for two new artemisinin compounds satisfactory results for anticancer activity and pharmacokinetic and toxicological properties. Consequently, further studies need be done to evaluate the different proposals as well as their actions, toxicity, and potential use for treatment of cancers.
Phantom and animal imaging studies using PLS synchrotron X-rays

CERN Document Server

Hee Joung Kim; Kyu Ho Lee; Hai Jo Jung; Eun Kyung Kim; Jung Ho Je; In Woo Kim; Yeukuang, Hwu; Wen Li Tsai; Je Kyung Seong; Seung Won Lee; Hyung Sik Yoo

2001-01-01

Ultra-high resolution radiographs can be obtained using synchrotron X-rays. A collaboration team consisting of K-JIST, POSTECH and YUMC has recently commissioned a new beamline (5C1) at Pohang Light Source (PLS) in Korea for medical applications using phase contrast radiology. Relatively simple image acquisition systems were set up on 5C1 beamline, and imaging studies were performed for resolution test patterns, mammographic phantom, and animals. Resolution test patterns and mammographic phantom images showed much better image resolution and quality with the 5C1 imaging system than the mammography system. Both fish and mouse images with 5C1 imaging system also showed much better image resolution with great details of organs and anatomy compared to those obtained with a conventional mammography system. A simple and inexpensive ultra-high resolution imaging system on 5C1 beamline was successfully implemented. The authors were able to acquire ultra-high resolution images for, resolution test patterns, mammograph...
A course in statistics with R

CERN Document Server

Tattar, Prabhanjan N; Manjunath, B G

2016-01-01

Integrates the theory and applications of statistics using R A Course in Statistics with R has been written to bridge the gap between theory and applications and explain how mathematical expressions are converted into R programs. The book has been primarily designed as a useful companion for a Masters student during each semester of the course, but will also help applied statisticians in revisiting the underpinnings of the subject. With this dual goal in mind, the book begins with R basics and quickly covers visualization and exploratory analysis. Probability and statistical inference, inclusive of classical, nonparametric, and Bayesian schools, is developed with definitions, motivations, mathematical expression and R programs in a way which will help the reader to understand the mathematical development as well as R implementation. Linear regression models, experimental designs, multivariate analysis, and categorical data analysis are treated in a way which makes effective use of visualization techniques and...
Logistic regression applied to natural hazards: rare event logistic regression with replications

OpenAIRE

Guns, M.; Vanacker, Veerle

2012-01-01

Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logisti...
Local bilinear multiple-output quantile/depth regression

Czech Academy of Sciences Publication Activity Database

Hallin, M.; Lu, Z.; Paindaveine, D.; Šiman, Miroslav

2015-01-01

Roč. 21, č. 3 (2015), s. 1435-1466 ISSN 1350-7265 R&D Projects: GA MŠk(CZ) 1M06047 Institutional support: RVO:67985556 Keywords : conditional depth * growth chart * halfspace depth * local bilinear regression * multivariate quantile * quantile regression * regression depth Subject RIV: BA - General Mathematics Impact factor: 1.372, year: 2015 http://library.utia.cas.cz/separaty/2015/SI/siman-0446857.pdf
tgp: An R Package for Bayesian Nonstationary, Semiparametric Nonlinear Regression and Design by Treed Gaussian Process Models

Directory of Open Access Journals (Sweden)

Robert B. Gramacy

2007-06-01

Full Text Available The tgp package for R is a tool for fully Bayesian nonstationary, semiparametric nonlinear regression and design by treed Gaussian processes with jumps to the limiting linear model. Special cases also implemented include Bayesian linear models, linear CART, stationary separable and isotropic Gaussian processes. In addition to inference and posterior prediction, the package supports the (sequential design of experiments under these models paired with several objective criteria. 1-d and 2-d plotting, with higher dimension projection and slice capabilities, and tree drawing functions (requiring maptree and combinat packages, are also provided for visualization of tgp objects.
Prediction of valid acidity in intact apples with Fourier transform near infrared spectroscopy.

Science.gov (United States)

Liu, Yan-De; Ying, Yi-Bin; Fu, Xia-Ping

2005-03-01

To develop nondestructive acidity prediction for intact Fuji apples, the potential of Fourier transform near infrared (FT-NIR) method with fiber optics in interactance mode was investigated. Interactance in the 800 nm to 2619 nm region was measured for intact apples, harvested from early to late maturity stages. Spectral data were analyzed by two multivariate calibration techniques including partial least squares (PLS) and principal component regression (PCR) methods. A total of 120 Fuji apples were tested and 80 of them were used to form a calibration data set. The influences of different data preprocessing and spectra treatments were also quantified. Calibration models based on smoothing spectra were slightly worse than that based on derivative spectra, and the best result was obtained when the segment length was 5 nm and the gap size was 10 points. Depending on data preprocessing and PLS method, the best prediction model yielded correlation coefficient of determination (r2) of 0.759, low root mean square error of prediction (RMSEP) of 0.0677, low root mean square error of calibration (RMSEC) of 0.0562. The results indicated the feasibility of FT-NIR spectral analysis for predicting apple valid acidity in a nondestructive way.
Development and Validation of a Near-Infrared Spectroscopy Method for the Prediction of Acrylamide Content in French-Fried Potato.

Science.gov (United States)

Adedipe, Oluwatosin E; Johanningsmeier, Suzanne D; Truong, Van-Den; Yencho, G Craig

2016-03-02

This study investigated the ability of near-infrared spectroscopy (NIRS) to predict acrylamide content in French-fried potato. Potato flour spiked with acrylamide (50-8000 μg/kg) was used to determine if acrylamide could be accurately predicted in a potato matrix. French fries produced with various pretreatments and cook times (n = 84) and obtained from quick-service restaurants (n = 64) were used for model development and validation. Acrylamide was quantified using gas chromatography-mass spectrometry, and reflectance spectra (400-2500 nm) of each freeze-dried sample were captured on a Foss XDS Rapid Content Analyzer-NIR spectrometer. Partial least-squares (PLS) discriminant analysis and PLS regression modeling demonstrated that NIRS could accurately detect acrylamide content as low as 50 μg/kg in the model potato matrix. Prediction errors of 135 μg/kg (R(2) = 0.98) and 255 μg/kg (R(2) = 0.93) were achieved with the best PLS models for acrylamide prediction in Russet Norkotah French-fried potato and multiple samples of unknown varieties, respectively. The findings indicate that NIRS can be used as a screening tool in potato breeding and potato processing research to reduce acrylamide in the food supply.
Computing multiple-output regression quantile regions

Czech Academy of Sciences Publication Activity Database

Paindaveine, D.; Šiman, Miroslav

2012-01-01

Roč. 56, č. 4 (2012), s. 840-853 ISSN 0167-9473 R&D Projects: GA MŠk(CZ) 1M06047 Institutional research plan: CEZ:AV0Z10750506 Keywords : halfspace depth * multiple-output regression * parametric linear programming * quantile regression Subject RIV: BA - General Mathematics Impact factor: 1.304, year: 2012 http://library.utia.cas.cz/separaty/2012/SI/siman-0376413.pdf
Recognition of Orobanche cumana Below-Ground Parasitism Through Physiological and Hyper Spectral Measurements in Sunflower (Helianthus annuus L.).

Science.gov (United States)

Cochavi, Amnon; Rapaport, Tal; Gendler, Tania; Karnieli, Arnon; Eizenberg, Hanan; Rachmilevitch, Shimon; Ephrath, Jhonathan E

2017-01-01

Broomrape ( Orobanche and Phelipanche spp.) parasitism is a severe problem in many crops worldwide, including in the Mediterranean basin. Most of the damage occurs during the sub-soil developmental stage of the parasite, by the time the parasite emerges from the ground, damage to the crop has already been done. One feasible method for sensing early, below-ground parasitism is through physiological measurements, which provide preliminary indications of slight changes in plant vitality and productivity. However, a complete physiological field survey is slow, costly and requires skilled manpower. In recent decades, visible to-shortwave infrared (VIS-SWIR) hyperspectral tools have exhibited great potential for faster, cheaper, simpler and non-destructive tracking of physiological changes. The advantage of VIS-SWIR is even greater when narrow-band signatures are analyzed with an advanced statistical technique, like a partial least squares regression (PLS-R). The technique can pinpoint the most physiologically sensitive wavebands across an entire spectrum, even in the presence of high levels of noise and collinearity. The current study evaluated a method for early detection of Orobanche cumana parasitism in sunflower that combines plant physiology, hyperspectral readings and PLS-R. Seeds of susceptible and resistant O. cumana sunflower varieties were planted in infested (15 mg kg -1 seeds) and non-infested soil. The plants were examined weekly to detect any physiological or structural changes; the examinations were accompanied by hyperspectral readings. During the early stage of the parasitism, significant differences between infected and non-infected sunflower plants were found in the reflectance of near and shortwave infrared areas. Physiological measurements revealed no differences between treatments until O. cumana inflorescences emerged. However, levels of several macro- and microelements tended to decrease during the early stage of O. cumana parasitism. Analysis of
Recognition of Orobanche cumana Below-Ground Parasitism Through Physiological and Hyper Spectral Measurements in Sunflower (Helianthus annuus L.

Directory of Open Access Journals (Sweden)

Amnon Cochavi

2017-06-01

Full Text Available Broomrape (Orobanche and Phelipanche spp. parasitism is a severe problem in many crops worldwide, including in the Mediterranean basin. Most of the damage occurs during the sub-soil developmental stage of the parasite, by the time the parasite emerges from the ground, damage to the crop has already been done. One feasible method for sensing early, below-ground parasitism is through physiological measurements, which provide preliminary indications of slight changes in plant vitality and productivity. However, a complete physiological field survey is slow, costly and requires skilled manpower. In recent decades, visible to-shortwave infrared (VIS-SWIR hyperspectral tools have exhibited great potential for faster, cheaper, simpler and non-destructive tracking of physiological changes. The advantage of VIS-SWIR is even greater when narrow-band signatures are analyzed with an advanced statistical technique, like a partial least squares regression (PLS-R. The technique can pinpoint the most physiologically sensitive wavebands across an entire spectrum, even in the presence of high levels of noise and collinearity. The current study evaluated a method for early detection of Orobanche cumana parasitism in sunflower that combines plant physiology, hyperspectral readings and PLS-R. Seeds of susceptible and resistant O. cumana sunflower varieties were planted in infested (15 mg kg-1 seeds and non-infested soil. The plants were examined weekly to detect any physiological or structural changes; the examinations were accompanied by hyperspectral readings. During the early stage of the parasitism, significant differences between infected and non-infected sunflower plants were found in the reflectance of near and shortwave infrared areas. Physiological measurements revealed no differences between treatments until O. cumana inflorescences emerged. However, levels of several macro- and microelements tended to decrease during the early stage of O. cumana

Understanding logistic regression analysis.

Science.gov (United States)

Sperandei, Sandro

2014-01-01

Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using examples to make it as simple as possible. After definition of the technique, the basic interpretation of the results is highlighted and then some special issues are discussed.
Distributed Monitoring of the R2 Statistic for Linear Regression

Data.gov (United States)

National Aeronautics and Space Administration — The problem of monitoring a multivariate linear regression model is relevant in studying the evolving relationship between a set of input variables (features) and...
Comparing Kriging and Regression Approaches for Mapping Soil Clay Content in a diverse Danish Landscape

DEFF Research Database (Denmark)

Adhikari, Kabindra; Bou Kheir, Rania; Greve, Mette Balslev

2013-01-01

Information on the spatial variability of soil texture including soil clay content in a landscape is very important for agricultural and environmental use. Different prediction techniques are available to assess and map spatial variability of soil properties, but selecting the most suitable techn...... the prediction in OKst compared with that in OK, whereas RT showed the lowest performance of all (R2 = 0.52; RMSE = 0.52; and RPD = 1.17). We found RKrr to be an effective prediction method and recommend this method for any future soil mapping activities in Denmark....... technique at a given site has always been a major issue in all soil mapping applications. We studied the prediction performance of ordinary kriging (OK), stratified OK (OKst), regression trees (RT), and rule-based regression kriging (RKrr) for digital mapping of soil clay content at 30.4-m grid size using 6...
Logistic regression for risk factor modelling in stuttering research.

Science.gov (United States)

Reed, Phil; Wu, Yaqionq

2013-06-01

To outline the uses of logistic regression and other statistical methods for risk factor analysis in the context of research on stuttering. The principles underlying the application of a logistic regression are illustrated, and the types of questions to which such a technique has been applied in the stuttering field are outlined. The assumptions and limitations of the technique are discussed with respect to existing stuttering research, and with respect to formulating appropriate research strategies to accommodate these considerations. Finally, some alternatives to the approach are briefly discussed. The way the statistical procedures are employed are demonstrated with some hypothetical data. Research into several practical issues concerning stuttering could benefit if risk factor modelling were used. Important examples are early diagnosis, prognosis (whether a child will recover or persist) and assessment of treatment outcome. After reading this article you will: (a) Summarize the situations in which logistic regression can be applied to a range of issues about stuttering; (b) Follow the steps in performing a logistic regression analysis; (c) Describe the assumptions of the logistic regression technique and the precautions that need to be checked when it is employed; (d) Be able to summarize its advantages over other techniques like estimation of group differences and simple regression. Copyright © 2012 Elsevier Inc. All rights reserved.
Linear regression

CERN Document Server

Olive, David J

2017-01-01

This text covers both multiple linear regression and some experimental design models. The text uses the response plot to visualize the model and to detect outliers, does not assume that the error distribution has a known parametric distribution, develops prediction intervals that work when the error distribution is unknown, suggests bootstrap hypothesis tests that may be useful for inference after variable selection, and develops prediction regions and large sample theory for the multivariate linear regression model that has m response variables. A relationship between multivariate prediction regions and confidence regions provides a simple way to bootstrap confidence regions. These confidence regions often provide a practical method for testing hypotheses. There is also a chapter on generalized linear models and generalized additive models. There are many R functions to produce response and residual plots, to simulate prediction intervals and hypothesis tests, to detect outliers, and to choose response trans...
Multivariate regression models for the simultaneous quantitative analysis of calcium and magnesium carbonates and magnesium oxide through drifts data

Directory of Open Access Journals (Sweden)

Marder Luciano

2006-01-01

Full Text Available In the present work multivariate regression models were developed for the quantitative analysis of ternary systems using Diffuse Reflectance Infrared Fourier Transform Spectroscopy (DRIFTS to determine the concentration in weight of calcium carbonate, magnesium carbonate and magnesium oxide. Nineteen spectra of standard samples previously defined in ternary diagram by mixture design were prepared and mid-infrared diffuse reflectance spectra were recorded. The partial least squares (PLS regression method was applied to the model. The spectra set was preprocessed by either mean-centered and variance-scaled (model 2 or mean-centered only (model 1. The results based on the prediction performance of the external validation set expressed by RMSEP (root mean square error of prediction demonstrated that it is possible to develop good models to simultaneously determine calcium carbonate, magnesium carbonate and magnesium oxide content in powdered samples that can be used in the study of the thermal decomposition of dolomite rocks.
Using Reflectance Spectroscopy and Artificial Neural Network to Assess Water Infiltration Rate into the Soil Profile

Directory of Open Access Journals (Sweden)

Naftali Goldshleger

2012-01-01

Full Text Available We explored the effect of raindrop energy on both water infiltration into soil and the soil's NIR-SWIR spectral reflectance (1200–2400 nm. Seven soils with different physical and morphological properties from Israel and the US were subjected to an artificial rainstorm. The spectral properties of the crust formed on the soil surface were analyzed using an artificial neural network (ANN. Results were compared to a study with the same population in which partial least-squares (PLS regression was applied. It was concluded that both models (PLS regression and ANN are generic as they are based on properties that correlate with the physical crust, such as clay content, water content and organic matter. Nonetheless, better results for the connection between infiltration rate and spectral properties were achieved with the non-linear ANN technique in terms of statistical values (RMSE of 17.3% for PLS regression and 10% for ANN. Furthermore, although both models were run at the selected wavelengths and their accuracy was assessed with an independent external group of samples, no pre-processing procedure was applied to the reflectance data when using ANN. As the relationship between infiltration rate and soil reflectance is not linear, ANN methods have the advantage for examining this relationship when many soils are being analyzed.
Soil moisture estimation using multi linear regression with terraSAR-X data

Directory of Open Access Journals (Sweden)

G. García

2016-06-01

Full Text Available The first five centimeters of soil form an interface where the main heat fluxes exchanges between the land surface and the atmosphere occur. Besides ground measurements, remote sensing has proven to be an excellent tool for the monitoring of spatial and temporal distributed data of the most relevant Earth surface parameters including soil’s parameters. Indeed, active microwave sensors (Synthetic Aperture Radar - SAR offer the opportunity to monitor soil moisture (HS at global, regional and local scales by monitoring involved processes. Several inversion algorithms, that derive geophysical information as HS from SAR data, were developed. Many of them use electromagnetic models for simulating the backscattering coefficient and are based on statistical techniques, such as neural networks, inversion methods and regression models. Recent studies have shown that simple multiple regression techniques yield satisfactory results. The involved geophysical variables in these methodologies are descriptive of the soil structure, microwave characteristics and land use. Therefore, in this paper we aim at developing a multiple linear regression model to estimate HS on flat agricultural regions using TerraSAR-X satellite data and data from a ground weather station. The results show that the backscatter, the precipitation and the relative humidity are the explanatory variables of HS. The results obtained presented a RMSE of 5.4 and a R2 of about 0.6
Multi-Response Optimization and Regression Analysis of Process Parameters for Wire-EDMed HCHCr Steel Using Taguchi’s Technique

Directory of Open Access Journals (Sweden)

K. Srujay Varma

2017-04-01

Full Text Available In this study, effect of machining process parameters viz. pulse-on time, pulse-off time, current and servo-voltage for machining High Carbon High Chromium Steel (HCHCr using copper electrode in wire EDM was investigated. High Carbon High Chromium Steel is a difficult to machine alloy, which has many applications in low temperature manufacturing, and copper is chosen as electrode as it has good electrical conductivity and most frequently used electrode all over the world. Tool making culture of copper has made many shops in Europe and Japan to used copper electrode. Experiments were conducted according to Taguchi’s technique by varying the machining process parameters at three levels. Taguchi’s method based on L9 orthogonal array was followed and number of experiments was limited to 9. Experimental cost and time consumption was reduced by following this statistical technique. Targeted output parameters are Material Removal Rate (MRR, Vickers Hardness (HV and Surface Roughness (SR. Analysis of Variance (ANOVA and Regression Analysis was performed using Minitab 17 software to optimize the parameters and draw relationship between input and output process parameters. Regression models were developed relating input and output parameters. It was observed that most influential factor for MRR, Hardness and SR are Ton, Toff and SV.
Perceived Organizational Support for Enhancing Welfare at Work: A Regression Tree Model

Science.gov (United States)

Giorgi, Gabriele; Dubin, David; Perez, Javier Fiz

2016-01-01

When trying to examine outcomes such as welfare and well-being, research tends to focus on main effects and take into account limited numbers of variables at a time. There are a number of techniques that may help address this problem. For example, many statistical packages available in R provide easy-to-use methods of modeling complicated analysis such as classification and tree regression (i.e., recursive partitioning). The present research illustrates the value of recursive partitioning in the prediction of perceived organizational support in a sample of more than 6000 Italian bankers. Utilizing the tree function party package in R, we estimated a regression tree model predicting perceived organizational support from a multitude of job characteristics including job demand, lack of job control, lack of supervisor support, training, etc. The resulting model appears particularly helpful in pointing out several interactions in the prediction of perceived organizational support. In particular, training is the dominant factor. Another dimension that seems to influence organizational support is reporting (perceived communication about safety and stress concerns). Results are discussed from a theoretical and methodological point of view. PMID:28082924
Implementing the Fundamental Principle of Islamic Finance PLS in Order to Reduce Moral Hazard on the Financial Services Market

OpenAIRE

Dariusz Piotrowski

2014-01-01

Moral hazard is a situation where agent takes a risky actions, knowing that potential costs will be born by principal. In finance, moral hazard arises when advisers take risky decisions come to believe that they will not have to carry the full burden of potential loses. Implementing the fundamental principle of Islamic finance PLS could reduce moral hazard on financial services market.
Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP

Directory of Open Access Journals (Sweden)

Jeffrey B. Endelman

2011-11-01

Full Text Available Many important traits in plant breeding are polygenic and therefore recalcitrant to traditional marker-assisted selection. Genomic selection addresses this complexity by including all markers in the prediction model. A key method for the genomic prediction of breeding values is ridge regression (RR, which is equivalent to best linear unbiased prediction (BLUP when the genetic covariance between lines is proportional to their similarity in genotype space. This additive model can be broadened to include epistatic effects by using other kernels, such as the Gaussian, which represent inner products in a complex feature space. To facilitate the use of RR and nonadditive kernels in plant breeding, a new software package for R called rrBLUP has been developed. At its core is a fast maximum-likelihood algorithm for mixed models with a single variance component besides the residual error, which allows for efficient prediction with unreplicated training data. Use of the rrBLUP software is demonstrated through several examples, including the identification of optimal crosses based on superior progeny value. In cross-validation tests, the prediction accuracy with nonadditive kernels was significantly higher than RR for wheat ( L. grain yield but equivalent for several maize ( L. traits.
Spatial stochastic regression modelling of urban land use

International Nuclear Information System (INIS)

Arshad, S H M; Jaafar, J; Abiden, M Z Z; Latif, Z A; Rasam, A R A

2014-01-01

Urbanization is very closely linked to industrialization, commercialization or overall economic growth and development. This results in innumerable benefits of the quantity and quality of the urban environment and lifestyle but on the other hand contributes to unbounded development, urban sprawl, overcrowding and decreasing standard of living. Regulation and observation of urban development activities is crucial. The understanding of urban systems that promotes urban growth are also essential for the purpose of policy making, formulating development strategies as well as development plan preparation. This study aims to compare two different stochastic regression modeling techniques for spatial structure models of urban growth in the same specific study area. Both techniques will utilize the same datasets and their results will be analyzed. The work starts by producing an urban growth model by using stochastic regression modeling techniques namely the Ordinary Least Square (OLS) and Geographically Weighted Regression (GWR). The two techniques are compared to and it is found that, GWR seems to be a more significant stochastic regression model compared to OLS, it gives a smaller AICc (Akaike's Information Corrected Criterion) value and its output is more spatially explainable
Monthly streamflow forecasting with auto-regressive integrated moving average

Science.gov (United States)

Nasir, Najah; Samsudin, Ruhaidah; Shabri, Ani

2017-09-01

Forecasting of streamflow is one of the many ways that can contribute to better decision making for water resource management. The auto-regressive integrated moving average (ARIMA) model was selected in this research for monthly streamflow forecasting with enhancement made by pre-processing the data using singular spectrum analysis (SSA). This study also proposed an extension of the SSA technique to include a step where clustering was performed on the eigenvector pairs before reconstruction of the time series. The monthly streamflow data of Sungai Muda at Jeniang, Sungai Muda at Jambatan Syed Omar and Sungai Ketil at Kuala Pegang was gathered from the Department of Irrigation and Drainage Malaysia. A ratio of 9:1 was used to divide the data into training and testing sets. The ARIMA, SSA-ARIMA and Clustered SSA-ARIMA models were all developed in R software. Results from the proposed model are then compared to a conventional auto-regressive integrated moving average model using the root-mean-square error and mean absolute error values. It was found that the proposed model can outperform the conventional model.
Estimation of active pharmaceutical ingredients content using locally weighted partial least squares and statistical wavelength selection.

Science.gov (United States)

Kim, Sanghong; Kano, Manabu; Nakagawa, Hiroshi; Hasebe, Shinji

2011-12-15

Development of quality estimation models using near infrared spectroscopy (NIRS) and multivariate analysis has been accelerated as a process analytical technology (PAT) tool in the pharmaceutical industry. Although linear regression methods such as partial least squares (PLS) are widely used, they cannot always achieve high estimation accuracy because physical and chemical properties of a measuring object have a complex effect on NIR spectra. In this research, locally weighted PLS (LW-PLS) which utilizes a newly defined similarity between samples is proposed to estimate active pharmaceutical ingredient (API) content in granules for tableting. In addition, a statistical wavelength selection method which quantifies the effect of API content and other factors on NIR spectra is proposed. LW-PLS and the proposed wavelength selection method were applied to real process data provided by Daiichi Sankyo Co., Ltd., and the estimation accuracy was improved by 38.6% in root mean square error of prediction (RMSEP) compared to the conventional PLS using wavelengths selected on the basis of variable importance on the projection (VIP). The results clearly show that the proposed calibration modeling technique is useful for API content estimation and is superior to the conventional one. Copyright © 2011 Elsevier B.V. All rights reserved.
Quantitative determination of polyphosphate in sediments using Attenuated Total Reflectance-Fourier Transform Infrared (ATR-FTIR) spectroscopy and partial least squares regression.

Science.gov (United States)

Khoshmanesh, Aazam; Cook, Perran L M; Wood, Bayden R

2012-08-21

Phosphorus (P) is a major cause of eutrophication and subsequent loss of water quality in freshwater ecosystems. A major part of the flux of P to eutrophic lake sediments is organically bound or of biogenic origin. Despite the broad relevance of polyphosphate (Poly-P) in bioremediation and P release processes in the environment, its quantification is not yet well developed for sediment samples. Current methods possess significant disadvantages because of the difficulties associated with using a single extractant to extract a specific P compound without altering others. A fast and reliable method to estimate the quantitative contribution of microorganisms to sediment P release processes is needed, especially when an excessive P accumulation in the form of polyphosphate (Poly-P) occurs. Development of novel approaches for application of emerging spectroscopic techniques to complex environmental matrices such as sediments significantly contributes to the speciation models of P mobilization, biogeochemical nutrient cycling and development of nutrient models. In this study, for the first time Attenuated Total Reflectance-Fourier Transform Infrared (ATR-FTIR) spectroscopy in combination with partial least squares (PLS) was used to quantify Poly-P in sediments. To reduce the high absorption matrix components in sediments such as silica, a physical extraction method was developed to separate sediment biological materials from abiotic particles. The aim was to achieve optimal separation of the biological materials from sediment abiotic particles with minimum chemical change in the sample matrix prior to ATR-FTIR analysis. Using a calibration set of 60 samples for the PLS prediction models in the Poly-P concentration range of 0-1 mg g(-1) d.w. (dry weight of sediment) (R(2) = 0.984 and root mean square error of prediction RMSEP = 0.041 at Factor-1) Poly-P could be detected at less than 50 μg g(-l) d.w. Using this technique, there is no solvent extraction or chemical
riskRegression

DEFF Research Database (Denmark)

Ozenne, Brice; Sørensen, Anne Lyngholm; Scheike, Thomas

2017-01-01

In the presence of competing risks a prediction of the time-dynamic absolute risk of an event can be based on cause-specific Cox regression models for the event and the competing risks (Benichou and Gail, 1990). We present computationally fast and memory optimized C++ functions with an R interface......-product we obtain fast access to the baseline hazards (compared to survival::basehaz()) and predictions of survival probabilities, their confidence intervals and confidence bands. Confidence intervals and confidence bands are based on point-wise asymptotic expansions of the corresponding statistical...
Regression and regression analysis time series prediction modeling on climate data of quetta, pakistan

International Nuclear Information System (INIS)

Jafri, Y.Z.; Kamal, L.

2007-01-01

Various statistical techniques was used on five-year data from 1998-2002 of average humidity, rainfall, maximum and minimum temperatures, respectively. The relationships to regression analysis time series (RATS) were developed for determining the overall trend of these climate parameters on the basis of which forecast models can be corrected and modified. We computed the coefficient of determination as a measure of goodness of fit, to our polynomial regression analysis time series (PRATS). The correlation to multiple linear regression (MLR) and multiple linear regression analysis time series (MLRATS) were also developed for deciphering the interdependence of weather parameters. Spearman's rand correlation and Goldfeld-Quandt test were used to check the uniformity or non-uniformity of variances in our fit to polynomial regression (PR). The Breusch-Pagan test was applied to MLR and MLRATS, respectively which yielded homoscedasticity. We also employed Bartlett's test for homogeneity of variances on a five-year data of rainfall and humidity, respectively which showed that the variances in rainfall data were not homogenous while in case of humidity, were homogenous. Our results on regression and regression analysis time series show the best fit to prediction modeling on climatic data of Quetta, Pakistan. (author)
Ridge Regression Signal Processing

Science.gov (United States)

Kuhl, Mark R.

1990-01-01

The introduction of the Global Positioning System (GPS) into the National Airspace System (NAS) necessitates the development of Receiver Autonomous Integrity Monitoring (RAIM) techniques. In order to guarantee a certain level of integrity, a thorough understanding of modern estimation techniques applied to navigational problems is required. The extended Kalman filter (EKF) is derived and analyzed under poor geometry conditions. It was found that the performance of the EKF is difficult to predict, since the EKF is designed for a Gaussian environment. A novel approach is implemented which incorporates ridge regression to explain the behavior of an EKF in the presence of dynamics under poor geometry conditions. The basic principles of ridge regression theory are presented, followed by the derivation of a linearized recursive ridge estimator. Computer simulations are performed to confirm the underlying theory and to provide a comparative analysis of the EKF and the recursive ridge estimator.
Improved variable reduction in partial least squares modelling based on predictive-property-ranked variables and adaptation of partial least squares complexity.

Science.gov (United States)

Andries, Jan P M; Vander Heyden, Yvan; Buydens, Lutgarde M C

2011-10-31

The calibration performance of partial least squares for one response variable (PLS1) can be improved by elimination of uninformative variables. Many methods are based on so-called predictive variable properties, which are functions of various PLS-model parameters, and which may change during the variable reduction process. In these methods variable reduction is made on the variables ranked in descending order for a given variable property. The methods start with full spectrum modelling. Iteratively, until a specified number of remaining variables is reached, the variable with the smallest property value is eliminated; a new PLS model is calculated, followed by a renewed ranking of the variables. The Stepwise Variable Reduction methods using Predictive-Property-Ranked Variables are denoted as SVR-PPRV. In the existing SVR-PPRV methods the PLS model complexity is kept constant during the variable reduction process. In this study, three new SVR-PPRV methods are proposed, in which a possibility for decreasing the PLS model complexity during the variable reduction process is build in. Therefore we denote our methods as PPRVR-CAM methods (Predictive-Property-Ranked Variable Reduction with Complexity Adapted Models). The selective and predictive abilities of the new methods are investigated and tested, using the absolute PLS regression coefficients as predictive property. They were compared with two modifications of existing SVR-PPRV methods (with constant PLS model complexity) and with two reference methods: uninformative variable elimination followed by either a genetic algorithm for PLS (UVE-GA-PLS) or an interval PLS (UVE-iPLS). The performance of the methods is investigated in conjunction with two data sets from near-infrared sources (NIR) and one simulated set. The selective and predictive performances of the variable reduction methods are compared statistically using the Wilcoxon signed rank test. The three newly developed PPRVR-CAM methods were able to retain

Using Dominance Analysis to Determine Predictor Importance in Logistic Regression

Science.gov (United States)

Azen, Razia; Traxel, Nicole

2009-01-01

This article proposes an extension of dominance analysis that allows researchers to determine the relative importance of predictors in logistic regression models. Criteria for choosing logistic regression R[superscript 2] analogues were determined and measures were selected that can be used to perform dominance analysis in logistic regression. A…
Soft sensor design by multivariate fusion of image features and process measurements

DEFF Research Database (Denmark)

Lin, Bao; Jørgensen, Sten Bay

2011-01-01

This paper presents a multivariate data fusion procedure for design of dynamic soft sensors where suitably selected image features are combined with traditional process measurements to enhance the performance of data-driven soft sensors. A key issue of fusing multiple sensor data, i.e. to determine...... with a multivariate analysis technique from RGB pictures. The color information is also transformed to hue, saturation and intensity components. Both sets of image features are combined with traditional process measurements to obtain an inferential model by partial least squares (PLS) regression. A dynamic PLS model...... oxides (NOx) emission of cement kilns. On-site tests demonstrate improved performance over soft sensors based on conventional process measurements only....
On directional multiple-output quantile regression

Czech Academy of Sciences Publication Activity Database

Paindaveine, D.; Šiman, Miroslav

2011-01-01

Roč. 102, č. 2 (2011), s. 193-212 ISSN 0047-259X R&D Projects: GA MŠk(CZ) 1M06047 Grant - others:Commision EC(BE) Fonds National de la Recherche Scientifique Institutional research plan: CEZ:AV0Z10750506 Keywords : multivariate quantile * quantile regression * multiple-output regression * halfspace depth * portfolio optimization * value-at risk Subject RIV: BA - General Mathematics Impact factor: 0.879, year: 2011 http://library.utia.cas.cz/separaty/2011/SI/siman-0364128.pdf
Combining Partial Least Squares and the Gradient-Boosting Method for Soil Property Retrieval Using Visible Near-Infrared Shortwave Infrared Spectra

Directory of Open Access Journals (Sweden)

Lanfa Liu

2017-12-01

Full Text Available Soil spectroscopy has experienced a tremendous increase in soil property characterisation, and can be used not only in the laboratory but also from the space (imaging spectroscopy. Partial least squares (PLS regression is one of the most common approaches for the calibration of soil properties using soil spectra. Besides functioning as a calibration method, PLS can also be used as a dimension reduction tool, which has scarcely been studied in soil spectroscopy. PLS components retained from high-dimensional spectral data can further be explored with the gradient-boosted decision tree (GBDT method. Three soil sample categories were extracted from the Land Use/Land Cover Area Frame Survey (LUCAS soil library according to the type of land cover (woodland, grassland, and cropland. First, PLS regression and GBDT were separately applied to build the spectroscopic models for soil organic carbon (OC, total nitrogen content (N, and clay for each soil category. Then, PLS-derived components were used as input variables for the GBDT model. The results demonstrate that the combined PLS-GBDT approach has better performance than PLS or GBDT alone. The relative important variables for soil property estimation revealed by the proposed method demonstrated that the PLS method is a useful dimension reduction tool for soil spectra to retain target-related information.
Directional quantile regression in Octave (and MATLAB)

Czech Academy of Sciences Publication Activity Database

Boček, Pavel; Šiman, Miroslav

2016-01-01

Roč. 52, č. 1 (2016), s. 28-51 ISSN 0023-5954 R&D Projects: GA ČR GA14-07234S Institutional support: RVO:67985556 Keywords : quantile regression * multivariate quantile * depth contour * Matlab Subject RIV: IN - Informatics, Computer Science Impact factor: 0.379, year: 2016 http://library.utia.cas.cz/separaty/2016/SI/bocek-0458380.pdf
Quantification of extra virgin olive oil in dressing and edible oil blends using the representative TMS-4,4'-desmethylsterols gas-chromatographic-normalized fingerprint.

Science.gov (United States)

Pérez-Castaño, Estefanía; Sánchez-Viñas, Mercedes; Gázquez-Evangelista, Domingo; Bagur-González, M Gracia

2018-01-15

This paper describes and discusses the application of trimethylsilyl (TMS)-4,4'-desmethylsterols derivatives chromatographic fingerprints (obtained from an off-line HPLC-GC-FID system) for the quantification of extra virgin olive oil in commercial vinaigrettes, dressing salad and in-house reference materials (i-HRM) using two different Partial Least Square-Regression (PLS-R) multivariate quantification methods. Different data pre-processing strategies were carried out being the whole one: (i) internal normalization; (ii) sampling based on The Nyquist Theorem; (iii) internal correlation optimized shifting, icoshift; (iv) baseline correction (v) mean centering and (vi) selecting zones. The first model corresponds to a matrix of dimensions 'n×911' variables and the second one to a matrix of dimensions 'n×431' variables. It has to be highlighted that the proposed two PLS-R models allow the quantification of extra virgin olive oil in binary blends, foodstuffs, etc., when the provided percentage is greater than 25%. Copyright © 2017 Elsevier Ltd. All rights reserved.
Applied Regression Modeling A Business Approach

CERN Document Server

Pardoe, Iain

2012-01-01

An applied and concise treatment of statistical regression techniques for business students and professionals who have little or no background in calculusRegression analysis is an invaluable statistical methodology in business settings and is vital to model the relationship between a response variable and one or more predictor variables, as well as the prediction of a response value given values of the predictors. In view of the inherent uncertainty of business processes, such as the volatility of consumer spending and the presence of market uncertainty, business professionals use regression a
Physical structure and genetic expression of the sulfonamide-resistance plasmid pLS80 and its derivatives in Streptococcus pneumoniae and Bacillus subtilis

Energy Technology Data Exchange (ETDEWEB)

Lopez, P.; Espinosa, M.; Lacks, S.A.

1984-01-01

The 10-kb chromosomal fragment of Streptococcus pneumoniae cloned in pLS80 contains the sul-d allele of the pneumococcal gene for dihydropteroate synthase. As a single copy in the chromosome this allele confers resistance to sulfanilamide at 0.2 mg/ml; in the multicopy plasmid it confers resistance to 2.0 mg/ml. The sul-d mutation was mapped by restriction analysis to a 0.4-kb region. A spontaneous deletion beginning approx. 1.5 kb to the right of the sul-d mutation prevented gene function, possibly by removing a promoter. This region could be restored by chromosomal facilitation and be demonstrated in the plasmid by selection for sulfonamide resistance. Under selection for a vector marker, tetracycline resistance, only the deleted plasmid was detectable, apparently as a result of plasmid segregation and the advantageous growth rates of cells with smaller plasmids. When such cells were selected for sulfonamide resistance, the deleted region returned to the plasmid, presumably by equilibration between the chromosome and the plasmid pool, to give a low frequency (approx. 10/sup -3/) of cells resistant to sulfanilamide at 2.0 mg/ml. Models for the mechanisms of chromosomal facilitation and equilibration are proposed. Several derivatives of pLS80 could be transferred to Bacillus subtilis, where they conferred resistance to sulfanilamide at 2 mg/ml, thereby demonstrating cross-species expression of the pneumococcal gene. Transfer of the plasmids to B. subtilis gave rise to large deletions to the left of the sul-d marker, but these deletions did not interfere with the sul-d gene function. Restriction maps of pLS80 and its variously deleted derivatives are presented.
Applications of neutron activation analysis technique in the IPR-R1 research reactor

International Nuclear Information System (INIS)

Sabino, C.V.S.; Mansur, N.

1986-01-01

A review is made of the neutron activation analysis technique used in the IPR-R1 reactor of the Centro de Desenvolvimento da Tecnologia Nuclear - NUCLEBRAS. Some characteristics of the method are described, types of samples and elements analyzed are also mentioned. (Author) [pt
Predictors of course in obsessive-compulsive disorder: logistic regression versus Cox regression for recurrent events.

Science.gov (United States)

Kempe, P T; van Oppen, P; de Haan, E; Twisk, J W R; Sluis, A; Smit, J H; van Dyck, R; van Balkom, A J L M

2007-09-01

Two methods for predicting remissions in obsessive-compulsive disorder (OCD) treatment are evaluated. Y-BOCS measurements of 88 patients with a primary OCD (DSM-III-R) diagnosis were performed over a 16-week treatment period, and during three follow-ups. Remission at any measurement was defined as a Y-BOCS score lower than thirteen combined with a reduction of seven points when compared with baseline. Logistic regression models were compared with a Cox regression for recurrent events model. Logistic regression yielded different models at different evaluation times. The recurrent events model remained stable when fewer measurements were used. Higher baseline levels of neuroticism and more severe OCD symptoms were associated with a lower chance of remission, early age of onset and more depressive symptoms with a higher chance. Choice of outcome time affects logistic regression prediction models. Recurrent events analysis uses all information on remissions and relapses. Short- and long-term predictors for OCD remission show overlap.
Raman spectroscopy: in vivo quick response code of skin physiological status

Science.gov (United States)

Vyumvuhore, Raoul; Tfayli, Ali; Piot, Olivier; Le Guillou, Maud; Guichard, Nathalie; Manfait, Michel; Baillet-Guffroy, Arlette

2014-11-01

Dermatologists need to combine different clinically relevant characteristics for a better understanding of skin health. These characteristics are usually measured by different techniques, and some of them are highly time consuming. Therefore, a predicting model based on Raman spectroscopy and partial least square (PLS) regression was developed as a rapid multiparametric method. The Raman spectra collected from the five uppermost micrometers of 11 healthy volunteers were fitted to different skin characteristics measured by independent appropriate methods (transepidermal water loss, hydration, pH, relative amount of ceramides, fatty acids, and cholesterol). For each parameter, the obtained PLS model presented correlation coefficients higher than R2=0.9. This model enables us to obtain all the aforementioned parameters directly from the unique Raman signature. In addition to that, in-depth Raman analyses down to 20 μm showed different balances between partially bound water and unbound water with depth. In parallel, the increase of depth was followed by an unfolding process of the proteins. The combinations of all these information led to a multiparametric investigation, which better characterizes the skin status. Raman signal can thus be used as a quick response code (QR code). This could help dermatologic diagnosis of physiological variations and presents a possible extension to pathological characterization.
Identification and characterization of a novel type of replication terminator with bidirectional activity on the Bacillus subtilis theta plasmid pLS20

NARCIS (Netherlands)

Meijer, WJJ; Smith, M; Wake, RG; deBoer, AL; Venema, G; Bron, S

We have sequenced and analysed a 3.1 kb fragment of the 55 kb endogenous Bacillus subtilis plasmid pLS20 containing its replication functions, Just outside the region required for autonomous replication, a segment of 18 bp was identified as being almost identical to part of the major B. subtilis
Concepteur de matériel pédagogique et rédacteur technique (h/f ...

International Development Research Centre (IDRC) Digital Library (Canada)

Le titulaire veille à ce que ce contenu fasse mieux connaître les services et les outils de GI-TI, y compris les changements récents. Il se tient au courant des changements survenus dans le domaine et entretient des relations avec la communauté locale et virtuelle de l'apprentissage et de la rédaction technique afin de ...
Variances in the projections, resulting from CLIMEX, Boosted Regression Trees and Random Forests techniques

Science.gov (United States)

Shabani, Farzin; Kumar, Lalit; Solhjouy-fard, Samaneh

2017-08-01

The aim of this study was to have a comparative investigation and evaluation of the capabilities of correlative and mechanistic modeling processes, applied to the projection of future distributions of date palm in novel environments and to establish a method of minimizing uncertainty in the projections of differing techniques. The location of this study on a global scale is in Middle Eastern Countries. We compared the mechanistic model CLIMEX (CL) with the correlative models MaxEnt (MX), Boosted Regression Trees (BRT), and Random Forests (RF) to project current and future distributions of date palm ( Phoenix dactylifera L.). The Global Climate Model (GCM), the CSIRO-Mk3.0 (CS) using the A2 emissions scenario, was selected for making projections. Both indigenous and alien distribution data of the species were utilized in the modeling process. The common areas predicted by MX, BRT, RF, and CL from the CS GCM were extracted and compared to ascertain projection uncertainty levels of each individual technique. The common areas identified by all four modeling techniques were used to produce a map indicating suitable and unsuitable areas for date palm cultivation for Middle Eastern countries, for the present and the year 2100. The four different modeling approaches predict fairly different distributions. Projections from CL were more conservative than from MX. The BRT and RF were the most conservative methods in terms of projections for the current time. The combination of the final CL and MX projections for the present and 2100 provide higher certainty concerning those areas that will become highly suitable for future date palm cultivation. According to the four models, cold, hot, and wet stress, with differences on a regional basis, appears to be the major restrictions on future date palm distribution. The results demonstrate variances in the projections, resulting from different techniques. The assessment and interpretation of model projections requires reservations
Formulating state space models in R with focus on longitudinal regression models

DEFF Research Database (Denmark)

Dethlefsen, Claus; Lundbye-Christensen, Søren

We provide a language for formulating a range of state space models. The described methodology is implemented in the R -package sspir available from cran.r-project.org . A state space model is specified similarly to a generalized linear model in R , by marking the time-varying terms in the form...... We provide a language for formulating a range of state space models. The described methodology is implemented in the R -package sspir available from cran.r-project.org . A state space model is specified similarly to a generalized linear model in R , by marking the time-varying terms...
Evaluation of J-R curve testing of nuclear piping materials using the direct current potential drop technique

International Nuclear Information System (INIS)

Hackett, E.M.; Kirk, M.T.; Hays, R.A.

1986-08-01

A method is described for developing J-R curves for nuclear piping materials using the DC Potential Drop (DCPD) technique. Experimental calibration curves were developed for both three point bend and compact specimen geometries using ASTM A106 steel, a type 304 stainless steel and a high strength aluminum alloy. These curves were fit with a power law expression over the range of crack extension encountered during J-R curve tests (0.6 a/W to 0.8 a/W). The calibration curves were insensitive to both material and sidegrooving and depended solely on specimen geometry and lead attachment points. Crack initiation in J-R curve tests using DCPD was determined by a deviation from a linear region on a plot of COD vs. DCPD. The validity of this criterion for ASTM A106 steel was determined by a series of multispecimen tests that bracketed the initiation region. A statistical differential slope procedure for determination of the crack initiation point is presented and discussed. J-R curve tests were performed on ASTM A106 steel and type 304 stainless steel using both the elastic compliance and DCPD techniques to assess R-curve comparability. J-R curves determined using the two approaches were found to be in good agreement for ASTM A106 steel. The applicability of the DCPD technique to type 304 stainless steel and high rate loading of ferromagnetic materials is discussed. 15 refs., 33 figs
Stochastic search, optimization and regression with energy applications

Science.gov (United States)

Hannah, Lauren A.

Designing clean energy systems will be an important task over the next few decades. One of the major roadblocks is a lack of mathematical tools to economically evaluate those energy systems. However, solutions to these mathematical problems are also of interest to the operations research and statistical communities in general. This thesis studies three problems that are of interest to the energy community itself or provide support for solution methods: R&D portfolio optimization, nonparametric regression and stochastic search with an observable state variable. First, we consider the one stage R&D portfolio optimization problem to avoid the sequential decision process associated with the multi-stage. The one stage problem is still difficult because of a non-convex, combinatorial decision space and a non-convex objective function. We propose a heuristic solution method that uses marginal project values---which depend on the selected portfolio---to create a linear objective function. In conjunction with the 0-1 decision space, this new problem can be solved as a knapsack linear program. This method scales well to large decision spaces. We also propose an alternate, provably convergent algorithm that does not exploit problem structure. These methods are compared on a solid oxide fuel cell R&D portfolio problem. Next, we propose Dirichlet Process mixtures of Generalized Linear Models (DPGLM), a new method of nonparametric regression that accommodates continuous and categorical inputs, and responses that can be modeled by a generalized linear model. We prove conditions for the asymptotic unbiasedness of the DP-GLM regression mean function estimate. We also give examples for when those conditions hold, including models for compactly supported continuous distributions and a model with continuous covariates and categorical response. We empirically analyze the properties of the DP-GLM and why it provides better results than existing Dirichlet process mixture regression
Screening method for rapid classification of psychoactive substances in illicit tablets using mid infrared spectroscopy and PLS-DA.

Science.gov (United States)

Pereira, Leandro S A; Lisboa, Fernanda L C; Coelho Neto, José; Valladão, Frederico N; Sena, Marcelo M

2018-05-09

Several new psychoactive substances (NPS) have reached the illegal drug market in recent years, and ecstasy-like tablets are one of the forms affected by this change. Cathinones and tryptamines have increasingly been found in ecstasy-like seized samples as well as other amphetamine type stimulants. A presumptive method for identifying different drugs in seized ecstasy tablets (n=92) using ATR-FTIR (attenuated total reflectance - Fourier transform infrared spectroscopy) and PLS-DA (partial least squares discriminant analysis) was developed. A hierarchical strategy of sequential modeling was performed with PLS-DA. The main model discriminated four classes: 5-MeO-MIPT, methylenedioxyamphetamines (MDMA and MDA), methamphetamine, and cathinones. Two submodels were built to identify drugs present in MDs and cathinones classes. Models were validated through the estimate of figures of merit. The average reliability rate (RLR) of the main model was 96.8% and accordance (ACC) was 100%. For the submodels, RLR and ACC were 100%. The reliability of the models was corroborated through their spectral interpretation. Thus, spectral assignments were performed by associating informative vectors of each specific modeled class to the respective drugs. The developed method is simple, fast, and can be applied to the forensic laboratory routine, leading to objective results reports useful for forensic scientists and law enforcement. Copyright © 2018 Elsevier B.V. All rights reserved.
Development of a method for the determination of caffeine anhydrate in various designed intact tablets [correction of tables] by near-infrared spectroscopy: a comparison between reflectance and transmittance technique.

Science.gov (United States)

Ito, Masatomo; Suzuki, Tatsuya; Yada, Shuichi; Kusai, Akira; Nakagami, Hiroaki; Yonemochi, Etsuo; Terada, Katsuhide

2008-08-05

Using near-infrared (NIR) spectroscopy, an assay method which is not affected by such elements of tablet design as thickness, shape, embossing and scored line was developed. Tablets containing caffeine anhydrate were prepared by direct compression at various compression force levels using different shaped punches. NIR spectra were obtained from these intact tablets using the reflectance and transmittance techniques. A reference assay was performed by high-performance liquid chromatography (HPLC). Calibration models were generated by the partial least-squares (PLS) regression. Changes in the tablet thickness, shape, embossing and scored line caused NIR spectral changes in different ways, depending on the technique used. As a result, noticeable errors in drug content prediction occurred using calibration models generated according to the conventional method. On the other hand, when the various tablet design elements which caused the NIR spectral changes were included in the model, the prediction of the drug content in the tablets was scarcely affected by those elements when using either of the techniques. A comparison of these techniques resulted in higher predictability under the tablet design variations using the transmittance technique with preferable linearity and accuracy. This is probably attributed to the transmittance spectra which sensitively reflect the differences in tablet thickness or shape as a result of obtaining information inside the tablets.
Metode PLS: Analisis Kinerja Karyawan melalui Kepuasan Kerja dan Komitmen Karyawan

Directory of Open Access Journals (Sweden)

Saskia Yuanita

2012-11-01

Full Text Available Global challenges of the current causes increasing competition among national and international businesses. Under these conditions, the company realizes the importance of quality and efforts to enhance competitiveness by doing improvements consistently and continuously in order to meet customer and market needs. This study aims to examine the effect of the implementation of ISO 9001 quality management system onemployee performance, and the moderating effects of job satisfaction and employee commitment to the relationship between the application of ISO 9001:2008 quality management system on employee performance. The method used to analyze is Partial Least Square (PLS. The results show that the application of ISO 9001:2008 quality management system affects performance of employees with employee satisfaction and commitment as moderating variable that affect the relationship between the application of ISO 9001:2008quality management system on employee performance. Both variables, moderating employee satisfaction and commitment have positive parameter estimation, so that when the satisfaction and commitment of employees increase, it will give effect to the improvement of the implementation of the ISO 9001:2008 quality management system on employee performance.

Changes in persistence, spurious regressions and the Fisher hypothesis

DEFF Research Database (Denmark)

Kruse, Robinson; Ventosa-Santaulària, Daniel; Noriega, Antonio E.

Declining inflation persistence has been documented in numerous studies. When such series are analyzed in a regression framework in conjunction with other persistent time series, spurious regressions are likely to occur. We propose to use the coefficient of determination R2 as a test statistic to...
Corrosion Inhibition of Q235A Steel in Acid Medium Using Isatin Derivatives: A Qsar Study

International Nuclear Information System (INIS)

Abdo M Al-Fakih; Madzlan Aziz; Abdo M Al-Fakih; Abdallah, H.H.; Hasmerya Maarof; Rosmahaida Jamaludin; Bishir Usman

2016-01-01

Quantitative Structure-Activity Relationship (QSAR) study was performed on 10 isatin derivatives which were reportedly used as corrosion inhibitors. Dragon software was used to calculate the molecular descriptors. Partial least square (PLS) method was used to run the regression analysis between the descriptors and the corrosion inhibition efficiencies (IE) of the inhibitors. A predictive QSAR model was developed with a correlation coefficient (r 2 cal ) of 0.9676. The model validity was assessed through internal and external validation. The results show that cross-validation regression coefficient (r 2 cv ) and prediction regression coefficient (r 2 pred ) are 0.8163 and 0.9189, respectively. The model was used to predict the IE for ten isatin derivatives. The results confirm a good stability and predictive ability of the model. Dragon-based descriptors provide a very good description of the corrosion inhibition properties of the inhibitors. The results of the QSAR study were found to be consistent with the experimental data. (author)
Tools to support interpreting multiple regression in the face of multicollinearity.

Science.gov (United States)

Kraha, Amanda; Turner, Heather; Nimon, Kim; Zientek, Linda Reichwein; Henson, Robin K

2012-01-01

While multicollinearity may increase the difficulty of interpreting multiple regression (MR) results, it should not cause undue problems for the knowledgeable researcher. In the current paper, we argue that rather than using one technique to investigate regression results, researchers should consider multiple indices to understand the contributions that predictors make not only to a regression model, but to each other as well. Some of the techniques to interpret MR effects include, but are not limited to, correlation coefficients, beta weights, structure coefficients, all possible subsets regression, commonality coefficients, dominance weights, and relative importance weights. This article will review a set of techniques to interpret MR effects, identify the elements of the data on which the methods focus, and identify statistical software to support such analyses.
[From clinical judgment to linear regression model.

Science.gov (United States)

Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O

2013-01-01

When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R 2 ) indicates the importance of independent variables in the outcome.
Regression modeling of ground-water flow

Science.gov (United States)

Cooley, R.L.; Naff, R.L.

1985-01-01

Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
A regression approach for zircaloy-2 in-reactor creep constitutive equations

International Nuclear Information System (INIS)

Yung Liu, Y.; Bement, A.L.

1977-01-01

In this paper the methodology of multiple regressions as applied to zircaloy-2 in-reactor creep data analysis and construction of constitutive equation are illustrated. While the resulting constitutive equation can be used in creep analysis of in-reactor zircaloy structural components, the methodology itself is entirely general and can be applied to any creep data analysis. From data analysis and model development point of views, both the assumption of independence and prior committment to specific model forms are unacceptable. One would desire means which can not only estimate the required parameters directly from data but also provide basis for model selections, viz., one model against others. Basic understanding of the physics of deformation is important in choosing the forms of starting physical model equations, but the justifications must rely on their abilities in correlating the overall data. The promising aspects of multiple regression creep data analysis are briefly outlined as follows: (1) when there are more than one variable involved, there is no need to make the assumption that each variable affects the response independently. No separate normalizations are required either and the estimation of parameters is obtained by solving many simultaneous equations. The number of simultaneous equations is equal to the number of data sets, (2) regression statistics such as R 2 - and F-statistics provide measures of the significance of regression creep equation in correlating the overall data. The relative weights of each variable on the response can also be obtained. (3) Special regression techniques such as step-wise, ridge, and robust regressions and residual plots, etc., provide diagnostic tools for model selections
Retrieving relevant factors with exploratory SEM and principal-covariate regression: A comparison.

Science.gov (United States)

Vervloet, Marlies; Van den Noortgate, Wim; Ceulemans, Eva

2018-02-12

Behavioral researchers often linearly regress a criterion on multiple predictors, aiming to gain insight into the relations between the criterion and predictors. Obtaining this insight from the ordinary least squares (OLS) regression solution may be troublesome, because OLS regression weights show only the effect of a predictor on top of the effects of other predictors. Moreover, when the number of predictors grows larger, it becomes likely that the predictors will be highly collinear, which makes the regression weights' estimates unstable (i.e., the "bouncing beta" problem). Among other procedures, dimension-reduction-based methods have been proposed for dealing with these problems. These methods yield insight into the data by reducing the predictors to a smaller number of summarizing variables and regressing the criterion on these summarizing variables. Two promising methods are principal-covariate regression (PCovR) and exploratory structural equation modeling (ESEM). Both simultaneously optimize reduction and prediction, but they are based on different frameworks. The resulting solutions have not yet been compared; it is thus unclear what the strengths and weaknesses are of both methods. In this article, we focus on the extents to which PCovR and ESEM are able to extract the factors that truly underlie the predictor scores and can predict a single criterion. The results of two simulation studies showed that for a typical behavioral dataset, ESEM (using the BIC for model selection) in this regard is successful more often than PCovR. Yet, in 93% of the datasets PCovR performed equally well, and in the case of 48 predictors, 100 observations, and large differences in the strengths of the factors, PCovR even outperformed ESEM.
Establishment of regression dependences. Linear and nonlinear dependences

International Nuclear Information System (INIS)

Onishchenko, A.M.

1994-01-01

The main problems of determination of linear and 19 types of nonlinear regression dependences are completely discussed. It is taken into consideration that total dispersions are the sum of measurement dispersions and parameter variation dispersions themselves. Approaches to all dispersions determination are described. It is shown that the least square fit gives inconsistent estimation for industrial objects and processes. The correction methods by taking into account comparable measurement errors for both variable give an opportunity to obtain consistent estimation for the regression equation parameters. The condition of the correction technique application expediency is given. The technique for determination of nonlinear regression dependences taking into account the dependence form and comparable errors of both variables is described. 6 refs., 1 tab
Liquid Chromatographic-Chemometric Techniques for the Simultaneous HPLC Determination of Lansoprazole, Amoxicillin and Clarithromycin in Commercial Preparation.

Science.gov (United States)

Aktas, A Hakan; Saridag, Ayse Mine

2017-09-01

Two multivariate calibration-prediction techniques, principal component regression (PCR) and partial least-squares regression (PLSR) were applied to the chromatographic multicomponent analysis of the drug containing lansoprazole (LAN), clarithromycin (CLA) and amoxicillin (AMO). Optimum chromatographic separation of LAN, CLA and AMO with atorvastatin as the internal standard (IS) was obtained by using Xterra® RP18 column 5 μm 4.6 × 250 mm2, and 25 mM ammonium chloride buffer prepared ammonium chloride, acetonitrile and bidistilled water (45:45:10 v/v) as the mobile phase at flow rate 1.0 mL/min. The high pressure liquid chromatography data sets consisting of the ratios of analyte peak areas to the IS peak area were obtained by using diode array detector detection at five wavelengths (205, 210, 215, 220 and 225 nm). LC-chemometric calibration for LAN, CLA and AMO were separately constructed by using the relationship between the peak-area ratio and training sets for each analyte. A series of synthetic solutions containing different concentrations of LAN, CLA and AMO were used to check the prediction ability of the PCR and PLS. Both of the two-chemometric methods in this study can be satisfactorily used for the quantitative analysis and for dissolutions tests of multicomponent commercial drug. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Classification of Amazonian rosewood essential oil by Raman spectroscopy and PLS-DA with reliability estimation.

Science.gov (United States)

Almeida, Mariana R; Fidelis, Carlos H V; Barata, Lauro E S; Poppi, Ronei J

2013-12-15

The Amazon tree Aniba rosaeodora Ducke (rosewood) provides an essential oil valuable for the perfume industry, but after decades of predatory extraction it is at risk of extinction. The extraction of the essential oil from wood implies the cutting of the tree, and then the study of oil extracted from the leaves is important as a sustainable alternative. The goal of this study was to test the applicability of Raman spectroscopy and Partial Least Square Discriminant Analysis (PLS-DA) as means to classify the essential oil extracted from different parties (wood, leaves and branches) of the Brazilian tree A. rosaeodora. For the development of classification models, the Raman spectra were split into two sets: training and test. The value of the limit that separates the classes was calculated based on the distribution of samples of training. This value was calculated in a manner that the classes are divided with a lower probability of incorrect classification for future estimates. The best model presented sensitivity and specificity of 100%, predictive accuracy and efficiency of 100%. These results give an overall vision of the behavior of the model, but do not give information about individual samples; in this case, the confidence interval for each sample of classification was also calculated using the resampling bootstrap technique. The methodology developed have the potential to be an alternative for standard procedures used for oil analysis and it can be employed as screening method, since it is fast, non-destructive and robust. © 2013 Elsevier B.V. All rights reserved.
Preface to Berk's "Regression Analysis: A Constructive Critique"

OpenAIRE

de Leeuw, Jan

2003-01-01

It is pleasure to write a preface for the book ”Regression Analysis” of my fellow series editor Dick Berk. And it is a pleasure in particular because the book is about regression analysis, the most popular and the most fundamental technique in applied statistics. And because it is critical of the way regression analysis is used in the sciences, in particular in the social and behavioral sciences. Although the book can be read as an introduction to regression analysis, it can also be read as a...
Application of FTIR Spectrometry Using Multivariate Analysis For Prediction Fuel in Engine Oil

Directory of Open Access Journals (Sweden)

Marie Sejkorová

2017-01-01

Full Text Available This work presents the potentiality of partial least squares (PLS regression associated with Fourier transform infrared spectroscopy (FTIR spectrometry for detecting penetration of diesel fuel into the mineral engine oil SAE 15W‑40 in the concentration range from 0 % to 9.5 % (w/w. As a best practice has proven FTIR‑PLS model, which uses the data file in the spectral range 835 – 688 cm−1.The quality of the model was evaluated using the root mean square error of calibration (RMSEC and cross validation (RMSECV. A correlation coefficient R = 0.999 and values of RMSEC, RMSECV were obtained 0.11 % and 0.38 % respectively. After the calibration of the FTIR spectrometer, the contamination engine oil with diesel fuel could be obtained in 1 – 2 min per sample.
Preliminary Discrimination of Cheese Adulteration by FT-IR Spectroscopy

Directory of Open Access Journals (Sweden)

Lucian Cuibus

2014-11-01

Full Text Available The present work describes a preliminary study to compare some traditional Romanian cheeses and adulterated cheeses using Attenuated Total Reflectance-Fourier transform infrared spectroscopy (ATR-FTIR. For PLS model calibration (6 concentration levels and validation (5 concentration levels sets were prepared from commercial Dalia Cheese from different manufacturers by spiking it with palm oil at concentrations ranging 2-50 % and 5-40 %, respectively. Fifteen Dalia Cheese were evaluated as external set. The spectra of each sample, after homogenization, were acquired in triplicate using a FTIR Shimatsu Prestige 21 Spectrophotometer, with a horizontal diamond ATR accessory in the MIR region 4000-600 cm-1. Statistical methods as PLS were applied using MVC1 routines written for Matlab R2010a. As first step the optimal condition for PLS model were obtained using cross-validation on the Calibration set. Spectral region in 3873-652 cm-1, and 3 PLS-factors were stated as the best conditions and showed an R2 value of 0.9338 and a relative error in the calibration of 17.2%. Then validation set was evaluated, obtaining good recovery rates (108% and acceptable dispersion of the data (20%. The curve of actual vs. predicted values shows slope near to 1 and origin close to 0, with an R2 of 0.9695. When the external sample set was evaluated, samples F19, F21, F22 and F24, showed detectable levels of palm fats. The results proved that FTIR-PLS is a reliable non-destructive technique for a rapid quantification the level of adulteration in cheese. The spectroscopic methods could assist the quality control authority, traders and the producers to discriminate the adulterated cheeses with palm oil.
General Nature of Multicollinearity in Multiple Regression Analysis.

Science.gov (United States)

Liu, Richard

1981-01-01

Discusses multiple regression, a very popular statistical technique in the field of education. One of the basic assumptions in regression analysis requires that independent variables in the equation should not be highly correlated. The problem of multicollinearity and some of the solutions to it are discussed. (Author)
Comparison of infrared spectroscopy techniques: developing an efficient method for high resolution analysis of sediment properties from long records

Science.gov (United States)

Hahn, Annette; Rosén, Peter; Kliem, Pierre; Ohlendorf, Christian; Persson, Per; Zolitschka, Bernd; Pasado Science Team

2010-05-01

The analysis of sediment samples in visible to mid-infrared spectra is ideal for high-resolution records. It requires only small amounts (0.01-0.1g dry weight) of sample material and facilitates rapid and cost efficient analysis of a wide variety of biogeochemical properties on minerogenic and organic substances (Kellner et al. 1998). One of these techniques, the Diffuse Reflectance Fourier Transform Infrared Spectrometry (DRIFTS), has already been successfully applied to lake sediment from very different settings and has shown to be a promising technique for high resolution analyses of long sedimentary records on glacial-interglacial timescales (Rosén et al. 2009). However, the DRIFTS technique includes a time-consuming step where sediment samples are mixed with KBr. To assess if alternative and more rapid infrared (IR) techniques can be used, four different IR spectroscopy techniques are compared for core catcher sediment samples from Laguna Potrok Aike - an ICDP site located in southernmost South America. Partial least square (PLS) calibration models were developed using the DRIFTS technique. The correlation coefficients (R) for correlations between DRIFTS-inferred and conventionally measured biogeochemical properties show values of 0.80 for biogenic silica (BSi), 0.95 for total organic carbon (TOC), 0.91 for total nitrogen (TN), and 0.92 for total inorganic carbon (TIC). Good statistical performance was also obtained by using the Attenuated Total Reflectance Fourier Transform Infrared Spectroscopy ATR-FTIRS technique which requires less sample preparation. Two devices were used, the full-sized Bruker Equinox 252 and the smaller and less expensive Bruker Alpha. R for ATR-FTIRS-inferred and conventionally measured biogeochemical properties were 0.87 (BSi), 0.93 (TOC), 0.90 (TN), and 0.91 (TIC) for the Alpha, and 0.78 (TOC), 0.85 (TN), 0.79 (TIC) for the Equinox 252 device. As the penetration depth of the IR beam is frequency dependent, a firm surface contact of
Computerized modeling techniques predict the 3D structure of H₄R: facts and fiction.

Science.gov (United States)

Zaid, Hilal; Ismael-Shanak, Siba; Michaeli, Amit; Rayan, Anwar

2012-01-01

The functional characterization of proteins presents a daily challenge r biochemical, medical and computational sciences, especially when the structures are undetermined empirically, as in the case of the Histamine H4 Receptor (H₄R). H₄R is a member of the GPCR superfamily that plays a vital role in immune and inflammatory responses. To date, the concept of GPCRs modeling is highlighted in textbooks and pharmaceutical pamphlets, and this group of proteins has been the subject of almost 3500 publications in the scientific literature. The dynamic nature of determining the GPCRs structure was elucidated through elegant and creative modeling methodologies, implemented by many groups around the world. H₄R which belongs to the GPCR family was cloned in 2000; understandably, its biological activity was reported only 65 times in pubmed. Here we attempt to cover the fundamental concepts of H₄R structure modeling and its implementation in drug discovery, especially those that have been experimentally tested and to highlight some ideas that are currently being discussed on the dynamic nature of H₄R and GPCRs computerized techniques for 3D structure modeling.
Regression formulae for predicting hematologic and liver functions ...

African Journals Online (AJOL)

African Journal of Biomedical Research ... On the other hand platelet and white blood cell (WBC) counts in these workers correlated positively with years of service [r = 0.342 (P <0.001) and r = 0.130 (P<0.0001) ... The regression equation defining this relationship is: ALP concentration = 33.68 – 0.075 x years of service.
Parametric vs. Nonparametric Regression Modelling within Clinical Decision Support

Czech Academy of Sciences Publication Activity Database

Kalina, Jan; Zvárová, Jana

2017-01-01

Roč. 5, č. 1 (2017), s. 21-27 ISSN 1805-8698 R&D Projects: GA ČR GA17-01251S Institutional support: RVO:67985807 Keywords : decision support systems * decision rules * statistical analysis * nonparametric regression Subject RIV: IN - Informatics, Computer Science OBOR OECD: Statistics and probability
Prediction of soluble solids content and ph in red wine by visible and near infrared spectroscopy

Science.gov (United States)

Wang, Li; He, Yong; Wang, Yanyan

2008-02-01

Soluble solids content (SSC) and pH are two major characteristic used for assessing quality of red wine, and they are also two important quality indexes in the manufacture of red wine. For rapid detection of SSC and pH in red wine, visible and near infrared (Vis/NIR) transmittance spectroscopy technique combined with partial least squares (PLS) and least squares support vector machines (LS-SVM) were used in this study. First, the near infrared transmittance spectra of 175 red wine samples were obtained using Vis/NIR spectroradiometer, then, PLS was applied for reducing the dimensionality of the original spectra, latent variables (LVs) selected by PLS could be used to replace the complex spectral data. All samples were randomly separated into calibration set and validation set. The LVs (selected by PLS) of each sample in calibration set was used as the inputs to train the LS-SVM model, then the optimal model was used to predict the SSC and pH values of samples in validation set based on their LVs. Standard error prediction (SEP) and determination coefficient (r2) were used as the evaluation standards, and the results indicated that the SEP and r2 for the prediction of SSC were 0.2313 and 0.9348; while 0.0071 and 0.9986 for pH. This prediction model was more accurate compared with the related research.
An Electrochemical Impedance Spectroscopy System for Monitoring Pineapple Waste Saccharification

Directory of Open Access Journals (Sweden)

Claudia Conesa

2016-02-01

Full Text Available Electrochemical impedance spectroscopy (EIS has been used for monitoring the enzymatic pineapple waste hydrolysis process. The system employed consists of a device called Advanced Voltammetry, Impedance Spectroscopy & Potentiometry Analyzer (AVISPA equipped with a specific software application and a stainless steel double needle electrode. EIS measurements were conducted at different saccharification time intervals: 0, 0.75, 1.5, 6, 12 and 24 h. Partial least squares (PLS were used to model the relationship between the EIS measurements and the sugar determination by HPAEC-PAD. On the other hand, artificial neural networks: (multilayer feed forward architecture with quick propagation training algorithm and logistic-type transfer functions gave the best results as predictive models for glucose, fructose, sucrose and total sugars. Coefficients of determination (R2 and root mean square errors of prediction (RMSEP were determined as R2 > 0.944 and RMSEP < 1.782 for PLS and R2 > 0.973 and RMSEP < 0.486 for artificial neural networks (ANNs, respectively. Therefore, a combination of both an EIS-based technique and ANN models is suggested as a promising alternative to the traditional laboratory techniques for monitoring the pineapple waste saccharification step.

Circulating levels of miR-133a predict the regression potential of left ventricular hypertrophy after valve replacement surgery in patients with aortic stenosis.

Science.gov (United States)

García, Raquel; Villar, Ana V; Cobo, Manuel; Llano, Miguel; Martín-Durán, Rafael; Hurlé, María A; Nistal, J Francisco

2013-08-15

Myocardial microRNA-133a (miR-133a) is directly related to reverse remodeling after pressure overload release in aortic stenosis patients. Herein, we assessed the significance of plasma miR-133a as an accessible biomarker with prognostic value in predicting the reversibility potential of LV hypertrophy after aortic valve replacement (AVR) in these patients. The expressions of miR-133a and its targets were measured in LV biopsies from 74 aortic stenosis patients. Circulating miR-133a was measured in peripheral and coronary sinus blood. LV mass reduction was determined echocardiographically. Myocardial and plasma levels of miR-133a correlated directly (r=0.46, Pregression analysis identified plasma miR-133a as a positive predictor of the hypertrophy reversibility after surgery. The discrimination of the model yielded an area under the receiver operator characteristic curve of 0.89 (Pregression analysis revealed plasma miR-133a and its myocardial target Wolf-Hirschhorn syndrome candidate 2/Negative elongation factor A as opposite predictors of the LV mass loss (g) after AVR. Preoperative plasma levels of miR-133a reflect their myocardial expression and predict the regression potential of LV hypertrophy after AVR. The value of this bedside information for the surgical timing, particularly in asymptomatic aortic stenosis patients, deserves confirmation in further clinical studies.
Significance testing in ridge regression for genetic data

Directory of Open Access Journals (Sweden)

De Iorio Maria

2011-09-01

Full Text Available Abstract Background Technological developments have increased the feasibility of large scale genetic association studies. Densely typed genetic markers are obtained using SNP arrays, next-generation sequencing technologies and imputation. However, SNPs typed using these methods can be highly correlated due to linkage disequilibrium among them, and standard multiple regression techniques fail with these data sets due to their high dimensionality and correlation structure. There has been increasing interest in using penalised regression in the analysis of high dimensional data. Ridge regression is one such penalised regression technique which does not perform variable selection, instead estimating a regression coefficient for each predictor variable. It is therefore desirable to obtain an estimate of the significance of each ridge regression coefficient. Results We develop and evaluate a test of significance for ridge regression coefficients. Using simulation studies, we demonstrate that the performance of the test is comparable to that of a permutation test, with the advantage of a much-reduced computational cost. We introduce the p-value trace, a plot of the negative logarithm of the p-values of ridge regression coefficients with increasing shrinkage parameter, which enables the visualisation of the change in p-value of the regression coefficients with increasing penalisation. We apply the proposed method to a lung cancer case-control data set from EPIC, the European Prospective Investigation into Cancer and Nutrition. Conclusions The proposed test is a useful alternative to a permutation test for the estimation of the significance of ridge regression coefficients, at a much-reduced computational cost. The p-value trace is an informative graphical tool for evaluating the results of a test of significance of ridge regression coefficients as the shrinkage parameter increases, and the proposed test makes its production computationally feasible.
Biomechanical determinants of elite rowing technique and performance.

Science.gov (United States)

Buckeridge, E M; Bull, A M J; McGregor, A H

2015-04-01

In rowing, the parameters of injury, performance, and technique are all interrelated and in dynamic equilibrium. Whilst rowing requires extreme physical strength and endurance, a high level of skill and technique is essential to enable an effective transfer of power through the rowing sequence. This study aimed to determine discrete aspects of rowing technique, which strongly influence foot force production and asymmetries at the foot-stretchers, as these are biomechanical parameters often associated with performance and injury risk. Twenty elite female rowers performed an incremental rowing test on an instrumented rowing ergometer, which measured force at the handle and foot-stretchers, while three-dimensional kinematic recordings of the ankle, knee, hip, and lumbar-pelvic joints were made. Multiple regression analyses identified hip kinematics as a key predictor of foot force output (R(2) = 0.48), whereas knee and lumbar-pelvic kinematics were the main determinants in optimizing the horizontal foot force component (R(2) = .41). Bilateral asymmetries of the foot-stretchers were also seen to significantly influence lumbar-pelvic kinematics (R(2) = 0.43) and pelvic twisting (R(2) = 0.32) during the rowing stroke. These results provide biomechanical evidence toward aspects of technique that can be modified to optimize force output and performance, which can be of direct benefit to coaches and athletes. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Combined robotic transanal total mesorectal excision (R-taTME) and single-site plus one-port (R-SSPO) technique for ultra-low rectal surgery-initial experience with a new operation approach.

Science.gov (United States)

Kuo, Li-Jen; Ngu, James Chi-Yong; Tong, Yiu-Shun; Chen, Chia-Che

2017-02-01

Robot-assisted rectal surgery is gaining popularity, and robotic single-site surgery is also being explored clinically. We report our initial experience with robotic transanal total mesorectal excision (R-taTME) and radical proctectomy using the robotic single-site plus one-port (R-SSPO) technique for low rectal surgery. Between July 2015 and March 2016, 15 consecutive patients with ultra-low rectal lesions underwent R-taTME followed by radical proctectomy using the R-SSPO technique by a single surgeon. The clinical and pathological results were retrospectively analyzed. The median operative time was 473 (range, 335-569) min, and the estimated blood loss was 33 (range, 30-50) mL. The median number of lymph nodes harvested was 12 (range, 8-18). The median distal resection margin was 1.4 (range, 0.4-3.5) cm, and all patients had clear circumferential resection margins. We encountered a left ureteric transection intraoperatively in one patient, and another patient required reoperation for postoperative adhesive intestinal obstruction. There was no 30-day mortality. R-taTME followed by radical proctectomy using the R-SSPO technique for patients with low rectal lesions is technically feasible and safe without compromising oncologic outcomes. However, there were considerable limitations and a steep learning curve using current robotic technology.
Handbook of Partial Least Squares Concepts, Methods and Applications

CERN Document Server

Vinzi, Vincenzo Esposito; Henseler, Jörg

2010-01-01

This handbook provides a comprehensive overview of Partial Least Squares (PLS) methods with specific reference to their use in marketing and with a discussion of the directions of current research and perspectives. It covers the broad area of PLS methods, from regression to structural equation modeling applications, software and interpretation of results. The handbook serves both as an introduction for those without prior knowledge of PLS and as a comprehensive reference for researchers and practitioners interested in the most recent advances in PLS methodology.
Robust median estimator in logisitc regression

Czech Academy of Sciences Publication Activity Database

Hobza, T.; Pardo, L.; Vajda, Igor

2008-01-01

Roč. 138, č. 12 (2008), s. 3822-3840 ISSN 0378-3758 R&D Projects: GA MŠk 1M0572 Grant - others:Instituto Nacional de Estadistica (ES) MPO FI - IM3/136; GA MŠk(CZ) MTM 2006-06872 Institutional research plan: CEZ:AV0Z10750506 Keywords : Logistic regression * Median * Robustness * Consistency and asymptotic normality * Morgenthaler * Bianco and Yohai * Croux and Hasellbroeck Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.679, year: 2008 http://library.utia.cas.cz/separaty/2008/SI/vajda-robust%20median%20estimator%20in%20logistic%20regression.pdf
Correlation and simple linear regression.

Science.gov (United States)

Zou, Kelly H; Tuncali, Kemal; Silverman, Stuart G

2003-06-01

In this tutorial article, the concepts of correlation and regression are reviewed and demonstrated. The authors review and compare two correlation coefficients, the Pearson correlation coefficient and the Spearman rho, for measuring linear and nonlinear relationships between two continuous variables. In the case of measuring the linear relationship between a predictor and an outcome variable, simple linear regression analysis is conducted. These statistical concepts are illustrated by using a data set from published literature to assess a computed tomography-guided interventional technique. These statistical methods are important for exploring the relationships between variables and can be applied to many radiologic studies.
Partial least squares methods for spectrally estimating lunar soil FeO abundance: A stratified approach to revealing nonlinear effect and qualitative interpretation

Science.gov (United States)

Li, Lin

2008-12-01

Partial least squares (PLS) regressions were applied to lunar highland and mare soil data characterized by the Lunar Soil Characterization Consortium (LSCC) for spectral estimation of the abundance of lunar soil chemical constituents FeO and Al2O3. The LSCC data set was split into a number of subsets including the total highland, Apollo 16, Apollo 14, and total mare soils, and then PLS was applied to each to investigate the effect of nonlinearity on the performance of the PLS method. The weight-loading vectors resulting from PLS were analyzed to identify mineral species responsible for spectral estimation of the soil chemicals. The results from PLS modeling indicate that the PLS performance depends on the correlation of constituents of interest to their major mineral carriers, and the Apollo 16 soils are responsible for the large errors of FeO and Al2O3 estimates when the soils were modeled along with other types of soils. These large errors are primarily attributed to the degraded correlation FeO to pyroxene for the relatively mature Apollo 16 soils as a result of space weathering and secondary to the interference of olivine. PLS consistently yields very accurate fits to the two soil chemicals when applied to mare soils. Although Al2O3 has no spectrally diagnostic characteristics, this chemical can be predicted for all subset data by PLS modeling at high accuracies because of its correlation to FeO. This correlation is reflected in the symmetry of the PLS weight-loading vectors for FeO and Al2O3, which prove to be very useful for qualitative interpretation of the PLS results. However, this qualitative interpretation of PLS modeling cannot be achieved using principal component regression loading vectors.
Development and validation of a Partial Least Squares-Discriminant Analysis (PLS-DA) model based on the determination of ethyl glucuronide (EtG) and fatty acid ethyl esters (FAEEs) in hair for the diagnosis of chronic alcohol abuse.

Science.gov (United States)

Alladio, E; Giacomelli, L; Biosa, G; Corcia, D Di; Gerace, E; Salomone, A; Vincenti, M

2018-01-01

The chronic intake of an excessive amount of alcohol is currently ascertained by determining the concentration of direct alcohol metabolites in the hair samples of the alleged abusers, including ethyl glucuronide (EtG) and, less frequently, fatty acid ethyl esters (FAEEs). Indirect blood biomarkers of alcohol abuse are still determined to support hair EtG results and diagnose a consequent liver impairment. In the present study, the supporting role of hair FAEEs is compared with indirect blood biomarkers with respect to the contexts in which hair EtG interpretation is uncertain. Receiver Operating Characteristics (ROC) curves and multivariate Principal Component Analysis (PCA) demonstrated much stronger correlation of EtG results with FAEEs than with any single indirect biomarker or their combinations. Partial Least Squares Discriminant Analysis (PLS-DA) models based on hair EtG and FAEEs were developed to maximize the biomarkers information content on a multivariate background. The final PLS-DA model yielded 100% correct classification on a training/evaluation dataset of 155 subjects, including both chronic alcohol abusers and social drinkers. Then, the PLS-DA model was validated on an external dataset of 81 individual providing optimal discrimination ability between chronic alcohol abusers and social drinkers, in terms of specificity and sensitivity. The PLS-DA scores obtained for each subject, with respect to the PLS-DA model threshold that separates the probabilistic distributions for the two classes, furnished a likelihood ratio value, which in turn conveys the strength of the experimental data support to the classification decision, within a Bayesian logic. Typical boundary real cases from daily work are discussed, too. Copyright © 2017 Elsevier B.V. All rights reserved.
[Rapid determination of componential contents and calorific value of selected agricultural biomass feedstocks using spectroscopic technology].

Science.gov (United States)

Sheng, Kui-Chuan; Shen, Ying-Ying; Yang, Hai-Qing; Wang, Wen-Jin; Luo, Wei-Qiang

2012-10-01

Rapid determination of biomass feedstock properties is of value for the production of biomass densification briquetting fuel with high quality. In the present study, visible and near-infrared (Vis-NIR) spectroscopy was employed to build prediction models of componential contents, i. e. moisture, ash, volatile matter and fixed-carbon, and calorific value of three selected species of agricultural biomass feedstock, i. e. pine wood, cedar wood, and cotton stalk. The partial least squares (PLS) cross validation results showed that compared with original reflection spectra, PLS regression models developed for first derivative spectra produced higher prediction accuracy with coefficients of determination (R2) of 0.97, 0.94 and 0.90, and residual prediction deviation (RPD) of 6.57, 4.00 and 3.01 for ash, volatile matter and moisture, respectively. Good prediction accuracy was achieved with R2 of 0.85 and RPD of 2.55 for fixed carbon, and R2 of 0.87 and RPD of 2.73 for calorific value. It is concluded that the Vis-NIR spectroscopy is promising as an alternative of traditional proximate analysis for rapid determination of componential contents and calorific value of agricultural biomass feedstock
Characterisation of olive fruit for the milling process by using visible/near infrared spectroscopy

Directory of Open Access Journals (Sweden)

Roberto Beghi

2013-10-01

Full Text Available Increasing consumption of olive oil and table olives has recently determined an expansion of olive tree cultivation in the world. This trend is supported by the documented nutritional value of the Mediterranean diet. The aim of this work was to test a portable visible/ near infrared (vis/NIR system (400-1000 nm for the analysis of physical-chemical parameters, such as olive soluble solid content (SSC and texture before the olive oil extraction process. The final goal is to provide the sector with post-harvest methods and sorting systems for a quick evaluation of important properties of olive fruit. In the present study, a total of 109 olives for oil production were analysed. Olive spectra registered with the optical device and values obtained with destructive analysis in the laboratory were analysed. Specific statistical models were elaborated to study correlations between optical and laboratory analysis, and to evaluate predictions of reference parameters obtained through the analysis of the visible-near infrared range. Statistical models were processed using chemometric techniques to extract maximum data information. Principal component analysis (PCA was performed on vis/NIR spectra to examine sample groupings and identify outliers, while partial least square (PLS regression algorithm was used to correlate samples spectra and physical- chemical properties. Results are encouraging. PCA showed a significant sample grouping among different ranges of SSC and texture. PLS models gave fairly good predictive capabilities in validation for SSC (R2=0.67 and RMSECV%=7.5% and texture (R2=0.68 and RMSECV%=8.2%.
Identification of chilling and heat requirements of cherry trees--a statistical approach.

Science.gov (United States)

Luedeling, Eike; Kunz, Achim; Blanke, Michael M

2013-09-01

Most trees from temperate climates require the accumulation of winter chill and subsequent heat during their dormant phase to resume growth and initiate flowering in the following spring. Global warming could reduce chill and hence hamper the cultivation of high-chill species such as cherries. Yet determining chilling and heat requirements requires large-scale controlled-forcing experiments, and estimates are thus often unavailable. Where long-term phenology datasets exist, partial least squares (PLS) regression can be used as an alternative, to determine climatic requirements statistically. Bloom dates of cherry cv. 'Schneiders späte Knorpelkirsche' trees in Klein-Altendorf, Germany, from 24 growing seasons were correlated with 11-day running means of daily mean temperature. Based on the output of the PLS regression, five candidate chilling periods ranging in length from 17 to 102 days, and one forcing phase of 66 days were delineated. Among three common chill models used to quantify chill, the Dynamic Model showed the lowest variation in chill, indicating that it may be more accurate than the Utah and Chilling Hours Models. Based on the longest candidate chilling phase with the earliest starting date, cv. 'Schneiders späte Knorpelkirsche' cherries at Bonn exhibited a chilling requirement of 68.6 ± 5.7 chill portions (or 1,375 ± 178 chilling hours or 1,410 ± 238 Utah chill units) and a heat requirement of 3,473 ± 1,236 growing degree hours. Closer investigation of the distinct chilling phases detected by PLS regression could contribute to our understanding of dormancy processes and thus help fruit and nut growers identify suitable tree cultivars for a future in which static climatic conditions can no longer be assumed. All procedures used in this study were bundled in an R package ('chillR') and are provided as Supplementary materials. The procedure was also applied to leaf emergence dates of walnut (cv. 'Payne') at Davis, California.
Pengaruh Strategic Costing Sebagai Strategic Management Accounting Techniques Terhadap Competitive Advantage Dan Organizational Performance

OpenAIRE

Cynthia, Cynthia

2015-01-01

The purpose of this study was to test the affect of Strategic Costing on Organizational Performance through Competitive Advantage that acted as the intervening variable on manufacturing companies in Surabaya and Sidoarjo. The sample of this study was 50 manufacturing companies in Surabaya and Sidoarjo. The data was collected by distributing questionnaires to the companies. This study used path modeling analysis technique with PLS tools. The results from this study showed that there were posi...
Applied survival analysis using R

CERN Document Server

Moore, Dirk F

2016-01-01

Applied Survival Analysis Using R covers the main principles of survival analysis, gives examples of how it is applied, and teaches how to put those principles to use to analyze data using R as a vehicle. Survival data, where the primary outcome is time to a specific event, arise in many areas of biomedical research, including clinical trials, epidemiological studies, and studies of animals. Many survival methods are extensions of techniques used in linear regression and categorical data, while other aspects of this field are unique to survival data. This text employs numerous actual examples to illustrate survival curve estimation, comparison of survivals of different groups, proper accounting for censoring and truncation, model variable selection, and residual analysis. Because explaining survival analysis requires more advanced mathematics than many other statistical topics, this book is organized with basic concepts and most frequently used procedures covered in earlier chapters, with more advanced topics...
On two flexible methods of 2-dimensional regression analysis

Czech Academy of Sciences Publication Activity Database

Volf, Petr

2012-01-01

Roč. 18, č. 4 (2012), s. 154-164 ISSN 1803-9782 Grant - others:GA ČR(CZ) GAP209/10/2045 Institutional support: RVO:67985556 Keywords : regression analysis * Gordon surface * prediction error * projection pursuit Subject RIV: BB - Applied Statistics, Operational Research http://library.utia.cas.cz/separaty/2013/SI/volf-on two flexible methods of 2-dimensional regression analysis.pdf
Non-destructive technique for determining the viability of soybean (Glycine max) seeds using FT-NIR spectroscopy.

Science.gov (United States)

Kusumaningrum, Dewi; Lee, Hoonsoo; Lohumi, Santosh; Mo, Changyeun; Kim, Moon S; Cho, Byoung-Kwan

2018-03-01

The viability of seeds is important for determining their quality. A high-quality seed is one that has a high capability of germination that is necessary to ensure high productivity. Hence, developing technology for the detection of seed viability is a high priority in agriculture. Fourier transform near-infrared (FT-NIR) spectroscopy is one of the most popular devices among other vibrational spectroscopies. This study aims to use FT-NIR spectroscopy to determine the viability of soybean seeds. Viable and artificial ageing seeds as non-viable soybeans were used in this research. The FT-NIR spectra of soybean seeds were collected and analysed using a partial least-squares discriminant analysis (PLS-DA) to classify viable and non-viable soybean seeds. Moreover, the variable importance in projection (VIP) method for variable selection combined with the PLS-DA was employed. The most effective wavelengths were selected by the VIP method, which selected 146 optimal variables from the full set of 1557 variables. The results demonstrated that the FT-NIR spectral analysis with the PLS-DA method that uses all variables or the selected variables showed good performance based on the high value of prediction accuracy for soybean viability with an accuracy close to 100%. Hence, FT-NIR techniques with a chemometric analysis have the potential for rapidly measuring soybean seed viability. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.
Housing price forecastability: A factor analysis

DEFF Research Database (Denmark)

Møller, Stig Vinther; Bork, Lasse

2017-01-01

We examine U.S. housing price forecastability using principal component analysis (PCA), partial least squares (PLS), and sparse PLS (SPLS). We incorporate information from a large panel of 128 economic time series and show that macroeconomic fundamentals have strong predictive power for future...... movements in housing prices. We find that (S)PLS models systematically dominate PCA models. (S)PLS models also generate significant out-of-sample predictive power over and above the predictive power contained by the price-rent ratio, autoregressive benchmarks, and regression models based on small datasets....
A land use regression model for ambient ultrafine particles in Montreal, Canada: A comparison of linear regression and a machine learning approach.

Science.gov (United States)

Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne

2016-04-01

Existing evidence suggests that ambient ultrafine particles (UFPs) (regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure. Crown Copyright © 2015. Published by Elsevier Inc. All rights reserved.
Semiparametric nonlinear quantile regression model for financial returns

Czech Academy of Sciences Publication Activity Database

Avdulaj, Krenar; Baruník, Jozef

2017-01-01

Roč. 21, č. 1 (2017), s. 81-97 ISSN 1081-1826 R&D Projects: GA ČR(CZ) GBP402/12/G097 Institutional support: RVO:67985556 Keywords : copula quantile regression * realized volatility * value-at-risk Subject RIV: AH - Economic s OBOR OECD: Applied Economic s, Econometrics Impact factor: 0.649, year: 2016 http://library.utia.cas.cz/separaty/2017/E/avdulaj-0472346.pdf
On-line mixture-based alternative to logistic regression

Czech Academy of Sciences Publication Activity Database

Nagy, Ivan; Suzdaleva, Evgenia

2016-01-01

Roč. 26, č. 5 (2016), s. 417-437 ISSN 1210-0552 R&D Projects: GA ČR GA15-03564S Institutional support: RVO:67985556 Keywords : on-line modeling * on-line logistic regression * recursive mixture estimation * data dependent pointer Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.394, year: 2016 http://library.utia.cas.cz/separaty/2016/ZS/suzdaleva-0464463.pdf

Hierarchical Cluster-based Partial Least Squares Regression (HC-PLSR is an efficient tool for metamodelling of nonlinear dynamic models

Directory of Open Access Journals (Sweden)

Omholt Stig W

2011-06-01

Full Text Available Abstract Background Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs to variation in features of the trajectories of the state variables (outputs throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR, where fuzzy C-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR and ordinary least squares (OLS regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function. Results Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback
Hierarchical cluster-based partial least squares regression (HC-PLSR) is an efficient tool for metamodelling of nonlinear dynamic models.

Science.gov (United States)

Tøndel, Kristin; Indahl, Ulf G; Gjuvsland, Arne B; Vik, Jon Olav; Hunter, Peter; Omholt, Stig W; Martens, Harald

2011-06-01

Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs) to variation in features of the trajectories of the state variables (outputs) throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR), where fuzzy C-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR) and ordinary least squares (OLS) regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function. Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback loops. HC-PLSR is a promising approach for
Regression: The Apple Does Not Fall Far From the Tree.

Science.gov (United States)

Vetter, Thomas R; Schober, Patrick

2018-05-15

Researchers and clinicians are frequently interested in either: (1) assessing whether there is a relationship or association between 2 or more variables and quantifying this association; or (2) determining whether 1 or more variables can predict another variable. The strength of such an association is mainly described by the correlation. However, regression analysis and regression models can be used not only to identify whether there is a significant relationship or association between variables but also to generate estimations of such a predictive relationship between variables. This basic statistical tutorial discusses the fundamental concepts and techniques related to the most common types of regression analysis and modeling, including simple linear regression, multiple regression, logistic regression, ordinal regression, and Poisson regression, as well as the common yet often underrecognized phenomenon of regression toward the mean. The various types of regression analysis are powerful statistical techniques, which when appropriately applied, can allow for the valid interpretation of complex, multifactorial data. Regression analysis and models can assess whether there is a relationship or association between 2 or more observed variables and estimate the strength of this association, as well as determine whether 1 or more variables can predict another variable. Regression is thus being applied more commonly in anesthesia, perioperative, critical care, and pain research. However, it is crucial to note that regression can identify plausible risk factors; it does not prove causation (a definitive cause and effect relationship). The results of a regression analysis instead identify independent (predictor) variable(s) associated with the dependent (outcome) variable. As with other statistical methods, applying regression requires that certain assumptions be met, which can be tested with specific diagnostics.
Poisson Mixture Regression Models for Heart Disease Prediction.

Science.gov (United States)

Mufudza, Chipo; Erol, Hamza

2016-01-01

Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model.
Poisson Mixture Regression Models for Heart Disease Prediction

Science.gov (United States)

Erol, Hamza

2016-01-01

Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model. PMID:27999611
[Quantitative determination of glass content in monazite glass-ceramics by IR technique].

Science.gov (United States)

He, Yong; Zhang, Bao-min

2003-04-01

Monazite glass-ceramics consist of both monazite and metaphoshate glass phases. The absorption bands of both phases do not overlap each other, and the absorption intensities of bands 1,275 and 616 cm-1 vary with the glass contents. The correlation coefficient between logarithmic absorbance ratio of the two bands and glass contents was r = 0.9975 and its regression equation was y = 48.356 + 25.93x. The absorbance ratio of bands 952 and 616 cm-1 also varied with different ratios of Ce2O3/La2O3 in synthetic monazites, with r = 0.9917 and a regression equation y = 0.2211 exp (0.0221x). High correlation coefficients show that the IR technique could find new application in the quantitative analysis of glass content in phosphate glass-ceramics.
The Bacillus subtilis Conjugative Plasmid pLS20 Encodes Two Ribbon-Helix-Helix Type Auxiliary Relaxosome Proteins That Are Essential for Conjugation

Directory of Open Access Journals (Sweden)

Andrés Miguel-Arribas

2017-11-01

Full Text Available Bacterial conjugation is the process by which a conjugative element (CE is transferred horizontally from a donor to a recipient cell via a connecting pore. One of the first steps in the conjugation process is the formation of a nucleoprotein complex at the origin of transfer (oriT, where one of the components of the nucleoprotein complex, the relaxase, introduces a site- and strand specific nick to initiate the transfer of a single DNA strand into the recipient cell. In most cases, the nucleoprotein complex involves, besides the relaxase, one or more additional proteins, named auxiliary proteins, which are encoded by the CE and/or the host. The conjugative plasmid pLS20 replicates in the Gram-positive Firmicute bacterium Bacillus subtilis. We have recently identified the relaxase gene and the oriT of pLS20, which are separated by a region of almost 1 kb. Here we show that this region contains two auxiliary genes that we name aux1LS20 and aux2LS20, and which we show are essential for conjugation. Both Aux1LS20 and Aux2LS20 are predicted to contain a Ribbon-Helix-Helix DNA binding motif near their N-terminus. Analyses of the purified proteins show that Aux1LS20 and Aux2LS20 form tetramers and hexamers in solution, respectively, and that they both bind preferentially to oriTLS20, although with different characteristics and specificities. In silico analyses revealed that genes encoding homologs of Aux1LS20 and/or Aux2LS20 are located upstream of almost 400 relaxase genes of the RelLS20 family (MOBL of relaxases. Thus, Aux1LS20 and Aux2LS20 of pLS20 constitute the founding member of the first two families of auxiliary proteins described for CEs of Gram-positive origin.
The Bacillus subtilis Conjugative Plasmid pLS20 Encodes Two Ribbon-Helix-Helix Type Auxiliary Relaxosome Proteins That Are Essential for Conjugation.

Science.gov (United States)

Miguel-Arribas, Andrés; Hao, Jian-An; Luque-Ortega, Juan R; Ramachandran, Gayetri; Val-Calvo, Jorge; Gago-Córdoba, César; González-Álvarez, Daniel; Abia, David; Alfonso, Carlos; Wu, Ling J; Meijer, Wilfried J J

2017-01-01

Bacterial conjugation is the process by which a conjugative element (CE) is transferred horizontally from a donor to a recipient cell via a connecting pore. One of the first steps in the conjugation process is the formation of a nucleoprotein complex at the origin of transfer ( oriT ), where one of the components of the nucleoprotein complex, the relaxase, introduces a site- and strand specific nick to initiate the transfer of a single DNA strand into the recipient cell. In most cases, the nucleoprotein complex involves, besides the relaxase, one or more additional proteins, named auxiliary proteins, which are encoded by the CE and/or the host. The conjugative plasmid pLS20 replicates in the Gram-positive Firmicute bacterium Bacillus subtilis . We have recently identified the relaxase gene and the oriT of pLS20, which are separated by a region of almost 1 kb. Here we show that this region contains two auxiliary genes that we name aux1 LS20 and aux2 LS20 , and which we show are essential for conjugation. Both Aux1 LS20 and Aux2 LS20 are predicted to contain a Ribbon-Helix-Helix DNA binding motif near their N-terminus. Analyses of the purified proteins show that Aux1 LS20 and Aux2 LS20 form tetramers and hexamers in solution, respectively, and that they both bind preferentially to oriT LS20 , although with different characteristics and specificities. In silico analyses revealed that genes encoding homologs of Aux1 LS20 and/or Aux2 LS20 are located upstream of almost 400 relaxase genes of the Rel LS20 family (MOB L ) of relaxases. Thus, Aux1 LS20 and Aux2 LS20 of pLS20 constitute the founding member of the first two families of auxiliary proteins described for CEs of Gram-positive origin.
Geographically Weighted Logistic Regression Applied to Credit Scoring Models

Directory of Open Access Journals (Sweden)

Pedro Henrique Melo Albuquerque

Full Text Available Abstract This study used real data from a Brazilian financial institution on transactions involving Consumer Direct Credit (CDC, granted to clients residing in the Distrito Federal (DF, to construct credit scoring models via Logistic Regression and Geographically Weighted Logistic Regression (GWLR techniques. The aims were: to verify whether the factors that influence credit risk differ according to the borrower’s geographic location; to compare the set of models estimated via GWLR with the global model estimated via Logistic Regression, in terms of predictive power and financial losses for the institution; and to verify the viability of using the GWLR technique to develop credit scoring models. The metrics used to compare the models developed via the two techniques were the AICc informational criterion, the accuracy of the models, the percentage of false positives, the sum of the value of false positive debt, and the expected monetary value of portfolio default compared with the monetary value of defaults observed. The models estimated for each region in the DF were distinct in their variables and coefficients (parameters, with it being concluded that credit risk was influenced differently in each region in the study. The Logistic Regression and GWLR methodologies presented very close results, in terms of predictive power and financial losses for the institution, and the study demonstrated viability in using the GWLR technique to develop credit scoring models for the target population in the study.
Geographically weighted regression based methods for merging satellite and gauge precipitation

Science.gov (United States)

Chao, Lijun; Zhang, Ke; Li, Zhijia; Zhu, Yuelong; Wang, Jingfeng; Yu, Zhongbo

2018-03-01

Real-time precipitation data with high spatiotemporal resolutions are crucial for accurate hydrological forecasting. To improve the spatial resolution and quality of satellite precipitation, a three-step satellite and gauge precipitation merging method was formulated in this study: (1) bilinear interpolation is first applied to downscale coarser satellite precipitation to a finer resolution (PS); (2) the (mixed) geographically weighted regression methods coupled with a weighting function are then used to estimate biases of PS as functions of gauge observations (PO) and PS; and (3) biases of PS are finally corrected to produce a merged precipitation product. Based on the above framework, eight algorithms, a combination of two geographically weighted regression methods and four weighting functions, are developed to merge CMORPH (CPC MORPHing technique) precipitation with station observations on a daily scale in the Ziwuhe Basin of China. The geographical variables (elevation, slope, aspect, surface roughness, and distance to the coastline) and a meteorological variable (wind speed) were used for merging precipitation to avoid the artificial spatial autocorrelation resulting from traditional interpolation methods. The results show that the combination of the MGWR and BI-square function (MGWR-BI) has the best performance (R = 0.863 and RMSE = 7.273 mm/day) among the eight algorithms. The MGWR-BI algorithm was then applied to produce hourly merged precipitation product. Compared to the original CMORPH product (R = 0.208 and RMSE = 1.208 mm/hr), the quality of the merged data is significantly higher (R = 0.724 and RMSE = 0.706 mm/hr). The developed merging method not only improves the spatial resolution and quality of the satellite product but also is easy to implement, which is valuable for hydrological modeling and other applications.
The R Package threg to Implement Threshold Regression Models

Directory of Open Access Journals (Sweden)

Tao Xiao

2015-08-01

This new package includes four functions: threg, and the methods hr, predict and plot for threg objects returned by threg. The threg function is the model-fitting function which is used to calculate regression coefficient estimates, asymptotic standard errors and p values. The hr method for threg objects is the hazard-ratio calculation function which provides the estimates of hazard ratios at selected time points for specified scenarios (based on given categories or value settings of covariates. The predict method for threg objects is used for prediction. And the plot method for threg objects provides plots for curves of estimated hazard functions, survival functions and probability density functions of the first-hitting-time; function curves corresponding to different scenarios can be overlaid in the same plot for comparison to give additional research insights.
Evaluation of Logistic Regression and Multivariate Adaptive Regression Spline Models for Groundwater Potential Mapping Using R and GIS

Directory of Open Access Journals (Sweden)

Soyoung Park

2017-07-01

Full Text Available This study mapped and analyzed groundwater potential using two different models, logistic regression (LR and multivariate adaptive regression splines (MARS, and compared the results. A spatial database was constructed for groundwater well data and groundwater influence factors. Groundwater well data with a high potential yield of ≥70 m3/d were extracted, and 859 locations (70% were used for model training, whereas the other 365 locations (30% were used for model validation. We analyzed 16 groundwater influence factors including altitude, slope degree, slope aspect, plan curvature, profile curvature, topographic wetness index, stream power index, sediment transport index, distance from drainage, drainage density, lithology, distance from fault, fault density, distance from lineament, lineament density, and land cover. Groundwater potential maps (GPMs were constructed using LR and MARS models and tested using a receiver operating characteristics curve. Based on this analysis, the area under the curve (AUC for the success rate curve of GPMs created using the MARS and LR models was 0.867 and 0.838, and the AUC for the prediction rate curve was 0.836 and 0.801, respectively. This implies that the MARS model is useful and effective for groundwater potential analysis in the study area.
Differentiating regressed melanoma from regressed lichenoid keratosis.

Science.gov (United States)

Chan, Aegean H; Shulman, Kenneth J; Lee, Bonnie A

2017-04-01

Distinguishing regressed lichen planus-like keratosis (LPLK) from regressed melanoma can be difficult on histopathologic examination, potentially resulting in mismanagement of patients. We aimed to identify histopathologic features by which regressed melanoma can be differentiated from regressed LPLK. Twenty actively inflamed LPLK, 12 LPLK with regression and 15 melanomas with regression were compared and evaluated by hematoxylin and eosin staining as well as Melan-A, microphthalmia transcription factor (MiTF) and cytokeratin (AE1/AE3) immunostaining. (1) A total of 40% of regressed melanomas showed complete or near complete loss of melanocytes within the epidermis with Melan-A and MiTF immunostaining, while 8% of regressed LPLK exhibited this finding. (2) Necrotic keratinocytes were seen in the epidermis in 33% regressed melanomas as opposed to all of the regressed LPLK. (3) A dense infiltrate of melanophages in the papillary dermis was seen in 40% of regressed melanomas, a feature not seen in regressed LPLK. In summary, our findings suggest that a complete or near complete loss of melanocytes within the epidermis strongly favors a regressed melanoma over a regressed LPLK. In addition, necrotic epidermal keratinocytes and the presence of a dense band-like distribution of dermal melanophages can be helpful in differentiating these lesions. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Photo double-ionization of helium: a new approach combining R matrix and semiclassical techniques in an hyperspherical framework

International Nuclear Information System (INIS)

Malegat, L.; Kazansky, A.; Selles, P.

1999-01-01

We introduce a new method for computing photo double ionization (PDI) cross sections for two electron atoms. It is formulated in terms of the hyperspherical radius R and relies upon a combination of R matrix techniques in the inner region R≤R 0 with a semiclassical approximation for the R motion in the outer region. We present a first application of this method to the PDI of He within a model of reduced dimensionality where r 1 =r 2 . It demonstrates the validity of our numerical scheme and provides a first quantitative estimate of the energy domain of validity of the Wannier mechanism. (orig.)
Convex Optimization in R

Directory of Open Access Journals (Sweden)

Roger Koenker

2014-09-01

Full Text Available Convex optimization now plays an essential role in many facets of statistics. We briefly survey some recent developments and describe some implementations of these methods in R . Applications of linear and quadratic programming are introduced including quantile regression, the Huber M-estimator and various penalized regression methods. Applications to additively separable convex problems subject to linear equality and inequality constraints such as nonparametric density estimation and maximum likelihood estimation of general nonparametric mixture models are described, as are several cone programming problems. We focus throughout primarily on implementations in the R environment that rely on solution methods linked to R, like MOSEK by the package Rmosek. Code is provided in R to illustrate several of these problems. Other applications are available in the R package REBayes, dealing with empirical Bayes estimation of nonparametric mixture models.
Wavelet regression model in forecasting crude oil price

Science.gov (United States)

Hamid, Mohd Helmie; Shabri, Ani

2017-05-01

This study presents the performance of wavelet multiple linear regression (WMLR) technique in daily crude oil forecasting. WMLR model was developed by integrating the discrete wavelet transform (DWT) and multiple linear regression (MLR) model. The original time series was decomposed to sub-time series with different scales by wavelet theory. Correlation analysis was conducted to assist in the selection of optimal decomposed components as inputs for the WMLR model. The daily WTI crude oil price series has been used in this study to test the prediction capability of the proposed model. The forecasting performance of WMLR model were also compared with regular multiple linear regression (MLR), Autoregressive Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) using root mean square errors (RMSE) and mean absolute errors (MAE). Based on the experimental results, it appears that the WMLR model performs better than the other forecasting technique tested in this study.
[Establishment of the Mathematical Model for PMI Estimation Using FTIR Spectroscopy and Data Mining Method].

Science.gov (United States)

Wang, L; Qin, X C; Lin, H C; Deng, K F; Luo, Y W; Sun, Q R; Du, Q X; Wang, Z Y; Tuo, Y; Sun, J H

2018-02-01

To analyse the relationship between Fourier transform infrared （FTIR） spectrum of rat's spleen tissue and postmortem interval （PMI） for PMI estimation using FTIR spectroscopy combined with data mining method. Rats were sacrificed by cervical dislocation, and the cadavers were placed at 20 ℃. The FTIR spectrum data of rats' spleen tissues were taken and measured at different time points. After pretreatment, the data was analysed by data mining method. The absorption peak intensity of rat's spleen tissue spectrum changed with the PMI, while the absorption peak position was unchanged. The results of principal component analysis （PCA） showed that the cumulative contribution rate of the first three principal components was 96%. There was an obvious clustering tendency for the spectrum sample at each time point. The methods of partial least squares discriminant analysis （PLS-DA） and support vector machine classification （SVMC） effectively divided the spectrum samples with different PMI into four categories （0-24 h, 48-72 h, 96-120 h and 144-168 h）. The determination coefficient （ R ²） of the PMI estimation model established by PLS regression analysis was 0.96, and the root mean square error of calibration （RMSEC） and root mean square error of cross validation （RMSECV） were 9.90 h and 11.39 h respectively. In prediction set, the R ² was 0.97, and the root mean square error of prediction （RMSEP） was 10.49 h. The FTIR spectrum of the rat's spleen tissue can be effectively analyzed qualitatively and quantitatively by the combination of FTIR spectroscopy and data mining method, and the classification and PLS regression models can be established for PMI estimation. Copyright© by the Editorial Department of Journal of Forensic Medicine.
Fuzzy multiple linear regression: A computational approach

Science.gov (United States)

Juang, C. H.; Huang, X. H.; Fleming, J. W.

1992-01-01

This paper presents a new computational approach for performing fuzzy regression. In contrast to Bardossy's approach, the new approach, while dealing with fuzzy variables, closely follows the conventional regression technique. In this approach, treatment of fuzzy input is more 'computational' than 'symbolic.' The following sections first outline the formulation of the new approach, then deal with the implementation and computational scheme, and this is followed by examples to illustrate the new procedure.
Explaining and modeling the concentration and loading of Escherichia coli in a stream-A case study.

Science.gov (United States)

Wang, Chaozi; Schneider, Rebecca L; Parlange, Jean-Yves; Dahlke, Helen E; Walter, M Todd

2018-09-01

Escherichia coli (E. coli) level in streams is a public health indicator. Therefore, being able to explain why E. coli levels are sometimes high and sometimes low is important. Using citizen science data from Fall Creek in central NY we found that complementarily using principal component analysis (PCA) and partial least squares (PLS) regression provided insights into the drivers of E. coli and a mechanism for predicting E. coli levels, respectively. We found that stormwater, temperature/season and shallow subsurface flow are the three dominant processes driving the fate and transport of E. coli. PLS regression modeling provided very good predictions under stormwater conditions (R 2  = 0.85 for log (E. coli concentration) and R 2  = 0.90 for log (E. coli loading)); predictions under baseflow conditions were less robust. But, in our case, both E. coli concentration and E. coli loading were significantly higher under stormwater condition, so it is probably more important to predict high-flow E. coli hazards than low-flow conditions. Besides previously reported good indicators of in-stream E. coli level, nitrate-/nitrite-nitrogen and soluble reactive phosphorus were also found to be good indicators of in-stream E. coli levels. These findings suggest management practices to reduce E. coli concentrations and loads in-streams and, eventually, reduce the risk of waterborne disease outbreak. Copyright © 2018. Published by Elsevier B.V.
Model-based Quantile Regression for Discrete Data

KAUST Repository

Padellini, Tullia

2018-04-10

Quantile regression is a class of methods voted to the modelling of conditional quantiles. In a Bayesian framework quantile regression has typically been carried out exploiting the Asymmetric Laplace Distribution as a working likelihood. Despite the fact that this leads to a proper posterior for the regression coefficients, the resulting posterior variance is however affected by an unidentifiable parameter, hence any inferential procedure beside point estimation is unreliable. We propose a model-based approach for quantile regression that considers quantiles of the generating distribution directly, and thus allows for a proper uncertainty quantification. We then create a link between quantile regression and generalised linear models by mapping the quantiles to the parameter of the response variable, and we exploit it to fit the model with R-INLA. We extend it also in the case of discrete responses, where there is no 1-to-1 relationship between quantiles and distribution\\'s parameter, by introducing continuous generalisations of the most common discrete variables (Poisson, Binomial and Negative Binomial) to be exploited in the fitting.

application of multilinear regression analysis in modeling of soil

African Journals Online (AJOL)

Windows User

Accordingly [1, 3] in their work, they applied linear regression ... (MLRA) is a statistical technique that uses several explanatory ... order to check this, they adopted bivariate correlation analysis .... groups, namely A-1 through A-7, based on their relative expected ..... Multivariate Regression in Gorgan Province North of Iran” ...
What Are the Odds of that? A Primer on Understanding Logistic Regression

Science.gov (United States)

Huang, Francis L.; Moon, Tonya R.

2013-01-01

The purpose of this Methodological Brief is to present a brief primer on logistic regression, a commonly used technique when modeling dichotomous outcomes. Using data from the National Education Longitudinal Study of 1988 (NELS:88), logistic regression techniques were used to investigate student-level variables in eighth grade (i.e., enrolled in a…
Applied econometrics with R

CERN Document Server

Kleiber, Christian

2008-01-01

Offers an introduction to the R system for users with a background in economics. This book covers a variety of regression models, regression diagnostics and robustness issues, the nonlinear models of microeconomics, time series and time series econometrics.
Correlation of sensory bitterness in dairy protein hydrolysates: Comparison of prediction models built using sensory, chromatographic and electronic tongue data.

Science.gov (United States)

Newman, J; Egan, T; Harbourne, N; O'Riordan, D; Jacquier, J C; O'Sullivan, M

2014-08-01

Sensory evaluation can be problematic for ingredients with a bitter taste during research and development phase of new food products. In this study, 19 dairy protein hydrolysates (DPH) were analysed by an electronic tongue and their physicochemical characteristics, the data obtained from these methods were correlated with their bitterness intensity as scored by a trained sensory panel and each model was also assessed by its predictive capabilities. The physiochemical characteristics of the DPHs investigated were degree of hydrolysis (DH%), and data relating to peptide size and relative hydrophobicity from size exclusion chromatography (SEC) and reverse phase (RP) HPLC. Partial least square regression (PLS) was used to construct the prediction models. All PLS regressions had good correlations (0.78 to 0.93) with the strongest being the combination of data obtained from SEC and RP HPLC. However, the PLS with the strongest predictive power was based on the e-tongue which had the PLS regression with the lowest root mean predicted residual error sum of squares (PRESS) in the study. The results show that the PLS models constructed with the e-tongue and the combination of SEC and RP-HPLC has potential to be used for prediction of bitterness and thus reducing the reliance on sensory analysis in DPHs for future food research. Copyright © 2014 Elsevier B.V. All rights reserved.
Superquantile Regression: Theory, Algorithms, and Applications

Science.gov (United States)

2014-12-01

Highway, Suite 1204, Arlington, Va 22202-4302, and to the Office of Management and Budget, Paperwork Reduction Project (0704-0188) Washington DC 20503. 1...Navy submariners, reliability engineering, uncertainty quantification, and financial risk management . Superquantile, superquantile regression...Royset Carlos F. Borges Associate Professor of Operations Research Dissertation Supervisor Professor of Applied Mathematics Lyn R. Whitaker Javier
Prediction of valid acidity in intact apples with Fourier transform near infrared spectroscopy*

OpenAIRE

Liu, Yan-de; Ying, Yi-bin; Fu, Xia-ping

2005-01-01

To develop nondestructive acidity prediction for intact Fuji apples, the potential of Fourier transform near infrared (FT-NIR) method with fiber optics in interactance mode was investigated. Interactance in the 800 nm to 2619 nm region was measured for intact apples, harvested from early to late maturity stages. Spectral data were analyzed by two multivariate calibration techniques including partial least squares (PLS) and principal component regression (PCR) methods. A total of 120 Fuji appl...
Propiedades psicométricas de la Escala de lenguaje para preescolares (PLS-3 colombianos

Directory of Open Access Journals (Sweden)

Rita Flórez Romero

2013-01-01

Full Text Available Objetivo.El presente estudio buscó caracterizar las propiedades psicométricas del instrumento Preschool Language Scale – 3(PLS-3, en una muestra de 477 niños colombianos de cuatro a siete años de la ciudad de Bogotá D. C. Método. Para lograr este propósito, se realizaron análisis de coeficientes de discriminación, dificultad y matriz de relaciones tetracóricas. Resultados. Se encontraron apropiados niveles de confiabilidad, alta sensibilidad a la evolución de la comprensión y producción lingüística de los niños, así como un bajo índice de dificultad en algunos reactivos. Conclusión. Estos resultados se discuten a la luz de índices de discriminación en pruebas de desarrollo lingüístico típico y atípico.
Computer software for linear and nonlinear regression in organic NMR; Programa de computador para regressao linear e nao linear em R.M.N. organica

Energy Technology Data Exchange (ETDEWEB)

Canto, Eduardo Leite do; Rittner, Roberto [Universidade Estadual de Campinas, SP (Brazil). Inst. de Quimica

1992-12-31

Calculation involving two variable linear regressions, require specific procedures generally not familiar to chemist. For attending the necessity of fast and efficient handling of NMR data, a self explained and Pc portable software has been developed, which allows user to produce and use diskette recorded tables, containing chemical shift or any other substituent physical-chemical measurements and constants ({sigma}{sub T}, {sigma}{sup o}{sub R}, E{sub s}, ...) 9 refs., 1 fig.
Non-destructive analysis of sensory traits of dry-cured loins by MRI-computer vision techniques and data mining.

Science.gov (United States)

Caballero, Daniel; Antequera, Teresa; Caro, Andrés; Ávila, María Del Mar; G Rodríguez, Pablo; Perez-Palacios, Trinidad

2017-07-01

Magnetic resonance imaging (MRI) combined with computer vision techniques have been proposed as an alternative or complementary technique to determine the quality parameters of food in a non-destructive way. The aim of this work was to analyze the sensory attributes of dry-cured loins using this technique. For that, different MRI acquisition sequences (spin echo, gradient echo and turbo 3D), algorithms for MRI analysis (GLCM, NGLDM, GLRLM and GLCM-NGLDM-GLRLM) and predictive data mining techniques (multiple linear regression and isotonic regression) were tested. The correlation coefficient (R) and mean absolute error (MAE) were used to validate the prediction results. The combination of spin echo, GLCM and isotonic regression produced the most accurate results. In addition, the MRI data from dry-cured loins seems to be more suitable than the data from fresh loins. The application of predictive data mining techniques on computational texture features from the MRI data of loins enables the determination of the sensory traits of dry-cured loins in a non-destructive way. © 2016 Society of Chemical Industry. © 2016 Society of Chemical Industry.
Introduction to the use of regression models in epidemiology.

Science.gov (United States)

Bender, Ralf

2009-01-01

Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.
Investing in Global Markets: Big Data and Applications of Robust Regression

Directory of Open Access Journals (Sweden)

John eGuerard

2016-02-01

Full Text Available In this analysis of the risk and return of stocks in global markets, we apply several applications of robust regression techniques in producing stock selection models and several optimization techniques in portfolio construction in global stock universes. We find that (1 the robust regression applications are appropriate for modeling stock returns in global markets; and (2 mean-variance techniques continue to produce portfolios capable of generating excess returns above transaction costs and statistically significant asset selection. We estimate expected return models in a global equity markets using a given stock selection model and generate statistically significant active returns from various portfolio construction techniques.
Estimation of active pharmaceutical ingredients content using locally weighted partial least squares and statistical wavelength selection.

OpenAIRE

Kim, Sanghong; Kano, Manabu; Nakagawa, Hiroshi; Hasebe, Shinji

2011-01-01

Development of quality estimation models using near infrared spectroscopy (NIRS) and multivariate analysis has been accelerated as a process analytical technology (PAT) tool in the pharmaceutical industry. Although linear regression methods such as partial least squares (PLS) are widely used, they cannot always achieve high estimation accuracy because physical and chemical properties of a measuring object have a complex effect on NIR spectra. In this research, locally weighted PLS (LW-PLS) wh...
Validation of Fluorescence Spectroscopy to Detect Adulteration of Edible Oil in Extra Virgin Olive Oil (EVOO) by Applying Chemometrics.

Science.gov (United States)

Ali, Hina; Saleem, Muhammad; Anser, Muhammad Ramzan; Khan, Saranjam; Ullah, Rahat; Bilal, Muhammad

2018-01-01

Due to high price and nutritional values of extra virgin olive oil (EVOO), it is vulnerable to adulteration internationally. Refined oil or other vegetable oils are commonly blended with EVOO and to unmask such fraud, quick, and reliable technique needs to be standardized and developed. Therefore, in this study, adulteration of edible oil (sunflower oil) is made with pure EVOO and analyzed using fluorescence spectroscopy (excitation wavelength at 350 nm) in conjunction with principal component analysis (PCA) and partial least squares (PLS) regression. Fluorescent spectra contain fingerprints of chlorophyll and carotenoids that are characteristics of EVOO and differentiated it from sunflower oil. A broad intense hump corresponding to conjugated hydroperoxides is seen in sunflower oil in the range of 441-489 nm with the maximum at 469 nm whereas pure EVOO has low intensity doublet peaks in this region at 441 nm and 469 nm. Visible changes in spectra are observed in adulterated EVOO by increasing the concentration of sunflower oil, with an increase in doublet peak and correspondingly decrease in chlorophyll peak intensity. Principal component analysis showed a distinct clustering of adulterated samples of different concentrations. Subsequently, the PLS regression model was best fitted over the complete data set on the basis of coefficient of determination (R 2 ), standard error of calibration (SEC), and standard error of prediction (SEP) of values 0.99, 0.617, and 0.623 respectively. In addition to adulterant, test samples and imported commercial brands of EVOO were also used for prediction and validation of the models. Fluorescence spectroscopy combined with chemometrics showed its robustness to identify and quantify the specified adulterant in pure EVOO.
bayesQR: A Bayesian Approach to Quantile Regression

Directory of Open Access Journals (Sweden)

Dries F. Benoit

2017-01-01

Full Text Available After its introduction by Koenker and Basset (1978, quantile regression has become an important and popular tool to investigate the conditional response distribution in regression. The R package bayesQR contains a number of routines to estimate quantile regression parameters using a Bayesian approach based on the asymmetric Laplace distribution. The package contains functions for the typical quantile regression with continuous dependent variable, but also supports quantile regression for binary dependent variables. For both types of dependent variables, an approach to variable selection using the adaptive lasso approach is provided. For the binary quantile regression model, the package also contains a routine that calculates the fitted probabilities for each vector of predictors. In addition, functions for summarizing the results, creating traceplots, posterior histograms and drawing quantile plots are included. This paper starts with a brief overview of the theoretical background of the models used in the bayesQR package. The main part of this paper discusses the computational problems that arise in the implementation of the procedure and illustrates the usefulness of the package through selected examples.
Emerging approach for analytical characterization and geographical classification of Moroccan and French honeys by means of a voltammetric electronic tongue.

Science.gov (United States)

El Alami El Hassani, Nadia; Tahri, Khalid; Llobet, Eduard; Bouchikhi, Benachir; Errachid, Abdelhamid; Zine, Nadia; El Bari, Nezha

2018-03-15

Moroccan and French honeys from different geographical areas were classified and characterized by applying a voltammetric electronic tongue (VE-tongue) coupled to analytical methods. The studied parameters include color intensity, free lactonic and total acidity, proteins, phenols, hydroxymethylfurfural content (HMF), sucrose, reducing and total sugars. The geographical classification of different honeys was developed through three-pattern recognition techniques: principal component analysis (PCA), support vector machines (SVMs) and hierarchical cluster analysis (HCA). Honey characterization was achieved by partial least squares modeling (PLS). All the PLS models developed were able to accurately estimate the correct values of the parameters analyzed using as input the voltammetric experimental data (i.e. r>0.9). This confirms the potential ability of the VE-tongue for performing a rapid characterization of honeys via PLS in which an uncomplicated, cost-effective sample preparation process that does not require the use of additional chemicals is implemented. Copyright © 2017 Elsevier Ltd. All rights reserved.
cp-R, an interface the R programming language for clinical laboratory method comparisons.

Science.gov (United States)

Holmes, Daniel T

2015-02-01

Clinical scientists frequently need to compare two different bioanalytical methods as part of assay validation/monitoring. As a matter necessity, regression methods for quantitative comparison in clinical chemistry, hematology and other clinical laboratory disciplines must allow for error in both the x and y variables. Traditionally the methods popularized by 1) Deming and 2) Passing and Bablok have been recommended. While commercial tools exist, no simple open source tool is available. The purpose of this work was to develop and entirely open-source GUI-driven program for bioanalytical method comparisons capable of performing these regression methods and able to produce highly customized graphical output. The GUI is written in python and PyQt4 with R scripts performing regression and graphical functions. The program can be run from source code or as a pre-compiled binary executable. The software performs three forms of regression and offers weighting where applicable. Confidence bands of the regression are calculated using bootstrapping for Deming and Passing Bablok methods. Users can customize regression plots according to the tools available in R and can produced output in any of: jpg, png, tiff, bmp at any desired resolution or ps and pdf vector formats. Bland Altman plots and some regression diagnostic plots are also generated. Correctness of regression parameter estimates was confirmed against existing R packages. The program allows for rapid and highly customizable graphical output capable of conforming to the publication requirements of any clinical chemistry journal. Quick method comparisons can also be performed and cut and paste into spreadsheet or word processing applications. We present a simple and intuitive open source tool for quantitative method comparison in a clinical laboratory environment. Copyright © 2014 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.
Detection of Differential Item Functioning with Nonlinear Regression: A Non-IRT Approach Accounting for Guessing

Czech Academy of Sciences Publication Activity Database

Drabinová, Adéla; Martinková, Patrícia

2017-01-01

Roč. 54, č. 4 (2017), s. 498-517 ISSN 0022-0655 R&D Projects: GA ČR GJ15-15856Y Institutional support: RVO:67985807 Keywords : differential item functioning * non-linear regression * logistic regression * item response theory Subject RIV: AM - Education OBOR OECD: Statistics and probability Impact factor: 0.979, year: 2016
CADDIS Volume 4. Data Analysis: PECBO Appendix - R Scripts for Non-Parametric Regressions

Science.gov (United States)

Script for computing nonparametric regression analysis. Overview of using scripts to infer environmental conditions from biological observations, statistically estimating species-environment relationships, statistical scripts.
Formulating state space models in R with focus on longitudinal regression models

DEFF Research Database (Denmark)

Dethlefsen, Claus; Lundbye-Christensen, Søren

2006-01-01

We provide a language for formulating a range of state space models with response densities within the exponential family. The described methodology is implemented in the R-package sspir. A state space model is specified similarly to a generalized linear model in R, and then the time-varying terms...
8th International Conference on Partial Least Squares and Related Methods

CERN Document Server

Vinzi, Vincenzo; Russolillo, Giorgio; Saporta, Gilbert; Trinchera, Laura

2016-01-01

This volume presents state of the art theories, new developments, and important applications of Partial Least Square (PLS) methods. The text begins with the invited communications of current leaders in the field who cover the history of PLS, an overview of methodological issues, and recent advances in regression and multi-block approaches. The rest of the volume comprises selected, reviewed contributions from the 8th International Conference on Partial Least Squares and Related Methods held in Paris, France, on 26-28 May, 2014. They are organized in four coherent sections: 1) new developments in genomics and brain imaging, 2) new and alternative methods for multi-table and path analysis, 3) advances in partial least square regression (PLSR), and 4) partial least square path modeling (PLS-PM) breakthroughs and applications. PLS methods are very versatile methods that are now used in areas as diverse as engineering, life science, sociology, psychology, brain imaging, genomics, and business among both academics ...

Improved ability of biological and previous caries multimarkers to predict caries disease as revealed by multivariate PLS modelling

Directory of Open Access Journals (Sweden)

Ericson Thorild

2009-11-01

Full Text Available Abstract Background Dental caries is a chronic disease with plaque bacteria, diet and saliva modifying disease activity. Here we have used the PLS method to evaluate a multiplicity of such biological variables (n = 88 for ability to predict caries in a cross-sectional (baseline caries and prospective (2-year caries development setting. Methods Multivariate PLS modelling was used to associate the many biological variables with caries recorded in thirty 14-year-old children by measuring the numbers of incipient and manifest caries lesions at all surfaces. Results A wide but shallow gliding scale of one fifth caries promoting or protecting, and four fifths non-influential, variables occurred. The influential markers behaved in the order of plaque bacteria > diet > saliva, with previously known plaque bacteria/diet markers and a set of new protective diet markers. A differential variable patterning appeared for new versus progressing lesions. The influential biological multimarkers (n = 18 predicted baseline caries better (ROC area 0.96 than five markers (0.92 and a single lactobacilli marker (0.7 with sensitivity/specificity of 1.87, 1.78 and 1.13 at 1/3 of the subjects diagnosed sick, respectively. Moreover, biological multimarkers (n = 18 explained 2-year caries increment slightly better than reported before but predicted it poorly (ROC area 0.76. By contrast, multimarkers based on previous caries predicted alone (ROC area 0.88, or together with biological multimarkers (0.94, increment well with a sensitivity/specificity of 1.74 at 1/3 of the subjects diagnosed sick. Conclusion Multimarkers behave better than single-to-five markers but future multimarker strategies will require systematic searches for improved saliva and plaque bacteria markers.
Bayesian regression of piecewise homogeneous Poisson processes

Directory of Open Access Journals (Sweden)

Diego Sevilla

2015-12-01

Full Text Available In this paper, a Bayesian method for piecewise regression is adapted to handle counting processes data distributed as Poisson. A numerical code in Mathematica is developed and tested analyzing simulated data. The resulting method is valuable for detecting breaking points in the count rate of time series for Poisson processes. Received: 2 November 2015, Accepted: 27 November 2015; Edited by: R. Dickman; Reviewed by: M. Hutter, Australian National University, Canberra, Australia.; DOI: http://dx.doi.org/10.4279/PIP.070018 Cite as: D J R Sevilla, Papers in Physics 7, 070018 (2015
An Innovative Technique to Assess Spontaneous Baroreflex Sensitivity with Short Data Segments: Multiple Trigonometric Regressive Spectral Analysis.

Science.gov (United States)

Li, Kai; Rüdiger, Heinz; Haase, Rocco; Ziemssen, Tjalf

2018-01-01

Objective: As the multiple trigonometric regressive spectral (MTRS) analysis is extraordinary in its ability to analyze short local data segments down to 12 s, we wanted to evaluate the impact of the data segment settings by applying the technique of MTRS analysis for baroreflex sensitivity (BRS) estimation using a standardized data pool. Methods: Spectral and baroreflex analyses were performed on the EuroBaVar dataset (42 recordings, including lying and standing positions). For this analysis, the technique of MTRS was used. We used different global and local data segment lengths, and chose the global data segments from different positions. Three global data segments of 1 and 2 min and three local data segments of 12, 20, and 30 s were used in MTRS analysis for BRS. Results: All the BRS-values calculated on the three global data segments were highly correlated, both in the supine and standing positions; the different global data segments provided similar BRS estimations. When using different local data segments, all the BRS-values were also highly correlated. However, in the supine position, using short local data segments of 12 s overestimated BRS compared with those using 20 and 30 s. In the standing position, the BRS estimations using different local data segments were comparable. There was no proportional bias for the comparisons between different BRS estimations. Conclusion: We demonstrate that BRS estimation by the MTRS technique is stable when using different global data segments, and MTRS is extraordinary in its ability to evaluate BRS in even short local data segments (20 and 30 s). Because of the non-stationary character of most biosignals, the MTRS technique would be preferable for BRS analysis especially in conditions when only short stationary data segments are available or when dynamic changes of BRS should be monitored.
Integrating SQ4R Technique with Graphic Postorganizers in the Science Learning of Earth and Space

OpenAIRE

Djudin, Tomo; Amir, R

2018-01-01

This study examined the effect of integrating SQ4R reading technique with graphic post organizers on the students' Earth and Space Science learning achievement and development of metacognitive knowledge. The pretest-posttest non-equivalent control group design was employed in this quasi-experimental method. The sample which consists of 103 seventh grade of secondary school students of SMPN 1 Pontianak was drawn by using intact group random sampling technique. An achievement test and a questio...
Analyzing the Relative Linkages of Land Use and Hydrologic Variables with Urban Surface Water Quality using Multivariate Techniques

Science.gov (United States)

Ahmed, S.; Abdul-Aziz, O. I.

2015-12-01

We used a systematic data-analytics approach to analyze and quantify relative linkages of four stream water quality indicators (total nitrogen, TN; total phosphorus, TP; chlorophyll-a, Chla; and dissolved oxygen, DO) with six land use and four hydrologic variables, along with the potential external (upstream in-land and downstream coastal) controls in highly complex coastal urban watersheds of southeast Florida, U.S.A. Multivariate pattern recognition techniques of principle component and factor analyses, in concert with Pearson correlation analysis, were applied to map interrelations and identify latent patterns of the participatory variables. Relative linkages of the in-stream water quality variables with their associated drivers were then quantified by developing dimensionless partial least squares (PLS) regression model based on standardized data. Model fitting efficiency (R2=0.71-0.87) and accuracy (ratio of root-mean-square error to the standard deviation of the observations, RSR=0.35-0.53) suggested good predictions of the water quality variables in both wet and dry seasons. Agricultural land and groundwater exhibited substantial controls on surface water quality. In-stream TN concentration appeared to be mostly contributed by the upstream water entering from Everglades in both wet and dry seasons. In contrast, watershed land uses had stronger linkages with TP and Chla than that of the watershed hydrologic and upstream (Everglades) components for both seasons. Both land use and hydrologic components showed strong linkages with DO in wet season; however, the land use linkage appeared to be less in dry season. The data-analytics method provided a comprehensive empirical framework to achieve crucial mechanistic insights into the urban stream water quality processes. Our study quantitatively identified dominant drivers of water quality, indicating key management targets to maintain healthy stream ecosystems in complex urban-natural environments near the coast.
Analyzing Integrated Cost-Schedule Risk for Complex Product Systems R&D Projects

Directory of Open Access Journals (Sweden)

Zhe Xu

2014-01-01

Full Text Available The vast majority of the research efforts in project risk management tend to assess cost risk and schedule risk independently. However, project cost and time are related in reality and the relationship between them should be analyzed directly. We propose an integrated cost and schedule risk assessment model for complex product systems R&D projects. Graphical evaluation review technique (GERT, Monte Carlo simulation, and probability distribution theory are utilized to establish the model. In addition, statistical analysis and regression analysis techniques are employed to analyze simulation outputs. Finally, a complex product systems R&D project as an example is modeled by the proposed approach and the simulation outputs are analyzed to illustrate the effectiveness of the risk assessment model. It seems that integrating cost and schedule risk assessment can provide more reliable risk estimation results.
A method for nonlinear exponential regression analysis

Science.gov (United States)

Junkin, B. G.

1971-01-01

A computer-oriented technique is presented for performing a nonlinear exponential regression analysis on decay-type experimental data. The technique involves the least squares procedure wherein the nonlinear problem is linearized by expansion in a Taylor series. A linear curve fitting procedure for determining the initial nominal estimates for the unknown exponential model parameters is included as an integral part of the technique. A correction matrix was derived and then applied to the nominal estimate to produce an improved set of model parameters. The solution cycle is repeated until some predetermined criterion is satisfied.
Real estate value prediction using multivariate regression models

Science.gov (United States)

Manjula, R.; Jain, Shubham; Srivastava, Sharad; Rajiv Kher, Pranav

2017-11-01

The real estate market is one of the most competitive in terms of pricing and the same tends to vary significantly based on a lot of factors, hence it becomes one of the prime fields to apply the concepts of machine learning to optimize and predict the prices with high accuracy. Therefore in this paper, we present various important features to use while predicting housing prices with good accuracy. We have described regression models, using various features to have lower Residual Sum of Squares error. While using features in a regression model some feature engineering is required for better prediction. Often a set of features (multiple regressions) or polynomial regression (applying a various set of powers in the features) is used for making better model fit. For these models are expected to be susceptible towards over fitting ridge regression is used to reduce it. This paper thus directs to the best application of regression models in addition to other techniques to optimize the result.
Modified multiblock partial least squares path modeling algorithm with backpropagation neural networks approach

Science.gov (United States)

Yuniarto, Budi; Kurniawan, Robert

2017-03-01

PLS Path Modeling (PLS-PM) is different from covariance based SEM, where PLS-PM use an approach based on variance or component, therefore, PLS-PM is also known as a component based SEM. Multiblock Partial Least Squares (MBPLS) is a method in PLS regression which can be used in PLS Path Modeling which known as Multiblock PLS Path Modeling (MBPLS-PM). This method uses an iterative procedure in its algorithm. This research aims to modify MBPLS-PM with Back Propagation Neural Network approach. The result is MBPLS-PM algorithm can be modified using the Back Propagation Neural Network approach to replace the iterative process in backward and forward step to get the matrix t and the matrix u in the algorithm. By modifying the MBPLS-PM algorithm using Back Propagation Neural Network approach, the model parameters obtained are relatively not significantly different compared to model parameters obtained by original MBPLS-PM algorithm.
Multiple regression and beyond an introduction to multiple regression and structural equation modeling

CERN Document Server

Keith, Timothy Z

2014-01-01

Multiple Regression and Beyond offers a conceptually oriented introduction to multiple regression (MR) analysis and structural equation modeling (SEM), along with analyses that flow naturally from those methods. By focusing on the concepts and purposes of MR and related methods, rather than the derivation and calculation of formulae, this book introduces material to students more clearly, and in a less threatening way. In addition to illuminating content necessary for coursework, the accessibility of this approach means students are more likely to be able to conduct research using MR or SEM--and more likely to use the methods wisely. Covers both MR and SEM, while explaining their relevance to one another Also includes path analysis, confirmatory factor analysis, and latent growth modeling Figures and tables throughout provide examples and illustrate key concepts and techniques For additional resources, please visit: http://tzkeith.com/.
Evaluation of three gentamicin serum assay techniques

International Nuclear Information System (INIS)

Matzke, G.R.; Gwizdala, C.; Wery, J.; Ferry, D.; Starnes, R.

1982-01-01

This investigation was designed to compare the enzyme-modified immunoassay (Syva--EMIT) with a radioimmunoassay (New England Nuclear--RIA) and the radiometric assay (Johnston--BACTEC) to determine the optimal assay for use in our aminoglycoside dosing service. The serum concentration determinations obtained via the three assay methods were analyzed by linear regression analysis. Significant positive correlations were noted between the three assay techniques (p less than 0.005) during both sample collection phases. The coefficients of determination for EMIT vs BACTEC and RIA vs BACTEC were 0.73 and 0.83 during phase 1, respectively, and 0.65 and 0.68 during phase 2, respectively. The slope of the regression lines also varied markedly during the two phases; 0.49 and 0.42 for EMIT and for RIA vs BACTEC, respectively, during phase 1 compound with 1.12 and 0.77, respectively, during phase 2. The differences noted in these relationships during phase 1 and 2 may be related to the alteration of the pH of the control sera utilized in the BACTEC assay. In contrast, RIA vs EMIT regression analysis indicated that existence of a highly significant relationship (p less than 0.0005 and r2 . 0.90). The EMIT technique was the easiest and most accurate for determination of serum gentamicin concentrations, whereas the BACTEC method was judged unacceptable for clinical use
Potential of multispectral imaging technology for rapid and non-destructive determination of the microbiological quality of beef filets during aerobic storage

DEFF Research Database (Denmark)

Panagou, Efstathios Z.; Papadopoulou, Olga; Carstensen, Jens Michael

2014-01-01

counts, namely Class 1 (TVC7.0log10CFU/g). Furthermore, PLS regression models were developed to provide quantitative estimations of microbial counts during meat storage. In both cases model validation was implemented with independent experiments at intermediate storage temperatures (2 and 10°C) using....... thermosphacta, and TVC, respectively. The results indicated that multispectral vision technology has significant potential as a rapid and non-destructive technique in assessing the microbiological quality of beef fillets....
Improvement of the thermo-mechanical position stability of the beam position monitor in the PLS-II

Science.gov (United States)

Ha, Taekyun; Hong, Mansu; Kwon, Hyuckchae; Han, Hongsik; Park, Chongdo

2016-09-01

In the storage ring of the Pohang Light Source-II (PLS-II), we reduced the mechanical displacement of the electron-beam position monitors (e-BPMs) that is caused by heating during e-beam storage. The BPM pickup itself must be kept stable to sub-micrometer precision in order for a stable photon beam to be provided to beamlines because the orbit feedback system is programmed to make the electron beam pass through the center of the BPM. Thermal deformation of the vacuum chambers on which the BPM pickups are mounted is inevitable when the electron beam current is changed by an unintended beam abort. We reduced this deformation by improving the vacuum chamber support and by enhancing the water cooling. We report a thermo-mechanical analysis and displacement measurements for the BPM pickups after improvements.
Dynamic olfactometry and GC–TOFMS to monitor the efficiency of an industrial biofilter

Energy Technology Data Exchange (ETDEWEB)

Gutiérrez, M.C.; Martín, M.A. [University of Cordoba, Department of Inorganic Chemical and Chemical Engineering, Campus Universitario de Rabanales, Carretera N-IV, km 396, Edificio Marie Curie, 14071 Córdoba (Spain); Pagans, E.; Vera, L. [Odournet SL, Parc de Recerca UAB, Edificio Eureka, Espacio P2M2, 08193, Bellaterra, Cerdanyola del Vallès, Barcelona (Spain); García-Olmo, J. [NIR/MIR Spectroscopy Unit, Central Service for Research Support (SCAI), University of Cordoba, Campus de Rabanales, 14071 Cordoba (Spain); Chica, A.F., E-mail: afchica@uco.es [University of Cordoba, Department of Inorganic Chemical and Chemical Engineering, Campus Universitario de Rabanales, Carretera N-IV, km 396, Edificio Marie Curie, 14071 Córdoba (Spain)

2015-04-15

Biofiltration is the most widely used technique for eliminating odours in waste treatment plants. Volatile organic compounds (VOCs) are among the odorous compounds emitted by waste management plants, and serve as variables to measure odour emissions depending on the type of aeration process used. In this work, we assess the performance of an industrial-scale biofilter where composting is the main source of VOCs and odour emissions. Dynamic olfactometry is the sensorial technique used to determine odour concentration, while gas chromatography–time of flight-mass spectrometry (GC–TOFMS) is used to perform the chemical characterization. This work examines a total of 82 compounds belonging to 15 odorous families of VOCs, particularly mercaptans, sulphur-containing compounds, alcohols and terpenes, among others. Principal component analysis (PCA) is used to assess the influence of each of these families of VOCs on the total variance of the measure with regard to both the input and output flow of the biofilter. Finally, partial least-squares (PLS) regression is used to estimate the odour concentration in each of the samples taken at the inlet and outlet of the biofilter in each of the samples based on the chemical information provided by chromatographic analysis. The study shows that there is an adequate correlation (r = 0.9751) between real and estimated odour concentrations, both of which are expressed in European odour units per cubic metre (ou{sub E}·m{sup −3}). - Highlights: • Odour and VOC removal by industrial biofilter was evaluated. • Dynamic olfactometry and GC-TOF MS were the techniques used. • The compost aeration mode was considered in this study. • The influence of 15 VOC families on sample variance was demonstrated by PCA. • Odour concentration was predicted from selected chromatographic information by PLS.
An empirical study on open position risk assessment using VAR and regression analysis: A case study of Iranian banking industry

Directory of Open Access Journals (Sweden)

Elmira Mahmoudzadeh

2012-10-01

Full Text Available During the past few years, there have been tremendous fluctuations on different currencies. For instance, European common currency, Euro, has be fluctuated between 0.60 to 0.9 against US dollar. Therefore, it is important to study the behavior of currency valuations using different techniques. In this paper, we present an empirical study to measure the impact of different items on risk of foreign currency using value at risk (VaR and regression methods. The proposed model of this paper investigates whether the risk of open positions of six foreign currencies including US dollar, Euro, British Pound, Switzerland Frank, Norwegian Kroner and United Emirate Dirham increase during the time horizon. The proposed study of this paper uses historical daily prices of these currencies for a fiscal year of 2011 in one of private banks located in Iran and measures the relative risk. The results of the implementation of two methods of VaR and linear regression indicate that the risk of open positions increases during the time horizon.
Hyperspectral Unmixing with Robust Collaborative Sparse Regression

Directory of Open Access Journals (Sweden)

Chang Li

2016-07-01

Full Text Available Recently, sparse unmixing (SU of hyperspectral data has received particular attention for analyzing remote sensing images. However, most SU methods are based on the commonly admitted linear mixing model (LMM, which ignores the possible nonlinear effects (i.e., nonlinearity. In this paper, we propose a new method named robust collaborative sparse regression (RCSR based on the robust LMM (rLMM for hyperspectral unmixing. The rLMM takes the nonlinearity into consideration, and the nonlinearity is merely treated as outlier, which has the underlying sparse property. The RCSR simultaneously takes the collaborative sparse property of the abundance and sparsely distributed additive property of the outlier into consideration, which can be formed as a robust joint sparse regression problem. The inexact augmented Lagrangian method (IALM is used to optimize the proposed RCSR. The qualitative and quantitative experiments on synthetic datasets and real hyperspectral images demonstrate that the proposed RCSR is efficient for solving the hyperspectral SU problem compared with the other four state-of-the-art algorithms.
The use of adaptive statistical iterative reconstruction (ASiR) technique in evaluation of patients with cervical spine trauma: impact on radiation dose reduction and image quality.

Science.gov (United States)

Patro, Satya N; Chakraborty, Santanu; Sheikh, Adnan

2016-01-01

The aim of this study was to evaluate the impact of adaptive statistical iterative reconstruction (ASiR) technique on the image quality and radiation dose reduction. The comparison was made with the traditional filtered back projection (FBP) technique. We retrospectively reviewed 78 patients, who underwent cervical spine CT for blunt cervical trauma between 1 June 2010 and 30 November 2010. 48 patients were imaged using traditional FBP technique and the remaining 30 patients were imaged using the ASiR technique. The patient demographics, radiation dose, objective image signal and noise were recorded; while subjective noise, sharpness, diagnostic acceptability and artefacts were graded by two radiologists blinded to the techniques. We found that the ASiR technique was able to reduce the volume CT dose index, dose-length product and effective dose by 36%, 36.5% and 36.5%, respectively, compared with the FBP technique. There was no significant difference in the image noise (p = 0.39), signal (p = 0.82) and signal-to-noise ratio (p = 0.56) between the groups. The subjective image quality was minimally better in the ASiR group but not statistically significant. There was excellent interobserver agreement on the subjective image quality and diagnostic acceptability for both groups. The use of ASiR technique allowed approximately 36% radiation dose reduction in the evaluation of cervical spine without degrading the image quality. The present study highlights that the ASiR technique is extremely helpful in reducing the patient radiation exposure while maintaining the image quality. It is highly recommended to utilize this novel technique in CT imaging of different body regions.
Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model

Science.gov (United States)

Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami

2017-06-01

A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.
Retro-regression--another important multivariate regression improvement.

Science.gov (United States)

Randić, M

2001-01-01

We review the serious problem associated with instabilities of the coefficients of regression equations, referred to as the MRA (multivariate regression analysis) "nightmare of the first kind". This is manifested when in a stepwise regression a descriptor is included or excluded from a regression. The consequence is an unpredictable change of the coefficients of the descriptors that remain in the regression equation. We follow with consideration of an even more serious problem, referred to as the MRA "nightmare of the second kind", arising when optimal descriptors are selected from a large pool of descriptors. This process typically causes at different steps of the stepwise regression a replacement of several previously used descriptors by new ones. We describe a procedure that resolves these difficulties. The approach is illustrated on boiling points of nonanes which are considered (1) by using an ordered connectivity basis; (2) by using an ordering resulting from application of greedy algorithm; and (3) by using an ordering derived from an exhaustive search for optimal descriptors. A novel variant of multiple regression analysis, called retro-regression (RR), is outlined showing how it resolves the ambiguities associated with both "nightmares" of the first and the second kind of MRA.
Modified Regression Correlation Coefficient for Poisson Regression Model

Science.gov (United States)

Kaengthong, Nattacha; Domthong, Uthumporn

2017-09-01

This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).

Sensor combination and chemometric variable selection for online monitoring of Streptomyces coelicolor fed-batch cultivations

DEFF Research Database (Denmark)

Ödman, Peter; Johansen, C.L.; Olsson, L.

2010-01-01

of biomass and substrate (casamino acids) concentrations, respectively. The effect of combination of fluorescence and gas analyzer data as well as of different variable selection methods was investigated. Improved prediction models were obtained by combination of data from the two sensors and by variable......Fed-batch cultivations of Streptomyces coelicolor, producing the antibiotic actinorhodin, were monitored online by multiwavelength fluorescence spectroscopy and off-gas analysis. Partial least squares (PLS), locally weighted regression, and multilinear PLS (N-PLS) models were built for prediction...
Wind speed prediction using statistical regression and neural network

Indian Academy of Sciences (India)

Prediction of wind speed in the atmospheric boundary layer is important for wind energy assess- ment,satellite launching and aviation,etc.There are a few techniques available for wind speed prediction,which require a minimum number of input parameters.Four different statistical techniques,viz.,curve ﬁtting,Auto Regressive ...
Semi-quantitative prediction of a multiple API solid dosage form with a combination of vibrational spectroscopy methods.

Science.gov (United States)

Hertrampf, A; Sousa, R M; Menezes, J C; Herdling, T

2016-05-30

Quality control (QC) in the pharmaceutical industry is a key activity in ensuring medicines have the required quality, safety and efficacy for their intended use. QC departments at pharmaceutical companies are responsible for all release testing of final products but also all incoming raw materials. Near-infrared spectroscopy (NIRS) and Raman spectroscopy are important techniques for fast and accurate identification and qualification of pharmaceutical samples. Tablets containing two different active pharmaceutical ingredients (API) [bisoprolol, hydrochlorothiazide] in different commercially available dosages were analysed using Raman- and NIR Spectroscopy. The goal was to define multivariate models based on each vibrational spectroscopy to discriminate between different dosages (identity) and predict their dosage (semi-quantitative). Furthermore the combination of spectroscopic techniques was investigated. Therefore, two different multiblock techniques based on PLS have been applied: multiblock PLS (MB-PLS) and sequential-orthogonalised PLS (SO-PLS). NIRS showed better results compared to Raman spectroscopy for both identification and quantitation. The multiblock techniques investigated showed that each spectroscopy contains information not present or captured with the other spectroscopic technique, thus demonstrating that there is a potential benefit in their combined use for both identification and quantitation purposes. Copyright © 2016 Elsevier B.V. All rights reserved.
Using Apparent Density of Paper from Hardwood Kraft Pulps to Predict Sheet Properties, based on Unsupervised Classification and Multivariable Regression Techniques

Directory of Open Access Journals (Sweden)

Ofélia Anjos

2015-07-01

Full Text Available Paper properties determine the product application potential and depend on the raw material, pulping conditions, and pulp refining. The aim of this study was to construct mathematical models that predict quantitative relations between the paper density and various mechanical and optical properties of the paper. A dataset of properties of paper handsheets produced with pulps of Acacia dealbata, Acacia melanoxylon, and Eucalyptus globulus beaten at 500, 2500, and 4500 revolutions was used. Unsupervised classification techniques were combined to assess the need to perform separated prediction models for each species, and multivariable regression techniques were used to establish such prediction models. It was possible to develop models with a high goodness of fit using paper density as the independent variable (or predictor for all variables except tear index and zero-span tensile strength, both dry and wet.
Studying Vegetation Salinity: From the Field View to a Satellite-Based Perspective

Directory of Open Access Journals (Sweden)

Rachel Lugassi

2017-02-01

Full Text Available Salinization of irrigated lands in the semi-arid Jezreel Valley, Northern Israel results in soil-structure deterioration and crop damage. We formulated a generic rule for estimating salinity of different vegetation types by studying the relationship between Cl/Na and different spectral slopes in the visible–near infrared–shortwave infrared (VIS–NIR–SWIR spectral range using both field measurements and satellite imagery (Sentinel-2. For the field study, the slope-based model was integrated with conventional partial least squares (PLS analyses. Differences in 14 spectral ranges, indicating changes in salinity levels, were identified across the VIS–NIR–SWIR region (350–2500 nm. Next, two different models were run using PLS regression: (i using spectral slope data across these ranges; and (ii using preprocessed spectral reflectance. The best model for predicting Cl content was based on continuum removal reflectance (R2 = 0.84. Satisfactory correlations were obtained using the slope-based PLS model (R2 = 0.77 for Cl and R2 = 0.63 for Na. Thus, salinity contents in fresh plants could be estimated, despite masking of some spectral regions by water absorbance. Finally, we estimated the most sensitive spectral channels for monitoring vegetation salinity from a satellite perspective. We evaluated the recently available Sentinel-2 imagery’s ability to distinguish variability in vegetation salinity levels. The best estimate of a Sentinel-2-based vegetation salinity index was generated based on a ratio between calculated slopes: the 490–665 nm and 705–1610 nm. This index was denoted as the Sentinel-2-based vegetation salinity index (SVSI (band 4 − band 2/(band 5 + band 11.
On generalized elliptical quantiles in the nonlinear quantile regression setup

Czech Academy of Sciences Publication Activity Database

Hlubinka, D.; Šiman, Miroslav

2015-01-01

Roč. 24, č. 2 (2015), s. 249-264 ISSN 1133-0686 R&D Projects: GA ČR GA14-07234S Institutional support: RVO:67985556 Keywords : multivariate quantile * elliptical quantile * quantile regression * multivariate statistical inference * portfolio optimization Subject RIV: BA - General Mathematics Impact factor: 1.207, year: 2015 http://library.utia.cas.cz/separaty/2014/SI/siman-0434510.pdf
Three Contributions to Robust Regression Diagnostics

Czech Academy of Sciences Publication Activity Database

Kalina, Jan

2015-01-01

Roč. 11, č. 2 (2015), s. 69-78 ISSN 1336-9180 Grant - others:GA ČR(CZ) GA13-01930S; Nadační fond na podporu vědy(CZ) Neuron Institutional support: RVO:67985807 Keywords : robust regression * robust econometrics * hypothesis test ing Subject RIV: BA - General Mathematics http://www.degruyter.com/view/j/jamsi.2015.11.issue-2/jamsi-2015-0013/jamsi-2015-0013.xml?format=INT
A multiple regression analysis for accurate background subtraction in 99Tcm-DTPA renography

International Nuclear Information System (INIS)

Middleton, G.W.; Thomson, W.H.; Davies, I.H.; Morgan, A.

1989-01-01

A technique for accurate background subtraction in 99 Tc m -DTPA renography is described. The technique is based on a multiple regression analysis of the renal curves and separate heart and soft tissue curves which together represent background activity. It is compared, in over 100 renograms, with a previously described linear regression technique. Results show that the method provides accurate background subtraction, even in very poorly functioning kidneys, thus enabling relative renal filtration and excretion to be accurately estimated. (author)
A modification of the successive projections algorithm for spectral variable selection in the presence of unknown interferents.

Science.gov (United States)

Soares, Sófacles Figueredo Carreiro; Galvão, Roberto Kawakami Harrop; Araújo, Mário César Ugulino; da Silva, Edvan Cirino; Pereira, Claudete Fernandes; de Andrade, Stéfani Iury Evangelista; Leite, Flaviano Carvalho

2011-03-09

This work proposes a modification to the successive projections algorithm (SPA) aimed at selecting spectral variables for multiple linear regression (MLR) in the presence of unknown interferents not included in the calibration data set. The modified algorithm favours the selection of variables in which the effect of the interferent is less pronounced. The proposed procedure can be regarded as an adaptive modelling technique, because the spectral features of the samples to be analyzed are considered in the variable selection process. The advantages of this new approach are demonstrated in two analytical problems, namely (1) ultraviolet-visible spectrometric determination of tartrazine, allure red and sunset yellow in aqueous solutions under the interference of erythrosine, and (2) near-infrared spectrometric determination of ethanol in gasoline under the interference of toluene. In these case studies, the performance of conventional MLR-SPA models is substantially degraded by the presence of the interferent. This problem is circumvented by applying the proposed Adaptive MLR-SPA approach, which results in prediction errors smaller than those obtained by three other multivariate calibration techniques, namely stepwise regression, full-spectrum partial-least-squares (PLS) and PLS with variables selected by a genetic algorithm. An inspection of the variable selection results reveals that the Adaptive approach successfully avoids spectral regions in which the interference is more intense. Copyright © 2011 Elsevier B.V. All rights reserved.
Linear regression and sensitivity analysis in nuclear reactor design

International Nuclear Information System (INIS)

Kumar, Akansha; Tsvetkov, Pavel V.; McClarren, Ryan G.

2015-01-01

Highlights: • Presented a benchmark for the applicability of linear regression to complex systems. • Applied linear regression to a nuclear reactor power system. • Performed neutronics, thermal–hydraulics, and energy conversion using Brayton’s cycle for the design of a GCFBR. • Performed detailed sensitivity analysis to a set of parameters in a nuclear reactor power system. • Modeled and developed reactor design using MCNP, regression using R, and thermal–hydraulics in Java. - Abstract: The paper presents a general strategy applicable for sensitivity analysis (SA), and uncertainity quantification analysis (UA) of parameters related to a nuclear reactor design. This work also validates the use of linear regression (LR) for predictive analysis in a nuclear reactor design. The analysis helps to determine the parameters on which a LR model can be fit for predictive analysis. For those parameters, a regression surface is created based on trial data and predictions are made using this surface. A general strategy of SA to determine and identify the influential parameters those affect the operation of the reactor is mentioned. Identification of design parameters and validation of linearity assumption for the application of LR of reactor design based on a set of tests is performed. The testing methods used to determine the behavior of the parameters can be used as a general strategy for UA, and SA of nuclear reactor models, and thermal hydraulics calculations. A design of a gas cooled fast breeder reactor (GCFBR), with thermal–hydraulics, and energy transfer has been used for the demonstration of this method. MCNP6 is used to simulate the GCFBR design, and perform the necessary criticality calculations. Java is used to build and run input samples, and to extract data from the output files of MCNP6, and R is used to perform regression analysis and other multivariate variance, and analysis of the collinearity of data
Interactions of miR-323/miR-326/miR-329 and miR-130a/miR-155/miR-210 as prognostic indicators for clinical outcome of glioblastoma patients

Directory of Open Access Journals (Sweden)

Qiu Shuwei

2013-01-01

Full Text Available Abstract Background Glioblastoma multiforme (GBM is the most common and aggressive brain tumor with poor clinical outcome. Identification and development of new markers could be beneficial for the diagnosis and prognosis of GBM patients. Deregulation of microRNAs (miRNAs or miRs is involved in GBM. Therefore, we attempted to identify and develop specific miRNAs as prognostic and predictive markers for GBM patient survival. Methods Expression profiles of miRNAs and genes and the corresponding clinical information of 480 GBM samples from The Cancer Genome Atlas (TCGA dataset were downloaded and interested miRNAs were identified. Patients’ overall survival (OS and progression-free survival (PFS associated with interested miRNAs and miRNA-interactions were performed by Kaplan-Meier survival analysis. The impacts of miRNA expressions and miRNA-interactions on survival were evaluated by Cox proportional hazard regression model. Biological processes and network of putative and validated targets of miRNAs were analyzed by bioinformatics. Results In this study, 6 interested miRNAs were identified. Survival analysis showed that high levels of miR-326/miR-130a and low levels of miR-323/miR-329/miR-155/miR-210 were significantly associated with long OS of GBM patients, and also showed that high miR-326/miR-130a and low miR-155/miR-210 were related with extended PFS. Moreover, miRNA-323 and miRNA-329 were found to be increased in patients with no-recurrence or long time to progression (TTP. More notably, our analysis revealed miRNA-interactions were more specific and accurate to discriminate and predict OS and PFS. This interaction stratified OS and PFS related with different miRNA levels more detailed, and could obtain longer span of mean survival in comparison to that of one single miRNA. Moreover, miR-326, miR-130a, miR-155, miR-210 and 4 miRNA-interactions were confirmed for the first time as independent predictors for survival by Cox regression model
QSAR Study of Insecticides of Phthalamide Derivatives Using Multiple Linear Regression and Artificial Neural Network Methods

Directory of Open Access Journals (Sweden)

Adi Syahputra

2014-03-01

Full Text Available Quantitative structure activity relationship (QSAR for 21 insecticides of phthalamides containing hydrazone (PCH was studied using multiple linear regression (MLR, principle component regression (PCR and artificial neural network (ANN. Five descriptors were included in the model for MLR and ANN analysis, and five latent variables obtained from principle component analysis (PCA were used in PCR analysis. Calculation of descriptors was performed using semi-empirical PM6 method. ANN analysis was found to be superior statistical technique compared to the other methods and gave a good correlation between descriptors and activity (r2 = 0.84. Based on the obtained model, we have successfully designed some new insecticides with higher predicted activity than those of previously synthesized compounds, e.g.2-(decalinecarbamoyl-5-chloro-N’-((5-methylthiophen-2-ylmethylene benzohydrazide, 2-(decalinecarbamoyl-5-chloro-N’-((thiophen-2-yl-methylene benzohydrazide and 2-(decaline carbamoyl-N’-(4-fluorobenzylidene-5-chlorobenzohydrazide with predicted log LC50 of 1.640, 1.672, and 1.769 respectively.
The Collinearity Free and Bias Reduced Regression Estimation Project: The Theory of Normalization Ridge Regression. Report No. 2.

Science.gov (United States)

Bulcock, J. W.; And Others

Multicollinearity refers to the presence of highly intercorrelated independent variables in structural equation models, that is, models estimated by using techniques such as least squares regression and maximum likelihood. There is a problem of multicollinearity in both the natural and social sciences where theory formulation and estimation is in…
Predicting the cross-reactivities of polycyclic aromatic hydrocarbons in ELISA by regression analysis and CoMFA methods

Energy Technology Data Exchange (ETDEWEB)

Zhang, Yan-Feng; Dai, Shu-Gui [College of Environmental Science and Engineering, Nankai University, Key Laboratory for Pollution Process and Environmental Criteria of Ministry of Education, Tianjin (China); Ma, Yi [College of Chemistry, Nankai University, Institute of Elemento-Organic Chemistry, Tianjin (China); Gao, Zhi-Xian [Institute of Hygiene and Environmental Medicine, Tianjin (China)

2010-07-15

Immunoassays have been regarded as a possible alternative or supplement for measuring polycyclic aromatic hydrocarbons (PAHs) in the environment. Since there are too many potential cross-reactants for PAH immunoassays, it is difficult to determine all the cross-reactivities (CRs) by experimental tests. The relationship between CR and the physical-chemical properties of PAHs and related compounds was investigated using the CR data from a commercial enzyme-linked immunosorbent assay (ELISA) kit test. Two quantitative structure-activity relationship (QSAR) techniques, regression analysis and comparative molecular field analysis (CoMFA), were applied for predicting the CR of PAHs in this ELISA kit. Parabolic regression indicates that the CRs are significantly correlated with the logarithm of the partition coefficient for the octanol-water system (log K{sub ow}) (r{sup 2}=0.643, n=23, P<0.0001), suggesting that hydrophobic interactions play an important role in the antigen-antibody binding and the cross-reactions in this ELISA test. The CoMFA model obtained shows that the CRs of the PAHs are correlated with the 3D structure of the molecules (r{sub cv}{sup 2}=0.663, r{sup 2}=0.873, F{sub 4,32}=55.086). The contributions of the steric and electrostatic fields to CR were 40.4 and 59.6%, respectively. Both of the QSAR models satisfactorily predict the CR in this PAH immunoassay kit, and help in understanding the mechanisms of antigen-antibody interaction. (orig.)
Adaptive metric kernel regression

DEFF Research Database (Denmark)

Goutte, Cyril; Larsen, Jan

2000-01-01

Kernel smoothing is a widely used non-parametric pattern recognition technique. By nature, it suffers from the curse of dimensionality and is usually difficult to apply to high input dimensions. In this contribution, we propose an algorithm that adapts the input metric used in multivariate...... regression by minimising a cross-validation estimate of the generalisation error. This allows to automatically adjust the importance of different dimensions. The improvement in terms of modelling performance is illustrated on a variable selection task where the adaptive metric kernel clearly outperforms...
Subset selection in regression

CERN Document Server

Miller, Alan

2002-01-01

Originally published in 1990, the first edition of Subset Selection in Regression filled a significant gap in the literature, and its critical and popular success has continued for more than a decade. Thoroughly revised to reflect progress in theory, methods, and computing power, the second edition promises to continue that tradition. The author has thoroughly updated each chapter, incorporated new material on recent developments, and included more examples and references. New in the Second Edition:A separate chapter on Bayesian methodsComplete revision of the chapter on estimationA major example from the field of near infrared spectroscopyMore emphasis on cross-validationGreater focus on bootstrappingStochastic algorithms for finding good subsets from large numbers of predictors when an exhaustive search is not feasible Software available on the Internet for implementing many of the algorithms presentedMore examplesSubset Selection in Regression, Second Edition remains dedicated to the techniques for fitting...
Multiple Linear Regression: A Realistic Reflector.

Science.gov (United States)

Nutt, A. T.; Batsell, R. R.

Examples of the use of Multiple Linear Regression (MLR) techniques are presented. This is done to show how MLR aids data processing and decision-making by providing the decision-maker with freedom in phrasing questions and by accurately reflecting the data on hand. A brief overview of the rationale underlying MLR is given, some basic definitions…
Pengaruh Bauran Pemasaran Terhadap Keputusan Pembelian Produk Mobil Mazda 2R Pada PT Nusantara Batavia Motor Jakarta Pusat

OpenAIRE

Ratnasari, Desy; Sunardi, HP

2015-01-01

The purpose of this study was to determine the effect of the product, price, place and promotion on purchase decisions. The population is people who buy cars mazda 2R At PT Nusantara Batavia Motor Jakarta. The samples used were 125 people who buy cars mazda 2R using questionnaire techniques. The collection of data through questionnaires. The results of multiple linear regression analysis, using SPSS show that all variable products, price, place and promotion has a significant positive effect ...
Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate

Directory of Open Access Journals (Sweden)

Minh Vu Trieu

2017-03-01

Full Text Available This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS, Brazilian tensile strength (BTS, rock brittleness index (BI, the distance between planes of weakness (DPW, and the alpha angle (Alpha between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP. Four (4 statistical regression models (two linear and two nonlinear are built to predict the ROP of TBM. Finally a fuzzy logic model is developed as an alternative method and compared to the four statistical regression models. Results show that the fuzzy logic model provides better estimations and can be applied to predict the TBM performance. The R-squared value (R2 of the fuzzy logic model scores the highest value of 0.714 over the second runner-up of 0.667 from the multiple variables nonlinear regression model.
Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate

Science.gov (United States)

Minh, Vu Trieu; Katushin, Dmitri; Antonov, Maksim; Veinthal, Renno

2017-03-01

This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM) based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS), Brazilian tensile strength (BTS), rock brittleness index (BI), the distance between planes of weakness (DPW), and the alpha angle (Alpha) between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP). Four (4) statistical regression models (two linear and two nonlinear) are built to predict the ROP of TBM. Finally a fuzzy logic model is developed as an alternative method and compared to the four statistical regression models. Results show that the fuzzy logic model provides better estimations and can be applied to predict the TBM performance. The R-squared value (R2) of the fuzzy logic model scores the highest value of 0.714 over the second runner-up of 0.667 from the multiple variables nonlinear regression model.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.