regression method pls: Topics by WorldWideScience.org

Sample records for regression method pls

Variable and subset selection in PLS regression

DEFF Research Database (Denmark)

Høskuldsson, Agnar

2001-01-01

The purpose of this paper is to present some useful methods for introductory analysis of variables and subsets in relation to PLS regression. We present here methods that are efficient in finding the appropriate variables or subset to use in the PLS regression. The general conclusion...... is that variable selection is important for successful analysis of chemometric data. An important aspect of the results presented is that lack of variable selection can spoil the PLS regression, and that cross-validation measures using a test set can show larger variation, when we use different subsets of X, than...
COMPARISON OF PARTIAL LEAST SQUARES REGRESSION METHOD ALGORITHMS: NIPALS AND PLS-KERNEL AND AN APPLICATION

Directory of Open Access Journals (Sweden)

ELİF BULUT

2013-06-01

Full Text Available Partial Least Squares Regression (PLSR is a multivariate statistical method that consists of partial least squares and multiple linear regression analysis. Explanatory variables, X, having multicollinearity are reduced to components which explain the great amount of covariance between explanatory and response variable. These components are few in number and they don’t have multicollinearity problem. Then multiple linear regression analysis is applied to those components to model the response variable Y. There are various PLSR algorithms. In this study NIPALS and PLS-Kernel algorithms will be studied and illustrated on a real data set.
Application of NIRS coupled with PLS regression as a rapid, non-destructive alternative method for quantification of KBA in Boswellia sacra

Science.gov (United States)

Al-Harrasi, Ahmed; Rehman, Najeeb Ur; Mabood, Fazal; Albroumi, Muhammaed; Ali, Liaqat; Hussain, Javid; Hussain, Hidayat; Csuk, René; Khan, Abdul Latif; Alam, Tanveer; Alameri, Saif

2017-09-01

In the present study, for the first time, NIR spectroscopy coupled with PLS regression as a rapid and alternative method was developed to quantify the amount of Keto-β-Boswellic Acid (KBA) in different plant parts of Boswellia sacra and the resin exudates of the trunk. NIR spectroscopy was used for the measurement of KBA standards and B. sacra samples in absorption mode in the wavelength range from 700-2500 nm. PLS regression model was built from the obtained spectral data using 70% of KBA standards (training set) in the range from 0.1 ppm to 100 ppm. The PLS regression model obtained was having R-square value of 98% with 0.99 corelationship value and having good prediction with RMSEP value 3.2 and correlation of 0.99. It was then used to quantify the amount of KBA in the samples of B. sacra. The results indicated that the MeOH extract of resin has the highest concentration of KBA (0.6%) followed by essential oil (0.1%). However, no KBA was found in the aqueous extract. The MeOH extract of the resin was subjected to column chromatography to get various sub-fractions at different polarity of organic solvents. The sub-fraction at 4% MeOH/CHCl3 (4.1% of KBA) was found to contain the highest percentage of KBA followed by another sub-fraction at 2% MeOH/CHCl3 (2.2% of KBA). The present results also indicated that KBA is only present in the gum-resin of the trunk and not in all parts of the plant. These results were further confirmed through HPLC analysis and therefore it is concluded that NIRS coupled with PLS regression is a rapid and alternate method for quantification of KBA in Boswellia sacra. It is non-destructive, rapid, sensitive and uses simple methods of sample preparation.
Kinetic microplate bioassays for relative potency of antibiotics improved by partial Least Square (PLS) regression.

Science.gov (United States)

Francisco, Fabiane Lacerda; Saviano, Alessandro Morais; Almeida, Túlia de Souza Botelho; Lourenço, Felipe Rebello

2016-05-01

Microbiological assays are widely used to estimate the relative potencies of antibiotics in order to guarantee the efficacy, safety, and quality of drug products. Despite of the advantages of turbidimetric bioassays when compared to other methods, it has limitations concerning the linearity and range of the dose-response curve determination. Here, we proposed to use partial least squares (PLS) regression to solve these limitations and to improve the prediction of relative potencies of antibiotics. Kinetic-reading microplate turbidimetric bioassays for apramacyin and vancomycin were performed using Escherichia coli (ATCC 8739) and Bacillus subtilis (ATCC 6633), respectively. Microbial growths were measured as absorbance up to 180 and 300min for apramycin and vancomycin turbidimetric bioassays, respectively. Conventional dose-response curves (absorbances or area under the microbial growth curve vs. log of antibiotic concentration) showed significant regression, however there were significant deviation of linearity. Thus, they could not be used for relative potency estimations. PLS regression allowed us to construct a predictive model for estimating the relative potencies of apramycin and vancomycin without over-fitting and it improved the linear range of turbidimetric bioassay. In addition, PLS regression provided predictions of relative potencies equivalent to those obtained from agar diffusion official methods. Therefore, we conclude that PLS regression may be used to estimate the relative potencies of antibiotics with significant advantages when compared to conventional dose-response curve determination. Copyright © 2016 Elsevier B.V. All rights reserved.
Analysis of designed experiments by stabilised PLS Regression and jack-knifing

DEFF Research Database (Denmark)

Martens, Harald; Høy, M.; Westad, F.

2001-01-01

Pragmatical, visually oriented methods for assessing and optimising bi-linear regression models are described, and applied to PLS Regression (PLSR) analysis of multi-response data from controlled experiments. The paper outlines some ways to stabilise the PLSR method to extend its range...... the reliability of the linear and bi-linear model parameter estimates. The paper illustrates how the obtained PLSR "significance" probabilities are similar to those from conventional factorial ANOVA, but the PLSR is shown to give important additional overview plots of the main relevant structures in the multi....... An Introduction, Wiley, Chichester, UK, 2001]....
The Chaotic Prediction for Aero-Engine Performance Parameters Based on Nonlinear PLS Regression

Directory of Open Access Journals (Sweden)

Chunxiao Zhang

2012-01-01

Full Text Available The prediction of the aero-engine performance parameters is very important for aero-engine condition monitoring and fault diagnosis. In this paper, the chaotic phase space of engine exhaust temperature (EGT time series which come from actual air-borne ACARS data is reconstructed through selecting some suitable nearby points. The partial least square (PLS based on the cubic spline function or the kernel function transformation is adopted to obtain chaotic predictive function of EGT series. The experiment results indicate that the proposed PLS chaotic prediction algorithm based on biweight kernel function transformation has significant advantage in overcoming multicollinearity of the independent variables and solve the stability of regression model. Our predictive NMSE is 16.5 percent less than that of the traditional linear least squares (OLS method and 10.38 percent less than that of the linear PLS approach. At the same time, the forecast error is less than that of nonlinear PLS algorithm through bootstrap test screening.
Enhanced Anomaly Detection Via PLS Regression Models and Information Entropy Theory

KAUST Repository

Harrou, Fouzi

2015-12-07

Accurate and effective fault detection and diagnosis of modern engineering systems is crucial for ensuring reliability, safety and maintaining the desired product quality. In this work, we propose an innovative method for detecting small faults in the highly correlated multivariate data. The developed method utilizes partial least square (PLS) method as a modelling framework, and the symmetrized Kullback-Leibler divergence (KLD) as a monitoring index, where it is used to quantify the dissimilarity between probability distributions of current PLS-based residual and reference one obtained using fault-free data. The performance of the PLS-based KLD fault detection algorithm is illustrated and compared to the conventional PLS-based fault detection methods. Using synthetic data, we have demonstrated the greater sensitivity and effectiveness of the developed method over the conventional methods, especially when data are highly correlated and small faults are of interest.
Enhanced Anomaly Detection Via PLS Regression Models and Information Entropy Theory

KAUST Repository

Harrou, Fouzi; Sun, Ying

2015-01-01

Accurate and effective fault detection and diagnosis of modern engineering systems is crucial for ensuring reliability, safety and maintaining the desired product quality. In this work, we propose an innovative method for detecting small faults in the highly correlated multivariate data. The developed method utilizes partial least square (PLS) method as a modelling framework, and the symmetrized Kullback-Leibler divergence (KLD) as a monitoring index, where it is used to quantify the dissimilarity between probability distributions of current PLS-based residual and reference one obtained using fault-free data. The performance of the PLS-based KLD fault detection algorithm is illustrated and compared to the conventional PLS-based fault detection methods. Using synthetic data, we have demonstrated the greater sensitivity and effectiveness of the developed method over the conventional methods, especially when data are highly correlated and small faults are of interest.
Determination of fat content in chicken hamburgers using NIR spectroscopy and the Successive Projections Algorithm for interval selection in PLS regression (iSPA-PLS)

Science.gov (United States)

Krepper, Gabriela; Romeo, Florencia; Fernandes, David Douglas de Sousa; Diniz, Paulo Henrique Gonçalves Dias; de Araújo, Mário César Ugulino; Di Nezio, María Susana; Pistonesi, Marcelo Fabián; Centurión, María Eugenia

2018-01-01

Determining fat content in hamburgers is very important to minimize or control the negative effects of fat on human health, effects such as cardiovascular diseases and obesity, which are caused by the high consumption of saturated fatty acids and cholesterol. This study proposed an alternative analytical method based on Near Infrared Spectroscopy (NIR) and Successive Projections Algorithm for interval selection in Partial Least Squares regression (iSPA-PLS) for fat content determination in commercial chicken hamburgers. For this, 70 hamburger samples with a fat content ranging from 14.27 to 32.12 mg kg- 1 were prepared based on the upper limit recommended by the Argentinean Food Codex, which is 20% (w w- 1). NIR spectra were then recorded and then preprocessed by applying different approaches: base line correction, SNV, MSC, and Savitzky-Golay smoothing. For comparison, full-spectrum PLS and the Interval PLS are also used. The best performance for the prediction set was obtained for the first derivative Savitzky-Golay smoothing with a second-order polynomial and window size of 19 points, achieving a coefficient of correlation of 0.94, RMSEP of 1.59 mg kg- 1, REP of 7.69% and RPD of 3.02. The proposed methodology represents an excellent alternative to the conventional Soxhlet extraction method, since waste generation is avoided, yet without the use of either chemical reagents or solvents, which follows the primary principles of Green Chemistry. The new method was successfully applied to chicken hamburger analysis, and the results agreed with those with reference values at a 95% confidence level, making it very attractive for routine analysis.
Application of sequential and orthogonalised-partial least squares (SO-PLS) regression to predict sensory properties of Cabernet Sauvignon wines from grape chemical composition.

Science.gov (United States)

Niimi, Jun; Tomic, Oliver; Næs, Tormod; Jeffery, David W; Bastian, Susan E P; Boss, Paul K

2018-08-01

The current study determined the applicability of sequential and orthogonalised-partial least squares (SO-PLS) regression to relate Cabernet Sauvignon grape chemical composition to the sensory perception of the corresponding wines. Grape samples (n = 25) were harvested at a similar maturity and vinified identically in 2013. Twelve measures using various (bio)chemical methods were made on grapes. Wines were evaluated using descriptive analysis with a trained panel (n = 10) for sensory profiling. Data was analysed globally using SO-PLS for the entire sensory profiles (SO-PLS2), as well as for single sensory attributes (SO-PLS1). SO-PLS1 models were superior in validated explained variances than SO-PLS2. SO-PLS provided a structured approach in the selection of predictor chemical data sets that best contributed to the correlation of important sensory attributes. This new approach presents great potential for application in other explorative metabolomics studies of food and beverages to address factors such as quality and regional influences. Copyright © 2018 Elsevier Ltd. All rights reserved.
Senate Bill (PLS No. 200, de 2015, analysis versus the Principle of the Prohibition of Social Regression

Directory of Open Access Journals (Sweden)

Glaucia Ribeiro Lima

2016-12-01

Full Text Available The Senate Bill (PLS number 200, of 2015, proposes the edition of a law for the conduction of clinical trials involving human subjects. This study aimed to perform a critical analysis of the PLS 200/2015, based on the Principle of the Prohibition of Social Regression. Thus, a descriptive, documentary and normative research was conducted, with survey of the ethical and sanitary standards related to clinical research and findings related to the PL 200/2015. The PLS 200/2015 and the information regarding was also consulted on the website of the Senate. The regulation of the matter by law demonstrated not to be a problem in the research. The main conflicts were related to the creation of Independent Ethics Committee (IEC, that does not link the ethic review to an State Agency; the use of placebo, in which flexibility is contrary to all efforts to ensure that participants have the best treatment options; and post-study access, which restriction is contrary to the existing regulations that determine the free and unlimited access. The analysis of the main settings specified in the PLS 200/2015 did not identify social or scientific improvements. The Principle of the Prohibition of Social Regression can be used, thus, to ensure the constitutional provisions already undertake and accomplished, mainly the right to health, human dignity and the inviolability of the right to live.
Variable selection methods in PLS regression - a comparison study on metabolomics data

DEFF Research Database (Denmark)

Karaman, İbrahim; Hedemann, Mette Skou; Knudsen, Knud Erik Bach

. The aim of the metabolomics study was to investigate the metabolic profile in pigs fed various cereal fractions with special attention to the metabolism of lignans using LC-MS based metabolomic approach. References 1. Lê Cao KA, Rossouw D, Robert-Granié C, Besse P: A Sparse PLS for Variable Selection when...... integrated approach. Due to the high number of variables in data sets (both raw data and after peak picking) the selection of important variables in an explorative analysis is difficult, especially when different data sets of metabolomics data need to be related. Variable selection (or removal of irrelevant...... different strategies for variable selection on PLSR method were considered and compared with respect to selected subset of variables and the possibility for biological validation. Sparse PLSR [1] as well as PLSR with Jack-knifing [2] was applied to data in order to achieve variable selection prior...
A PLS-based extractive spectrophotometric method for simultaneous determination of carbamazepine and carbamazepine-10,11-epoxide in plasma and comparison with HPLC

Science.gov (United States)

Hemmateenejad, Bahram; Rezaei, Zahra; Khabnadideh, Soghra; Saffari, Maryam

2007-11-01

Carbamazepine (CBZ) undergoes enzyme biotransformation through epoxidation with the formation of its metabolite, carbamazepine-10,11-epoxide (CBZE). A simple chemometrics-assisted spectrophotometric method has been proposed for simultaneous determination of CBZ and CBZE in plasma. A liquid extraction procedure was operated to separate the analytes from plasma, and the UV absorbance spectra of the resultant solutions were subjected to partial least squares (PLS) regression. The optimum number of PLS latent variables was selected according to the PRESS values of leave-one-out cross-validation. A HPLC method was also employed for comparison. The respective mean recoveries for analysis of CBZ and CBZE in synthetic mixtures were 102.57 (±0.25)% and 103.00 (±0.09)% for PLS and 99.40 (±0.15)% and 102.20 (±0.02)%. The concentrations of CBZ and CBZE were also determined in five patients using the PLS and HPLC methods. The results showed that the data obtained by PLS were comparable with those obtained by HPLC method.
The effect of PLS regression in PLS path model estimation when multicollinearity is present

DEFF Research Database (Denmark)

Nielsen, Rikke; Kristensen, Kai; Eskildsen, Jacob

PLS path modelling has previously been found to be robust to multicollinearity both between latent variables and between manifest variables of a common latent variable (see e.g. Cassel et al. (1999), Kristensen, Eskildsen (2005), Westlund et al. (2008)). However, most of the studies investigate...... models with relatively few variables and very simple dependence structures compared to the models that are often estimated in practical settings. A recent study by Nielsen et al. (2009) found that when model structure is more complex, PLS path modelling is not as robust to multicollinearity between...... latent variables as previously assumed. A difference in the standard error of path coefficients of as much as 83% was found between moderate and severe levels of multicollinearity. Large differences were found not only for large path coefficients, but also for small path coefficients and in some cases...
Assessment of bitter taste of pharmaceuticals with multisensor system employing 3 way PLS regression

International Nuclear Information System (INIS)

Rudnitskaya, Alisa; Kirsanov, Dmitry; Blinova, Yulia; Legin, Evgeny; Seleznev, Boris; Clapham, David; Ives, Robert S.; Saunders, Kenneth A.; Legin, Andrey

2013-01-01

Highlights: ► Chemically diverse APIs are studied with potentiometric “electronic tongue”. ► Bitter taste of APIs can be predicted with 3wayPLS regression from ET data. ► High correlation of ET assessment with human panel and rat in vivo model. -- Abstract: The application of the potentiometric multisensor system (electronic tongue, ET) for quantification of the bitter taste of structurally diverse active pharmaceutical ingredients (API) is reported. The measurements were performed using a set of bitter substances that had been assessed by a professional human sensory panel and the in vivo rat brief access taste aversion (BATA) model to produce bitterness intensity scores for each substance at different concentrations. The set consisted of eight substances, both inorganic and organic – azelastine, caffeine, chlorhexidine, potassium nitrate, naratriptan, paracetamol, quinine, and sumatriptan. With the aim of enhancing the response of the sensors to the studied APIs, measurements were carried out at different pH levels ranging from 2 to 10, thus promoting ionization of the compounds. This experiment yielded a 3 way data array (samples × sensors × pH levels) from which 3wayPLS regression models were constructed with both human panel and rat model reference data. These models revealed that artificial assessment of bitter taste with ET in the chosen set of API's is possible with average relative errors of 16% in terms of human panel bitterness score and 25% in terms of inhibition values from in vivo rat model data. Furthermore, these 3wayPLS models were applied for prediction of the bitterness in blind test samples of a further set of API's. The results of the prediction were compared with the inhibition values obtained from the in vivo rat model
Assessment of bitter taste of pharmaceuticals with multisensor system employing 3 way PLS regression

Energy Technology Data Exchange (ETDEWEB)

Rudnitskaya, Alisa [CESAM and Chemistry Department, University of Aveiro, Aveiro (Portugal); Kirsanov, Dmitry, E-mail: d.kirsanov@gmail.com [Chemistry Department, St. Petersburg University, St. Petersburg (Russian Federation); Blinova, Yulia [Chemistry Department, St. Petersburg University, St. Petersburg (Russian Federation); Legin, Evgeny [Sensor Systems LLC, St. Petersburg (Russian Federation); Seleznev, Boris [Chemistry Department, St. Petersburg University, St. Petersburg (Russian Federation); Clapham, David; Ives, Robert S.; Saunders, Kenneth A. [GlaxoSmithKline Pharmaceuticals, Gunnels Wood Road, Stevenage (United Kingdom); Legin, Andrey [Chemistry Department, St. Petersburg University, St. Petersburg (Russian Federation)

2013-04-03

Highlights: ► Chemically diverse APIs are studied with potentiometric “electronic tongue”. ► Bitter taste of APIs can be predicted with 3wayPLS regression from ET data. ► High correlation of ET assessment with human panel and rat in vivo model. -- Abstract: The application of the potentiometric multisensor system (electronic tongue, ET) for quantification of the bitter taste of structurally diverse active pharmaceutical ingredients (API) is reported. The measurements were performed using a set of bitter substances that had been assessed by a professional human sensory panel and the in vivo rat brief access taste aversion (BATA) model to produce bitterness intensity scores for each substance at different concentrations. The set consisted of eight substances, both inorganic and organic – azelastine, caffeine, chlorhexidine, potassium nitrate, naratriptan, paracetamol, quinine, and sumatriptan. With the aim of enhancing the response of the sensors to the studied APIs, measurements were carried out at different pH levels ranging from 2 to 10, thus promoting ionization of the compounds. This experiment yielded a 3 way data array (samples × sensors × pH levels) from which 3wayPLS regression models were constructed with both human panel and rat model reference data. These models revealed that artificial assessment of bitter taste with ET in the chosen set of API's is possible with average relative errors of 16% in terms of human panel bitterness score and 25% in terms of inhibition values from in vivo rat model data. Furthermore, these 3wayPLS models were applied for prediction of the bitterness in blind test samples of a further set of API's. The results of the prediction were compared with the inhibition values obtained from the in vivo rat model.
An improved partial least-squares regression method for Raman spectroscopy

Science.gov (United States)

Momenpour Tehran Monfared, Ali; Anis, Hanan

2017-10-01

It is known that the performance of partial least-squares (PLS) regression analysis can be improved using the backward variable selection method (BVSPLS). In this paper, we further improve the BVSPLS based on a novel selection mechanism. The proposed method is based on sorting the weighted regression coefficients, and then the importance of each variable of the sorted list is evaluated using root mean square errors of prediction (RMSEP) criterion in each iteration step. Our Improved BVSPLS (IBVSPLS) method has been applied to leukemia and heparin data sets and led to an improvement in limit of detection of Raman biosensing ranged from 10% to 43% compared to PLS. Our IBVSPLS was also compared to the jack-knifing (simpler) and Genetic Algorithm (more complex) methods. Our method was consistently better than the jack-knifing method and showed either a similar or a better performance compared to the genetic algorithm.
Towards molecular design using 2D-molecular contour maps obtained from PLS regression coefficients

Science.gov (United States)

Borges, Cleber N.; Barigye, Stephen J.; Freitas, Matheus P.

2017-12-01

The multivariate image analysis descriptors used in quantitative structure-activity relationships are direct representations of chemical structures as they are simply numerical decodifications of pixels forming the 2D chemical images. These MDs have found great utility in the modeling of diverse properties of organic molecules. Given the multicollinearity and high dimensionality of the data matrices generated with the MIA-QSAR approach, modeling techniques that involve the projection of the data space onto orthogonal components e.g. Partial Least Squares (PLS) have been generally used. However, the chemical interpretation of the PLS-based MIA-QSAR models, in terms of the structural moieties affecting the modeled bioactivity has not been straightforward. This work describes the 2D-contour maps based on the PLS regression coefficients, as a means of assessing the relevance of single MIA predictors to the response variable, and thus allowing for the structural, electronic and physicochemical interpretation of the MIA-QSAR models. A sample study to demonstrate the utility of the 2D-contour maps to design novel drug-like molecules is performed using a dataset of some anti-HIV-1 2-amino-6-arylsulfonylbenzonitriles and derivatives, and the inferences obtained are consistent with other reports in the literature. In addition, the different schemes for encoding atomic properties in molecules are discussed and evaluated.
The feasibility of using explicit method for linear correction of the particle size variation using NIR Spectroscopy combined with PLS2regression method

Science.gov (United States)

Yulia, M.; Suhandy, D.

2018-03-01

NIR spectra obtained from spectral data acquisition system contains both chemical information of samples as well as physical information of the samples, such as particle size and bulk density. Several methods have been established for developing calibration models that can compensate for sample physical information variations. One common approach is to include physical information variation in the calibration model both explicitly and implicitly. The objective of this study was to evaluate the feasibility of using explicit method to compensate the influence of different particle size of coffee powder in NIR calibration model performance. A number of 220 coffee powder samples with two different types of coffee (civet and non-civet) and two different particle sizes (212 and 500 µm) were prepared. Spectral data was acquired using NIR spectrometer equipped with an integrating sphere for diffuse reflectance measurement. A discrimination method based on PLS-DA was conducted and the influence of different particle size on the performance of PLS-DA was investigated. In explicit method, we add directly the particle size as predicted variable results in an X block containing only the NIR spectra and a Y block containing the particle size and type of coffee. The explicit inclusion of the particle size into the calibration model is expected to improve the accuracy of type of coffee determination. The result shows that using explicit method the quality of the developed calibration model for type of coffee determination is a little bit superior with coefficient of determination (R2) = 0.99 and root mean square error of cross-validation (RMSECV) = 0.041. The performance of the PLS2 calibration model for type of coffee determination with particle size compensation was quite good and able to predict the type of coffee in two different particle sizes with relatively high R2 pred values. The prediction also resulted in low bias and RMSEP values.
Linear feature selection in texture analysis - A PLS based method

DEFF Research Database (Denmark)

Marques, Joselene; Igel, Christian; Lillholm, Martin

2013-01-01

We present a texture analysis methodology that combined uncommitted machine-learning techniques and partial least square (PLS) in a fully automatic framework. Our approach introduces a robust PLS-based dimensionality reduction (DR) step to specifically address outliers and high-dimensional feature...... and considering all CV groups, the methods selected 36 % of the original features available. The diagnosis evaluation reached a generalization area-under-the-ROC curve of 0.92, which was higher than established cartilage-based markers known to relate to OA diagnosis....

Evaluation of in-line Raman data for end-point determination of a coating process: Comparison of Science-Based Calibration, PLS-regression and univariate data analysis.

Science.gov (United States)

Barimani, Shirin; Kleinebudde, Peter

2017-10-01

A multivariate analysis method, Science-Based Calibration (SBC), was used for the first time for endpoint determination of a tablet coating process using Raman data. Two types of tablet cores, placebo and caffeine cores, received a coating suspension comprising a polyvinyl alcohol-polyethylene glycol graft-copolymer and titanium dioxide to a maximum coating thickness of 80µm. Raman spectroscopy was used as in-line PAT tool. The spectra were acquired every minute and correlated to the amount of applied aqueous coating suspension. SBC was compared to another well-known multivariate analysis method, Partial Least Squares-regression (PLS) and a simpler approach, Univariate Data Analysis (UVDA). All developed calibration models had coefficient of determination values (R 2 ) higher than 0.99. The coating endpoints could be predicted with root mean square errors (RMSEP) less than 3.1% of the applied coating suspensions. Compared to PLS and UVDA, SBC proved to be an alternative multivariate calibration method with high predictive power. Copyright © 2017 Elsevier B.V. All rights reserved.
A graphical method to evaluate spectral preprocessing in multivariate regression calibrations: example with Savitzky-Golay filters and partial least squares regression.

Science.gov (United States)

Delwiche, Stephen R; Reeves, James B

2010-01-01

In multivariate regression analysis of spectroscopy data, spectral preprocessing is often performed to reduce unwanted background information (offsets, sloped baselines) or accentuate absorption features in intrinsically overlapping bands. These procedures, also known as pretreatments, are commonly smoothing operations or derivatives. While such operations are often useful in reducing the number of latent variables of the actual decomposition and lowering residual error, they also run the risk of misleading the practitioner into accepting calibration equations that are poorly adapted to samples outside of the calibration. The current study developed a graphical method to examine this effect on partial least squares (PLS) regression calibrations of near-infrared (NIR) reflection spectra of ground wheat meal with two analytes, protein content and sodium dodecyl sulfate sedimentation (SDS) volume (an indicator of the quantity of the gluten proteins that contribute to strong doughs). These two properties were chosen because of their differing abilities to be modeled by NIR spectroscopy: excellent for protein content, fair for SDS sedimentation volume. To further demonstrate the potential pitfalls of preprocessing, an artificial component, a randomly generated value, was included in PLS regression trials. Savitzky-Golay (digital filter) smoothing, first-derivative, and second-derivative preprocess functions (5 to 25 centrally symmetric convolution points, derived from quadratic polynomials) were applied to PLS calibrations of 1 to 15 factors. The results demonstrated the danger of an over reliance on preprocessing when (1) the number of samples used in a multivariate calibration is low (<50), (2) the spectral response of the analyte is weak, and (3) the goodness of the calibration is based on the coefficient of determination (R(2)) rather than a term based on residual error. The graphical method has application to the evaluation of other preprocess functions and various
PLS2 regression as a tool for selection of optimal analytical modality

DEFF Research Database (Denmark)

Madsen, Michael; Esbensen, Kim

Intelligent use of modern process analysers allows process technicians and engineers to look deep into the dynamic behaviour of production systems. This opens up for a plurality of new possibilities with respect to process optimisation. Oftentimes, several instruments representing different...... technologies and price classes are able to decipher relevant process information simultaneously. The question then is: how to choose between available technologies without compromising the quality and usability of the data. We apply PLS2 modelling to quantify the relative merits of competing, or complementing......, analytical modalities. We here present results from a feasibility study, where Fourier Transform Near InfraRed (FT-NIR), Fourier Transform Mid InfraRed (FT-MIR), and Raman laser spectroscopy were applied on the same set of samples obtained from a pilot-scale beer brewing process. Quantitative PLS1 models...
PLS-based memory control scheme for enhanced process monitoring

KAUST Repository

Harrou, Fouzi

2017-01-20

Fault detection is important for safe operation of various modern engineering systems. Partial least square (PLS) has been widely used in monitoring highly correlated process variables. Conventional PLS-based methods, nevertheless, often fail to detect incipient faults. In this paper, we develop new PLS-based monitoring chart, combining PLS with multivariate memory control chart, the multivariate exponentially weighted moving average (MEWMA) monitoring chart. The MEWMA are sensitive to incipient faults in the process mean, which significantly improves the performance of PLS methods and widen their applicability in practice. Using simulated distillation column data, we demonstrate that the proposed PLS-based MEWMA control chart is more effective in detecting incipient fault in the mean of the multivariate process variables, and outperform the conventional PLS-based monitoring charts.
Statistical process control of cocrystallization processes: A comparison between OPLS and PLS.

Science.gov (United States)

Silva, Ana F T; Sarraguça, Mafalda Cruz; Ribeiro, Paulo R; Santos, Adenilson O; De Beer, Thomas; Lopes, João Almeida

2017-03-30

Orthogonal partial least squares regression (OPLS) is being increasingly adopted as an alternative to partial least squares (PLS) regression due to the better generalization that can be achieved. Particularly in multivariate batch statistical process control (BSPC), the use of OPLS for estimating nominal trajectories is advantageous. In OPLS, the nominal process trajectories are expected to be captured in a single predictive principal component while uncorrelated variations are filtered out to orthogonal principal components. In theory, OPLS will yield a better estimation of the Hotelling's T 2 statistic and corresponding control limits thus lowering the number of false positives and false negatives when assessing the process disturbances. Although OPLS advantages have been demonstrated in the context of regression, its use on BSPC was seldom reported. This study proposes an OPLS-based approach for BSPC of a cocrystallization process between hydrochlorothiazide and p-aminobenzoic acid monitored on-line with near infrared spectroscopy and compares the fault detection performance with the same approach based on PLS. A series of cocrystallization batches with imposed disturbances were used to test the ability to detect abnormal situations by OPLS and PLS-based BSPC methods. Results demonstrated that OPLS was generally superior in terms of sensibility and specificity in most situations. In some abnormal batches, it was found that the imposed disturbances were only detected with OPLS. Copyright © 2017 Elsevier B.V. All rights reserved.
On-line monitoring the extract process of Fu-fang Shuanghua oral solution using near infrared spectroscopy and different PLS algorithms

Science.gov (United States)

Kang, Qian; Ru, Qingguo; Liu, Yan; Xu, Lingyan; Liu, Jia; Wang, Yifei; Zhang, Yewen; Li, Hui; Zhang, Qing; Wu, Qing

2016-01-01

An on-line near infrared (NIR) spectroscopy monitoring method with an appropriate multivariate calibration method was developed for the extraction process of Fu-fang Shuanghua oral solution (FSOS). On-line NIR spectra were collected through two fiber optic probes, which were designed to transmit NIR radiation by a 2 mm flange. Partial least squares (PLS), interval PLS (iPLS) and synergy interval PLS (siPLS) algorithms were used comparatively for building the calibration regression models. During the extraction process, the feasibility of NIR spectroscopy was employed to determine the concentrations of chlorogenic acid (CA) content, total phenolic acids contents (TPC), total flavonoids contents (TFC) and soluble solid contents (SSC). High performance liquid chromatography (HPLC), ultraviolet spectrophotometric method (UV) and loss on drying methods were employed as reference methods. Experiment results showed that the performance of siPLS model is the best compared with PLS and iPLS. The calibration models for AC, TPC, TFC and SSC had high values of determination coefficients of (R2) (0.9948, 0.9992, 0.9950 and 0.9832) and low root mean square error of cross validation (RMSECV) (0.0113, 0.0341, 0.1787 and 1.2158), which indicate a good correlation between reference values and NIR predicted values. The overall results show that the on line detection method could be feasible in real application and would be of great value for monitoring the mixed decoction process of FSOS and other Chinese patent medicines.
New strategy for determination of anthocyanins, polyphenols and antioxidant capacity of Brassica oleracea liquid extract using infrared spectroscopies and multivariate regression

Science.gov (United States)

de Oliveira, Isadora R. N.; Roque, Jussara V.; Maia, Mariza P.; Stringheta, Paulo C.; Teófilo, Reinaldo F.

2018-04-01

A new method was developed to determine the antioxidant properties of red cabbage extract (Brassica oleracea) by mid (MID) and near (NIR) infrared spectroscopies and partial least squares (PLS) regression. A 70% (v/v) ethanolic extract of red cabbage was concentrated to 9° Brix and further diluted (12 to 100%) in water. The dilutions were used as external standards for the building of PLS models. For the first time, this strategy was applied for building multivariate regression models. Reference analyses and spectral data were obtained from diluted extracts. The determinate properties were total and monomeric anthocyanins, total polyphenols and antioxidant capacity by ABTS (2,2-azino-bis(3-ethyl-benzothiazoline-6-sulfonate)) and DPPH (2,2-diphenyl-1-picrylhydrazyl) methods. Ordered predictors selection (OPS) and genetic algorithm (GA) were used for feature selection before PLS regression (PLS-1). In addition, a PLS-2 regression was applied to all properties simultaneously. PLS-1 models provided more predictive models than did PLS-2 regression. PLS-OPS and PLS-GA models presented excellent prediction results with a correlation coefficient higher than 0.98. However, the best models were obtained using PLS and variable selection with the OPS algorithm and the models based on NIR spectra were considered more predictive for all properties. Then, these models provided a simple, rapid and accurate method for determination of red cabbage extract antioxidant properties and its suitability for use in the food industry.
Comparing the analytical performances of Micro-NIR and FT-NIR spectrometers in the evaluation of acerola fruit quality, using PLS and SVM regression algorithms.

Science.gov (United States)

Malegori, Cristina; Nascimento Marques, Emanuel José; de Freitas, Sergio Tonetto; Pimentel, Maria Fernanda; Pasquini, Celio; Casiraghi, Ernestina

2017-04-01

The main goal of this study was to investigate the analytical performances of a state-of-the-art device, one of the smallest dispersion NIR spectrometers on the market (MicroNIR 1700), making a critical comparison with a benchtop FT-NIR spectrometer in the evaluation of the prediction accuracy. In particular, the aim of this study was to estimate in a non-destructive manner, titratable acidity and ascorbic acid content in acerola fruit during ripening, in a view of direct applicability in field of this new miniaturised handheld device. Acerola (Malpighia emarginata DC.) is a super-fruit characterised by a considerable amount of ascorbic acid, ranging from 1.0% to 4.5%. However, during ripening, acerola colour changes and the fruit may lose as much as half of its ascorbic acid content. Because the variability of chemical parameters followed a non-strictly linear profile, two different regression algorithms were compared: PLS and SVM. Regression models obtained with Micro-NIR spectra give better results using SVM algorithm, for both ascorbic acid and titratable acidity estimation. FT-NIR data give comparable results using both SVM and PLS algorithms, with lower errors for SVM regression. The prediction ability of the two instruments was statistically compared using the Passing-Bablok regression algorithm; the outcomes are critically discussed together with the regression models, showing the suitability of the portable Micro-NIR for in field monitoring of chemical parameters of interest in acerola fruits. Copyright © 2016 Elsevier B.V. All rights reserved.
AO–MW–PLS method applied to rapid quantification of teicoplanin with near-infrared spectroscopy

Directory of Open Access Journals (Sweden)

Jiemei Chen

2017-01-01

Full Text Available Teicoplanin (TCP is an important lipoglycopeptide antibiotic produced by fermenting Actinoplanes teichomyceticus. The change in TCP concentration is important to measure in the fermentation process. In this study, a reagent-free and rapid quantification method for TCP in the TCP–Tris–HCl mixture samples was developed using near-infrared (NIR spectroscopy by focusing our attention on the fermentation process for TCP. The absorbance optimization (AO partial least squares (PLS was proposed and integrated with the moving window (MW PLS, which is called AO–MW–PLS method, to select appropriate wavebands. A model set that includes various wavebands that were equivalent to the optimal AO–MW–PLS waveband was proposed based on statistical considerations. The public region of all equivalent wavebands was just one of the equivalent wavebands. The obtained public regions were 1540–1868nm for TCP and 1114–1310nm for Tris. The root-mean-square error and correlation coefficient for leave-one-out cross validation were 0.046mg mL−1 and 0.9998mg mL−1 for TCP, and 0.235mg mL−1 and 0.9986mg mL−1 for Tris, respectively. All the models achieved highly accurate prediction effects, and the selected wavebands provided valuable references for designing specialized spectrometers. This study provided a valuable reference for further application of the proposed methods to TCP fermentation broth and to other spectroscopic analysis fields.
Comparison of FTIR-ATR and Raman spectroscopy in determination of VLDL triglycerides in blood serum with PLS regression

Science.gov (United States)

Oleszko, Adam; Hartwich, Jadwiga; Wójtowicz, Anna; Gąsior-Głogowska, Marlena; Huras, Hubert; Komorowska, Małgorzata

2017-08-01

Hypertriglyceridemia, related with triglyceride (TG) in plasma above 1.7 mmol/L is one of the cardiovascular risk factors. Very low density lipoproteins (VLDL) are the main TG carriers. Despite being time consuming, demanding well-qualified staff and expensive instrumentation, ultracentrifugation technique still remains the gold standard for the VLDL isolation. Therefore faster and simpler method of VLDL-TG determination is needed. Vibrational spectroscopy, including FT-IR and Raman, is widely used technique in lipid and protein research. The aim of this study was assessment of Raman and FT-IR spectroscopy in determination of VLDL-TG directly in serum with the isolation step omitted. TG concentration in serum and in ultracentrifugated VLDL fractions from 32 patients were measured with reference colorimetric method. FT-IR and Raman spectra of VLDL and serum samples were acquired. Partial least square (PLS) regression was used for calibration and leave-one-out cross validation. Our results confirmed possibility of reagent-free determination of VLDL-TG directly in serum with both Raman and FT-IR spectroscopy. Quantitative VLDL testing by FT-IR and/or Raman spectroscopy applied directly to maternal serum seems to be promising screening test to identify women with increased risk of adverse pregnancy outcomes and patient friendly method of choice based on ease of performance, accuracy and efficiency.
Simultaneous chemometric determination of pyridoxine hydrochloride and isoniazid in tablets by multivariate regression methods.

Science.gov (United States)

Dinç, Erdal; Ustündağ, Ozgür; Baleanu, Dumitru

2010-08-01

The sole use of pyridoxine hydrochloride during treatment of tuberculosis gives rise to pyridoxine deficiency. Therefore, a combination of pyridoxine hydrochloride and isoniazid is used in pharmaceutical dosage form in tuberculosis treatment to reduce this side effect. In this study, two chemometric methods, partial least squares (PLS) and principal component regression (PCR), were applied to the simultaneous determination of pyridoxine (PYR) and isoniazid (ISO) in their tablets. A concentration training set comprising binary mixtures of PYR and ISO consisting of 20 different combinations were randomly prepared in 0.1 M HCl. Both multivariate calibration models were constructed using the relationships between the concentration data set (concentration data matrix) and absorbance data matrix in the spectral region 200-330 nm. The accuracy and the precision of the proposed chemometric methods were validated by analyzing synthetic mixtures containing the investigated drugs. The recovery results obtained by applying PCR and PLS calibrations to the artificial mixtures were found between 100.0 and 100.7%. Satisfactory results obtained by applying the PLS and PCR methods to both artificial and commercial samples were obtained. The results obtained in this manuscript strongly encourage us to use them for the quality control and the routine analysis of the marketing tablets containing PYR and ISO drugs. Copyright © 2010 John Wiley & Sons, Ltd.
[Influence of Spectral Pre-Processing on PLS Quantitative Model of Detecting Cu in Navel Orange by LIBS].

Science.gov (United States)

Li, Wen-bing; Yao, Lin-tao; Liu, Mu-hua; Huang, Lin; Yao, Ming-yin; Chen, Tian-bing; He, Xiu-wen; Yang, Ping; Hu, Hui-qin; Nie, Jiang-hui

2015-05-01

Cu in navel orange was detected rapidly by laser-induced breakdown spectroscopy (LIBS) combined with partial least squares (PLS) for quantitative analysis, then the effect on the detection accuracy of the model with different spectral data ptetreatment methods was explored. Spectral data for the 52 Gannan navel orange samples were pretreated by different data smoothing, mean centralized and standard normal variable transform. Then 319~338 nm wavelength section containing characteristic spectral lines of Cu was selected to build PLS models, the main evaluation indexes of models such as regression coefficient (r), root mean square error of cross validation (RMSECV) and the root mean square error of prediction (RMSEP) were compared and analyzed. Three indicators of PLS model after 13 points smoothing and processing of the mean center were found reaching 0. 992 8, 3. 43 and 3. 4 respectively, the average relative error of prediction model is only 5. 55%, and in one word, the quality of calibration and prediction of this model are the best results. The results show that selecting the appropriate data pre-processing method, the prediction accuracy of PLS quantitative model of fruits and vegetables detected by LIBS can be improved effectively, providing a new method for fast and accurate detection of fruits and vegetables by LIBS.
PLS-based memory control scheme for enhanced process monitoring

KAUST Repository

Harrou, Fouzi; Sun, Ying

2017-01-01

Fault detection is important for safe operation of various modern engineering systems. Partial least square (PLS) has been widely used in monitoring highly correlated process variables. Conventional PLS-based methods, nevertheless, often fail
A study of machine learning regression methods for major elemental analysis of rocks using laser-induced breakdown spectroscopy

Science.gov (United States)

Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.

2015-05-01

The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO2, Fe2O3, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na2O, K2O, TiO2, and P2O5, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144 channels
Boosted regression trees, multivariate adaptive regression splines and their two-step combinations with multiple linear regression or partial least squares to predict blood-brain barrier passage: a case study.

Science.gov (United States)

Deconinck, E; Zhang, M H; Petitet, F; Dubus, E; Ijjaali, I; Coomans, D; Vander Heyden, Y

2008-02-18

The use of some unconventional non-linear modeling techniques, i.e. classification and regression trees and multivariate adaptive regression splines-based methods, was explored to model the blood-brain barrier (BBB) passage of drugs and drug-like molecules. The data set contains BBB passage values for 299 structural and pharmacological diverse drugs, originating from a structured knowledge-based database. Models were built using boosted regression trees (BRT) and multivariate adaptive regression splines (MARS), as well as their respective combinations with stepwise multiple linear regression (MLR) and partial least squares (PLS) regression in two-step approaches. The best models were obtained using combinations of MARS with either stepwise MLR or PLS. It could be concluded that the use of combinations of a linear with a non-linear modeling technique results in some improved properties compared to the individual linear and non-linear models and that, when the use of such a combination is appropriate, combinations using MARS as non-linear technique should be preferred over those with BRT, due to some serious drawbacks of the BRT approaches.
Determination of Trace Amounts of Gold in Environmental Samples by Adsorptive Stripping Voltammetry of Its Complex with Rhodamine Using Osc-Pls

Directory of Open Access Journals (Sweden)

A. Akrami

2012-11-01

Full Text Available The multivariate calibration method was applied for the determination of trace amounts of gold based on a hanging mercury drop electrode (HMDE in the presence of rhodanine, followed by reduction of adsorbed gold by voltammetric scan using differential pulse modulation The optimum experimental conditions are: rhodanine concentration of 0.20 mg mL-1, pH 5.0, accumulation potential of -600 mV versus Ag/AgCl, accumulation time of 100 sec, scan rate of 30 mV s-1 and pulse height of 100 mV. The calibration matrix for partial least squares (PLS regression was designed with 9 samples. Orthogonal signal correction (OSC is a preprocessing technique used for removing the information unrelated to the target variables based on constrained principal component analysis. OSC is a suitable preprocessing method for PLS calibration without loss of prediction capacity using electrochemical method. The RMSEP for gold determination with PLS and OSC-PLS were 8.51 and 1.94, respectively. This procedure allows the determination of gold in synthetic and real samples with good reliability of the determination.
Fault detection in processes represented by PLS models using an EWMA control scheme

KAUST Repository

Harrou, Fouzi

2016-10-20

Fault detection is important for effective and safe process operation. Partial least squares (PLS) has been used successfully in fault detection for multivariate processes with highly correlated variables. However, the conventional PLS-based detection metrics, such as the Hotelling\\'s T and the Q statistics are not well suited to detect small faults because they only use information about the process in the most recent observation. Exponentially weighed moving average (EWMA), however, has been shown to be more sensitive to small shifts in the mean of process variables. In this paper, a PLS-based EWMA fault detection method is proposed for monitoring processes represented by PLS models. The performance of the proposed method is compared with that of the traditional PLS-based fault detection method through a simulated example involving various fault scenarios that could be encountered in real processes. The simulation results clearly show the effectiveness of the proposed method over the conventional PLS method.
Efectivity of Additive Spline for Partial Least Square Method in Regression Model Estimation

Directory of Open Access Journals (Sweden)

Ahmad Bilfarsah

2005-04-01

Full Text Available Additive Spline of Partial Least Square method (ASPL as one generalization of Partial Least Square (PLS method. ASPLS method can be acommodation to non linear and multicollinearity case of predictor variables. As a principle, The ASPLS method approach is cahracterized by two idea. The first is to used parametric transformations of predictors by spline function; the second is to make ASPLS components mutually uncorrelated, to preserve properties of the linear PLS components. The performance of ASPLS compared with other PLS method is illustrated with the fisher economic application especially the tuna fish production.
Nuclear magnetic resonance metabonomic profiling using tO2PLS

Energy Technology Data Exchange (ETDEWEB)

Kirwan, Gemma M., E-mail: gemma.kirwan@gmail.com [Department of Chemistry, School of Applied Sciences, RMIT University, City Campus, Vic 3001 (Australia); Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho Uji, Kyoto (Japan); Hancock, Timothy [Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho Uji, Kyoto (Japan); Hassell, Kathryn [Biotechnology and Environmental Biology, School of Applied Sciences, RMIT University, PO Box 71, Bundoora, Vic 3083 (Australia); Niere, Julie O. [Department of Chemistry, School of Applied Sciences, RMIT University, City Campus, Vic 3001 (Australia); Nugegoda, Dayanthi [Biotechnology and Environmental Biology, School of Applied Sciences, RMIT University, PO Box 71, Bundoora, Vic 3083 (Australia); Goto, Susumu [Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho Uji, Kyoto (Japan); Adams, Michael J. [Department of Chemistry, School of Applied Sciences, RMIT University, City Campus, Vic 3001 (Australia)

2013-06-05

Graphical abstract: -- Highlights: •Transposition of O2PLS input matrix (tO2PLS) to analyze metabonomics data. •tO2PLS specific components describe features that separate and define sample groups. •Application of tO2PLS to a {sup 1}H NMR metabonomics study of black bream fish. -- Abstract: Blood plasma collected from adult fish (black bream, Sparidae) exposed to a dose of 5 mg kg{sup −1} 17β-estradiol underwent metabonomic profiling using nuclear magnetic resonance (NMR). An extension of the orthogonal 2 projection to latent structure (O2PLS) analysis, tO2PLS, was proposed and utilized to classify changes between the control and experimental metabolic profiles. As a bidirectional modeling tool, O2PLS examines the (variable) commonality between two different data blocks, and extracts the joint correlations as well as the unique variations present within each data block. tO2PLS is a proposed matrix transposition of O2PLS to allow for commonality between experiments (spectral profiles) to be observed, rather than between sample variables. tO2PLS analysis highlighted two potential biomarkers, trimethylamine-N-oxide (TMAO) and choline, that distinguish between control and 17β-estradiol exposed fish. This study presents an alternative way of examining spectroscopic (metabolite) data, providing a method for the visual assessment of similarities and differences between control and experimental spectral features in large data sets.
Nuclear magnetic resonance metabonomic profiling using tO2PLS

International Nuclear Information System (INIS)

Kirwan, Gemma M.; Hancock, Timothy; Hassell, Kathryn; Niere, Julie O.; Nugegoda, Dayanthi; Goto, Susumu; Adams, Michael J.

2013-01-01

Graphical abstract: -- Highlights: •Transposition of O2PLS input matrix (tO2PLS) to analyze metabonomics data. •tO2PLS specific components describe features that separate and define sample groups. •Application of tO2PLS to a 1 H NMR metabonomics study of black bream fish. -- Abstract: Blood plasma collected from adult fish (black bream, Sparidae) exposed to a dose of 5 mg kg −1 17β-estradiol underwent metabonomic profiling using nuclear magnetic resonance (NMR). An extension of the orthogonal 2 projection to latent structure (O2PLS) analysis, tO2PLS, was proposed and utilized to classify changes between the control and experimental metabolic profiles. As a bidirectional modeling tool, O2PLS examines the (variable) commonality between two different data blocks, and extracts the joint correlations as well as the unique variations present within each data block. tO2PLS is a proposed matrix transposition of O2PLS to allow for commonality between experiments (spectral profiles) to be observed, rather than between sample variables. tO2PLS analysis highlighted two potential biomarkers, trimethylamine-N-oxide (TMAO) and choline, that distinguish between control and 17β-estradiol exposed fish. This study presents an alternative way of examining spectroscopic (metabolite) data, providing a method for the visual assessment of similarities and differences between control and experimental spectral features in large data sets

Data Mining of Chemogenomics Data Using Bi-Modal PLS Methods and Chemical Interpretation for Molecular Design.

Science.gov (United States)

Hasegawa, Kiyoshi; Funatsu, Kimito

2014-12-01

Chemogenomics is a new strategy in drug discovery for interrogating all molecules capable of interacting with all biological targets. Because of the almost infinite number of drug-like organic molecules, bench-based experimental chemogenomics methods are not generally feasible. Several in silico chemogenomics models have therefore been developed for high-throughput screening of large numbers of drug candidate compounds and target proteins. In previous studies, we described two novel bi-modal PLS approaches. These methods provide a significant advantage in that they enable direct connections to be made between biological activities and ligand and protein descriptors. In this special issue, we review these two PLS-based approaches using two different chemogenomics datasets for illustration. We then compare the predictive and interpretive performance of the two methods using the same congeneric data set. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
A study of machine learning regression methods for major elemental analysis of rocks using laser-induced breakdown spectroscopy

International Nuclear Information System (INIS)

Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.

2015-01-01

The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO 2 , Fe 2 O 3 , CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na 2 O, K 2 O, TiO 2 , and P 2 O 5 , the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144
semPLS: Structural Equation Modeling Using Partial Least Squares

Directory of Open Access Journals (Sweden)

Armin Monecke

2012-05-01

Full Text Available Structural equation models (SEM are very popular in many disciplines. The partial least squares (PLS approach to SEM offers an alternative to covariance-based SEM, which is especially suited for situations when data is not normally distributed. PLS path modelling is referred to as soft-modeling-technique with minimum demands regarding mea- surement scales, sample sizes and residual distributions. The semPLS package provides the capability to estimate PLS path models within the R programming environment. Different setups for the estimation of factor scores can be used. Furthermore it contains modular methods for computation of bootstrap confidence intervals, model parameters and several quality indices. Various plot functions help to evaluate the model. The well known mobile phone dataset from marketing research is used to demonstrate the features of the package.
Deflation in multiblock PLS

NARCIS (Netherlands)

Westerhuis, J. A.; Smilde, A. K.

2001-01-01

This paper describes some of the deflation problems in multiblock PLS. Deflation of X using block scores leads to inferior prediction of Y. Deflation of X using super scores gives the same predictions as standard PLS with all variables in one large X-block, but the information of the separate blocks
Control Point Generated PLS - lines

Data.gov (United States)

Minnesota Department of Natural Resources — The Control Point Generated PLS layer contains line and polygon features to the 1/4 of 1/4 PLS section (approximately 40 acres) and government lot level. The layer...
Control Point Generated PLS - polygons

Data.gov (United States)

Minnesota Department of Natural Resources — The Control Point Generated PLS layer contains line and polygon features to the 1/4 of 1/4 PLS section (approximately 40 acres) and government lot level. The layer...
Prediction of the distillation temperatures of crude oils using ¹H NMR and support vector regression with estimated confidence intervals.

Science.gov (United States)

Filgueiras, Paulo R; Terra, Luciana A; Castro, Eustáquio V R; Oliveira, Lize M S L; Dias, Júlio C M; Poppi, Ronei J

2015-09-01

This paper aims to estimate the temperature equivalent to 10% (T10%), 50% (T50%) and 90% (T90%) of distilled volume in crude oils using (1)H NMR and support vector regression (SVR). Confidence intervals for the predicted values were calculated using a boosting-type ensemble method in a procedure called ensemble support vector regression (eSVR). The estimated confidence intervals obtained by eSVR were compared with previously accepted calculations from partial least squares (PLS) models and a boosting-type ensemble applied in the PLS method (ePLS). By using the proposed boosting strategy, it was possible to identify outliers in the T10% property dataset. The eSVR procedure improved the accuracy of the distillation temperature predictions in relation to standard PLS, ePLS and SVR. For T10%, a root mean square error of prediction (RMSEP) of 11.6°C was obtained in comparison with 15.6°C for PLS, 15.1°C for ePLS and 28.4°C for SVR. The RMSEPs for T50% were 24.2°C, 23.4°C, 22.8°C and 14.4°C for PLS, ePLS, SVR and eSVR, respectively. For T90%, the values of RMSEP were 39.0°C, 39.9°C and 39.9°C for PLS, ePLS, SVR and eSVR, respectively. The confidence intervals calculated by the proposed boosting methodology presented acceptable values for the three properties analyzed; however, they were lower than those calculated by the standard methodology for PLS. Copyright © 2015 Elsevier B.V. All rights reserved.
A study of machine learning regression methods for major elemental analysis of rocks using laser-induced breakdown spectroscopy

Energy Technology Data Exchange (ETDEWEB)

Boucher, Thomas F., E-mail: boucher@cs.umass.edu [School of Computer Science, University of Massachusetts Amherst, 140 Governor' s Drive, Amherst, MA 01003, United States. (United States); Ozanne, Marie V. [Department of Astronomy, Mount Holyoke College, South Hadley, MA 01075 (United States); Carmosino, Marco L. [School of Computer Science, University of Massachusetts Amherst, 140 Governor' s Drive, Amherst, MA 01003, United States. (United States); Dyar, M. Darby [Department of Astronomy, Mount Holyoke College, South Hadley, MA 01075 (United States); Mahadevan, Sridhar [School of Computer Science, University of Massachusetts Amherst, 140 Governor' s Drive, Amherst, MA 01003, United States. (United States); Breves, Elly A.; Lepore, Kate H. [Department of Astronomy, Mount Holyoke College, South Hadley, MA 01075 (United States); Clegg, Samuel M. [Los Alamos National Laboratory, P.O. Box 1663, MS J565, Los Alamos, NM 87545 (United States)

2015-05-01

The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO{sub 2}, Fe{sub 2}O{sub 3}, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na{sub 2}O, K{sub 2}O, TiO{sub 2}, and P{sub 2}O{sub 5}, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high
Offset Free Tracking Predictive Control Based on Dynamic PLS Framework

Directory of Open Access Journals (Sweden)

Jin Xin

2017-10-01

Full Text Available This paper develops an offset free tracking model predictive control based on a dynamic partial least square (PLS framework. First, state space model is used as the inner model of PLS to describe the dynamic system, where subspace identification method is used to identify the inner model. Based on the obtained model, multiple independent model predictive control (MPC controllers are designed. Due to the decoupling character of PLS, these controllers are running separately, which is suitable for distributed control framework. In addition, the increment of inner model output is considered in the cost function of MPC, which involves integral action in the controller. Hence, the offset free tracking performance is guaranteed. The results of an industry background simulation demonstrate the effectiveness of proposed method.
Interval ridge regression (iRR) as a fast and robust method for quantitative prediction and variable selection applied to edible oil adulteration.

Science.gov (United States)

Jović, Ozren; Smrečki, Neven; Popović, Zora

2016-04-01

A novel quantitative prediction and variable selection method called interval ridge regression (iRR) is studied in this work. The method is performed on six data sets of FTIR, two data sets of UV-vis and one data set of DSC. The obtained results show that models built with ridge regression on optimal variables selected with iRR significantly outperfom models built with ridge regression on all variables in both calibration (6 out of 9 cases) and validation (2 out of 9 cases). In this study, iRR is also compared with interval partial least squares regression (iPLS). iRR outperfomed iPLS in validation (insignificantly in 6 out of 9 cases and significantly in one out of 9 cases for poil, a well known health beneficial nutrient, is studied in this work by mixing it with cheap and widely used oils such as soybean (So) oil, rapeseed (R) oil and sunflower (Su) oil. Binary mixture sets of hempseed oil with these three oils (HSo, HR and HSu) and a ternary mixture set of H oil, R oil and Su oil (HRSu) were considered. The obtained accuracy indicates that using iRR on FTIR and UV-vis data, each particular oil can be very successfully quantified (in all 8 cases RMSEPoil (R(2)>0.99). Copyright © 2015 Elsevier B.V. All rights reserved.
Fault detection in processes represented by PLS models using an EWMA control scheme

KAUST Repository

Harrou, Fouzi; Nounou, Mohamed N.; Nounou, Hazem N.

2016-01-01

with that of the traditional PLS-based fault detection method through a simulated example involving various fault scenarios that could be encountered in real processes. The simulation results clearly show the effectiveness of the proposed method over the conventional PLS
Sparse kernel orthonormalized PLS for feature extraction in large datasets

DEFF Research Database (Denmark)

Arenas-García, Jerónimo; Petersen, Kaare Brandt; Hansen, Lars Kai

2006-01-01

In this paper we are presenting a novel multivariate analysis method for large scale problems. Our scheme is based on a novel kernel orthonormalized partial least squares (PLS) variant for feature extraction, imposing sparsity constrains in the solution to improve scalability. The algorithm...... is tested on a benchmark of UCI data sets, and on the analysis of integrated short-time music features for genre prediction. The upshot is that the method has strong expressive power even with rather few features, is clearly outperforming the ordinary kernel PLS, and therefore is an appealing method...
Determination of carbohydrates present in Saccharomyces cerevisiae using mid-infrared spectroscopy and partial least squares regression

OpenAIRE

Plata, Maria R.; Koch, Cosima; Wechselberger, Patrick; Herwig, Christoph; Lendl, Bernhard

2013-01-01

A fast and simple method to control variations in carbohydrate composition of Saccharomyces cerevisiae, baker's yeast, during fermentation was developed using mid-infrared (mid-IR) spectroscopy. The method allows for precise and accurate determinations with minimal or no sample preparation and reagent consumption based on mid-IR spectra and partial least squares (PLS) regression. The PLS models were developed employing the results from reference analysis of the yeast cells. The reference anal...
Helium Leak Test for the PLS Storage Ring Chamber

International Nuclear Information System (INIS)

Choi, M. H.; Kim, H. J.; Choi, W. C.

1993-01-01

The storage ring vacuum system for the Pohang Light Source (PLS) has been designed to maintain the vacuum pressure of 10 1 0 Torr which requires UHV welding to have helium leak rate less than 1x10 1 0 Torr·L/sec. In order to develop new technique (PLS) welding technique), a prototype vacuum chamber has been welded by using Tungsten Inert Gas welding method and all the welded joints have been tested with a non-destructive method, so called helium leak detection, to investigate the vacuum tightness of the weld joints. The test was performed with a detection limit of 1x10 1 0 Torr·L/sec for helium and no detectable leaks were found for all the welded joints. Thus the performance of welding technique is proven to meet the criteria of helium leak rate required in the PLS Storage Ring. Both the principle and the procedure for the helium leak detection are also discussed
Genetic algorithm-based wavelength selection in multicomponent spectrophotometric determination by PLS: Application on sulfamethoxazole and trimethoprim mixture in bovine milk

Directory of Open Access Journals (Sweden)

Givianrad Hadi Mohammad

2013-01-01

Full Text Available The simultaneous determination of sulfamethoxazole (SMX and trimethoprim (TMP mixtures in bovine milk by spectrophotometric method is a difficult problem in analytical chemistry, due to spectral interferences. By means of multivariate calibration methods, such as partial least square (PLS regression, it is possible to obtain a model adjusted to the concentration values of the mixtures used in the calibration range. Genetic algorithm (GA is a suitable method for selecting wavelengths for PLS calibration of mixtures with almost identical spectra without loss of prediction capacity using the spectrophotometric method. In this study, the calibration model based on absorption spectra in the 200-400 nm range for 25 different mixtures of SMX and TMP Calibration matrices were formed form samples containing 0.25-20 and 0.3-21 μg mL-1 for SMX and TMP, at pH=10, respectively. The root mean squared error of deviation (RMSED for SMX and TMP with PLS and genetic algorithm partial least square (GAPLS were 0.242, 0.066 μgmL-1 and 0.074, 0.027 μg mL-1, respectively. This procedure was allowed the simultaneous determination of SMX and TMP in synthetic and real samples and good reliability of the determination was proved.
Partial Least Squares Strukturgleichungsmodellierung (PLS-SEM)

DEFF Research Database (Denmark)

Hair, Joseph F.; Hult, G. Tomas M.; Ringle, Christian M.

(PLS-SEM) hat sich in der wirtschafts- und sozialwissenschaftlichen Forschung als geeignetes Verfahren zur Schätzung von Kausalmodellen behauptet. Dank der Anwenderfreundlichkeit des Verfahrens und der vorhandenen Software ist es inzwischen auch in der Praxis etabliert. Dieses Buch liefert eine...... anwendungsorientierte Einführung in die PLS-SEM. Der Fokus liegt auf den Grundlagen des Verfahrens und deren praktischer Umsetzung mit Hilfe der SmartPLS-Software. Das Konzept des Buches setzt dabei auf einfache Erläuterungen statistischer Ansätze und die anschauliche Darstellung zahlreicher Anwendungsbeispiele anhand...... einer einheitlichen Fallstudie. Viele Grafiken, Tabellen und Illustrationen erleichtern das Verständnis der PLS-SEM. Zudem werden dem Leser herunterladbare Datensätze, Aufgaben und weitere Fachartikel zur Vertiefung angeboten. Damit eignet sich das Buch hervorragend für Studierende, Forscher und...
Robust PLS approach for KPI-related prediction and diagnosis against outliers and missing data

Science.gov (United States)

Yin, Shen; Wang, Guang; Yang, Xu

2014-07-01

In practical industrial applications, the key performance indicator (KPI)-related prediction and diagnosis are quite important for the product quality and economic benefits. To meet these requirements, many advanced prediction and monitoring approaches have been developed which can be classified into model-based or data-driven techniques. Among these approaches, partial least squares (PLS) is one of the most popular data-driven methods due to its simplicity and easy implementation in large-scale industrial process. As PLS is totally based on the measured process data, the characteristics of the process data are critical for the success of PLS. Outliers and missing values are two common characteristics of the measured data which can severely affect the effectiveness of PLS. To ensure the applicability of PLS in practical industrial applications, this paper introduces a robust version of PLS to deal with outliers and missing values, simultaneously. The effectiveness of the proposed method is finally demonstrated by the application results of the KPI-related prediction and diagnosis on an industrial benchmark of Tennessee Eastman process.
Quantitative analysis of glycated albumin in serum based on ATR-FTIR spectrum combined with SiPLS and SVM.

Science.gov (United States)

Li, Yuanpeng; Li, Fucui; Yang, Xinhao; Guo, Liu; Huang, Furong; Chen, Zhenqiang; Chen, Xingdan; Zheng, Shifu

2018-08-05

A rapid quantitative analysis model for determining the glycated albumin (GA) content based on Attenuated total reflectance (ATR)-Fourier transform infrared spectroscopy (FTIR) combining with linear SiPLS and nonlinear SVM has been developed. Firstly, the real GA content in human serum was determined by GA enzymatic method, meanwhile, the ATR-FTIR spectra of serum samples from the population of health examination were obtained. The spectral data of the whole spectra mid-infrared region (4000-600 cm -1 ) and GA's characteristic region (1800-800 cm -1 ) were used as the research object of quantitative analysis. Secondly, several preprocessing steps including first derivative, second derivative, variable standardization and spectral normalization, were performed. Lastly, quantitative analysis regression models were established by using SiPLS and SVM respectively. The SiPLS modeling results are as follows: root mean square error of cross validation (RMSECV T ) = 0.523 g/L, calibration coefficient (R C ) = 0.937, Root Mean Square Error of Prediction (RMSEP T ) = 0.787 g/L, and prediction coefficient (R P ) = 0.938. The SVM modeling results are as follows: RMSECV T  = 0.0048 g/L, R C  = 0.998, RMSEP T  = 0.442 g/L, and R p  = 0.916. The results indicated that the model performance was improved significantly after preprocessing and optimization of characteristic regions. While modeling performance of nonlinear SVM was considerably better than that of linear SiPLS. Hence, the quantitative analysis model for GA in human serum based on ATR-FTIR combined with SiPLS and SVM is effective. And it does not need sample preprocessing while being characterized by simple operations and high time efficiency, providing a rapid and accurate method for GA content determination. Copyright © 2018 Elsevier B.V. All rights reserved.
Determinação simultânea dos teores de cinza e proteína em farinha de trigo empregando NIRR-PLS e DRIFT-PLS Simultaneous determination of ash content and protein in wheat flour using infrared reflection techniques and partial least-squares regression (PLS

Directory of Open Access Journals (Sweden)

Marco Flôres Ferrão

2004-09-01

Full Text Available As técnicas de espectroscopia por reflexão no infravermelho próximo (NIRRS e por reflexão difusa no infravermelho médio com transformada de Fourier (DRIFTS foram empregadas com o método de regressão multivariado por mínimos quadrados parciais (PLS para a determinação simultânea dos teores de proteína e cinza em amostras de farinha de trigo da variedade Triticum aestivum L. Foram coletados espectros no infravermelho em duplicata de 100 amostras, empregando-se acessórios de reflexão difusa. Os teores de proteína (8,85-13,23% e cinza (0,330-1,287%, empregados como referência, foram determinados pelo método Kjeldhal e método gravimétrico, respectivamente. Os dados espectrais foram utilizados no formato log(1/R, bem como suas derivadas de primeira e segunda ordem, sendo pré-processados usando-se os dados centrados na média (MC ou escalados pela variância (VS ou ambos. Cinqüenta e cinco amostras foram usadas para calibração e 45 para validação dos modelos, adotando-se como critério de construção os valores mínimos do erro padrão de calibração (SEC e do erro padrão de validação (SEV. Estes valores foram inferiores a 0,33% para proteína e a 0,07% para cinza. Os métodos desenvolvidos apresentam como vantagens a não agressão ao ambiente, bem como permitem uma determinação direta, simultânea, rápida e não destrutiva dos teores de proteína e cinza em amostras de farinha de trigo.Partial Least Square (PLS multivariate calibration associated to Near Infrared Reflection Spectroscopy (NIRRS or Diffuse Reflectance Infrared Fourier Transform Spectroscopy (DRIFTS were used to establish methods for simultaneous determination of protein and ash content on commercial wheat flour samples of Triticum aestivum L. Duplicate spectra of 100 samples with protein content between 8.85-13.23% (Kjeldahl method and ash content between 0.330-1.287% (gravimetric method were employed to build calibration methods. The spectra were used
Involvement of PlsX and the acyl-phosphate dependent sn-glycerol-3-phosphate acyltransferase PlsY in the initial stage of glycerolipid synthesis in Bacillus subtilis.

Science.gov (United States)

Hara, Yoshinori; Seki, Masahide; Matsuoka, Satoshi; Hara, Hiroshi; Yamashita, Atsushi; Matsumoto, Kouji

2008-12-01

The gene responsible for the first acylation of sn-glycerol-3-phosphate (G3P) in Bacillus subtilis has not yet been determined with certainty. The product of this first acylation, lysophosphatidic acid (LPA), is subsequently acylated again to form phosphatidic acid (PA), the primary precursor to membrane glycerolipids. A novel G3P acyltransferase (GPAT), the gene product of plsY, which uses acyl-phosphate formed by the plsX gene product, has recently been found to synthesize LPA in Streptococcus pneumoniae. We found that in B. subtilis growth arrests after repression of either a plsY homologue or a plsX homologue were overcome by expression of E. coli plsB, which encodes an acyl-acylcarrier protein (acyl-ACP)-dependent GPAT, although in the case of plsX repression a high level of plsB expression was required. B. subtilis has, therefore, a capability to use the acyl-ACP dependent GPAT of PlsB. Simultaneous expression of plsY and plsX suppressed the glycerol requirement of a strict glycerol auxotrophic derivative of the E. coli plsB26 mutant, although either one alone did not. Membrane fractions from B. subtilis cells catalyzed palmitoylphosphate-dependent acylation of [14C]-labeled G3P to synthesize [14C]-labeled LPA, whereas those from DeltaplsY cells did not. The results indicate unequivocally that PlsY is an acyl-phosphate dependent GPAT. Expression of plsX corrected the glycerol auxotrophy of a DeltaygiH (the deleted allele of an E. coli homologue of plsY) derivative of BB26-36 (plsB26 plsX50), suggesting an essential role of plsX other than substrate supply for acyl-phosphate dependent LPA synthesis. Two-hybrid examinations suggested that PlsY is associated with PlsX and that each may exist in multimeric form.

Status report on control system development for PLS

International Nuclear Information System (INIS)

Won, S.C.; Chang, S.S.; Huang, J.; Lee, J.W.; Lee, J.; Kim, J.H.

1992-01-01

Emphasizing reliability and flexibility, hierarchical architecture with distributed computers have been designed into the Pohang Light Source (PLS) computer control system. The PLS control system has four layers of computer systems connected via multiple data communication networks. This paper presents an overview of the PLS control system. (author)
Multiclass Prediction with Partial Least Square Regression for Gene Expression Data: Applications in Breast Cancer Intrinsic Taxonomy

Directory of Open Access Journals (Sweden)

Chi-Cheng Huang

2013-01-01

Full Text Available Multiclass prediction remains an obstacle for high-throughput data analysis such as microarray gene expression profiles. Despite recent advancements in machine learning and bioinformatics, most classification tools were limited to the applications of binary responses. Our aim was to apply partial least square (PLS regression for breast cancer intrinsic taxonomy, of which five distinct molecular subtypes were identified. The PAM50 signature genes were used as predictive variables in PLS analysis, and the latent gene component scores were used in binary logistic regression for each molecular subtype. The 139 prototypical arrays for PAM50 development were used as training dataset, and three independent microarray studies with Han Chinese origin were used for independent validation (n=535. The agreement between PAM50 centroid-based single sample prediction (SSP and PLS-regression was excellent (weighted Kappa: 0.988 within the training samples, but deteriorated substantially in independent samples, which could attribute to much more unclassified samples by PLS-regression. If these unclassified samples were removed, the agreement between PAM50 SSP and PLS-regression improved enormously (weighted Kappa: 0.829 as opposed to 0.541 when unclassified samples were analyzed. Our study ascertained the feasibility of PLS-regression in multi-class prediction, and distinct clinical presentations and prognostic discrepancies were observed across breast cancer molecular subtypes.
Combining pharmacophore fingerprints and PLS-discriminant analysis for virtual screening and SAR elucidation

DEFF Research Database (Denmark)

Askjær, Sune; Langgård, Morten

2008-01-01

The criterion of success for the initial stages of a ligand-based drug-discovery project is dual. First, a set of suitable lead compounds has to be identified. Second, a level of a preliminary structure-activity relationship (SAR) of the identified ligands has to be established in order to guide ...... by the protein-binding site known from X-ray complexes. The result of this analysis assists in explaining the efficiency of 2D pharmacophore fingerprints as descriptors in virtual screening....... the lead optimization toward a final drug candidate. This paper presents a combined approach to solving these two problems of ligand-based virtual screening and elucidation of SAR based on interplay between pharmacophore fingerprints and interpretation of PLS-discriminant analysis (PLS-DA) models....... The virtual screening capability of the PLS-DA method is compared to group fusion maximum similarity searching in a test using four graph-based pharmacophore fingerprints over a range of 10 diverse targets. The PLS-DA method was generally found to do better than the Smax method. The GpiDAPH3 and PCH...
Handbook of Partial Least Squares Concepts, Methods and Applications

CERN Document Server

Vinzi, Vincenzo Esposito; Henseler, Jörg

2010-01-01

This handbook provides a comprehensive overview of Partial Least Squares (PLS) methods with specific reference to their use in marketing and with a discussion of the directions of current research and perspectives. It covers the broad area of PLS methods, from regression to structural equation modeling applications, software and interpretation of results. The handbook serves both as an introduction for those without prior knowledge of PLS and as a comprehensive reference for researchers and practitioners interested in the most recent advances in PLS methodology.
Prediction of Caffeine Content in Java Preanger Coffee Beans by NIR Spectroscopy Using PLS and MLR Method

Science.gov (United States)

Budiastra, I. W.; Sutrisno; Widyotomo, S.; Ayu, P. C.

2018-05-01

Caffeine is one of important components in coffee that contributes to the coffee beverages flavor. Caffeine concentration in coffee bean is usually determined by chemical method which is time consuming and destructive method. A nondestructive method using NIR spectroscopy was successfully applied to determine the caffeine concentration of Arabica gayo coffee bean. In this study, NIR Spectroscopy was assessed to determine the caffeine concentration of java preanger coffee bean. A hundred samples, each consist of 96 g coffee beans were prepared for reflectance and chemical measurement. Reflectance of the sample was measured by FT-NIR spectrometer in the wavelength of 1000-2500 nm (10000-4000 cm-1) followed by determination of caffeine content using LCMS method. Calibration of NIR spectra and the caffeine content was carried out using PLS and MLR methods. Several spectra data processing was conducted to increase the accuracy of prediction. The result of the study showed that caffeine content could be determined by PLS model using 7 factors and spectra data processing of combination of the first derivative and MSC of spectra absorbance (r = 0.946; CV = 1.54 %; RPD = 2.28). A lower accuracy was obtained by MLR model consisted of three caffeine and other four absorption wavelengths (r = 0.683; CV = 3.31%; RPD = 1.18).
Simultaneous measurement of two enzyme activities using infrared spectroscopy: A comparative evaluation of PARAFAC, TUCKER and N-PLS modeling.

Science.gov (United States)

Baum, Andreas; Hansen, Per Waaben; Meyer, Anne S; Mikkelsen, Jørn Dalgaard

2013-08-06

Enzymes are used in many processes to release fermentable sugars for green production of biofuel, or the refinery of biomass for extraction of functional food ingredients such as pectin or prebiotic oligosaccharides. The complex biomasses may, however, require a multitude of specific enzymes which are active on specific substrates generating a multitude of products. In this paper we use the plant polymer, pectin, to present a method to quantify enzyme activity of two pectolytic enzymes by monitoring their superimposed spectral evolutions simultaneously. The data is analyzed by three chemometric multiway methods, namely PARAFAC, TUCKER3 and N-PLS, to establish simultaneous enzyme activity assays for pectin lyase and pectin methyl esterase. Correlation coefficients Rpred(2) for prediction test sets are 0.48, 0.96 and 0.96 for pectin lyase and 0.70, 0.89 and 0.89 for pectin methyl esterase, respectively. The retrieved models are compared and prediction test sets show that especially TUCKER3 performs well, even in comparison to the supervised regression method N-PLS. Copyright © 2013 Elsevier B.V. All rights reserved.
PLS beam position measurement and feedback system

International Nuclear Information System (INIS)

Huang, J.Y.; Lee, J.; Park, M.K.; Kim, J.H.; Won, S.C.

1992-01-01

A real-time orbit correction system is proposed for the stabilization of beam orbit and photon beam positions in Pohang Light Source. PLS beam position monitoring system is designed to be VMEbus compatible to fit the real-time digital orbit feedback system. A VMEbus based subsystem control computer, Mil-1553B communication network and 12 BPM/PS machine interface units constitute digital part of the feedback system. With the super-stable PLS correction magnet power supply, power line frequency noise is almost filtered out and the dominant spectra of beam obtit fluctuations are expected to appear below 15 Hz. Using DSP board in SCC for the computation and using an appropriate compensation circuit for the phase delay by the vacuum chamber, PLS real-time orbit correction system is realizable without changing the basic structure of PLS computer control system. (author)
Hybrid ANN–PLS approach to scroll compressor thermodynamic performance prediction

International Nuclear Information System (INIS)

Tian, Z.; Gu, B.; Yang, L.; Lu, Y.

2015-01-01

In this paper, a scroll compressor thermodynamic performance prediction was carried out by applying a hybrid ANN–PLS model. Firstly, an experimental platform with second-refrigeration calorimeter was set up and steady-state scroll compressor data sets were collected from experiments. Then totally 148 data sets were introduced to train and verify the validity of the ANN–PLS model for predicting the scroll compressor parameters such as volumetric efficiency, refrigerant mass flow rate, discharge temperature and power consumption. The ANN–PLS model was determined with 5 hidden neurons and 7 latent variables through the training process. Ultimately, the ANN–PLS model showed better performance than the ANN model and the PLS model working separately. ANN–PLS predictions agree well with the experimental values with mean relative errors (MREs) in the range of 0.34–1.96%, correlation coefficients (R 2 ) in the range of 0.9703–0.9999 and very low root mean square errors (RMSEs). - Highlights: • Hybrid ANN–PLS is utilized to predict the thermodynamic performance of scroll compressor. • ANN–PLS model is determined with 5 hidden neurons and 7 latent variables. • ANN–PLS model demonstrates better performance than ANN and PLS working separately. • The values of MRE and RMSE are in the range of 0.34–1.96% and 0.9703–0.9999, respectively
Determination of boiling point of petrochemicals by gas chromatography-mass spectrometry and multivariate regression analysis of structural activity relationship.

Science.gov (United States)

Fakayode, Sayo O; Mitchell, Breanna S; Pollard, David A

2014-08-01

Accurate understanding of analyte boiling points (BP) is of critical importance in gas chromatographic (GC) separation and crude oil refinery operation in petrochemical industries. This study reported the first combined use of GC separation and partial-least-square (PLS1) multivariate regression analysis of petrochemical structural activity relationship (SAR) for accurate BP determination of two commercially available (D3710 and MA VHP) calibration gas mix samples. The results of the BP determination using PLS1 multivariate regression were further compared with the results of traditional simulated distillation method of BP determination. The developed PLS1 regression was able to correctly predict analytes BP in D3710 and MA VHP calibration gas mix samples, with a root-mean-square-%-relative-error (RMS%RE) of 6.4%, and 10.8% respectively. In contrast, the overall RMS%RE of 32.9% and 40.4%, respectively obtained for BP determination in D3710 and MA VHP using a traditional simulated distillation method were approximately four times larger than the corresponding RMS%RE of BP prediction using MRA, demonstrating the better predictive ability of MRA. The reported method is rapid, robust, and promising, and can be potentially used routinely for fast analysis, pattern recognition, and analyte BP determination in petrochemical industries. Copyright © 2014 Elsevier B.V. All rights reserved.
8th International Conference on Partial Least Squares and Related Methods

CERN Document Server

Vinzi, Vincenzo; Russolillo, Giorgio; Saporta, Gilbert; Trinchera, Laura

2016-01-01

This volume presents state of the art theories, new developments, and important applications of Partial Least Square (PLS) methods. The text begins with the invited communications of current leaders in the field who cover the history of PLS, an overview of methodological issues, and recent advances in regression and multi-block approaches. The rest of the volume comprises selected, reviewed contributions from the 8th International Conference on Partial Least Squares and Related Methods held in Paris, France, on 26-28 May, 2014. They are organized in four coherent sections: 1) new developments in genomics and brain imaging, 2) new and alternative methods for multi-table and path analysis, 3) advances in partial least square regression (PLSR), and 4) partial least square path modeling (PLS-PM) breakthroughs and applications. PLS methods are very versatile methods that are now used in areas as diverse as engineering, life science, sociology, psychology, brain imaging, genomics, and business among both academics ...
Prediction of gas chromatography/electron capture detector retention times of chlorinated pesticides, herbicides, and organohalides by multivariate chemometrics methods

International Nuclear Information System (INIS)

Ghasemi, Jahanbakhsh; Asadpour, Saeid; Abdolmaleki, Azizeh

2007-01-01

A quantitative structure-retention relationship (QSRR) study, has been carried out on the gas chromatograph/electron capture detector (GC/ECD) system retention times (t R s) of 38 diverse chlorinated pesticides, herbicides, and organohalides by using molecular structural descriptors. Modeling of retention times of these compounds as a function of the theoretically derived descriptors was established by multiple linear regression (MLR) and partial least squares (PLS) regression. The stepwise regression using SPSS was used for the selection of the variables that resulted in the best-fitted models. Appropriate models with low standard errors and high correlation coefficients were obtained. Three types of molecular descriptors including electronic, steric and thermodynamic were used to develop a quantitative relationship between the retention times and structural properties. MLR and PLS analysis has been carried out to derive the best QSRR models. After variables selection, MLR and PLS methods used with leave-one-out cross validation for building the regression models. The predictive quality of the QSRR models were tested for an external prediction set of 12 compounds randomly chosen from 38 compounds. The PLS regression method was used to model the structure-retention relationships, more accurately. However, the results surprisingly showed more or less the same quality for MLR and PLS modeling according to squared regression coefficients R 2 which were 0.951 and 0.948 for MLR and PLS, respectively
[MEG]PLS: A pipeline for MEG data analysis and partial least squares statistics.

Science.gov (United States)

Cheung, Michael J; Kovačević, Natasa; Fatima, Zainab; Mišić, Bratislav; McIntosh, Anthony R

2016-01-01

The emphasis of modern neurobiological theories has recently shifted from the independent function of brain areas to their interactions in the context of whole-brain networks. As a result, neuroimaging methods and analyses have also increasingly focused on network discovery. Magnetoencephalography (MEG) is a neuroimaging modality that captures neural activity with a high degree of temporal specificity, providing detailed, time varying maps of neural activity. Partial least squares (PLS) analysis is a multivariate framework that can be used to isolate distributed spatiotemporal patterns of neural activity that differentiate groups or cognitive tasks, to relate neural activity to behavior, and to capture large-scale network interactions. Here we introduce [MEG]PLS, a MATLAB-based platform that streamlines MEG data preprocessing, source reconstruction and PLS analysis in a single unified framework. [MEG]PLS facilitates MRI preprocessing, including segmentation and coregistration, MEG preprocessing, including filtering, epoching, and artifact correction, MEG sensor analysis, in both time and frequency domains, MEG source analysis, including multiple head models and beamforming algorithms, and combines these with a suite of PLS analyses. The pipeline is open-source and modular, utilizing functions from FieldTrip (Donders, NL), AFNI (NIMH, USA), SPM8 (UCL, UK) and PLScmd (Baycrest, CAN), which are extensively supported and continually developed by their respective communities. [MEG]PLS is flexible, providing both a graphical user interface and command-line options, depending on the needs of the user. A visualization suite allows multiple types of data and analyses to be displayed and includes 4-D montage functionality. [MEG]PLS is freely available under the GNU public license (http://meg-pls.weebly.com). Copyright © 2015 Elsevier Inc. All rights reserved.
Measurement of process variables in solid-state fermentation of wheat straw using FT-NIR spectroscopy and synergy interval PLS algorithm

Science.gov (United States)

Jiang, Hui; Liu, Guohai; Mei, Congli; Yu, Shuang; Xiao, Xiahong; Ding, Yuhan

2012-11-01

The feasibility of rapid determination of the process variables (i.e. pH and moisture content) in solid-state fermentation (SSF) of wheat straw using Fourier transform near infrared (FT-NIR) spectroscopy was studied. Synergy interval partial least squares (siPLS) algorithm was implemented to calibrate regression model. The number of PLS factors and the number of subintervals were optimized simultaneously by cross-validation. The performance of the prediction model was evaluated according to the root mean square error of cross-validation (RMSECV), the root mean square error of prediction (RMSEP) and the correlation coefficient (R). The measurement results of the optimal model were obtained as follows: RMSECV = 0.0776, Rc = 0.9777, RMSEP = 0.0963, and Rp = 0.9686 for pH model; RMSECV = 1.3544% w/w, Rc = 0.8871, RMSEP = 1.4946% w/w, and Rp = 0.8684 for moisture content model. Finally, compared with classic PLS and iPLS models, the siPLS model revealed its superior performance. The overall results demonstrate that FT-NIR spectroscopy combined with siPLS algorithm can be used to measure process variables in solid-state fermentation of wheat straw, and NIR spectroscopy technique has a potential to be utilized in SSF industry.
Impact of multicollinearity on small sample hydrologic regression models

Science.gov (United States)

Kroll, Charles N.; Song, Peter

2013-06-01

Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.
OPLS statistical model versus linear regression to assess sonographic predictors of stroke prognosis.

Science.gov (United States)

Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi

2012-01-01

The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.
Combining Partial Least Squares and the Gradient-Boosting Method for Soil Property Retrieval Using Visible Near-Infrared Shortwave Infrared Spectra

Directory of Open Access Journals (Sweden)

Lanfa Liu

2017-12-01

Full Text Available Soil spectroscopy has experienced a tremendous increase in soil property characterisation, and can be used not only in the laboratory but also from the space (imaging spectroscopy. Partial least squares (PLS regression is one of the most common approaches for the calibration of soil properties using soil spectra. Besides functioning as a calibration method, PLS can also be used as a dimension reduction tool, which has scarcely been studied in soil spectroscopy. PLS components retained from high-dimensional spectral data can further be explored with the gradient-boosted decision tree (GBDT method. Three soil sample categories were extracted from the Land Use/Land Cover Area Frame Survey (LUCAS soil library according to the type of land cover (woodland, grassland, and cropland. First, PLS regression and GBDT were separately applied to build the spectroscopic models for soil organic carbon (OC, total nitrogen content (N, and clay for each soil category. Then, PLS-derived components were used as input variables for the GBDT model. The results demonstrate that the combined PLS-GBDT approach has better performance than PLS or GBDT alone. The relative important variables for soil property estimation revealed by the proposed method demonstrated that the PLS method is a useful dimension reduction tool for soil spectra to retain target-related information.
Robust Ultraviolet-Visible (UV-Vis) Partial Least-Squares (PLS) Models for Tannin Quantification in Red Wine.

Science.gov (United States)

Aleixandre-Tudo, José Luis; Nieuwoudt, Helené; Aleixandre, José Luis; Du Toit, Wessel J

2015-02-04

The validation of ultraviolet-visible (UV-vis) spectroscopy combined with partial least-squares (PLS) regression to quantify red wine tannins is reported. The methylcellulose precipitable (MCP) tannin assay and the bovine serum albumin (BSA) tannin assay were used as reference methods. To take the high variability of wine tannins into account when the calibration models were built, a diverse data set was collected from samples of South African red wines that consisted of 18 different cultivars, from regions spanning the wine grape-growing areas of South Africa with their various sites, climates, and soils, ranging in vintage from 2000 to 2012. A total of 240 wine samples were analyzed, and these were divided into a calibration set (n = 120) and a validation set (n = 120) to evaluate the predictive ability of the models. To test the robustness of the PLS calibration models, the predictive ability of the classifying variables cultivar, vintage year, and experimental versus commercial wines was also tested. In general, the statistics obtained when BSA was used as a reference method were slightly better than those obtained with MCP. Despite this, the MCP tannin assay should also be considered as a valid reference method for developing PLS calibrations. The best calibration statistics for the prediction of new samples were coefficient of correlation (R 2 val) = 0.89, root mean standard error of prediction (RMSEP) = 0.16, and residual predictive deviation (RPD) = 3.49 for MCP and R 2 val = 0.93, RMSEP = 0.08, and RPD = 4.07 for BSA, when only the UV region (260-310 nm) was selected, which also led to a faster analysis time. In addition, a difference in the results obtained when the predictive ability of the classifying variables vintage, cultivar, or commercial versus experimental wines was studied suggests that tannin composition is highly affected by many factors. This study also discusses the correlations in tannin values between the methylcellulose and protein
Parafac and PLS Applied to Determination of Captopril in Pharmaceutical Preparation and Biological Fluids by Ultraviolet Spectrophotometry

International Nuclear Information System (INIS)

Niazi, A.; Ghasemi, N.

2007-01-01

A new ultraviolet spectrophotometric method has been developed for the direct qualitative determination of captopril in pharmaceutical preparation and biological fluids such as human plasma and urine samples. The method was accomplished based on parallel factor analysis (PARAFAC) and partial least squares (PLS). The study was carried out in the pH range from 2.0 to 12.8 and with a concentration from 0.70 to 61.50 μg ml -1 of captopril. Multivariate calibration models PLS at various pH and PARAFAC were elaborated from ultraviolet spectra deconvolution and captopril determination. The best models for this system were obtained with PARAFAC and PLS at pH = 2.04 (PLS-PH2). The applications of the method for the determination of real samples were evaluated by analysis of captopril in pharmaceutical preparations and biological (human plasma and urine) fluids with satisfactory results. The accuracy of the method, evaluated through the root mean square error of prediction (RMSEP), was 0.58 for captopril with PARAFAC and 0.67 for captopril with PLS-PH2 model. Acidity constant of captopril at 25 0 C and ionic strength of 0.1 M have also been determined spectrophotometrically. The obtained pK a values of captopril are 3.90 ± 0.05 and 10.03 ± 0.08 for pK a1 and pK a2 , respectively
A heuristic approach using multiple criteria for environmentally benign 3PLs selection

Science.gov (United States)

Kongar, Elif

2005-11-01

Maintaining competitiveness in an environment where price and quality differences between competing products are disappearing depends on the company's ability to reduce costs and supply time. Timely responses to rapidly changing market conditions require an efficient Supply Chain Management (SCM). Outsourcing logistics to third-party logistics service providers (3PLs) is one commonly used way of increasing the efficiency of logistics operations, while creating a more "core competency focused" business environment. However, this alone may not be sufficient. Due to recent environmental regulations and growing public awareness regarding environmental issues, 3PLs need to be not only efficient but also environmentally benign to maintain companies' competitiveness. Even though an efficient and environmentally benign combination of 3PLs can theoretically be obtained using exhaustive search algorithms, heuristics approaches to the selection process may be superior in terms of the computational complexity. In this paper, a hybrid approach that combines a multiple criteria Genetic Algorithm (GA) with Linear Physical Weighting Algorithm (LPPW) to be used in efficient and environmentally benign 3PLs is proposed. A numerical example is also provided to illustrate the method and the analyses.
The comparison of partial least squares and principal component regression in simultaneous spectrophotometric determination of ascorbic acid, dopamine and uric acid in real samples

Directory of Open Access Journals (Sweden)

Habiboallah Khajehsharifi

2017-05-01

Full Text Available Partial least squares (PLS1 and principal component regression (PCR are two multivariate calibration methods that allow simultaneous determination of several analytes in spite of their overlapping spectra. In this research, a spectrophotometric method using PLS1 is proposed for the simultaneous determination of ascorbic acid (AA, dopamine (DA and uric acid (UA. The linear concentration ranges for AA, DA and UA were 1.76–47.55, 0.57–22.76 and 1.68–28.58 (in μg mL−1, respectively. However, PLS1 and PCR were applied to design calibration set based on absorption spectra in the 250–320 nm range for 36 different mixtures of AA, DA and UA, in all cases, the PLS1 calibration method showed more quantitative prediction ability than PCR method. Cross validation method was used to select the optimum number of principal components (NPC. The NPC for AA, DA and UA was found to be 4 by PLS1 and 5, 12, 8 by PCR. Prediction error sum of squares (PRESS of AA, DA and UA were 1.2461, 1.1144, 2.3104 for PLS1 and 11.0563, 1.3819, 4.0956 for PCR, respectively. Satisfactory results were achieved for the simultaneous determination of AA, DA and UA in some real samples such as human urine, serum and pharmaceutical formulations.

Sulfur Speciation of Crude Oils by Partial Least Squares Regression Modeling of Their Infrared Spectra

NARCIS (Netherlands)

de Peinder, P.; Visser, T.; Wagemans, R.W.P.; Blomberg, J.; Chaabani, H.; Soulimani, F.; Weckhuysen, B.M.

2013-01-01

Research has been carried out to determine the feasibility of partial least-squares regression (PLS) modeling of infrared (IR) spectra of crude oils as a tool for fast sulfur speciation. The study is a continuation of a previously developed method to predict long and short residue properties of
PLS and multicollinearity under conditions common in satisfaction studies

DEFF Research Database (Denmark)

Nielsen, Rikke; Kristensen, Kai; Eskildsen, Jacob Kjær

A number of studies have investigated the performance of the PLS path modelling algorithm in the presence of common empirical problems, such as model misspecification, skewness of manifest variables, missing values, and multicollinearity, and they have shown PLS to be quite robust (see e.g. Cassel...... et al., 1999; Kristensen, Eskildsen, 2005). However, most of the studies, including our own, have focused on somewhat simple models with very simple correlation structures. This paper extends the existing knowledge by investigating the effect of varying degrees of multicollinearity on the PLS model...
Desenvolvimento de Modelos de Regressão Multivariada para a Quantificação de Benzoilmetronidazol na Presença de seus Produtos de Degradação por Espectroscopia no Infravermelho Próximo

Directory of Open Access Journals (Sweden)

Willian Ricardo da Rosa de Almeida

2015-12-01

Full Text Available Benzoyl metronidazole (BMZ is a drug with antiparasitic and antibacterial activity available in the form of pediatric suspensions. The BMZ main degradation products are metronidazole and benzoic acid, and there are no reports in the literature on the determination of BMZ in the presence of its degradation products using near infrared spectroscopy. Therefore, in this study a method for determining the content of BMZ pharmaceutical ingredient in the presence of its main degradation products by near infrared spectroscopy associated with multivariate calibration were to develop. Regression with variable selection methods such as partial least squares regression for interval (iPLS and partial least squares regression for synergism intervals (siPLS were applied in order to select spectral regions that produce models with smaller errors. The best model using the iPLS algorithm was obtained when the spectrum was divided into 12 sub-intervals and select a period 11 (RSEP% = 1.37. Once the spectrum has been divided into 16 intervals and combined subintervals 9, 13 and 18 yielded the best model for siPLS algorithm (RSEP = 1.30%. The proposed method can be considered selective; it allows determining the BMZ in the presence of its degradation products. DOI: http://dx.doi.org/10.17807/orbital.v7i4.741
Support vector machine regression (SVR/LS-SVM)--an alternative to neural networks (ANN) for analytical chemistry? Comparison of nonlinear methods on near infrared (NIR) spectroscopy data.

Science.gov (United States)

Balabin, Roman M; Lomakina, Ekaterina I

2011-04-21

In this study, we make a general comparison of the accuracy and robustness of five multivariate calibration models: partial least squares (PLS) regression or projection to latent structures, polynomial partial least squares (Poly-PLS) regression, artificial neural networks (ANNs), and two novel techniques based on support vector machines (SVMs) for multivariate data analysis: support vector regression (SVR) and least-squares support vector machines (LS-SVMs). The comparison is based on fourteen (14) different datasets: seven sets of gasoline data (density, benzene content, and fractional composition/boiling points), two sets of ethanol gasoline fuel data (density and ethanol content), one set of diesel fuel data (total sulfur content), three sets of petroleum (crude oil) macromolecules data (weight percentages of asphaltenes, resins, and paraffins), and one set of petroleum resins data (resins content). Vibrational (near-infrared, NIR) spectroscopic data are used to predict the properties and quality coefficients of gasoline, biofuel/biodiesel, diesel fuel, and other samples of interest. The four systems presented here range greatly in composition, properties, strength of intermolecular interactions (e.g., van der Waals forces, H-bonds), colloid structure, and phase behavior. Due to the high diversity of chemical systems studied, general conclusions about SVM regression methods can be made. We try to answer the following question: to what extent can SVM-based techniques replace ANN-based approaches in real-world (industrial/scientific) applications? The results show that both SVR and LS-SVM methods are comparable to ANNs in accuracy. Due to the much higher robustness of the former, the SVM-based approaches are recommended for practical (industrial) application. This has been shown to be especially true for complicated, highly nonlinear objects.
Global classification of human facial healthy skin using PLS discriminant analysis and clustering analysis.

Science.gov (United States)

Guinot, C; Latreille, J; Tenenhaus, M; Malvy, D J

2001-04-01

Today's classifications of healthy skin are predominantly based on a very limited number of skin characteristics, such as skin oiliness or susceptibility to sun exposure. The aim of the present analysis was to set up a global classification of healthy facial skin, using mathematical models. This classification is based on clinical, biophysical skin characteristics and self-reported information related to the skin, as well as the results of a theoretical skin classification assessed separately for the frontal and the malar zones of the face. In order to maximize the predictive power of the models with a minimum of variables, the Partial Least Square (PLS) discriminant analysis method was used. The resulting PLS components were subjected to clustering analyses to identify the plausible number of clusters and to group the individuals according to their proximities. Using this approach, four PLS components could be constructed and six clusters were found relevant. So, from the 36 hypothetical combinations of the theoretical skin types classification, we tended to a strengthened six classes proposal. Our data suggest that the association of the PLS discriminant analysis and the clustering methods leads to a valid and simple way to classify healthy human skin and represents a potentially useful tool for cosmetic and dermatological research.
Group-wise partial least square regression

NARCIS (Netherlands)

Camacho, José; Saccenti, Edoardo

2018-01-01

This paper introduces the group-wise partial least squares (GPLS) regression. GPLS is a new sparse PLS technique where the sparsity structure is defined in terms of groups of correlated variables, similarly to what is done in the related group-wise principal component analysis. These groups are
A consensus successive projections algorithm--multiple linear regression method for analyzing near infrared spectra.

Science.gov (United States)

Liu, Ke; Chen, Xiaojing; Li, Limin; Chen, Huiling; Ruan, Xiukai; Liu, Wenbin

2015-02-09

The successive projections algorithm (SPA) is widely used to select variables for multiple linear regression (MLR) modeling. However, SPA used only once may not obtain all the useful information of the full spectra, because the number of selected variables cannot exceed the number of calibration samples in the SPA algorithm. Therefore, the SPA-MLR method risks the loss of useful information. To make a full use of the useful information in the spectra, a new method named "consensus SPA-MLR" (C-SPA-MLR) is proposed herein. This method is the combination of consensus strategy and SPA-MLR method. In the C-SPA-MLR method, SPA-MLR is used to construct member models with different subsets of variables, which are selected from the remaining variables iteratively. A consensus prediction is obtained by combining the predictions of the member models. The proposed method is evaluated by analyzing the near infrared (NIR) spectra of corn and diesel. The results of C-SPA-MLR method showed a better prediction performance compared with the SPA-MLR and full-spectra PLS methods. Moreover, these results could serve as a reference for combination the consensus strategy and other variable selection methods when analyzing NIR spectra and other spectroscopic techniques. Copyright © 2014 Elsevier B.V. All rights reserved.
Predicting blood β-hydroxybutyrate using milk Fourier transform infrared spectrum, milk composition, and producer-reported variables with multiple linear regression, partial least squares regression, and artificial neural network.

Science.gov (United States)

Pralle, R S; Weigel, K W; White, H M

2018-05-01

Prediction of postpartum hyperketonemia (HYK) using Fourier transform infrared (FTIR) spectrometry analysis could be a practical diagnostic option for farms because these data are now available from routine milk analysis during Dairy Herd Improvement testing. The objectives of this study were to (1) develop and evaluate blood β-hydroxybutyrate (BHB) prediction models using multivariate linear regression (MLR), partial least squares regression (PLS), and artificial neural network (ANN) methods and (2) evaluate whether milk FTIR spectrum (mFTIR)-based models are improved with the inclusion of test-day variables (mTest; milk composition and producer-reported data). Paired blood and milk samples were collected from multiparous cows 5 to 18 d postpartum at 3 Wisconsin farms (3,629 observations from 1,013 cows). Blood BHB concentration was determined by a Precision Xtra meter (Abbot Diabetes Care, Alameda, CA), and milk samples were analyzed by a privately owned laboratory (AgSource, Menomonie, WI) for components and FTIR spectrum absorbance. Producer-recorded variables were extracted from farm management software. A blood BHB ≥1.2 mmol/L was considered HYK. The data set was divided into a training set (n = 3,020) and an external testing set (n = 609). Model fitting was implemented with JMP 12 (SAS Institute, Cary, NC). A 5-fold cross-validation was performed on the training data set for the MLR, PLS, and ANN prediction methods, with square root of blood BHB as the dependent variable. Each method was fitted using 3 combinations of variables: mFTIR, mTest, or mTest + mFTIR variables. Models were evaluated based on coefficient of determination, root mean squared error, and area under the receiver operating characteristic curve. Four models (PLS-mTest + mFTIR, ANN-mFTIR, ANN-mTest, and ANN-mTest + mFTIR) were chosen for further evaluation in the testing set after fitting to the full training set. In the cross-validation analysis, model fit was greatest for ANN, followed
Comparison of partial least squares and lasso regression techniques as applied to laser-induced breakdown spectroscopy of geological samples

International Nuclear Information System (INIS)

Dyar, M.D.; Carmosino, M.L.; Breves, E.A.; Ozanne, M.V.; Clegg, S.M.; Wiens, R.C.

2012-01-01

A remote laser-induced breakdown spectrometer (LIBS) designed to simulate the ChemCam instrument on the Mars Science Laboratory Rover Curiosity was used to probe 100 geologic samples at a 9-m standoff distance. ChemCam consists of an integrated remote LIBS instrument that will probe samples up to 7 m from the mast of the rover and a remote micro-imager (RMI) that will record context images. The elemental compositions of 100 igneous and highly-metamorphosed rocks are determined with LIBS using three variations of multivariate analysis, with a goal of improving the analytical accuracy. Two forms of partial least squares (PLS) regression are employed with finely-tuned parameters: PLS-1 regresses a single response variable (elemental concentration) against the observation variables (spectra, or intensity at each of 6144 spectrometer channels), while PLS-2 simultaneously regresses multiple response variables (concentrations of the ten major elements in rocks) against the observation predictor variables, taking advantage of natural correlations between elements. Those results are contrasted with those from the multivariate regression technique of the least absolute shrinkage and selection operator (lasso), which is a penalized shrunken regression method that selects the specific channels for each element that explain the most variance in the concentration of that element. To make this comparison, we use results of cross-validation and of held-out testing, and employ unscaled and uncentered spectral intensity data because all of the input variables are already in the same units. Results demonstrate that the lasso, PLS-1, and PLS-2 all yield comparable results in terms of accuracy for this dataset. However, the interpretability of these methods differs greatly in terms of fundamental understanding of LIBS emissions. PLS techniques generate principal components, linear combinations of intensities at any number of spectrometer channels, which explain as much variance in the
On-line monitoring of extraction process of Flos Lonicerae Japonicae using near infrared spectroscopy combined with synergy interval PLS and genetic algorithm

Science.gov (United States)

Yang, Yue; Wang, Lei; Wu, Yongjiang; Liu, Xuesong; Bi, Yuan; Xiao, Wei; Chen, Yong

2017-07-01

There is a growing need for the effective on-line process monitoring during the manufacture of traditional Chinese medicine to ensure quality consistency. In this study, the potential of near infrared (NIR) spectroscopy technique to monitor the extraction process of Flos Lonicerae Japonicae was investigated. A new algorithm of synergy interval PLS with genetic algorithm (Si-GA-PLS) was proposed for modeling. Four different PLS models, namely Full-PLS, Si-PLS, GA-PLS, and Si-GA-PLS, were established, and their performances in predicting two quality parameters (viz. total acid and soluble solid contents) were compared. In conclusion, Si-GA-PLS model got the best results due to the combination of superiority of Si-PLS and GA. For Si-GA-PLS, the determination coefficient (Rp2) and root-mean-square error for the prediction set (RMSEP) were 0.9561 and 147.6544 μg/ml for total acid, 0.9062 and 0.1078% for soluble solid contents, correspondingly. The overall results demonstrated that the NIR spectroscopy technique combined with Si-GA-PLS calibration is a reliable and non-destructive alternative method for on-line monitoring of the extraction process of TCM on the production scale.
Statistical Downscaling Output GCM Modeling with Continuum Regression and Pre-Processing PCA Approach

Directory of Open Access Journals (Sweden)

Sutikno Sutikno

2010-08-01

Full Text Available One of the climate models used to predict the climatic conditions is Global Circulation Models (GCM. GCM is a computer-based model that consists of different equations. It uses numerical and deterministic equation which follows the physics rules. GCM is a main tool to predict climate and weather, also it uses as primary information source to review the climate change effect. Statistical Downscaling (SD technique is used to bridge the large-scale GCM with a small scale (the study area. GCM data is spatial and temporal data most likely to occur where the spatial correlation between different data on the grid in a single domain. Multicollinearity problems require the need for pre-processing of variable data X. Continuum Regression (CR and pre-processing with Principal Component Analysis (PCA methods is an alternative to SD modelling. CR is one method which was developed by Stone and Brooks (1990. This method is a generalization from Ordinary Least Square (OLS, Principal Component Regression (PCR and Partial Least Square method (PLS methods, used to overcome multicollinearity problems. Data processing for the station in Ambon, Pontianak, Losarang, Indramayu and Yuntinyuat show that the RMSEP values and R2 predict in the domain 8x8 and 12x12 by uses CR method produces results better than by PCR and PLS.
An extension of PPLS-DA for classification and comparison to ordinary PLS-DA.

Directory of Open Access Journals (Sweden)

Anna Telaar

Full Text Available Classification studies are widely applied, e.g. in biomedical research to classify objects/patients into predefined groups. The goal is to find a classification function/rule which assigns each object/patient to a unique group with the greatest possible accuracy (classification error. Especially in gene expression experiments often a lot of variables (genes are measured for only few objects/patients. A suitable approach is the well-known method PLS-DA, which searches for a transformation to a lower dimensional space. Resulting new components are linear combinations of the original variables. An advancement of PLS-DA leads to PPLS-DA, introducing a so called 'power parameter', which is maximized towards the correlation between the components and the group-membership. We introduce an extension of PPLS-DA for optimizing this power parameter towards the final aim, namely towards a minimal classification error. We compare this new extension with the original PPLS-DA and also with the ordinary PLS-DA using simulated and experimental datasets. For the investigated data sets with weak linear dependency between features/variables, no improvement is shown for PPLS-DA and for the extensions compared to PLS-DA. A very weak linear dependency, a low proportion of differentially expressed genes for simulated data, does not lead to an improvement of PPLS-DA over PLS-DA, but our extension shows a lower prediction error. On the contrary, for the data set with strong between-feature collinearity and a low proportion of differentially expressed genes and a large total number of genes, the prediction error of PPLS-DA and the extensions is clearly lower than for PLS-DA. Moreover we compare these prediction results with results of support vector machines with linear kernel and linear discriminant analysis.
Screening method for rapid classification of psychoactive substances in illicit tablets using mid infrared spectroscopy and PLS-DA.

Science.gov (United States)

Pereira, Leandro S A; Lisboa, Fernanda L C; Coelho Neto, José; Valladão, Frederico N; Sena, Marcelo M

2018-05-09

Several new psychoactive substances (NPS) have reached the illegal drug market in recent years, and ecstasy-like tablets are one of the forms affected by this change. Cathinones and tryptamines have increasingly been found in ecstasy-like seized samples as well as other amphetamine type stimulants. A presumptive method for identifying different drugs in seized ecstasy tablets (n=92) using ATR-FTIR (attenuated total reflectance - Fourier transform infrared spectroscopy) and PLS-DA (partial least squares discriminant analysis) was developed. A hierarchical strategy of sequential modeling was performed with PLS-DA. The main model discriminated four classes: 5-MeO-MIPT, methylenedioxyamphetamines (MDMA and MDA), methamphetamine, and cathinones. Two submodels were built to identify drugs present in MDs and cathinones classes. Models were validated through the estimate of figures of merit. The average reliability rate (RLR) of the main model was 96.8% and accordance (ACC) was 100%. For the submodels, RLR and ACC were 100%. The reliability of the models was corroborated through their spectral interpretation. Thus, spectral assignments were performed by associating informative vectors of each specific modeled class to the respective drugs. The developed method is simple, fast, and can be applied to the forensic laboratory routine, leading to objective results reports useful for forensic scientists and law enforcement. Copyright © 2018 Elsevier B.V. All rights reserved.
Mixture quantification using PLS in plastic scintillation measurements

Energy Technology Data Exchange (ETDEWEB)

Bagan, H.; Tarancon, A.; Rauret, G. [Departament de Quimica Analitica, Universitat de Barcelona, Diagonal 647, E-08028 Barcelona (Spain); Garcia, J.F., E-mail: jfgarcia@ub.ed [Departament de Quimica Analitica, Universitat de Barcelona, Diagonal 647, E-08028 Barcelona (Spain)

2011-06-15

This article reports the capability of plastic scintillation (PS) combined with multivariate calibration (Partial least squares; PLS) to detect and quantify alpha and beta emitters in mixtures. While several attempts have been made with this purpose in mind using liquid scintillation (LS), no attempt was done using PS that has the great advantage of not producing mixed waste after the measurements are performed. Following this objective, ternary mixtures of alpha and beta emitters ({sup 241}Am, {sup 137}Cs and {sup 90}Sr/{sup 90}Y) have been quantified. Procedure optimisation has evaluated the use of the net spectra or the sample spectra, the inclusion of different spectra obtained at different values of the Pulse Shape Analysis parameter and the application of the PLS1 or PLS2 algorithms. The conclusions show that the use of PS+PLS2 applied to the sample spectra, without the use of any pulse shape discrimination, allows quantification of the activities with relative errors less than 10% in most of the cases. This procedure not only allows quantification of mixtures but also reduces measurement time (no blanks are required) and the application of this procedure does not require detectors that include the pulse shape analysis parameter.
Evaluation of platelet thromboxane radioimmunoassay method to measure platelet life-span: Comparison with /sup 111/indium-platelet method

International Nuclear Information System (INIS)

Vallabhajosula, S.; Machac, J.; Badimon, L.; Lipszyc, H.; Goldsmith, S.J.; Fuster, V.

1985-01-01

The platelet activation during radiolabeling in vitro with Cr-51 and In-111 may affect the platelet life-span (PLS) in vivo. A new RIA method to measure PLS is being evaluated. Aspirin inhibits platelet thromboxane (TxA/sub 2/) by acetylating cyclooxygenase. The time required for the TxA/sub 2/ levels to return towards control values depends on the rate of new platelets entering circulation and is a measure of PLS. A single dose of aspirin (150mg) was given to 5 normal human subjects. Blood samples were collected for 2 days before aspirin and daily for 10 days. TxA/sub 2/ production in response to endogenous thrombin was studied by allowing 1 ml blood sample to clot at 37 0 C for 90 min. Serum TxB/sub 2/ (stable breakdown product of Tx-A/sub 2/) levels determined by RIA technique. The plot of TxB/sub 2/ levels (% control) against time showed a gradual increase. The PLS calculated by linear regression analysis assuming a 2-day lag period before cyclooxygenase recovery is 9.7 +- 2.37. In the same 5 subjects, platelets from a 50ml blood sample were labeled with /sup 111/In-tropolone in 2 ml autologous plasma. Starting at 1 hr after injection of labeled platelets, 10 blood samples were obtained over a 8 day period. The PLS calculated based on a linear regression analysis is 10.2 +. 1.4. The PLS measured from the rate of platelet disappearance from circulation and the rate of platelet regeneration into circulation are quite comparable in normal subjects. TxA/sub 2/ regeneration RIA may provide a method to measure PLS without administering radioactivity to patient
Pink line syndrome (PLS) in the scleractinian coral Porites lutea

Digital Repository Service at National Institute of Oceanography (India)

Ravindran, J.; Raghukumar, C.

Reef sites Pink line syndrome (PLS) in the scleractinian coral Porites lutea Accepted: 10 May 2002 / Published online: 5 July 2002 C211 Springer-Verlag 2002 We describe here an unreport- ed diseased state of Porites lutea (Milne-Edwards and Haime...)ontheKavarattireefof the Lakshadweep group of is- lands (11C176 N; 71C176E). Pink line syndrome (PLS) causes partial mortality of the coral P. lutea around Kavaratti Island (Fig. 1), and about 10% of colonies were found to be af- fected by PLS. The dead patches were colonized by a...
Exact estimation of biodiesel cetane number (CN) from its fatty acid methyl esters (FAMEs) profile using partial least square (PLS) adapted by artificial neural network (ANN)

International Nuclear Information System (INIS)

Hosseinpour, Soleiman; Aghbashlo, Mortaza; Tabatabaei, Meisam; Khalife, Esmail

2016-01-01

Highlights: • Estimating the biodiesel CN from its FAMEs profile using ANN-based PLS approach. • Comparing the capability of ANN-adapted PLS approach with the standard PLS model. • Exact prediction of biodiesel CN from it FAMEs profile using ANN-based PLS method. • Developing an easy-to-use software using ANN-PLS model for computing the biodiesel CN. - Abstract: Cetane number (CN) is among the most important properties of biodiesel because it quantifies combustion speed or in better words, ignition quality. Experimental measurement of biodiesel CN is rather laborious and expensive. However, the high proportionality of biodiesel fatty acid methyl esters (FAMEs) profile with its CN is very appealing to develop straightforward and inexpensive computerized tools for biodiesel CN estimation. Unfortunately, correlating the chemical structure of biodiesel to its CN using conventional statistical and mathematical approaches is very difficult. To solve this issue, partial least square (PLS) adapted by artificial neural network (ANN) was introduced and examined herein as an innovative approach for the exact estimation of biodiesel CN from its FAMEs profile. In the proposed approach, ANN paradigm was used for modeling the inner relation between the input and the output PLS score vectors. In addition, the capability of the developed method in predicting the biodiesel CN was compared with the basal PLS method. The accuracy of the developed approaches for computing the biodiesel CN was assessed using three statistical criteria, i.e., coefficient of determination (R"2), mean-squared error (MSE), and percentage error (PE). The ANN-adapted PLS method predicted the biodiesel CN with an R"2 value higher than 0.99 demonstrating the fidelity of the developed model over the classical PLS method with a markedly lower R"2 value of about 0.85. In order to facilitate the use of the proposed model, an easy-to-use computer program was also developed on the basis of ANN-adapted PLS
Improving the robustness of a partial least squares (PLS) model based on pure component selectivity analysis and range optimization: Case study for the analysis of an etching solution containing hydrogen peroxide

Energy Technology Data Exchange (ETDEWEB)

Lee, Youngbok [Department of Chemistry, College of Natural Sciences, Hanyang University Haengdang-Dong, Seoul 133-791 (Korea, Republic of); Chung, Hoeil [Department of Chemistry, College of Natural Sciences, Hanyang University Haengdang-Dong, Seoul 133-791 (Korea, Republic of)]. E-mail: hoeil@hanyang.ac.kr; Arnold, Mark A. [Optical Science and Technology Center and Department of Chemistry, University of Iowa, Iowa City, IA 52242 (United States)

2006-07-14

Pure component selectivity analysis (PCSA) was successfully utilized to enhance the robustness of a partial least squares (PLS) model by examining the selectivity of a given component to other components. The samples used in this study were composed of NH{sub 4}OH, H{sub 2}O{sub 2} and H{sub 2}O, a popular etchant solution in the electronic industry. Corresponding near-infrared (NIR) spectra (9000-7500 cm{sup -1}) were used to build PLS models. The selective determination of H{sub 2}O{sub 2} without influences from NH{sub 4}OH and H{sub 2}O was a key issue since its molecular structure is similar to that of H{sub 2}O and NH{sub 4}OH also has a hydroxyl functional group. The best spectral ranges for the determination of NH{sub 4}OH and H{sub 2}O{sub 2} were found with the use of moving window PLS (MW-PLS) and corresponding selectivity was examined by pure component selectivity analysis. The PLS calibration for NH{sub 4}OH was free from interferences from the other components due to the presence of its unique NH absorption bands. Since the spectral variation from H{sub 2}O{sub 2} was broadly overlapping and much less distinct than that from NH{sub 4}OH, the selectivity and prediction performance for the H{sub 2}O{sub 2} calibration were sensitively varied depending on the spectral ranges and number of factors used. PCSA, based on the comparison between regression vectors from PLS and the net analyte signal (NAS), was an effective method to prevent over-fitting of the H{sub 2}O{sub 2} calibration. A robust H{sub 2}O{sub 2} calibration model with minimal interferences from other components was developed. PCSA should be included as a standard method in PLS calibrations where prediction error only is the usual measure of performance.
Collective effects of the PLS 2 GeV storage ring

International Nuclear Information System (INIS)

Yoon, M.; Choi, J.; Lee, T.

1993-01-01

Collective effects of the PLS storage ring are discussed. Evaluation of the PLS storage ring coupling impedances is presented. RF cavity Impedances are emphasized. Single-bunch threshold current is studied and longitudinal coupled-bunch instabilities caused by RF narrow-band resonances are analyzed
Variable selection in near-infrared spectroscopy: Benchmarking of feature selection methods on biodiesel data

International Nuclear Information System (INIS)

Balabin, Roman M.; Smirnov, Sergey V.

2011-01-01

During the past several years, near-infrared (near-IR/NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields from petroleum to biomedical sectors. The NIR spectrum (above 4000 cm -1 ) of a sample is typically measured by modern instruments at a few hundred of wavelengths. Recently, considerable effort has been directed towards developing procedures to identify variables (wavelengths) that contribute useful information. Variable selection (VS) or feature selection, also called frequency selection or wavelength selection, is a critical step in data analysis for vibrational spectroscopy (infrared, Raman, or NIRS). In this paper, we compare the performance of 16 different feature selection methods for the prediction of properties of biodiesel fuel, including density, viscosity, methanol content, and water concentration. The feature selection algorithms tested include stepwise multiple linear regression (MLR-step), interval partial least squares regression (iPLS), backward iPLS (BiPLS), forward iPLS (FiPLS), moving window partial least squares regression (MWPLS), (modified) changeable size moving window partial least squares (CSMWPLS/MCSMWPLSR), searching combination moving window partial least squares (SCMWPLS), successive projections algorithm (SPA), uninformative variable elimination (UVE, including UVE-SPA), simulated annealing (SA), back-propagation artificial neural networks (BP-ANN), Kohonen artificial neural network (K-ANN), and genetic algorithms (GAs, including GA-iPLS). Two linear techniques for calibration model building, namely multiple linear regression (MLR) and partial least squares regression/projection to latent structures (PLS/PLSR), are used for the evaluation of biofuel properties. A comparison with a non-linear calibration model, artificial neural networks (ANN-MLP), is also provided. Discussion of gasoline, ethanol-gasoline (bioethanol), and diesel fuel data is presented. The results of other spectroscopic

Hybrid robust model based on an improved functional link neural network integrating with partial least square (IFLNN-PLS) and its application to predicting key process variables.

Science.gov (United States)

He, Yan-Lin; Xu, Yuan; Geng, Zhi-Qiang; Zhu, Qun-Xiong

2016-03-01

In this paper, a hybrid robust model based on an improved functional link neural network integrating with partial least square (IFLNN-PLS) is proposed. Firstly, an improved functional link neural network with small norm of expanded weights and high input-output correlation (SNEWHIOC-FLNN) was proposed for enhancing the generalization performance of FLNN. Unlike the traditional FLNN, the expanded variables of the original inputs are not directly used as the inputs in the proposed SNEWHIOC-FLNN model. The original inputs are attached to some small norm of expanded weights. As a result, the correlation coefficient between some of the expanded variables and the outputs is enhanced. The larger the correlation coefficient is, the more relevant the expanded variables tend to be. In the end, the expanded variables with larger correlation coefficient are selected as the inputs to improve the performance of the traditional FLNN. In order to test the proposed SNEWHIOC-FLNN model, three UCI (University of California, Irvine) regression datasets named Housing, Concrete Compressive Strength (CCS), and Yacht Hydro Dynamics (YHD) are selected. Then a hybrid model based on the improved FLNN integrating with partial least square (IFLNN-PLS) was built. In IFLNN-PLS model, the connection weights are calculated using the partial least square method but not the error back propagation algorithm. Lastly, IFLNN-PLS was developed as an intelligent measurement model for accurately predicting the key variables in the Purified Terephthalic Acid (PTA) process and the High Density Polyethylene (HDPE) process. Simulation results illustrated that the IFLNN-PLS could significant improve the prediction performance. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.
Determining the Relationship between U.S. County-Level Adult Obesity Rate and Multiple Risk Factors by PLS Regression and SVM Modeling Approaches

Directory of Open Access Journals (Sweden)

Chau-Kuang Chen

2015-02-01

Full Text Available Data from the Center for Disease Control (CDC has shown that the obesity rate doubled among adults within the past two decades. This upsurge was the result of changes in human behavior and environment. Partial least squares (PLS regression and support vector machine (SVM models were conducted to determine the relationship between U.S. county-level adult obesity rate and multiple risk factors. The outcome variable was the adult obesity rate. The 23 risk factors were categorized into four domains of the social ecological model including biological/behavioral factor, socioeconomic status, food environment, and physical environment. Of the 23 risk factors related to adult obesity, the top eight significant risk factors with high normalized importance were identified including physical inactivity, natural amenity, percent of households receiving SNAP benefits, and percent of all restaurants being fast food. The study results were consistent with those in the literature. The study showed that adult obesity rate was influenced by biological/behavioral factor, socioeconomic status, food environment, and physical environment embedded in the social ecological theory. By analyzing multiple risk factors of obesity in the communities, may lead to the proposal of more comprehensive and integrated policies and intervention programs to solve the population-based problem.
Alternative Methods of Regression

CERN Document Server

Birkes, David

2011-01-01

Of related interest. Nonlinear Regression Analysis and its Applications Douglas M. Bates and Donald G. Watts ".an extraordinary presentation of concepts and methods concerning the use and analysis of nonlinear regression models.highly recommend[ed].for anyone needing to use and/or understand issues concerning the analysis of nonlinear regression models." --Technometrics This book provides a balance between theory and practice supported by extensive displays of instructive geometrical constructs. Numerous in-depth case studies illustrate the use of nonlinear regression analysis--with all data s
Prediction Methods in Science and Technology

DEFF Research Database (Denmark)

Høskuldsson, Agnar

Presents the H-principle, the Heisenberg modelling principle. General properties of the Heisenberg modelling procedure is developed. The theory is applied to principal component analysis and linear regression analysis. It is shown that the H-principle leads to PLS regression in case the task...... is linear regression analysis. The book contains different methods to find the dimensions of linear models, to carry out sensitivity analysis in latent structure models, variable selection methods and presentation of results from analysis....
Development of a partial least squares-artificial neural network (PLS-ANN) hybrid model for the prediction of consumer liking scores of ready-to-drink green tea beverages.

Science.gov (United States)

Yu, Peigen; Low, Mei Yin; Zhou, Weibiao

2018-01-01

In order to develop products that would be preferred by consumers, the effects of the chemical compositions of ready-to-drink green tea beverages on consumer liking were studied through regression analyses. Green tea model systems were prepared by dosing solutions of 0.1% green tea extract with differing concentrations of eight flavour keys deemed to be important for green tea aroma and taste, based on a D-optimal experimental design, before undergoing commercial sterilisation. Sensory evaluation of the green tea model system was carried out using an untrained consumer panel to obtain hedonic liking scores of the samples. Regression models were subsequently trained to objectively predict the consumer liking scores of the green tea model systems. A linear partial least squares (PLS) regression model was developed to describe the effects of the eight flavour keys on consumer liking, with a coefficient of determination (R 2 ) of 0.733, and a root-mean-square error (RMSE) of 3.53%. The PLS model was further augmented with an artificial neural network (ANN) to establish a PLS-ANN hybrid model. The established hybrid model was found to give a better prediction of consumer liking scores, based on its R 2 (0.875) and RMSE (2.41%). Copyright © 2017 Elsevier Ltd. All rights reserved.
Classification of cassava starch films by physicochemical properties and water vapor permeability quantification by FTIR and PLS.

Science.gov (United States)

Henrique, C M; Teófilo, R F; Sabino, L; Ferreira, M M C; Cereda, M P

2007-05-01

Cassava starches are widely used in the production of biodegradable films, but their resistance to humidity migration is very low. In this work, commercial cassava starch films were studied and classified according to their physicochemical properties. A nondestructive method for water vapor permeability determination, which combines with infrared spectroscopy and multivariate calibration, is also presented. The following commercial cassava starches were studied: pregelatinized (amidomax 3550), carboxymethylated starch (CMA) of low and high viscosities, and esterified starches. To make the films, 2 different starch concentrations were evaluated, consisting of water suspensions with 3% and 5% starch. The filmogenic solutions were dried and characterized for their thickness, grammage, water vapor permeability, water activity, tensile strength (deformation force), water solubility, and puncture strength (deformation). The minimum thicknesses were 0.5 to 0.6 mm in pregelatinized starch films. The results were treated by means of the following chemometric methods: principal component analysis (PCA) and partial least squares (PLS) regression. PCA analysis on the physicochemical properties of the films showed that the differences in concentration of the dried material (3% and 5% starch) and also in the type of starch modification were mainly related to the following properties: permeability, solubility, and thickness. IR spectra collected in the region of 4000 to 600 cm(-1) were used to build a PLS model with good predictive power for water vapor permeability determination, with mean relative errors of 10.0% for cross-validation and 7.8% for the prediction set.
Comparison of partial least squares and lasso regression techniques as applied to laser-induced breakdown spectroscopy of geological samples

Energy Technology Data Exchange (ETDEWEB)

Dyar, M.D., E-mail: mdyar@mtholyoke.edu [Dept. of Astronomy, Mount Holyoke College, 50 College St., South Hadley, MA 01075 (United States); Carmosino, M.L.; Breves, E.A.; Ozanne, M.V. [Dept. of Astronomy, Mount Holyoke College, 50 College St., South Hadley, MA 01075 (United States); Clegg, S.M.; Wiens, R.C. [Los Alamos National Laboratory, P.O. Box 1663, MS J565, Los Alamos, NM 87545 (United States)

2012-04-15

A remote laser-induced breakdown spectrometer (LIBS) designed to simulate the ChemCam instrument on the Mars Science Laboratory Rover Curiosity was used to probe 100 geologic samples at a 9-m standoff distance. ChemCam consists of an integrated remote LIBS instrument that will probe samples up to 7 m from the mast of the rover and a remote micro-imager (RMI) that will record context images. The elemental compositions of 100 igneous and highly-metamorphosed rocks are determined with LIBS using three variations of multivariate analysis, with a goal of improving the analytical accuracy. Two forms of partial least squares (PLS) regression are employed with finely-tuned parameters: PLS-1 regresses a single response variable (elemental concentration) against the observation variables (spectra, or intensity at each of 6144 spectrometer channels), while PLS-2 simultaneously regresses multiple response variables (concentrations of the ten major elements in rocks) against the observation predictor variables, taking advantage of natural correlations between elements. Those results are contrasted with those from the multivariate regression technique of the least absolute shrinkage and selection operator (lasso), which is a penalized shrunken regression method that selects the specific channels for each element that explain the most variance in the concentration of that element. To make this comparison, we use results of cross-validation and of held-out testing, and employ unscaled and uncentered spectral intensity data because all of the input variables are already in the same units. Results demonstrate that the lasso, PLS-1, and PLS-2 all yield comparable results in terms of accuracy for this dataset. However, the interpretability of these methods differs greatly in terms of fundamental understanding of LIBS emissions. PLS techniques generate principal components, linear combinations of intensities at any number of spectrometer channels, which explain as much variance in the
Determination of carbohydrates present in Saccharomyces cerevisiae using mid-infrared spectroscopy and partial least squares regression.

Science.gov (United States)

Plata, Maria R; Koch, Cosima; Wechselberger, Patrick; Herwig, Christoph; Lendl, Bernhard

2013-10-01

A fast and simple method to control variations in carbohydrate composition of Saccharomyces cerevisiae, baker's yeast, during fermentation was developed using mid-infrared (mid-IR) spectroscopy. The method allows for precise and accurate determinations with minimal or no sample preparation and reagent consumption based on mid-IR spectra and partial least squares (PLS) regression. The PLS models were developed employing the results from reference analysis of the yeast cells. The reference analyses quantify the amount of trehalose, glucose, glycogen, and mannan in S. cerevisiae. The selection and optimization of pretreatment steps of samples such as the disruption of the yeast cells and the hydrolysis of mannan and glycogen to obtain monosaccharides were carried out. Trehalose, glucose, and mannose were determined using high-performance liquid chromatography coupled with a refractive index detector and total carbohydrates were measured using the phenol-sulfuric method. Linear concentration range, accuracy, precision, LOD and LOQ were examined to check the reliability of the chromatographic method for each analyte.
Evaluation of the efficiency of continuous wavelet transform as processing and preprocessing algorithm for resolution of overlapped signals in univariate and multivariate regression analyses; an application to ternary and quaternary mixtures

Science.gov (United States)

Hegazy, Maha A.; Lotfy, Hayam M.; Mowaka, Shereen; Mohamed, Ekram Hany

2016-07-01

Wavelets have been adapted for a vast number of signal-processing applications due to the amount of information that can be extracted from a signal. In this work, a comparative study on the efficiency of continuous wavelet transform (CWT) as a signal processing tool in univariate regression and a pre-processing tool in multivariate analysis using partial least square (CWT-PLS) was conducted. These were applied to complex spectral signals of ternary and quaternary mixtures. CWT-PLS method succeeded in the simultaneous determination of a quaternary mixture of drotaverine (DRO), caffeine (CAF), paracetamol (PAR) and p-aminophenol (PAP, the major impurity of paracetamol). While, the univariate CWT failed to simultaneously determine the quaternary mixture components and was able to determine only PAR and PAP, the ternary mixtures of DRO, CAF, and PAR and CAF, PAR, and PAP. During the calculations of CWT, different wavelet families were tested. The univariate CWT method was validated according to the ICH guidelines. While for the development of the CWT-PLS model a calibration set was prepared by means of an orthogonal experimental design and their absorption spectra were recorded and processed by CWT. The CWT-PLS model was constructed by regression between the wavelet coefficients and concentration matrices and validation was performed by both cross validation and external validation sets. Both methods were successfully applied for determination of the studied drugs in pharmaceutical formulations.
The Plasmin-Sensitive Protein Pls in Methicillin-Resistant Staphylococcus aureus (MRSA Is a Glycoprotein.

Directory of Open Access Journals (Sweden)

Isabelle Bleiziffer

2017-01-01

Full Text Available Most bacterial glycoproteins identified to date are virulence factors of pathogenic bacteria, i.e. adhesins and invasins. However, the impact of protein glycosylation on the major human pathogen Staphylococcus aureus remains incompletely understood. To study protein glycosylation in staphylococci, we analyzed lysostaphin lysates of methicillin-resistant Staphylococcus aureus (MRSA strains by SDS-PAGE and subsequent periodic acid-Schiff's staining. We detected four (>300, ∼250, ∼165, and ∼120 kDa and two (>300 and ∼175 kDa glycosylated surface proteins with strain COL and strain 1061, respectively. The ∼250, ∼165, and ∼175 kDa proteins were identified as plasmin-sensitive protein (Pls by mass spectrometry. Previously, Pls has been demonstrated to be a virulence factor in a mouse septic arthritis model. The pls gene is encoded by the staphylococcal cassette chromosome (SCCmec type I in MRSA that also encodes the methicillin resistance-conferring mecA and further genes. In a search for glycosyltransferases, we identified two open reading frames encoded downstream of pls on the SCCmec element, which we termed gtfC and gtfD. Expression and deletion analysis revealed that both gtfC and gtfD mediate glycosylation of Pls. Additionally, the recently reported glycosyltransferases SdgA and SdgB are involved in Pls glycosylation. Glycosylation occurs at serine residues in the Pls SD-repeat region and modifying carbohydrates are N-acetylhexosaminyl residues. Functional characterization revealed that Pls can confer increased biofilm formation, which seems to involve two distinct mechanisms. The first mechanism depends on glycosylation of the SD-repeat region by GtfC/GtfD and probably also involves eDNA, while the second seems to be independent of glycosylation as well as eDNA and may involve the centrally located G5 domains. Other previously known Pls properties are not related to the sugar modifications. In conclusion, Pls is a glycoprotein and
Improved anomaly detection using multi-scale PLS and generalized likelihood ratio test

KAUST Repository

Madakyaru, Muddu

2017-02-16

Process monitoring has a central role in the process industry to enhance productivity, efficiency, and safety, and to avoid expensive maintenance. In this paper, a statistical approach that exploit the advantages of multiscale PLS models (MSPLS) and those of a generalized likelihood ratio (GLR) test to better detect anomalies is proposed. Specifically, to consider the multivariate and multi-scale nature of process dynamics, a MSPLS algorithm combining PLS and wavelet analysis is used as modeling framework. Then, GLR hypothesis testing is applied using the uncorrelated residuals obtained from MSPLS model to improve the anomaly detection abilities of these latent variable based fault detection methods even further. Applications to a simulated distillation column data are used to evaluate the proposed MSPLS-GLR algorithm.
Improved anomaly detection using multi-scale PLS and generalized likelihood ratio test

KAUST Repository

Madakyaru, Muddu; Harrou, Fouzi; Sun, Ying

2017-01-01

Process monitoring has a central role in the process industry to enhance productivity, efficiency, and safety, and to avoid expensive maintenance. In this paper, a statistical approach that exploit the advantages of multiscale PLS models (MSPLS) and those of a generalized likelihood ratio (GLR) test to better detect anomalies is proposed. Specifically, to consider the multivariate and multi-scale nature of process dynamics, a MSPLS algorithm combining PLS and wavelet analysis is used as modeling framework. Then, GLR hypothesis testing is applied using the uncorrelated residuals obtained from MSPLS model to improve the anomaly detection abilities of these latent variable based fault detection methods even further. Applications to a simulated distillation column data are used to evaluate the proposed MSPLS-GLR algorithm.
Exploring a physico-chemical multi-array explanatory model with a new multiple covariance-based technique: structural equation exploratory regression.

Science.gov (United States)

Bry, X; Verron, T; Cazes, P

2009-05-29

In this work, we consider chemical and physical variable groups describing a common set of observations (cigarettes). One of the groups, minor smoke compounds (minSC), is assumed to depend on the others (minSC predictors). PLS regression (PLSR) of m inSC on the set of all predictors appears not to lead to a satisfactory analytic model, because it does not take into account the expert's knowledge. PLS path modeling (PLSPM) does not use the multidimensional structure of predictor groups. Indeed, the expert needs to separate the influence of several pre-designed predictor groups on minSC, in order to see what dimensions this influence involves. To meet these needs, we consider a multi-group component-regression model, and propose a method to extract from each group several strong uncorrelated components that fit the model. Estimation is based on a global multiple covariance criterion, used in combination with an appropriate nesting approach. Compared to PLSR and PLSPM, the structural equation exploratory regression (SEER) we propose fully uses predictor group complementarity, both conceptually and statistically, to predict the dependent group.
The Effect of Nonnormality on CB-SEM and PLS-SEM Path Estimates

OpenAIRE

Z. Jannoo; B. W. Yap; N. Auchoybur; M. A. Lazim

2014-01-01

The two common approaches to Structural Equation Modeling (SEM) are the Covariance-Based SEM (CB-SEM) and Partial Least Squares SEM (PLS-SEM). There is much debate on the performance of CB-SEM and PLS-SEM for small sample size and when distributions are nonnormal. This study evaluates the performance of CB-SEM and PLS-SEM under normality and nonnormality conditions via a simulation. Monte Carlo Simulation in R programming language was employed to generate data based on the theoretical model w...
Calibration sets selection strategy for the construction of robust PLS models for prediction of biodiesel/diesel blends physico-chemical properties using NIR spectroscopy

Science.gov (United States)

Palou, Anna; Miró, Aira; Blanco, Marcelo; Larraz, Rafael; Gómez, José Francisco; Martínez, Teresa; González, Josep Maria; Alcalà, Manel

2017-06-01

Even when the feasibility of using near infrared (NIR) spectroscopy combined with partial least squares (PLS) regression for prediction of physico-chemical properties of biodiesel/diesel blends has been widely demonstrated, inclusion in the calibration sets of the whole variability of diesel samples from diverse production origins still remains as an important challenge when constructing the models. This work presents a useful strategy for the systematic selection of calibration sets of samples of biodiesel/diesel blends from diverse origins, based on a binary code, principal components analysis (PCA) and the Kennard-Stones algorithm. Results show that using this methodology the models can keep their robustness over time. PLS calculations have been done using a specialized chemometric software as well as the software of the NIR instrument installed in plant, and both produced RMSEP under reproducibility values of the reference methods. The models have been proved for on-line simultaneous determination of seven properties: density, cetane index, fatty acid methyl esters (FAME) content, cloud point, boiling point at 95% of recovery, flash point and sulphur.
Timing system for PLS

International Nuclear Information System (INIS)

Chang, S.S.; Kim, M.S.; Won, S.C.; Choi, S.J.

1991-01-01

The PLS timing system consists of a master oscillator, a repetition rate pulse generator, a storage ring rf synchronizing system, and a rf driver and kicker trigger system composed of a fixed delay module and variable delay modules. All the timing modules are installed in the VME crates and controlled by the 32 bit microprocessors, and communicating with the Host computer via Ethernet. This paper describes the architectural design of this system as well as the requirements of performance
A Comparative Investigation of the Combined Effects of Pre-Processing, Wavelength Selection, and Regression Methods on Near-Infrared Calibration Model Performance.

Science.gov (United States)

Wan, Jian; Chen, Yi-Chieh; Morris, A Julian; Thennadil, Suresh N

2017-07-01

Near-infrared (NIR) spectroscopy is being widely used in various fields ranging from pharmaceutics to the food industry for analyzing chemical and physical properties of the substances concerned. Its advantages over other analytical techniques include available physical interpretation of spectral data, nondestructive nature and high speed of measurements, and little or no need for sample preparation. The successful application of NIR spectroscopy relies on three main aspects: pre-processing of spectral data to eliminate nonlinear variations due to temperature, light scattering effects and many others, selection of those wavelengths that contribute useful information, and identification of suitable calibration models using linear/nonlinear regression . Several methods have been developed for each of these three aspects and many comparative studies of different methods exist for an individual aspect or some combinations. However, there is still a lack of comparative studies for the interactions among these three aspects, which can shed light on what role each aspect plays in the calibration and how to combine various methods of each aspect together to obtain the best calibration model. This paper aims to provide such a comparative study based on four benchmark data sets using three typical pre-processing methods, namely, orthogonal signal correction (OSC), extended multiplicative signal correction (EMSC) and optical path-length estimation and correction (OPLEC); two existing wavelength selection methods, namely, stepwise forward selection (SFS) and genetic algorithm optimization combined with partial least squares regression for spectral data (GAPLSSP); four popular regression methods, namely, partial least squares (PLS), least absolute shrinkage and selection operator (LASSO), least squares support vector machine (LS-SVM), and Gaussian process regression (GPR). The comparative study indicates that, in general, pre-processing of spectral data can play a significant
PLS-Prediction and Confirmation of Hydrojuglone Glucoside as the Antitrypanosomal Constituent of Juglans Spp.

Directory of Open Access Journals (Sweden)

Therese Ellendorff

2015-05-01

Full Text Available Naphthoquinones (NQs occur naturally in a large variety of plants. Several NQs are highly active against protozoans, amongst them the causative pathogens of neglected tropical diseases such as human African trypanosomiasis (sleeping sickness, Chagas disease and leishmaniasis. Prominent NQ-producing plants can be found among Juglans spp. (Juglandaceae with juglone derivatives as known constituents. In this study, 36 highly variable extracts were prepared from different plant parts of J. regia, J. cinerea and J. nigra. For all extracts, antiprotozoal activity was determined against the protozoans Trypanosoma cruzi, T. brucei rhodesiense and Leishmania donovani. In addition, an LC-MS fingerprint was recorded for each extract. With each extract’s fingerprint and the data on in vitro growth inhibitory activity against T. brucei rhodesiense a Partial Least Squares (PLS regression model was calculated in order to obtain an indication of compounds responsible for the differences in bioactivity between the 36 extracts. By means of PLS, hydrojuglone glucoside was predicted as an active compound against T. brucei and consequently isolated and tested in vitro. In fact, the pure compound showed activity against T. brucei at a significantly lower cytotoxicity towards mammalian cells than established antiprotozoal NQs such as lapachol.
Regression methods for medical research

CERN Document Server

Tai, Bee Choo

2013-01-01

Regression Methods for Medical Research provides medical researchers with the skills they need to critically read and interpret research using more advanced statistical methods. The statistical requirements of interpreting and publishing in medical journals, together with rapid changes in science and technology, increasingly demands an understanding of more complex and sophisticated analytic procedures.The text explains the application of statistical models to a wide variety of practical medical investigative studies and clinical trials. Regression methods are used to appropriately answer the
Fusion of neural computing and PLS techniques for load estimation

Energy Technology Data Exchange (ETDEWEB)

Lu, M.; Xue, H.; Cheng, X. [Northwestern Polytechnical Univ., Xi' an (China); Zhang, W. [Xi' an Inst. of Post and Telecommunication, Xi' an (China)

2007-07-01

A method to predict the electric load of a power system in real time was presented. The method is based on neurocomputing and partial least squares (PLS). Short-term load forecasts for power systems are generally determined by conventional statistical methods and Computational Intelligence (CI) techniques such as neural computing. However, statistical modeling methods often require the input of questionable distributional assumptions, and neural computing is weak, particularly in determining topology. In order to overcome the problems associated with conventional techniques, the authors developed a CI hybrid model based on neural computation and PLS techniques. The theoretical foundation for the designed CI hybrid model was presented along with its application in a power system. The hybrid model is suitable for nonlinear modeling and latent structure extracting. It can automatically determine the optimal topology to maximize the generalization. The CI hybrid model provides faster convergence and better prediction results compared to the abductive networks model because it incorporates a load conversion technique as well as new transfer functions. In order to demonstrate the effectiveness of the hybrid model, load forecasting was performed on a data set obtained from the Puget Sound Power and Light Company. Compared with the abductive networks model, the CI hybrid model reduced the forecast error by 32.37 per cent on workday, and by an average of 27.18 per cent on the weekend. It was concluded that the CI hybrid model has a more powerful predictive ability. 7 refs., 1 tab., 3 figs.

Cellulose I crystallinity determination using FT-Raman spectroscopy : univariate and multivariate methods

Science.gov (United States)

Umesh P. Agarwal; Richard S. Reiner; Sally A. Ralph

2010-01-01

Two new methods based on FTâRaman spectroscopy, one simple, based on band intensity ratio, and the other using a partial least squares (PLS) regression model, are proposed to determine cellulose I crystallinity. In the simple method, crystallinity in cellulose I samples was determined based on univariate regression that was first developed using the Raman band...
Different frontal involvement in ALS and PLS revealed by Stroop event-related potentials and reaction times

Directory of Open Access Journals (Sweden)

Ninfa eAmato

2013-12-01

Full Text Available BACKGROUND: A growing body of evidence suggests a link between cognitive and pathological changes in amyotrophic lateral sclerosis (ALS and in frontotemporal lobar dementia (FTLD. Cognitive deficits have been investigated much less extensively in primary lateral sclerosis (PLS than in ALS. OBJECTIVE: to investigate bioelectrical activity to Stroop test, assessing frontal function, in ALS, PLS and control groups. METHODS: 32 non-demented ALS patients, 10 non-demented PLS patients and 27 healthy subjects were included. Twenty-nine electroencephalography (EEG channels with binaural reference were recorded during covert Stroop task performance, involving mental discrimination of the stimuli and not vocal or motor response. Group effects on event related potentials (ERPs latency were analyzed using statistical multivariate analysis. Topographic analysis was performed using low resolution brain electromagnetic tomography (LORETA. RESULTS: ALS patients committed more errors in the execution of the task but they were not slower, whereas PLS patients did not show reduced accuracy, despite a slowing of reaction times (RTs. The main ERP components were delayed in ALS, but not in PLS, compared with controls. Moreover, RTs speed but not ERP latency correlated with clinical scores. ALS had decreased frontotemporal activity in the P2, P3 and N4 time windows compared to controls. CONCLUSION: These findings suggest a different pattern of psychophysiological involvement in ALS compared with PLS. The former is increasingly recognized to be a multisystems disorder, with a spectrum of executive and behavioural impairments reflecting frontotemporal dysfunction. The latter seems to mainly involve the motor system, with largely spared cognitive functions. Moreover, our results suggest that the covert version of the Stroop task used in the present study, may be useful to assess cognitive state in the very advanced stage of the disease, when other cognitive tasks are not
Selecting minimum dataset soil variables using PLSR as a regressive multivariate method

Science.gov (United States)

Stellacci, Anna Maria; Armenise, Elena; Castellini, Mirko; Rossi, Roberta; Vitti, Carolina; Leogrande, Rita; De Benedetto, Daniela; Ferrara, Rossana M.; Vivaldi, Gaetano A.

2017-04-01

Long-term field experiments and science-based tools that characterize soil status (namely the soil quality indices, SQIs) assume a strategic role in assessing the effect of agronomic techniques and thus in improving soil management especially in marginal environments. Selecting key soil variables able to best represent soil status is a critical step for the calculation of SQIs. Current studies show the effectiveness of statistical methods for variable selection to extract relevant information deriving from multivariate datasets. Principal component analysis (PCA) has been mainly used, however supervised multivariate methods and regressive techniques are progressively being evaluated (Armenise et al., 2013; de Paul Obade et al., 2016; Pulido Moncada et al., 2014). The present study explores the effectiveness of partial least square regression (PLSR) in selecting critical soil variables, using a dataset comparing conventional tillage and sod-seeding on durum wheat. The results were compared to those obtained using PCA and stepwise discriminant analysis (SDA). The soil data derived from a long-term field experiment in Southern Italy. On samples collected in April 2015, the following set of variables was quantified: (i) chemical: total organic carbon and nitrogen (TOC and TN), alkali-extractable C (TEC and humic substances - HA-FA), water extractable N and organic C (WEN and WEOC), Olsen extractable P, exchangeable cations, pH and EC; (ii) physical: texture, dry bulk density (BD), macroporosity (Pmac), air capacity (AC), and relative field capacity (RFC); (iii) biological: carbon of the microbial biomass quantified with the fumigation-extraction method. PCA and SDA were previously applied to the multivariate dataset (Stellacci et al., 2016). PLSR was carried out on mean centered and variance scaled data of predictors (soil variables) and response (wheat yield) variables using the PLS procedure of SAS/STAT. In addition, variable importance for projection (VIP
Discrimination of Transgenic Rice Based on Near Infrared Reflectance Spectroscopy and Partial Least Squares Regression Discriminant Analysis

Directory of Open Access Journals (Sweden)

ZHANG Long

2015-09-01

Full Text Available Near infrared reflectance spectroscopy (NIRS, a non-destructive measurement technique, was combined with partial least squares regression discrimiant analysis (PLS-DA to discriminate the transgenic (TCTP and mi166 and wild type (Zhonghua 11 rice. Furthermore, rice lines transformed with protein gene (OsTCTP and regulation gene (Osmi166 were also discriminated by the NIRS method. The performances of PLS-DA in spectral ranges of 4 000–8 000 cm-1 and 4 000–10 000 cm-1 were compared to obtain the optimal spectral range. As a result, the transgenic and wild type rice were distinguished from each other in the range of 4 000–10 000 cm-1, and the correct classification rate was 100.0% in the validation test. The transgenic rice TCTP and mi166 were also distinguished from each other in the range of 4 000–10 000 cm-1, and the correct classification rate was also 100.0%. In conclusion, NIRS combined with PLS-DA can be used for the discrimination of transgenic rice.
Regression modeling methods, theory, and computation with SAS

CERN Document Server

Panik, Michael

2009-01-01

Regression Modeling: Methods, Theory, and Computation with SAS provides an introduction to a diverse assortment of regression techniques using SAS to solve a wide variety of regression problems. The author fully documents the SAS programs and thoroughly explains the output produced by the programs.The text presents the popular ordinary least squares (OLS) approach before introducing many alternative regression methods. It covers nonparametric regression, logistic regression (including Poisson regression), Bayesian regression, robust regression, fuzzy regression, random coefficients regression,
Spatial Estimation of Losses Attributable to Meteorological Disasters in a Specific Area (105.0°E–115.0°E, 25°N–35°N Using Bayesian Maximum Entropy and Partial Least Squares Regression

Directory of Open Access Journals (Sweden)

F. S. Zhang

2016-01-01

Full Text Available The spatial mapping of losses attributable to such disasters is now well established as a means of describing the spatial patterns of disaster risk, and it has been shown to be suitable for many types of major meteorological disasters. However, few studies have been carried out by developing a regression model to estimate the effects of the spatial distribution of meteorological factors on losses associated with meteorological disasters. In this study, the proposed approach is capable of the following: (a estimating the spatial distributions of seven meteorological factors using Bayesian maximum entropy, (b identifying the four mapping methods used in this research with the best performance based on the cross validation, and (c establishing a fitted model between the PLS components and disaster losses information using partial least squares regression within a specific research area. The results showed the following: (a best mapping results were produced by multivariate Bayesian maximum entropy with probabilistic soft data; (b the regression model using three PLS components, extracted from seven meteorological factors by PLS method, was the most predictive by means of PRESS/SS test; (c northern Hunan Province sustains the most damage, and southeastern Gansu Province and western Guizhou Province sustained the least.
A comparative QSAR study on the estrogenic activities of persistent organic pollutants by PLS and SVM

Directory of Open Access Journals (Sweden)

Fei Li

2015-11-01

Full Text Available Quantitative structure-activity relationships (QSARs were determined using partial least square (PLS and support vector machine (SVM. The predicted values by the final QSAR models were in good agreement with the corresponding experimental values. Chemical estrogenic activities are related to atomic properties (atomic Sanderson electronegativities, van der Waals volumes and polarizabilities. Comparison of the results obtained from two models, the SVM method exhibited better overall performances. Besides, three PLS models were constructed for some specific families based on their chemical structures. These predictive models should be useful to rapidly identify potential estrogenic endocrine disrupting chemicals.
Construction of Network Management Information System of Agricultural Products Supply Chain Based on 3PLs

Institute of Scientific and Technical Information of China (English)

2010-01-01

The necessity to construct the network management information system of 3PLs agricultural supply chain is analyzed,showing that 3PLs can improve the overall competitive advantage of agricultural supply chain.3PLs changes the homogeneity management into specialized management of logistics service and achieves the alliance of the subjects at different nodes of agricultural products supply chain.Network management information system structure of agricultural products supply chain based on 3PLs is constructed,including the four layers (the network communication layer,the hardware and software environment layer,the database layer,and the application layer) and 7 function modules (centralized control,transportation process management,material and vehicle scheduling,customer relationship,storage management,customer inquiry,and financial management).Framework for the network management information system of agricultural products supply chain based on 3PLs is put forward.The management of 3PLs mainly includes purchasing management,supplier relationship management,planning management,customer relationship management,storage management and distribution management.Thus,a management system of internal and external integrated agricultural enterprises is obtained.The network management information system of agricultural products supply chain based on 3PLs has realized the effective sharing of enterprise information of agricultural products supply chain at different nodes,establishing a long-term partnership revolving around the 3PLs core enterprise,as well as a supply chain with stable relationship based on the supply chain network system,so as to improve the circulation efficiency of agricultural products,and to explore the sales market for agricultural products.
Prediction of clinical depression scores and detection of changes in whole-brain using resting-state functional MRI data with partial least squares regression.

Directory of Open Access Journals (Sweden)

Kosuke Yoshida

Full Text Available In diagnostic applications of statistical machine learning methods to brain imaging data, common problems include data high-dimensionality and co-linearity, which often cause over-fitting and instability. To overcome these problems, we applied partial least squares (PLS regression to resting-state functional magnetic resonance imaging (rs-fMRI data, creating a low-dimensional representation that relates symptoms to brain activity and that predicts clinical measures. Our experimental results, based upon data from clinically depressed patients and healthy controls, demonstrated that PLS and its kernel variants provided significantly better prediction of clinical measures than ordinary linear regression. Subsequent classification using predicted clinical scores distinguished depressed patients from healthy controls with 80% accuracy. Moreover, loading vectors for latent variables enabled us to identify brain regions relevant to depression, including the default mode network, the right superior frontal gyrus, and the superior motor area.
PLS-based and regularization-based methods for the selection of relevant variables in non-targeted metabolomics data

Directory of Open Access Journals (Sweden)

Renata Bujak

2016-07-01

Full Text Available Non-targeted metabolomics constitutes a part of systems biology and aims to determine many metabolites in complex biological samples. Datasets obtained in non-targeted metabolomics studies are multivariate and high-dimensional due to the sensitivity of mass spectrometry-based detection methods as well as complexity of biological matrices. Proper selection of variables which contribute into group classification is a crucial step, especially in metabolomics studies which are focused on searching for disease biomarker candidates. In the present study, three different statistical approaches were tested using two metabolomics datasets (RH and PH study. Orthogonal projections to latent structures-discriminant analysis (OPLS-DA without and with multiple testing correction as well as least absolute shrinkage and selection operator (LASSO were tested and compared. For the RH study, OPLS-DA model built without multiple testing correction, selected 46 and 218 variables based on VIP criteria using Pareto and UV scaling, respectively. In the case of the PH study, 217 and 320 variables were selected based on VIP criteria using Pareto and UV scaling, respectively. In the RH study, OPLS-DA model built with multiple testing correction, selected 4 and 19 variables as statistically significant in terms of Pareto and UV scaling, respectively. For PH study, 14 and 18 variables were selected based on VIP criteria in terms of Pareto and UV scaling, respectively. Additionally, the concept and fundaments of the least absolute shrinkage and selection operator (LASSO with bootstrap procedure evaluating reproducibility of results, was demonstrated. In the RH and PH study, the LASSO selected 14 and 4 variables with reproducibility between 99.3% and 100%. However, apart from the popularity of PLS-DA and OPLS-DA methods in metabolomics, it should be highlighted that they do not control type I or type II error, but only arbitrarily establish a cut-off value for PLS-DA loadings
BOX-COX REGRESSION METHOD IN TIME SCALING

Directory of Open Access Journals (Sweden)

ATİLLA GÖKTAŞ

2013-06-01

Full Text Available Box-Cox regression method with λj, for j = 1, 2, ..., k, power transformation can be used when dependent variable and error term of the linear regression model do not satisfy the continuity and normality assumptions. The situation obtaining the smallest mean square error when optimum power λj, transformation for j = 1, 2, ..., k, of Y has been discussed. Box-Cox regression method is especially appropriate to adjust existence skewness or heteroscedasticity of error terms for a nonlinear functional relationship between dependent and explanatory variables. In this study, the advantage and disadvantage use of Box-Cox regression method have been discussed in differentiation and differantial analysis of time scale concept.
Simultaneous determination of estrogens (ethinylestradiol and norgestimate) concentrations in human and bovine serum albumin by use of fluorescence spectroscopy and multivariate regression analysis.

Science.gov (United States)

Hordge, LaQuana N; McDaniel, Kiara L; Jones, Derick D; Fakayode, Sayo O

2016-05-15

The endocrine disruption property of estrogens necessitates the immediate need for effective monitoring and development of analytical protocols for their analyses in biological and human specimens. This study explores the first combined utility of a steady-state fluorescence spectroscopy and multivariate partial-least-square (PLS) regression analysis for the simultaneous determination of two estrogens (17α-ethinylestradiol (EE) and norgestimate (NOR)) concentrations in bovine serum albumin (BSA) and human serum albumin (HSA) samples. The influence of EE and NOR concentrations and temperature on the emission spectra of EE-HSA EE-BSA, NOR-HSA, and NOR-BSA complexes was also investigated. The binding of EE with HSA and BSA resulted in increase in emission characteristics of HSA and BSA and a significant blue spectra shift. In contrast, the interaction of NOR with HSA and BSA quenched the emission characteristics of HSA and BSA. The observed emission spectral shifts preclude the effective use of traditional univariate regression analysis of fluorescent data for the determination of EE and NOR concentrations in HSA and BSA samples. Multivariate partial-least-squares (PLS) regression analysis was utilized to correlate the changes in emission spectra with EE and NOR concentrations in HSA and BSA samples. The figures-of-merit of the developed PLS regression models were excellent, with limits of detection as low as 1.6×10(-8) M for EE and 2.4×10(-7) M for NOR and good linearity (R(2)>0.994985). The PLS models correctly predicted EE and NOR concentrations in independent validation HSA and BSA samples with a root-mean-square-percent-relative-error (RMS%RE) of less than 6.0% at physiological condition. On the contrary, the use of univariate regression resulted in poor predictions of EE and NOR in HSA and BSA samples, with RMS%RE larger than 40% at physiological conditions. High accuracy, low sensitivity, simplicity, low-cost with no prior analyte extraction or separation
Two-step superresolution approach for surveillance face image through radial basis function-partial least squares regression and locality-induced sparse representation

Science.gov (United States)

Jiang, Junjun; Hu, Ruimin; Han, Zhen; Wang, Zhongyuan; Chen, Jun

2013-10-01

Face superresolution (SR), or face hallucination, refers to the technique of generating a high-resolution (HR) face image from a low-resolution (LR) one with the help of a set of training examples. It aims at transcending the limitations of electronic imaging systems. Applications of face SR include video surveillance, in which the individual of interest is often far from cameras. A two-step method is proposed to infer a high-quality and HR face image from a low-quality and LR observation. First, we establish the nonlinear relationship between LR face images and HR ones, according to radial basis function and partial least squares (RBF-PLS) regression, to transform the LR face into the global face space. Then, a locality-induced sparse representation (LiSR) approach is presented to enhance the local facial details once all the global faces for each LR training face are constructed. A comparison of some state-of-the-art SR methods shows the superiority of the proposed two-step approach, RBF-PLS global face regression followed by LiSR-based local patch reconstruction. Experiments also demonstrate the effectiveness under both simulation conditions and some real conditions.
Improved intact soil-core carbon determination applying regression shrinkage and variable selection techniques to complete spectrum laser-induced breakdown spectroscopy (LIBS).

Science.gov (United States)

Bricklemyer, Ross S; Brown, David J; Turk, Philip J; Clegg, Sam M

2013-10-01

Laser-induced breakdown spectroscopy (LIBS) provides a potential method for rapid, in situ soil C measurement. In previous research on the application of LIBS to intact soil cores, we hypothesized that ultraviolet (UV) spectrum LIBS (200-300 nm) might not provide sufficient elemental information to reliably discriminate between soil organic C (SOC) and inorganic C (IC). In this study, using a custom complete spectrum (245-925 nm) core-scanning LIBS instrument, we analyzed 60 intact soil cores from six wheat fields. Predictive multi-response partial least squares (PLS2) models using full and reduced spectrum LIBS were compared for directly determining soil total C (TC), IC, and SOC. Two regression shrinkage and variable selection approaches, the least absolute shrinkage and selection operator (LASSO) and sparse multivariate regression with covariance estimation (MRCE), were tested for soil C predictions and the identification of wavelengths important for soil C prediction. Using complete spectrum LIBS for PLS2 modeling reduced the calibration standard error of prediction (SEP) 15 and 19% for TC and IC, respectively, compared to UV spectrum LIBS. The LASSO and MRCE approaches provided significantly improved calibration accuracy and reduced SEP 32-55% over UV spectrum PLS2 models. We conclude that (1) complete spectrum LIBS is superior to UV spectrum LIBS for predicting soil C for intact soil cores without pretreatment; (2) LASSO and MRCE approaches provide improved calibration prediction accuracy over PLS2 but require additional testing with increased soil and target analyte diversity; and (3) measurement errors associated with analyzing intact cores (e.g., sample density and surface roughness) require further study and quantification.
Beam property studies in the PLS diagnostic beamline

CERN Document Server

Ko, I S; Seon, D K; Kim, C B; Lee, T Y

1999-01-01

A diagnostic beamline has been operated in the Pohang Light Source (PLS) storage ring for the diagnostics of electron and photon beam properties. It consists of two 1:1 imaging systems: a visible-light imaging system and a soft X-ray imaging system. We have measured the transverse and the longitudinal structures of beams by using a streak camera to obtain a visible image. Accurate transverse beam size have been measured to be 186 mu m horizontally and 43.1 mu m vertically by using soft X-ray images with minimum diffraction errors. The corresponding emittances are 11.7 nm-rad horizontally and 0.59 nm-rad vertically. By comparing the measured data with the design values, we confirmed that the PLS storage ring has reached its designed performance within an error of 3.3 % in the transverse direction.
Linear and nonlinear methods in modeling the aqueous solubility of organic compounds.

Science.gov (United States)

Catana, Cornel; Gao, Hua; Orrenius, Christian; Stouten, Pieter F W

2005-01-01

Solubility data for 930 diverse compounds have been analyzed using linear Partial Least Square (PLS) and nonlinear PLS methods, Continuum Regression (CR), and Neural Networks (NN). 1D and 2D descriptors from MOE package in combination with E-state or ISIS keys have been used. The best model was obtained using linear PLS for a combination between 22 MOE descriptors and 65 ISIS keys. It has a correlation coefficient (r2) of 0.935 and a root-mean-square error (RMSE) of 0.468 log molar solubility (log S(w)). The model validated on a test set of 177 compounds not included in the training set has r2 0.911 and RMSE 0.475 log S(w). The descriptors were ranked according to their importance, and at the top of the list have been found the 22 MOE descriptors. The CR model produced results as good as PLS, and because of the way in which cross-validation has been done it is expected to be a valuable tool in prediction besides PLS model. The statistics obtained using nonlinear methods did not surpass those got with linear ones. The good statistic obtained for linear PLS and CR recommends these models to be used in prediction when it is difficult or impossible to make experimental measurements, for virtual screening, combinatorial library design, and efficient leads optimization.
Partial least squares methods for spectrally estimating lunar soil FeO abundance: A stratified approach to revealing nonlinear effect and qualitative interpretation

Science.gov (United States)

Li, Lin

2008-12-01

Partial least squares (PLS) regressions were applied to lunar highland and mare soil data characterized by the Lunar Soil Characterization Consortium (LSCC) for spectral estimation of the abundance of lunar soil chemical constituents FeO and Al2O3. The LSCC data set was split into a number of subsets including the total highland, Apollo 16, Apollo 14, and total mare soils, and then PLS was applied to each to investigate the effect of nonlinearity on the performance of the PLS method. The weight-loading vectors resulting from PLS were analyzed to identify mineral species responsible for spectral estimation of the soil chemicals. The results from PLS modeling indicate that the PLS performance depends on the correlation of constituents of interest to their major mineral carriers, and the Apollo 16 soils are responsible for the large errors of FeO and Al2O3 estimates when the soils were modeled along with other types of soils. These large errors are primarily attributed to the degraded correlation FeO to pyroxene for the relatively mature Apollo 16 soils as a result of space weathering and secondary to the interference of olivine. PLS consistently yields very accurate fits to the two soil chemicals when applied to mare soils. Although Al2O3 has no spectrally diagnostic characteristics, this chemical can be predicted for all subset data by PLS modeling at high accuracies because of its correlation to FeO. This correlation is reflected in the symmetry of the PLS weight-loading vectors for FeO and Al2O3, which prove to be very useful for qualitative interpretation of the PLS results. However, this qualitative interpretation of PLS modeling cannot be achieved using principal component regression loading vectors.
Using the partial least squares (PLS) method to establish critical success factor interdependence in ERP implementation projects

OpenAIRE

Esteves, José; Pastor Collado, Juan Antonio; Casanovas Garcia, Josep

2002-01-01

This technical research report proposes the usage of a statistical approach named Partial Least squares (PLS) to define the relationships between critical success factors for ERP implementation projects. In previous research work, we developed a unified model of critical success factors for ERP implementation projects. Some researchers have evidenced the relationships between these critical success factors, however no one has defined in a form...
Current Mathematical Methods Used in QSAR/QSPR Studies

Directory of Open Access Journals (Sweden)

Peixun Liu

2009-04-01

Full Text Available This paper gives an overview of the mathematical methods currently used in quantitative structure-activity/property relationship (QASR/QSPR studies. Recently, the mathematical methods applied to the regression of QASR/QSPR models are developing very fast, and new methods, such as Gene Expression Programming (GEP, Project Pursuit Regression (PPR and Local Lazy Regression (LLR have appeared on the QASR/QSPR stage. At the same time, the earlier methods, including Multiple Linear Regression (MLR, Partial Least Squares (PLS, Neural Networks (NN, Support Vector Machine (SVM and so on, are being upgraded to improve their performance in QASR/QSPR studies. These new and upgraded methods and algorithms are described in detail, and their advantages and disadvantages are evaluated and discussed, to show their application potential in QASR/QSPR studies in the future.
Quantile Regression Methods

DEFF Research Database (Denmark)

Fitzenberger, Bernd; Wilke, Ralf Andreas

2015-01-01

if the mean regression model does not. We provide a short informal introduction into the principle of quantile regression which includes an illustrative application from empirical labor market research. This is followed by briefly sketching the underlying statistical model for linear quantile regression based......Quantile regression is emerging as a popular statistical approach, which complements the estimation of conditional mean models. While the latter only focuses on one aspect of the conditional distribution of the dependent variable, the mean, quantile regression provides more detailed insights...... by modeling conditional quantiles. Quantile regression can therefore detect whether the partial effect of a regressor on the conditional quantiles is the same for all quantiles or differs across quantiles. Quantile regression can provide evidence for a statistical relationship between two variables even...

Quantification of organic acids in beer by nuclear magnetic resonance (NMR)-based methods

International Nuclear Information System (INIS)

Rodrigues, J.E.A.; Erny, G.L.; Barros, A.S.; Esteves, V.I.; Brandao, T.; Ferreira, A.A.; Cabrita, E.; Gil, A.M.

2010-01-01

The organic acids present in beer provide important information on the product's quality and history, determining organoleptic properties and being useful indicators of fermentation performance. NMR spectroscopy may be used for rapid quantification of organic acids in beer and different NMR-based methodologies are hereby compared for the six main acids found in beer (acetic, citric, lactic, malic, pyruvic and succinic). The use of partial least squares (PLS) regression enables faster quantification, compared to traditional integration methods, and the performance of PLS models built using different reference methods (capillary electrophoresis (CE), both with direct and indirect UV detection, and enzymatic essays) was investigated. The best multivariate models were obtained using CE/indirect detection and enzymatic essays as reference and their response was compared with NMR integration, either using an internal reference or an electrical reference signal (Electronic REference To access In vivo Concentrations, ERETIC). NMR integration results generally agree with those obtained by PLS, with some overestimation for malic and pyruvic acids, probably due to peak overlap and subsequent integral errors, and an apparent relative underestimation for citric acid. Overall, these results make the PLS-NMR method an interesting choice for organic acid quantification in beer.
Quantification of organic acids in beer by nuclear magnetic resonance (NMR)-based methods

Energy Technology Data Exchange (ETDEWEB)

Rodrigues, J.E.A. [CICECO-Department of Chemistry, University of Aveiro, Campus de Santiago, 3810-193 Aveiro (Portugal); Erny, G.L. [CESAM - Department of Chemistry, University of Aveiro, Campus de Santiago, 3810-193 Aveiro (Portugal); Barros, A.S. [QOPNAA-Department of Chemistry, University of Aveiro, Campus de Santiago, 3810-193 Aveiro (Portugal); Esteves, V.I. [CESAM - Department of Chemistry, University of Aveiro, Campus de Santiago, 3810-193 Aveiro (Portugal); Brandao, T.; Ferreira, A.A. [UNICER, Bebidas de Portugal, Leca do Balio, 4466-955 S. Mamede de Infesta (Portugal); Cabrita, E. [Department of Chemistry, New University of Lisbon, 2825-114 Caparica (Portugal); Gil, A.M., E-mail: agil@ua.pt [CICECO-Department of Chemistry, University of Aveiro, Campus de Santiago, 3810-193 Aveiro (Portugal)

2010-08-03

The organic acids present in beer provide important information on the product's quality and history, determining organoleptic properties and being useful indicators of fermentation performance. NMR spectroscopy may be used for rapid quantification of organic acids in beer and different NMR-based methodologies are hereby compared for the six main acids found in beer (acetic, citric, lactic, malic, pyruvic and succinic). The use of partial least squares (PLS) regression enables faster quantification, compared to traditional integration methods, and the performance of PLS models built using different reference methods (capillary electrophoresis (CE), both with direct and indirect UV detection, and enzymatic essays) was investigated. The best multivariate models were obtained using CE/indirect detection and enzymatic essays as reference and their response was compared with NMR integration, either using an internal reference or an electrical reference signal (Electronic REference To access In vivo Concentrations, ERETIC). NMR integration results generally agree with those obtained by PLS, with some overestimation for malic and pyruvic acids, probably due to peak overlap and subsequent integral errors, and an apparent relative underestimation for citric acid. Overall, these results make the PLS-NMR method an interesting choice for organic acid quantification in beer.
[Determination of Cu in Shell of Preserved Egg by LIBS Coupled with PLS].

Science.gov (United States)

Hu, Hui-qin; Xu, Xue-hong; Liu, Mu-hua; Tu, Jian-ping; Huang, Le; Huang, Lin; Yao, Ming-yin; Chen, Tian-bing; Yang, Ping

2015-12-01

In this work, the content of copper in the shell of preserved eggs were determined directly by Laser induced breakdown spectroscopy (LIBS), and the characteristics lines of Cu was obtained. The samples of eggshell were pretreated by acid wet digestion, and the real content of Cu was obtained by atomic absorption spectrophotometer (AAS). Due to the test precision and accuracy of LIBS was influenced by a serious of factors, for example, the complex matrix effect of sample, the enviro nment noise, the system noise of the instrument, the stability of laser energy and so on. And the conventional unvariate linear calibration curve between LIBS intensity and content of element of sample, such as by use of Schiebe G-Lomakin equation, can not meet the requirement of quantitative analysis. In account of that, a kind of multivariate calibration method is needed. In this work, the data of LIBS spectra were processed by partial least squares (PLS), the precision and accuracy of PLS model were compared by different smoothing treatment and five pretreatment methods. The result showed that the correlation coefficient and the accuracy of the PLS model were improved, and the root mean square error and the average relative error were reduced effectively by 11 point smoothing with Multiplicative scatter correction (MSC) pretreatment. The results of the study show that, heavy metal Cu in preserved egg shells can be direct detected accurately by laser induced breakdown spectroscopy, and the next step batch tests will been conducted to find out the relationship of heavy metal Cu content in the preserved egg between the eggshell, egg white and egg yolk. And the goal of the contents of heavy metals in the egg white, egg yolk can be knew through determinate the eggshell by the LIBS can be achieved, to provide new method for rapid non-destructive testing technology for quality and satety of agricultural products.
Assessing the impacts of human activities and climate variations on grassland productivity by partial least squares structural equation modeling (PLS-SEM)

Institute of Scientific and Technical Information of China (English)

SHA Zongyao; XIE Yichun; TAN Xicheng; BAI Yongfei; LI Jonathan; LIU Xuefeng

2017-01-01

The cause-effect associations between geographical phenomena are an important focus in ecological research.Recent studies in structural equation modeling (SEM) demonstrated the potential for analyzing such associations.We applied the variance-based partial least squares SEM (PLS-SEM) and geographically-weighted regression (GWR) modeling to assess the human-climate impact on grassland productivity represented by above-ground biomass (AGB).The human and climate factors and their interaction were taken to explain the AGB variance by a PLS-SEM developed for the grassland ecosystem in Inner Mongolia,China.Results indicated that 65.5％ of the AGB variance could be explained by the human and climate factors and their interaction.The case study showed that the human and climate factors imposed a significant and negative impact on the AGB and that their interaction alleviated to some extent the threat from the intensified human-climate pressure.The alleviation may be attributable to vegetation adaptation to high human-climate stresses,to human adaptation to climate conditions or/and to recent vegetation restoration programs in the highly degraded areas.Furthermore,the AGB response to the human and climate factors modeled by GWR exhibited significant spatial variations.This study demonstrated that the combination of PLS-SEM and GWR model is feasible to investigate the cause-effect relation in socio-ecological systems.
Klystron-modulator system availability of PLS 2 GeV electron linac

International Nuclear Information System (INIS)

Cho, M.H.; Park, S.S.; Oh, J.S.; Namkung, W.

1996-01-01

PLS Linac has been injecting 2 GeV electron beams to the Pohang Light Source (PLS) storage ring since September 1994. PLS 2 GeV linac employs 11 sets of high power klystron-modulator (K and M) system for the main RF source for the beam acceleration. The klystron has rated output peak power of 80 MW at 4 microsec pulse width and at 60 pps. The matching modulator has 200 MW peak output power. The total accumulated high voltage run time of the oldest unit has reached beyond 23,000 hour and the sum of all the high voltage run time is approximately 230,000 hour as of May 1996. In this paper, we review overall system performance of the high-power K and M system. A special attention is paid on the analysis of all failures and troubles of the K and M system which affected the linac high power RF operations as well as beam injection operations for the period of 1994 to May 1996. (author)
MALDI-TOF-MS with PLS Modeling Enables Strain Typing of the Bacterial Plant Pathogen Xanthomonas axonopodis

Science.gov (United States)

Sindt, Nathan M.; Robison, Faith; Brick, Mark A.; Schwartz, Howard F.; Heuberger, Adam L.; Prenni, Jessica E.

2018-02-01

Matrix-assisted desorption/ionization time of flight mass spectrometry (MALDI-TOF-MS) is a fast and effective tool for microbial species identification. However, current approaches are limited to species-level identification even when genetic differences are known. Here, we present a novel workflow that applies the statistical method of partial least squares discriminant analysis (PLS-DA) to MALDI-TOF-MS protein fingerprint data of Xanthomonas axonopodis, an important bacterial plant pathogen of fruit and vegetable crops. Mass spectra of 32 X. axonopodis strains were used to create a mass spectral library and PLS-DA was employed to model the closely related strains. A robust workflow was designed to optimize the PLS-DA model by assessing the model performance over a range of signal-to-noise ratios (s/n) and mass filter (MF) thresholds. The optimized parameters were observed to be s/n = 3 and MF = 0.7. The model correctly classified 83% of spectra withheld from the model as a test set. A new decision rule was developed, termed the rolled-up Maximum Decision Rule (ruMDR), and this method improved identification rates to 92%. These results demonstrate that MALDI-TOF-MS protein fingerprints of bacterial isolates can be utilized to enable identification at the strain level. Furthermore, the open-source framework of this workflow allows for broad implementation across various instrument platforms as well as integration with alternative modeling and classification algorithms.
Direct-on-Filter α-Quartz Estimation in Respirable Coal Mine Dust Using Transmission Fourier Transform Infrared Spectrometry and Partial Least Squares Regression.

Science.gov (United States)

Miller, Arthur L; Weakley, Andrew Todd; Griffiths, Peter R; Cauda, Emanuele G; Bayman, Sean

2017-05-01

In order to help reduce silicosis in miners, the National Institute for Occupational Health and Safety (NIOSH) is developing field-portable methods for measuring airborne respirable crystalline silica (RCS), specifically the polymorph α-quartz, in mine dusts. In this study we demonstrate the feasibility of end-of-shift measurement of α-quartz using a direct-on-filter (DoF) method to analyze coal mine dust samples deposited onto polyvinyl chloride filters. The DoF method is potentially amenable for on-site analyses, but deviates from the current regulatory determination of RCS for coal mines by eliminating two sample preparation steps: ashing the sampling filter and redepositing the ash prior to quantification by Fourier transform infrared (FT-IR) spectrometry. In this study, the FT-IR spectra of 66 coal dust samples from active mines were used, and the RCS was quantified by using: (1) an ordinary least squares (OLS) calibration approach that utilizes standard silica material as done in the Mine Safety and Health Administration's P7 method; and (2) a partial least squares (PLS) regression approach. Both were capable of accounting for kaolinite, which can confound the IR analysis of silica. The OLS method utilized analytical standards for silica calibration and kaolin correction, resulting in a good linear correlation with P7 results and minimal bias but with the accuracy limited by the presence of kaolinite. The PLS approach also produced predictions well-correlated to the P7 method, as well as better accuracy in RCS prediction, and no bias due to variable kaolinite mass. Besides decreased sensitivity to mineral or substrate confounders, PLS has the advantage that the analyst is not required to correct for the presence of kaolinite or background interferences related to the substrate, making the method potentially viable for automated RCS prediction in the field. This study demonstrated the efficacy of FT-IR transmission spectrometry for silica determination in
Thermal Efficiency Degradation Diagnosis Method Using Regression Model

International Nuclear Information System (INIS)

Jee, Chang Hyun; Heo, Gyun Young; Jang, Seok Won; Lee, In Cheol

2011-01-01

This paper proposes an idea for thermal efficiency degradation diagnosis in turbine cycles, which is based on turbine cycle simulation under abnormal conditions and a linear regression model. The correlation between the inputs for representing degradation conditions (normally unmeasured but intrinsic states) and the simulation outputs (normally measured but superficial states) was analyzed with the linear regression model. The regression models can inversely response an associated intrinsic state for a superficial state observed from a power plant. The diagnosis method proposed herein is classified into three processes, 1) simulations for degradation conditions to get measured states (referred as what-if method), 2) development of the linear model correlating intrinsic and superficial states, and 3) determination of an intrinsic state using the superficial states of current plant and the linear regression model (referred as inverse what-if method). The what-if method is to generate the outputs for the inputs including various root causes and/or boundary conditions whereas the inverse what-if method is the process of calculating the inverse matrix with the given superficial states, that is, component degradation modes. The method suggested in this paper was validated using the turbine cycle model for an operating power plant
Regression and Sparse Regression Methods for Viscosity Estimation of Acid Milk From it’s Sls Features

DEFF Research Database (Denmark)

Sharifzadeh, Sara; Skytte, Jacob Lercke; Nielsen, Otto Højager Attermann

2012-01-01

Statistical solutions find wide spread use in food and medicine quality control. We investigate the effect of different regression and sparse regression methods for a viscosity estimation problem using the spectro-temporal features from new Sub-Surface Laser Scattering (SLS) vision system. From...... with sparse LAR, lasso and Elastic Net (EN) sparse regression methods. Due to the inconsistent measurement condition, Locally Weighted Scatter plot Smoothing (Loess) has been employed to alleviate the undesired variation in the estimated viscosity. The experimental results of applying different methods show...
Comparison between Two Linear Supervised Learning Machines' Methods with Principle Component Based Methods for the Spectrofluorimetric Determination of Agomelatine and Its Degradants.

Science.gov (United States)

Elkhoudary, Mahmoud M; Naguib, Ibrahim A; Abdel Salam, Randa A; Hadad, Ghada M

2017-05-01

Four accurate, sensitive and reliable stability indicating chemometric methods were developed for the quantitative determination of Agomelatine (AGM) whether in pure form or in pharmaceutical formulations. Two supervised learning machines' methods; linear artificial neural networks (PC-linANN) preceded by principle component analysis and linear support vector regression (linSVR), were compared with two principle component based methods; principle component regression (PCR) as well as partial least squares (PLS) for the spectrofluorimetric determination of AGM and its degradants. The results showed the benefits behind using linear learning machines' methods and the inherent merits of their algorithms in handling overlapped noisy spectral data especially during the challenging determination of AGM alkaline and acidic degradants (DG1 and DG2). Relative mean squared error of prediction (RMSEP) for the proposed models in the determination of AGM were 1.68, 1.72, 0.68 and 0.22 for PCR, PLS, SVR and PC-linANN; respectively. The results showed the superiority of supervised learning machines' methods over principle component based methods. Besides, the results suggested that linANN is the method of choice for determination of components in low amounts with similar overlapped spectra and narrow linearity range. Comparison between the proposed chemometric models and a reported HPLC method revealed the comparable performance and quantification power of the proposed models.
Influence of the nature of soil organic matter on the sorption behaviour of pentadecane as determined by PLS analysis of mid-infrared DRIFT and solid-state {sup 13}C NMR spectra

Energy Technology Data Exchange (ETDEWEB)

Clark Ehlers, G.A. [Institute of Environmental Biotechnology, Department IFA-Tulln, University of Natural Resources and Applied Life Sciences, Vienna, Konrad Lorenz Str. 20, Tulln A-3430 (Austria); Forrester, Sean T. [CSIRO Land and Water, Waite Rd, Urrbrae SA 5064 (Australia); Scherr, Kerstin E. [Institute of Environmental Biotechnology, Department IFA-Tulln, University of Natural Resources and Applied Life Sciences, Vienna, Konrad Lorenz Str. 20, Tulln A-3430 (Austria); Loibner, Andreas P., E-mail: andreas.loibner@boku.ac.a [Institute of Environmental Biotechnology, Department IFA-Tulln, The University of Natural Resources and Applied Life Sciences, Vienna, Konrad Lorenz Str. 20, Tulln A-3430 (Austria); Janik, Les J. [CSIRO Land and Water, Waite Rd, Urrbrae SA 5064 (Australia)

2010-01-15

The nature of soil organic matter (SOM) functional groups associated with sorption processes was determined by correlating partitioning coefficients with solid-state {sup 13}C nuclear magnetic resonance (NMR) and diffuse reflectance mid-infrared (DRIFT) spectral features using partial least squares (PLS) regression analysis. Partitioning sorption coefficients for n-pentadecane (n-C{sub 15}) were determined for three alternative models: the Langmuir model, the dual distributed reactive domain model (DRDM) and the Freundlich model, where the latter was found to be the most appropriate. NMR-derived constitutional descriptors did not correlate with Freundlich model parameters. By contrast, PLS analysis revealed the most likely nature of the functional groups in SOM associated with n-C{sub 15} sorption coefficients (K{sub F}) to be aromatic, possibly porous soil char, rather than aliphatic organic components for the presently investigated soils. High PLS cross-validation correlation suggested that the model was robust for the purpose of characterising the functional group chemistry important for n-C{sub 15} sorption. - NMR/IR spectroscopy and chemometrics reveal the aromatic fraction of soil organic matter being responsible for alkane sorption.
Simultaneous determination of penicillin G salts by infrared spectroscopy: Evaluation of combining orthogonal signal correction with radial basis function-partial least squares regression

Science.gov (United States)

Talebpour, Zahra; Tavallaie, Roya; Ahmadi, Seyyed Hamid; Abdollahpour, Assem

2010-09-01

In this study, a new method for the simultaneous determination of penicillin G salts in pharmaceutical mixture via FT-IR spectroscopy combined with chemometrics was investigated. The mixture of penicillin G salts is a complex system due to similar analytical characteristics of components. Partial least squares (PLS) and radial basis function-partial least squares (RBF-PLS) were used to develop the linear and nonlinear relation between spectra and components, respectively. The orthogonal signal correction (OSC) preprocessing method was used to correct unexpected information, such as spectral overlapping and scattering effects. In order to compare the influence of OSC on PLS and RBF-PLS models, the optimal linear (PLS) and nonlinear (RBF-PLS) models based on conventional and OSC preprocessed spectra were established and compared. The obtained results demonstrated that OSC clearly enhanced the performance of both RBF-PLS and PLS calibration models. Also in the case of some nonlinear relation between spectra and component, OSC-RBF-PLS gave satisfactory results than OSC-PLS model which indicated that the OSC was helpful to remove extrinsic deviations from linearity without elimination of nonlinear information related to component. The chemometric models were tested on an external dataset and finally applied to the analysis commercialized injection product of penicillin G salts.
A thioesterase bypasses the requirement for exogenous fatty acids in the plsX deletion of Streptococcus pneumoniae

NARCIS (Netherlands)

Parsons, J.B.; Frank, M.W.; Eleveld, M.J.; Schalkwijk, J.; Broussard, T.C.; Jonge, M.I. de; Rock, C.O.

2015-01-01

PlsX is an acyl-acyl carrier protein (ACP):phosphate transacylase that interconverts the two acyl donors in Gram-positive bacterial phospholipid synthesis. The deletion of plsX in Staphylococcus aureus results in a requirement for both exogenous fatty acids and de novo type II fatty acid
Finding-equal regression method and its application in predication of U resources

International Nuclear Information System (INIS)

Cao Huimo

1995-03-01

The commonly adopted deposit model method in mineral resources predication has two main part: one is model data that show up geological mineralization law for deposit, the other is statistics predication method that accords with characters of the data namely pretty regression method. This kind of regression method may be called finding-equal regression, which is made of the linear regression and distribution finding-equal method. Because distribution finding-equal method is a data pretreatment which accords with advanced mathematical precondition for the linear regression namely equal distribution theory, and this kind of data pretreatment is possible of realization. Therefore finding-equal regression not only can overcome nonlinear limitations, that are commonly occurred in traditional linear regression or other regression and always have no solution, but also can distinguish outliers and eliminate its weak influence, which would usually appeared when Robust regression possesses outlier in independent variables. Thus this newly finding-equal regression stands the best status in all kind of regression methods. Finally, two good examples of U resource quantitative predication are provided
Ridge regression estimator: combining unbiased and ordinary ridge regression methods of estimation

Directory of Open Access Journals (Sweden)

Sharad Damodar Gore

2009-10-01

Full Text Available Statistical literature has several methods for coping with multicollinearity. This paper introduces a new shrinkage estimator, called modified unbiased ridge (MUR. This estimator is obtained from unbiased ridge regression (URR in the same way that ordinary ridge regression (ORR is obtained from ordinary least squares (OLS. Properties of MUR are derived. Results on its matrix mean squared error (MMSE are obtained. MUR is compared with ORR and URR in terms of MMSE. These results are illustrated with an example based on data generated by Hoerl and Kennard (1975.
PLS models for determination of SARA analysis of Colombian vacuum residues and molecular distillation fractions using MIR-ATR

Directory of Open Access Journals (Sweden)

Jorge A. Orrego-Ruiz

2014-06-01

Full Text Available In this work, prediction models of Saturates, Aromatics, Resins and Asphaltenes fractions (SARA from thirty-seven vacuum residues of representative Colombian crudes and eighteen fractions of molecular distillation process were obtained. Mid-Infrared (MIR Attenuated Total Reflection (ATR spectroscopy in combination with partial least squares (PLS regression analysis was used to estimate accurately SARA analysis in these kind of samples. Calibration coefficients of prediction models were for saturates, aromatics, resins and asphaltenes fractions, 0.99, 0.96, 0.97 and 0.99, respectively. This methodology permits to control the molecular distillation process since small differences in chemical composition can be detected. Total time elapsed to give the SARA analysis per sample is 10 minutes.
Determinação do Poder Calorífico de Amostras de Gasolina Utilizando Espectroscopia no Infravermelho Próximo e Regressão Multivariada

Directory of Open Access Journals (Sweden)

Janice Zulma Francesquett

2013-08-01

Full Text Available The aim this study was quantify the calorific power of 111 gasoline samples available at filling stations using near infrared spectroscopy in conjunction with the multivariate regression. The calorific power value of the fuels was determined using an adiabatic bomb calorimeter (norm ASTM D 4.809. For the construction of multivariate regression models were used 2/3 of the samples for calibration and the remainder to prediction, using the interval partial least squares (iPLS and synergy interval partial least square (siPLS algorithms. In the best iPLS model was selected the spectral range from 5561 to 6650 cm-1, obtaining RMSEP of 102 g cal-1 and showing a correlation coefficient (r of 0.8218 and 0.71% to calibration errors and 0.47% for prediction errors. The siPLS model divided into 32 intervals and grouped into three intervals was the highlighted model, which selected the region below 6000 cm-1 and above 6500 cm-1 with, presenting values of RMSECV of 89.8 cal g-1 and RMSEP of 96.7 cal g-1, and correlation coefficients for the cross-validation and prediction of 0.7834 and 0.7293, respectively. The methodology proposed in this work is efficient, with prediction errors lower than 1%, being a clean alternative, fast, safe and practical.
Determination of Ethanol in Blood Samples Using Partial Least Square Regression Applied to Surface Enhanced Raman Spectroscopy.

Science.gov (United States)

Açikgöz, Güneş; Hamamci, Berna; Yildiz, Abdulkadir

2018-04-01

Alcohol consumption triggers toxic effect to organs and tissues in the human body. The risks are essentially thought to be related to ethanol content in alcoholic beverages. The identification of ethanol in blood samples requires rapid, minimal sample handling, and non-destructive analysis, such as Raman Spectroscopy. This study aims to apply Raman Spectroscopy for identification of ethanol in blood samples. Silver nanoparticles were synthesized to obtain Surface Enhanced Raman Spectroscopy (SERS) spectra of blood samples. The SERS spectra were used for Partial Least Square (PLS) for determining ethanol quantitatively. To apply PLS method, 920~820 cm -1 band interval was chosen and the spectral changes of the observed concentrations statistically associated with each other. The blood samples were examined according to this model and the quantity of ethanol was determined as that: first a calibration method was established. A strong relationship was observed between known concentration values and the values obtained by PLS method (R 2 = 1). Second instead of then, quantities of ethanol in 40 blood samples were predicted according to the calibration method. Quantitative analysis of the ethanol in the blood was done by analyzing the data obtained by Raman spectroscopy and the PLS method.
Near infrared spectrometric technique for testing fruit quality: optimisation of regression models using genetic algorithms

Science.gov (United States)

Isingizwe Nturambirwe, J. Frédéric; Perold, Willem J.; Opara, Umezuruike L.

2016-02-01

Near infrared (NIR) spectroscopy has gained extensive use in quality evaluation. It is arguably one of the most advanced spectroscopic tools in non-destructive quality testing of food stuff, from measurement to data analysis and interpretation. NIR spectral data are interpreted through means often involving multivariate statistical analysis, sometimes associated with optimisation techniques for model improvement. The objective of this research was to explore the extent to which genetic algorithms (GA) can be used to enhance model development, for predicting fruit quality. Apple fruits were used, and NIR spectra in the range from 12000 to 4000 cm-1 were acquired on both bruised and healthy tissues, with different degrees of mechanical damage. GAs were used in combination with partial least squares regression methods to develop bruise severity prediction models, and compared to PLS models developed using the full NIR spectrum. A classification model was developed, which clearly separated bruised from unbruised apple tissue. GAs helped improve prediction models by over 10%, in comparison with full spectrum-based models, as evaluated in terms of error of prediction (Root Mean Square Error of Cross-validation). PLS models to predict internal quality, such as sugar content and acidity were developed and compared to the versions optimized by genetic algorithm. Overall, the results highlighted the potential use of GA method to improve speed and accuracy of fruit quality prediction.
New PLS analysis approach to wine volatile compounds characterization by near infrared spectroscopy (NIR).

Science.gov (United States)

Genisheva, Z; Quintelas, C; Mesquita, D P; Ferreira, E C; Oliveira, J M; Amaral, A L

2018-04-25

This work aims to explore the potential of near infrared (NIR) spectroscopy to quantify volatile compounds in Vinho Verde wines, commonly determined by gas chromatography. For this purpose, 105 Vinho Verde wine samples were analyzed using Fourier transform near infrared (FT-NIR) transmission spectroscopy in the range of 5435 cm -1 to 6357 cm -1 . Boxplot and principal components analysis (PCA) were performed for clusters identification and outliers removal. A partial least square (PLS) regression was then applied to develop the calibration models, by a new iterative approach. The predictive ability of the models was confirmed by an external validation procedure with an independent sample set. The obtained results could be considered as quite good with coefficients of determination (R 2 ) varying from 0.94 to 0.97. The current methodology, using NIR spectroscopy and chemometrics, can be seen as a promising rapid tool to determine volatile compounds in Vinho Verde wines. Copyright © 2017 Elsevier Ltd. All rights reserved.

Robust methods for multivariate data analysis A1

DEFF Research Database (Denmark)

Frosch, Stina; Von Frese, J.; Bro, Rasmus

2005-01-01

Outliers may hamper proper classical multivariate analysis, and lead to incorrect conclusions. To remedy the problem of outliers, robust methods are developed in statistics and chemometrics. Robust methods reduce or remove the effect of outlying data points and allow the ?good? data to primarily...... determine the result. This article reviews the most commonly used robust multivariate regression and exploratory methods that have appeared since 1996 in the field of chemometrics. Special emphasis is put on the robust versions of chemometric standard tools like PCA and PLS and the corresponding robust...
Optimizing methods for linking cinematic features to fMRI data.

Science.gov (United States)

Kauttonen, Janne; Hlushchuk, Yevhen; Tikka, Pia

2015-04-15

One of the challenges of naturalistic neurosciences using movie-viewing experiments is how to interpret observed brain activations in relation to the multiplicity of time-locked stimulus features. As previous studies have shown less inter-subject synchronization across viewers of random video footage than story-driven films, new methods need to be developed for analysis of less story-driven contents. To optimize the linkage between our fMRI data collected during viewing of a deliberately non-narrative silent film 'At Land' by Maya Deren (1944) and its annotated content, we combined the method of elastic-net regularization with the model-driven linear regression and the well-established data-driven independent component analysis (ICA) and inter-subject correlation (ISC) methods. In the linear regression analysis, both IC and region-of-interest (ROI) time-series were fitted with time-series of a total of 36 binary-valued and one real-valued tactile annotation of film features. The elastic-net regularization and cross-validation were applied in the ordinary least-squares linear regression in order to avoid over-fitting due to the multicollinearity of regressors, the results were compared against both the partial least-squares (PLS) regression and the un-regularized full-model regression. Non-parametric permutation testing scheme was applied to evaluate the statistical significance of regression. We found statistically significant correlation between the annotation model and 9 ICs out of 40 ICs. Regression analysis was also repeated for a large set of cubic ROIs covering the grey matter. Both IC- and ROI-based regression analyses revealed activations in parietal and occipital regions, with additional smaller clusters in the frontal lobe. Furthermore, we found elastic-net based regression more sensitive than PLS and un-regularized regression since it detected a larger number of significant ICs and ROIs. Along with the ISC ranking methods, our regression analysis proved
Application of GA-PLS and GA-KPLS calculations for the prediction of the retention indices of essential oils

Directory of Open Access Journals (Sweden)

Hadi Noorizadeh

2011-01-01

Full Text Available Genetic algorithm and partial least square (GA-PLS and kernel PLS (GA-KPLS techniques were used to investigate the correlation between retention indices (RI and descriptors for 117 diverse compounds in essential oils from 5 Pimpinella species gathered from central Turkey which were obtained by gas chromatography and gas chromatography-mass spectrometry. The square correlation coefficient leave-group-out cross validation (LGO-CV (Q² between experimental and predicted RI for training set by GA-PLS and GA-KPLS was 0.940 and 0.963, respectively. This indicates that GA-KPLS can be used as an alternative modeling tool for quantitative structure-retention relationship (QSRR studies.
Methods for identifying SNP interactions: a review on variations of Logic Regression, Random Forest and Bayesian logistic regression.

Science.gov (United States)

Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula

2011-01-01

Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.
Mobility of the native Bacillus subtilis conjugative plasmid pLS20 is regulated by intercellular signaling.

Science.gov (United States)

Singh, Praveen K; Ramachandran, Gayetri; Ramos-Ruiz, Ricardo; Peiró-Pastor, Ramón; Abia, David; Wu, Ling J; Meijer, Wilfried J J

2013-10-01

Horizontal gene transfer mediated by plasmid conjugation plays a significant role in the evolution of bacterial species, as well as in the dissemination of antibiotic resistance and pathogenicity determinants. Characterization of their regulation is important for gaining insights into these features. Relatively little is known about how conjugation of Gram-positive plasmids is regulated. We have characterized conjugation of the native Bacillus subtilis plasmid pLS20. Contrary to the enterococcal plasmids, conjugation of pLS20 is not activated by recipient-produced pheromones but by pLS20-encoded proteins that regulate expression of the conjugation genes. We show that conjugation is kept in the default "OFF" state and identified the master repressor responsible for this. Activation of the conjugation genes requires relief of repression, which is mediated by an anti-repressor that belongs to the Rap family of proteins. Using both RNA sequencing methodology and genetic approaches, we have determined the regulatory effects of the repressor and anti-repressor on expression of the pLS20 genes. We also show that the activity of the anti-repressor is in turn regulated by an intercellular signaling peptide. Ultimately, this peptide dictates the timing of conjugation. The implications of this regulatory mechanism and comparison with other mobile systems are discussed.
Check-all-that-apply data analysed by Partial Least Squares regression

DEFF Research Database (Denmark)

Rinnan, Åsmund; Giacalone, Davide; Frøst, Michael Bom

2015-01-01

are analysed by multivariate techniques. CATA data can be analysed both by setting the CATA as the X and the Y. The former is the PLS-Discriminant Analysis (PLS-DA) version, while the latter is the ANOVA-PLS (A-PLS) version. We investigated the difference between these two approaches, concluding...
Clustering and training set selection methods for improving the accuracy of quantitative laser induced breakdown spectroscopy

International Nuclear Information System (INIS)

Anderson, Ryan B.; Bell, James F.; Wiens, Roger C.; Morris, Richard V.; Clegg, Samuel M.

2012-01-01

We investigated five clustering and training set selection methods to improve the accuracy of quantitative chemical analysis of geologic samples by laser induced breakdown spectroscopy (LIBS) using partial least squares (PLS) regression. The LIBS spectra were previously acquired for 195 rock slabs and 31 pressed powder geostandards under 7 Torr CO 2 at a stand-off distance of 7 m at 17 mJ per pulse to simulate the operational conditions of the ChemCam LIBS instrument on the Mars Science Laboratory Curiosity rover. The clustering and training set selection methods, which do not require prior knowledge of the chemical composition of the test-set samples, are based on grouping similar spectra and selecting appropriate training spectra for the partial least squares (PLS2) model. These methods were: (1) hierarchical clustering of the full set of training spectra and selection of a subset for use in training; (2) k-means clustering of all spectra and generation of PLS2 models based on the training samples within each cluster; (3) iterative use of PLS2 to predict sample composition and k-means clustering of the predicted compositions to subdivide the groups of spectra; (4) soft independent modeling of class analogy (SIMCA) classification of spectra, and generation of PLS2 models based on the training samples within each class; (5) use of Bayesian information criteria (BIC) to determine an optimal number of clusters and generation of PLS2 models based on the training samples within each cluster. The iterative method and the k-means method using 5 clusters showed the best performance, improving the absolute quadrature root mean squared error (RMSE) by ∼ 3 wt.%. The statistical significance of these improvements was ∼ 85%. Our results show that although clustering methods can modestly improve results, a large and diverse training set is the most reliable way to improve the accuracy of quantitative LIBS. In particular, additional sulfate standards and specifically
Development and performance test of a new high power RF window in S-band PLS-II LINAC

Science.gov (United States)

Hwang, Woon-Ha; Joo, Young-Do; Kim, Seung-Hwan; Choi, Jae-Young; Noh, Sung-Ju; Ryu, Ji-Wan; Cho, Young-Ki

2017-12-01

A prototype of RF window was developed in collaboration with the Pohang Accelerator Laboratory (PAL) and domestic companies. High power performance tests of the single RF window were conducted at PAL to verify the operational characteristics for its application in the Pohang Light Source-II (PLS-II) linear accelerator (Linac). The tests were performed in the in-situ facility consisting of a modulator, klystron, waveguide network, vacuum system, cooling system, and RF analyzing equipment. The test results with Stanford linear accelerator energy doubler (SLED) have shown no breakdown up to 75 MW peak power with 4.5 μs RF pulse width at a repetition rate of 10 Hz. The test results with the current operation level of PLS-II Linac confirm that the RF window well satisfies the criteria for PLS-II Linac operation.
Interaction of PLS and PIN and hormonal crosstalk in Arabidopsis root developmentHormonal crosstalk in Arabidopsis

Directory of Open Access Journals (Sweden)

Junli eLiu

2013-04-01

Full Text Available Understanding how hormones and genes interact to coordinate plant growth is a major challenge in developmental biology. The activities of auxin, ethylene and cytokinin depend on cellular context and exhibit either synergistic or antagonistic interactions. Here we use experimentation and network construction to elucidate the role of the interaction of the POLARIS peptide (PLS and the auxin efflux carrier PIN proteins in the crosstalk of three hormones (auxin, ethylene and cytokinin in Arabidopsis root development. In ethylene hypersignalling mutants such as polaris (pls, we show experimentally that expression of both PIN1 and PIN2 significantly increases. This relationship is analysed in the context of the crosstalk between auxin, ethylene and cytokinin: in pls, endogenous auxin, ethylene and cytokinin concentration decreases, approximately remains unchanged and increases, respectively. Experimental data are integrated into a hormonal crosstalk network through combination with information in literature. Network construction reveals that the regulation of both PIN1 and PIN2 is predominantly via ethylene signalling. In addition, it is deduced that the relationship between cytokinin and PIN1 and PIN2 levels implies a regulatory role of cytokinin in addition to its regulation to auxin, ethylene and PLS levels. We discuss how the network of hormones and genes coordinates plant growth by simultaneously regulating the activities of auxin, ethylene and cytokinin signalling pathways.
DEMAND FOR AND SUPPLY OF MARK-UP AND PLS FUNDS IN ISLAMIC BANKING: SOME ALTERNATIVE EXPLANATIONS

OpenAIRE

KHAN, TARIQULLAH

1995-01-01

Profit and loss-sharing (PLS) and bai’ al murabahah lil amir bil shira (mark-up) are the two parent principles of Islamic financing. The use of PLS is limited and that of mark-up overwhelming in the operations of the Islamic banks. Several studies provide different explanations for this phenomenon. The dominant among these is the moral hazard hypothesis. Some alternative explanations are given in the present paper. The discussion is based on both demand (user of funds) and supply (bank) side ...
Preliminary antifungal and cytotoxic evaluation of synthetic cycloalkyl[b]thiophene derivatives with PLS-DA analysis.

Science.gov (United States)

Souza, Beatriz C C; De Oliveira, Tiago B; Aquino, Thiago M; de Lima, Maria C A; Pitta, Ivan R; Galdino, Suely L; Lima, Edeltrudes O; Gonçalves-Silva, Teresinha; Militão, Gardênia C G; Scotti, Luciana; Scotti, Marcus T; Mendonça, Francisco J B

2012-06-01

A series of 2-[(arylidene)amino]-cycloalkyl[b]thiophene-3-carbonitriles (2a-x) was synthesized by incorporation of substituted aromatic aldehydes in Gewald adducts (1a-c). The title compounds were screened for their antifungal activity against Candida krusei and Criptococcus neoformans and for their antiproliferative activity against a panel of 3 human cancer cell lines (HT29, NCI H-292 and HEP). For antiproliferative activity, the partial least squares (PLS) methodology was applied. Some of the prepared compounds exhibited promising antifungal and proliferative properties. The most active compounds for antifungal activity were cyclohexyl[b]thiophene derivatives, and for antiproliferative activity cycloheptyl[b]thiophene derivatives, especially 2-[(1H-indol-2-yl-methylidene)amino]- 5,6,7,8-tetrahydro-4H-cyclohepta[b]thiophene-3-carbonitrile (2r), which inhibited more than 97 % growth of the three cell lines. The PLS discriminant analysis (PLS-DA) applied generated good exploratory and predictive results and showed that the descriptors having shape characteristics were strongly correlated with the biological data.
A novel exploratory chemometric approach to environmental monitorring by combining block clustering with Partial Least Square (PLS) analysis

Science.gov (United States)

2013-01-01

Background Given the serious threats posed to terrestrial ecosystems by industrial contamination, environmental monitoring is a standard procedure used for assessing the current status of an environment or trends in environmental parameters. Measurement of metal concentrations at different trophic levels followed by their statistical analysis using exploratory multivariate methods can provide meaningful information on the status of environmental quality. In this context, the present paper proposes a novel chemometric approach to standard statistical methods by combining the Block clustering with Partial least square (PLS) analysis to investigate the accumulation patterns of metals in anthropized terrestrial ecosystems. The present study focused on copper, zinc, manganese, iron, cobalt, cadmium, nickel, and lead transfer along a soil-plant-snai food chain, and the hepatopancreas of the Roman snail (Helix pomatia) was used as a biological end-point of metal accumulation. Results Block clustering deliniates between the areas exposed to industrial and vehicular contamination. The toxic metals have similar distributions in the nettle leaves and snail hepatopancreas. PLS analysis showed that (1) zinc and copper concentrations at the lower trophic levels are the most important latent factors that contribute to metal accumulation in land snails; (2) cadmium and lead are the main determinants of pollution pattern in areas exposed to industrial contamination; (3) at the sites located near roads lead is the most threatfull metal for terrestrial ecosystems. Conclusion There were three major benefits by applying block clustering with PLS for processing the obtained data: firstly, it helped in grouping sites depending on the type of contamination. Secondly, it was valuable for identifying the latent factors that contribute the most to metal accumulation in land snails. Finally, it optimized the number and type of data that are best for monitoring the status of metallic
A novel exploratory chemometric approach to environmental monitorring by combining block clustering with Partial Least Square (PLS) analysis.

Science.gov (United States)

Nica, Dragos V; Bordean, Despina Maria; Pet, Ioan; Pet, Elena; Alda, Simion; Gergen, Iosif

2013-08-30

Given the serious threats posed to terrestrial ecosystems by industrial contamination, environmental monitoring is a standard procedure used for assessing the current status of an environment or trends in environmental parameters. Measurement of metal concentrations at different trophic levels followed by their statistical analysis using exploratory multivariate methods can provide meaningful information on the status of environmental quality. In this context, the present paper proposes a novel chemometric approach to standard statistical methods by combining the Block clustering with Partial least square (PLS) analysis to investigate the accumulation patterns of metals in anthropized terrestrial ecosystems. The present study focused on copper, zinc, manganese, iron, cobalt, cadmium, nickel, and lead transfer along a soil-plant-snai food chain, and the hepatopancreas of the Roman snail (Helix pomatia) was used as a biological end-point of metal accumulation. Block clustering deliniates between the areas exposed to industrial and vehicular contamination. The toxic metals have similar distributions in the nettle leaves and snail hepatopancreas. PLS analysis showed that (1) zinc and copper concentrations at the lower trophic levels are the most important latent factors that contribute to metal accumulation in land snails; (2) cadmium and lead are the main determinants of pollution pattern in areas exposed to industrial contamination; (3) at the sites located near roads lead is the most threatfull metal for terrestrial ecosystems. There were three major benefits by applying block clustering with PLS for processing the obtained data: firstly, it helped in grouping sites depending on the type of contamination. Secondly, it was valuable for identifying the latent factors that contribute the most to metal accumulation in land snails. Finally, it optimized the number and type of data that are best for monitoring the status of metallic contamination in terrestrial ecosystems
Non-destructive geographical traceability of sea cucumber (Apostichopus japonicus) using near infrared spectroscopy combined with chemometric methods.

Science.gov (United States)

Guo, Xiuhan; Cai, Rui; Wang, Shisheng; Tang, Bo; Li, Yueqing; Zhao, Weijie

2018-01-01

Sea cucumber is the major tonic seafood worldwide, and geographical origin traceability is an important part of its quality and safety control. In this work, a non-destructive method for origin traceability of sea cucumber ( Apostichopus japonicus ) from northern China Sea and East China Sea using near infrared spectroscopy (NIRS) and multivariate analysis methods was proposed. Total fat contents of 189 fresh sea cucumber samples were determined and partial least-squares (PLS) regression was used to establish the quantitative NIRS model. The ordered predictor selection algorithm was performed to select feasible wavelength regions for the construction of PLS and identification models. The identification model was developed by principal component analysis combined with Mahalanobis distance and scaling to the first range algorithms. In the test set of the optimum PLS models, the root mean square error of prediction was 0.45, and correlation coefficient was 0.90. The correct classification rates of 100% were obtained in both identification calibration model and test model. The overall results indicated that NIRS method combined with chemometric analysis was a suitable tool for origin traceability and identification of fresh sea cucumber samples from nine origins in China.
VG2 URA PLS DERIVED SUMMARY ION FIT 48SEC V1.0

Data.gov (United States)

National Aeronautics and Space Administration — This data set contains the total ion density obtained from Voyager 2 PLS data (voltage range 10-5950 eV/Q) at Uranus by fitting the measured spectra with isotropic...
VG2 URA PLS DERIVED RDR ION FIT 48SEC V1.0

Data.gov (United States)

National Aeronautics and Space Administration — This data set contains the ion densities and temperatures along with formal 1 Sigma errors obtained from Voyager 2 PLS data (voltage range 10-5950 eV/Q) at Uranus by...
Clustering and training set selection methods for improving the accuracy of quantitative laser induced breakdown spectroscopy

Energy Technology Data Exchange (ETDEWEB)

Anderson, Ryan B., E-mail: randerson@astro.cornell.edu [Cornell University Department of Astronomy, 406 Space Sciences Building, Ithaca, NY 14853 (United States); Bell, James F., E-mail: Jim.Bell@asu.edu [Arizona State University School of Earth and Space Exploration, Bldg.: INTDS-A, Room: 115B, Box 871404, Tempe, AZ 85287 (United States); Wiens, Roger C., E-mail: rwiens@lanl.gov [Los Alamos National Laboratory, P.O. Box 1663 MS J565, Los Alamos, NM 87545 (United States); Morris, Richard V., E-mail: richard.v.morris@nasa.gov [NASA Johnson Space Center, 2101 NASA Parkway, Houston, TX 77058 (United States); Clegg, Samuel M., E-mail: sclegg@lanl.gov [Los Alamos National Laboratory, P.O. Box 1663 MS J565, Los Alamos, NM 87545 (United States)

2012-04-15

We investigated five clustering and training set selection methods to improve the accuracy of quantitative chemical analysis of geologic samples by laser induced breakdown spectroscopy (LIBS) using partial least squares (PLS) regression. The LIBS spectra were previously acquired for 195 rock slabs and 31 pressed powder geostandards under 7 Torr CO{sub 2} at a stand-off distance of 7 m at 17 mJ per pulse to simulate the operational conditions of the ChemCam LIBS instrument on the Mars Science Laboratory Curiosity rover. The clustering and training set selection methods, which do not require prior knowledge of the chemical composition of the test-set samples, are based on grouping similar spectra and selecting appropriate training spectra for the partial least squares (PLS2) model. These methods were: (1) hierarchical clustering of the full set of training spectra and selection of a subset for use in training; (2) k-means clustering of all spectra and generation of PLS2 models based on the training samples within each cluster; (3) iterative use of PLS2 to predict sample composition and k-means clustering of the predicted compositions to subdivide the groups of spectra; (4) soft independent modeling of class analogy (SIMCA) classification of spectra, and generation of PLS2 models based on the training samples within each class; (5) use of Bayesian information criteria (BIC) to determine an optimal number of clusters and generation of PLS2 models based on the training samples within each cluster. The iterative method and the k-means method using 5 clusters showed the best performance, improving the absolute quadrature root mean squared error (RMSE) by {approx} 3 wt.%. The statistical significance of these improvements was {approx} 85%. Our results show that although clustering methods can modestly improve results, a large and diverse training set is the most reliable way to improve the accuracy of quantitative LIBS. In particular, additional sulfate standards and
Integrated Multiscale Latent Variable Regression and Application to Distillation Columns

Directory of Open Access Journals (Sweden)

Muddu Madakyaru

2013-01-01

Full Text Available Proper control of distillation columns requires estimating some key variables that are challenging to measure online (such as compositions, which are usually estimated using inferential models. Commonly used inferential models include latent variable regression (LVR techniques, such as principal component regression (PCR, partial least squares (PLS, and regularized canonical correlation analysis (RCCA. Unfortunately, measured practical data are usually contaminated with errors, which degrade the prediction abilities of inferential models. Therefore, noisy measurements need to be filtered to enhance the prediction accuracy of these models. Multiscale filtering has been shown to be a powerful feature extraction tool. In this work, the advantages of multiscale filtering are utilized to enhance the prediction accuracy of LVR models by developing an integrated multiscale LVR (IMSLVR modeling algorithm that integrates modeling and feature extraction. The idea behind the IMSLVR modeling algorithm is to filter the process data at different decomposition levels, model the filtered data from each level, and then select the LVR model that optimizes a model selection criterion. The performance of the developed IMSLVR algorithm is illustrated using three examples, one using synthetic data, one using simulated distillation column data, and one using experimental packed bed distillation column data. All examples clearly demonstrate the effectiveness of the IMSLVR algorithm over the conventional methods.
Stochastic development regression using method of moments

DEFF Research Database (Denmark)

Kühnel, Line; Sommer, Stefan Horst

2017-01-01

This paper considers the estimation problem arising when inferring parameters in the stochastic development regression model for manifold valued non-linear data. Stochastic development regression captures the relation between manifold-valued response and Euclidean covariate variables using...... the stochastic development construction. It is thereby able to incorporate several covariate variables and random effects. The model is intrinsically defined using the connection of the manifold, and the use of stochastic development avoids linearizing the geometry. We propose to infer parameters using...... the Method of Moments procedure that matches known constraints on moments of the observations conditional on the latent variables. The performance of the model is investigated in a simulation example using data on finite dimensional landmark manifolds....
On two flexible methods of 2-dimensional regression analysis

Czech Academy of Sciences Publication Activity Database

Volf, Petr

2012-01-01

Roč. 18, č. 4 (2012), s. 154-164 ISSN 1803-9782 Grant - others:GA ČR(CZ) GAP209/10/2045 Institutional support: RVO:67985556 Keywords : regression analysis * Gordon surface * prediction error * projection pursuit Subject RIV: BB - Applied Statistics, Operational Research http://library.utia.cas.cz/separaty/2013/SI/volf-on two flexible methods of 2-dimensional regression analysis.pdf

[Establishment of the Mathematical Model for PMI Estimation Using FTIR Spectroscopy and Data Mining Method].

Science.gov (United States)

Wang, L; Qin, X C; Lin, H C; Deng, K F; Luo, Y W; Sun, Q R; Du, Q X; Wang, Z Y; Tuo, Y; Sun, J H

2018-02-01

To analyse the relationship between Fourier transform infrared （FTIR） spectrum of rat's spleen tissue and postmortem interval （PMI） for PMI estimation using FTIR spectroscopy combined with data mining method. Rats were sacrificed by cervical dislocation, and the cadavers were placed at 20 ℃. The FTIR spectrum data of rats' spleen tissues were taken and measured at different time points. After pretreatment, the data was analysed by data mining method. The absorption peak intensity of rat's spleen tissue spectrum changed with the PMI, while the absorption peak position was unchanged. The results of principal component analysis （PCA） showed that the cumulative contribution rate of the first three principal components was 96%. There was an obvious clustering tendency for the spectrum sample at each time point. The methods of partial least squares discriminant analysis （PLS-DA） and support vector machine classification （SVMC） effectively divided the spectrum samples with different PMI into four categories （0-24 h, 48-72 h, 96-120 h and 144-168 h）. The determination coefficient （ R ²） of the PMI estimation model established by PLS regression analysis was 0.96, and the root mean square error of calibration （RMSEC） and root mean square error of cross validation （RMSECV） were 9.90 h and 11.39 h respectively. In prediction set, the R ² was 0.97, and the root mean square error of prediction （RMSEP） was 10.49 h. The FTIR spectrum of the rat's spleen tissue can be effectively analyzed qualitatively and quantitatively by the combination of FTIR spectroscopy and data mining method, and the classification and PLS regression models can be established for PMI estimation. Copyright© by the Editorial Department of Journal of Forensic Medicine.
Simultaneous quantitative determination of paracetamol and tramadol in tablet formulation using UV spectrophotometry and chemometric methods

Science.gov (United States)

Glavanović, Siniša; Glavanović, Marija; Tomišić, Vladislav

2016-03-01

The UV spectrophotometric methods for simultaneous quantitative determination of paracetamol and tramadol in paracetamol-tramadol tablets were developed. The spectrophotometric data obtained were processed by means of partial least squares (PLS) and genetic algorithm coupled with PLS (GA-PLS) methods in order to determine the content of active substances in the tablets. The results gained by chemometric processing of the spectroscopic data were statistically compared with those obtained by means of validated ultra-high performance liquid chromatographic (UHPLC) method. The accuracy and precision of data obtained by the developed chemometric models were verified by analysing the synthetic mixture of drugs, and by calculating recovery as well as relative standard error (RSE). A statistically good agreement was found between the amounts of paracetamol determined using PLS and GA-PLS algorithms, and that obtained by UHPLC analysis, whereas for tramadol GA-PLS results were proven to be more reliable compared to those of PLS. The simplest and the most accurate and precise models were constructed by using the PLS method for paracetamol (mean recovery 99.5%, RSE 0.89%) and the GA-PLS method for tramadol (mean recovery 99.4%, RSE 1.69%).
A multiple regression method for genomewide association studies ...

Indian Academy of Sciences (India)

Bujun Mei

2018-06-07

Jun 7, 2018 ... Similar to the typical genomewide association tests using LD ... new approach performed validly when the multiple regression based on linkage method was employed. .... the model, two groups of scenarios were simulated.
Application of Fourier transform infrared spectroscopy and orthogonal projections to latent structures/partial least squares regression for estimation of procyanidins average degree of polymerisation.

Science.gov (United States)

Passos, Cláudia P; Cardoso, Susana M; Barros, António S; Silva, Carlos M; Coimbra, Manuel A

2010-02-28

Fourier transform infrared (FTIR) spectroscopy has being emphasised as a widespread technique in the quick assess of food components. In this work, procyanidins were extracted with methanol and acetone/water from the seeds of white and red grape varieties. A fractionation by graded methanol/chloroform precipitations allowed to obtain 26 samples that were characterised using thiolysis as pre-treatment followed by HPLC-UV and MS detection. The average degree of polymerisation (DPn) of the procyanidins in the samples ranged from 2 to 11 flavan-3-ol residues. FTIR spectroscopy within the wavenumbers region of 1800-700 cm(-1) allowed to build a partial least squares (PLS1) regression model with 8 latent variables (LVs) for the estimation of the DPn, giving a RMSECV of 11.7%, with a R(2) of 0.91 and a RMSEP of 2.58. The application of orthogonal projection to latent structures (O-PLS1) clarifies the interpretation of the regression model vectors. Moreover, the O-PLS procedure has removed 88% of non-correlated variations with the DPn, allowing to relate the increase of the absorbance peaks at 1203 and 1099 cm(-1) with the increase of the DPn due to the higher proportion of substitutions in the aromatic ring of the polymerised procyanidin molecules. Copyright 2009 Elsevier B.V. All rights reserved.
Prediction of SOC content by Vis-NIR spectroscopy at European scale using a modified local PLS algorithm

Science.gov (United States)

Nocita, M.; Stevens, A.; Toth, G.; van Wesemael, B.; Montanarella, L.

2012-12-01

In the context of global environmental change, the estimation of carbon fluxes between soils and the atmosphere has been the object of a growing number of studies. This has been motivated notably by the possibility to sequester CO2 into soils by increasing the soil organic carbon (SOC) stocks and by the role of SOC in maintaining soil quality. Spatial variability of SOC masks its slow accumulation or depletion, and the sampling density required to detect a change in SOC content is often very high and thus very expensive and labour intensive. Visible near infrared diffuse reflectance spectroscopy (Vis-NIR DRS) has been shown to be a fast, cheap and efficient tool for the prediction of SOC at fine scales. However, when applied to regional or country scales, Vis-NIR DRS did not provide sufficient accuracy as an alternative to standard laboratory soil analysis for SOC monitoring. Under the framework of Land Use/Cover Area Frame Statistical Survey (LUCAS) project of the European Commission's Joint Research Centre (JRC), about 20,000 samples were collected all over European Union. Soil samples were analyzed for several physical and chemical parameters, and scanned with a Vis-NIR spectrometer in the same laboratory. The scope of our research was to predict SOC content at European scale using LUCAS spectral library. We implemented a modified local partial least square regression (l-PLS) including, in addition to spectral distance, other potentially useful covariates (geography, texture, etc.) to select for each unknown sample a group of predicting neighbours. The dataset was split in mineral soils under cropland, mineral soils under grassland, mineral soils under woodland, and organic soils due to the extremely diverse spectral response of the four classes. Four every class training (70%) and test (30%) sets were created to calibrate and validate the SOC prediction models. The results showed very good prediction ability for mineral soils under cropland and mineral soils
Prediction of long-residue properties of potential blends from mathematically mixed infrared spectra of pure crude oils by partial least-squares regression models

NARCIS (Netherlands)

de Peinder, P.; Visser, T.; Petrauskas, D.D.; Salvatori, F.; Soulimani, F.; Weckhuysen, B.M.

2009-01-01

Research has been carried out to determine the feasibility of partial least-squares (PLS) regression models to predict the long-residue (LR) properties of potential blends from infrared (IR) spectra that have been created by linearly co-adding the IR spectra of crude oils. The study is the follow-up
Multivariate analysis of nystatin and metronidazole in a semi-solid matrix by means of diffuse reflectance NIR spectroscopy and PLS regression.

Science.gov (United States)

Baratieri, Sabrina C; Barbosa, Juliana M; Freitas, Matheus P; Martins, José A

2006-01-23

A multivariate method of analysis of nystatin and metronidazole in a semi-solid matrix, based on diffuse reflectance NIR measurements and partial least squares regression, is reported. The product, a vaginal cream used in the antifungal and antibacterial treatment, is usually, quantitatively analyzed through microbiological tests (nystatin) and HPLC technique (metronidazole), according to pharmacopeial procedures. However, near infrared spectroscopy has demonstrated to be a valuable tool for content determination, given the rapidity and scope of the method. In the present study, it was successfully applied in the prediction of nystatin (even in low concentrations, ca. 0.3-0.4%, w/w, which is around 100,000 IU/5g) and metronidazole contents, as demonstrated by some figures of merit, namely linearity, precision (mean and repeatability) and accuracy.
Linear regression methods a ccording to objective functions

OpenAIRE

Yasemin Sisman; Sebahattin Bektas

2012-01-01

The aim of the study is to explain the parameter estimation methods and the regression analysis. The simple linear regressionmethods grouped according to the objective function are introduced. The numerical solution is achieved for the simple linear regressionmethods according to objective function of Least Squares and theLeast Absolute Value adjustment methods. The success of the appliedmethods is analyzed using their objective function values.
The Use of Nonparametric Kernel Regression Methods in Econometric Production Analysis

DEFF Research Database (Denmark)

Czekaj, Tomasz Gerard

and nonparametric estimations of production functions in order to evaluate the optimal firm size. The second paper discusses the use of parametric and nonparametric regression methods to estimate panel data regression models. The third paper analyses production risk, price uncertainty, and farmers' risk preferences...... within a nonparametric panel data regression framework. The fourth paper analyses the technical efficiency of dairy farms with environmental output using nonparametric kernel regression in a semiparametric stochastic frontier analysis. The results provided in this PhD thesis show that nonparametric......This PhD thesis addresses one of the fundamental problems in applied econometric analysis, namely the econometric estimation of regression functions. The conventional approach to regression analysis is the parametric approach, which requires the researcher to specify the form of the regression...
A simple linear regression method for quantitative trait loci linkage analysis with censored observations.

Science.gov (United States)

Anderson, Carl A; McRae, Allan F; Visscher, Peter M

2006-07-01

Standard quantitative trait loci (QTL) mapping techniques commonly assume that the trait is both fully observed and normally distributed. When considering survival or age-at-onset traits these assumptions are often incorrect. Methods have been developed to map QTL for survival traits; however, they are both computationally intensive and not available in standard genome analysis software packages. We propose a grouped linear regression method for the analysis of continuous survival data. Using simulation we compare this method to both the Cox and Weibull proportional hazards models and a standard linear regression method that ignores censoring. The grouped linear regression method is of equivalent power to both the Cox and Weibull proportional hazards methods and is significantly better than the standard linear regression method when censored observations are present. The method is also robust to the proportion of censored individuals and the underlying distribution of the trait. On the basis of linear regression methodology, the grouped linear regression model is computationally simple and fast and can be implemented readily in freely available statistical software.
Multivariate regression models for the simultaneous quantitative analysis of calcium and magnesium carbonates and magnesium oxide through drifts data

Directory of Open Access Journals (Sweden)

Marder Luciano

2006-01-01

Full Text Available In the present work multivariate regression models were developed for the quantitative analysis of ternary systems using Diffuse Reflectance Infrared Fourier Transform Spectroscopy (DRIFTS to determine the concentration in weight of calcium carbonate, magnesium carbonate and magnesium oxide. Nineteen spectra of standard samples previously defined in ternary diagram by mixture design were prepared and mid-infrared diffuse reflectance spectra were recorded. The partial least squares (PLS regression method was applied to the model. The spectra set was preprocessed by either mean-centered and variance-scaled (model 2 or mean-centered only (model 1. The results based on the prediction performance of the external validation set expressed by RMSEP (root mean square error of prediction demonstrated that it is possible to develop good models to simultaneously determine calcium carbonate, magnesium carbonate and magnesium oxide content in powdered samples that can be used in the study of the thermal decomposition of dolomite rocks.
Sensory and instrumental texture assessment of roasted pistachio nut/kernel by partial least square (PLS) regression analysis: effect of roasting conditions.

Science.gov (United States)

Mohammadi Moghaddam, Toktam; Razavi, Seyed M A; Taghizadeh, Masoud; Sazgarnia, Ameneh

2016-01-01

Roasting is an important step in the processing of pistachio nuts. The effect of hot air roasting temperature (90, 120 and 150 °C), time (20, 35 and 50 min) and air velocity (0.5, 1.5 and 2.5 m/s) on textural and sensory characteristics of pistachio nuts and kernels were investigated. The results showed that increasing the roasting temperature decreased the fracture force (82-25.54 N), instrumental hardness (82.76-37.59 N), apparent modulus of elasticity (47-21.22 N/s), compressive energy (280.73-101.18 N.s) and increased amount of bitterness (1-2.5) and the hardness score (6-8.40) of pistachio kernels. Higher roasting time improved the flavor of samples. The results of the consumer test showed that the roasted pistachio kernels have good acceptability for flavor (score 5.83-8.40), color (score 7.20-8.40) and hardness (score 6-8.40) acceptance. Moreover, Partial Least Square (PLS) analysis of instrumental and sensory data provided important information for the correlation of objective and subjective properties. The univariate analysis showed that over 93.87 % of the variation in sensory hardness and almost 87 % of the variation in sensory acceptability could be explained by instrumental texture properties.
Reliable and relevant modelling of real world data: a personal account of the development of PLS Regression

DEFF Research Database (Denmark)

Martens, Harald

2001-01-01

Why and how the Partial Least Squares Regression (PLSR) was developed, is here described from the author's perspective. The paper outlines my frustrating experiences in the 70'ies with two conflicting and equally over-ambitious and oversimplified modelling cultures - in traditional chemistry...
Statistical methods in regression and calibration analysis of chromosome aberration data

International Nuclear Information System (INIS)

Merkle, W.

1983-01-01

The method of iteratively reweighted least squares for the regression analysis of Poisson distributed chromosome aberration data is reviewed in the context of other fit procedures used in the cytogenetic literature. As an application of the resulting regression curves methods for calculating confidence intervals on dose from aberration yield are described and compared, and, for the linear quadratic model a confidence interval is given. Emphasis is placed on the rational interpretation and the limitations of various methods from a statistical point of view. (orig./MG)
Simultaneous Determination of 6-Mercaptopurine and its Oxidative Metabolites in Synthetic Solutions and Human Plasma using Spectrophotometric Multivariate Calibration Methods

Directory of Open Access Journals (Sweden)

Mohammad-Reza Rashidi

2011-06-01

Full Text Available Introduction: 6-Mercaptopurine (6MP is an important chemotherapeutic drug in the conventional treatment of childhood acute lymphoblastic leukemia (ALL. It is catabolized to 6-thiouric acid (6TUA through 8-hydroxo-6-mercaptopurine (8OH6MP or 6-thioxanthine (6TX intermediates. Methods: High-performance liquid chromatography (HPLC is usually used to determine the contents of therapeutic drugs, metabolites and other important biomedical analytes in biological samples. In the present study, the multivariate calibration methods, partial least squares (PLS-1 and principle component regression (PCR have been developed and validated for the simultaneous determination of 6MP and its oxidative metabolites (6TUA, 8OH6MP and 6TX without analyte separation in spiked human plasma. Mixtures of 6MP, 8-8OH6MP, 6TX and 6TUA have been resolved by PLS-1 and PCR to their UV spectra. Results: Recoveries (% obtained for 6MP, 8-8OH6MP, 6TX and 6TUA were 94.5-97.5, 96.6-103.3, 95.1-96.9 and 93.4-95.8, respectively, using PLS-1 and 96.7-101.3, 96.2-98.8, 95.8-103.3 and 94.3-106.1, respectively, using PCR. The NAS (Net analyte signal concept was used to calculate multivariate analytical figures of merit such as limit of detection (LOD, selectivity and sensitivity. The limit of detections for 6MP, 8-8OH6MP, 6TX and 6TUA were calculated to be 0.734, 0.439, 0.797 and 0.482 µmol L-1, respectively, using PLS and 0.724, 0.418, 0783 and 0.535 µmol L-1, respectively, using PCR. HPLC was also applied as a validation method for simultaneous determination of these thiopurines in the synthetic solutions and human plasma. Conclusion: Combination of spectroscopic techniques and chemometric methods (PLS and PCR has provided a simple but powerful method for simultaneous analysis of multicomponent mixtures.
Variable selection based on clustering analysis for improvement of polyphenols prediction in green tea using synchronous fluorescence spectra

Science.gov (United States)

Shan, Jiajia; Wang, Xue; Zhou, Hao; Han, Shuqing; Riza, Dimas Firmanda Al; Kondo, Naoshi

2018-04-01

Synchronous fluorescence spectra, combined with multivariate analysis were used to predict flavonoids content in green tea rapidly and nondestructively. This paper presented a new and efficient spectral intervals selection method called clustering based partial least square (CL-PLS), which selected informative wavelengths by combining clustering concept and partial least square (PLS) methods to improve models’ performance by synchronous fluorescence spectra. The fluorescence spectra of tea samples were obtained and k-means and kohonen-self organizing map clustering algorithms were carried out to cluster full spectra into several clusters, and sub-PLS regression model was developed on each cluster. Finally, CL-PLS models consisting of gradually selected clusters were built. Correlation coefficient (R) was used to evaluate the effect on prediction performance of PLS models. In addition, variable influence on projection partial least square (VIP-PLS), selectivity ratio partial least square (SR-PLS), interval partial least square (iPLS) models and full spectra PLS model were investigated and the results were compared. The results showed that CL-PLS presented the best result for flavonoids prediction using synchronous fluorescence spectra.
A fast all-in-one method for automated post-processing of PIV data.

Science.gov (United States)

Garcia, Damien

2011-05-01

Post-processing of PIV (particle image velocimetry) data typically contains three following stages: validation of the raw data, replacement of spurious and missing vectors, and some smoothing. A robust post-processing technique that carries out these steps simultaneously is proposed. The new all-in-one method (DCT-PLS), based on a penalized least squares approach (PLS), combines the use of the discrete cosine transform (DCT) and the generalized cross-validation, thus allowing fast unsupervised smoothing of PIV data. The DCT-PLS was compared with conventional methods, including the normalized median test, for post-processing of simulated and experimental raw PIV velocity fields. The DCT-PLS was shown to be more efficient than the usual methods, especially in the presence of clustered outliers. It was also demonstrated that the DCT-PLS can easily deal with a large amount of missing data. Because the proposed algorithm works in any dimension, the DCT-PLS is also suitable for post-processing of volumetric three-component PIV data.
A fast all-in-one method for automated post-processing of PIV data

Science.gov (United States)

Garcia, Damien

2013-01-01

Post-processing of PIV (particle image velocimetry) data typically contains three following stages: validation of the raw data, replacement of spurious and missing vectors, and some smoothing. A robust post-processing technique that carries out these steps simultaneously is proposed. The new all-in-one method (DCT-PLS), based on a penalized least squares approach (PLS), combines the use of the discrete cosine transform (DCT) and the generalized cross-validation, thus allowing fast unsupervised smoothing of PIV data. The DCT-PLS was compared with conventional methods, including the normalized median test, for post-processing of simulated and experimental raw PIV velocity fields. The DCT-PLS was shown to be more efficient than the usual methods, especially in the presence of clustered outliers. It was also demonstrated that the DCT-PLS can easily deal with a large amount of missing data. Because the proposed algorithm works in any dimension, the DCT-PLS is also suitable for post-processing of volumetric three-component PIV data. PMID:24795497
FATAL, General Experiment Fitting Program by Nonlinear Regression Method

International Nuclear Information System (INIS)

Salmon, L.; Budd, T.; Marshall, M.

1982-01-01

1 - Description of problem or function: A generalized fitting program with a free-format keyword interface to the user. It permits experimental data to be fitted by non-linear regression methods to any function describable by the user. The user requires the minimum of computer experience but needs to provide a subroutine to define his function. Some statistical output is included as well as 'best' estimates of the function's parameters. 2 - Method of solution: The regression method used is based on a minimization technique devised by Powell (Harwell Subroutine Library VA05A, 1972) which does not require the use of analytical derivatives. The method employs a quasi-Newton procedure balanced with a steepest descent correction. Experience shows this to be efficient for a very wide range of application. 3 - Restrictions on the complexity of the problem: The current version of the program permits functions to be defined with up to 20 parameters. The function may be fitted to a maximum of 400 points, preferably with estimated values of weight given
Instrumentation and control system for PLS-IM-T 60 MeV LINAC

International Nuclear Information System (INIS)

Liu, D.K.; Yei, K.R.; Cheng, H.J.

1992-01-01

The PLSIMT is a 60 MeV LINAC as a preinjector for 2 GeV LINAC of PLS project. The instrumentation and control system have been designed under the institutional collaboration between the IHEP (Beijing, China) and POSTECH (Pohang, Korea). So far, the I and C system are being set up nowadays at the POSTECH of Pohang. This paper describes its major characteristics and present status. (author)

Regression Methods for Virtual Metrology of Layer Thickness in Chemical Vapor Deposition

DEFF Research Database (Denmark)

Purwins, Hendrik; Barak, Bernd; Nagi, Ahmed

2014-01-01

The quality of wafer production in semiconductor manufacturing cannot always be monitored by a costly physical measurement. Instead of measuring a quantity directly, it can be predicted by a regression method (Virtual Metrology). In this paper, a survey on regression methods is given to predict...... average Silicon Nitride cap layer thickness for the Plasma Enhanced Chemical Vapor Deposition (PECVD) dual-layer metal passivation stack process. Process and production equipment Fault Detection and Classification (FDC) data are used as predictor variables. Various variable sets are compared: one most...... algorithm, and Support Vector Regression (SVR). On a test set, SVR outperforms the other methods by a large margin, being more robust towards changes in the production conditions. The method performs better on high-dimensional multivariate input data than on the most predictive variables alone. Process...
Comparing parametric and nonparametric regression methods for panel data

DEFF Research Database (Denmark)

Czekaj, Tomasz Gerard; Henningsen, Arne

We investigate and compare the suitability of parametric and non-parametric stochastic regression methods for analysing production technologies and the optimal firm size. Our theoretical analysis shows that the most commonly used functional forms in empirical production analysis, Cobb......-Douglas and Translog, are unsuitable for analysing the optimal firm size. We show that the Translog functional form implies an implausible linear relationship between the (logarithmic) firm size and the elasticity of scale, where the slope is artificially related to the substitutability between the inputs....... The practical applicability of the parametric and non-parametric regression methods is scrutinised and compared by an empirical example: we analyse the production technology and investigate the optimal size of Polish crop farms based on a firm-level balanced panel data set. A nonparametric specification test...
The crucial role of the Pls1 tetraspanin during ascospore germination in Podospora anserina provides an example of the convergent evolution of morphogenetic processes in fungal plant pathogens and saprobes.

Science.gov (United States)

Lambou, Karine; Malagnac, Fabienne; Barbisan, Crystel; Tharreau, Didier; Lebrun, Marc-Henri; Silar, Philippe

2008-10-01

Pls1 tetraspanins were shown for some pathogenic fungi to be essential for appressorium-mediated penetration into their host plants. We show here that Podospora anserina, a saprobic fungus lacking appressorium, contains PaPls1, a gene orthologous to known PLS1 genes. Inactivation of PaPls1 demonstrates that this gene is specifically required for the germination of ascospores in P. anserina. These ascospores are heavily melanized cells that germinate under inducing conditions through a specific pore. On the contrary, MgPLS1, which fully complements a DeltaPaPls1 ascospore germination defect, has no role in the germination of Magnaporthe grisea nonmelanized ascospores but is required for the formation of the penetration peg at the pore of its melanized appressorium. P. anserina mutants with mutation of PaNox2, which encodes the NADPH oxidase of the NOX2 family, display the same ascospore-specific germination defect as the DeltaPaPls1 mutant. Both mutant phenotypes are suppressed by the inhibition of melanin biosynthesis, suggesting that they are involved in the same cellular process required for the germination of P. anserina melanized ascospores. The analysis of the distribution of PLS1 and NOX2 genes in fungal genomes shows that they are either both present or both absent. These results indicate that the germination of P. anserina ascospores and the formation of the M. grisea appressorium penetration peg use the same molecular machinery that includes Pls1 and Nox2. This machinery is specifically required for the emergence of polarized hyphae from reinforced structures such as appressoria and ascospores. Its recurrent recruitment during fungal evolution may account for some of the morphogenetic convergence observed in fungi.
Methods for estimating disease transmission rates: Evaluating the precision of Poisson regression and two novel methods

DEFF Research Database (Denmark)

Kirkeby, Carsten Thure; Hisham Beshara Halasa, Tariq; Gussmann, Maya Katrin

2017-01-01

the transmission rate. We use data from the two simulation models and vary the sampling intervals and the size of the population sampled. We devise two new methods to determine transmission rate, and compare these to the frequently used Poisson regression method in both epidemic and endemic situations. For most...... tested scenarios these new methods perform similar or better than Poisson regression, especially in the case of long sampling intervals. We conclude that transmission rate estimates are easily biased, which is important to take into account when using these rates in simulation models....
Iterative random vs. Kennard-Stone sampling for IR spectrum-based classification task using PLS2-DA

Science.gov (United States)

Lee, Loong Chuen; Liong, Choong-Yeun; Jemain, Abdul Aziz

2018-04-01

External testing (ET) is preferred over auto-prediction (AP) or k-fold-cross-validation in estimating more realistic predictive ability of a statistical model. With IR spectra, Kennard-stone (KS) sampling algorithm is often used to split the data into training and test sets, i.e. respectively for model construction and for model testing. On the other hand, iterative random sampling (IRS) has not been the favored choice though it is theoretically more likely to produce reliable estimation. The aim of this preliminary work is to compare performances of KS and IRS in sampling a representative training set from an attenuated total reflectance - Fourier transform infrared spectral dataset (of four varieties of blue gel pen inks) for PLS2-DA modeling. The `best' performance achievable from the dataset is estimated with AP on the full dataset (APF, error). Both IRS (n = 200) and KS were used to split the dataset in the ratio of 7:3. The classic decision rule (i.e. maximum value-based) is employed for new sample prediction via partial least squares - discriminant analysis (PLS2-DA). Error rate of each model was estimated repeatedly via: (a) AP on full data (APF, error); (b) AP on training set (APS, error); and (c) ET on the respective test set (ETS, error). A good PLS2-DA model is expected to produce APS, error and EVS, error that is similar to the APF, error. Bearing that in mind, the similarities between (a) APS, error vs. APF, error; (b) ETS, error vs. APF, error and; (c) APS, error vs. ETS, error were evaluated using correlation tests (i.e. Pearson and Spearman's rank test), using series of PLS2-DA models computed from KS-set and IRS-set, respectively. Overall, models constructed from IRS-set exhibits more similarities between the internal and external error rates than the respective KS-set, i.e. less risk of overfitting. In conclusion, IRS is more reliable than KS in sampling representative training set.
Mapping urban environmental noise: a land use regression method.

Science.gov (United States)

Xie, Dan; Liu, Yi; Chen, Jining

2011-09-01

Forecasting and preventing urban noise pollution are major challenges in urban environmental management. Most existing efforts, including experiment-based models, statistical models, and noise mapping, however, have limited capacity to explain the association between urban growth and corresponding noise change. Therefore, these conventional methods can hardly forecast urban noise at a given outlook of development layout. This paper, for the first time, introduces a land use regression method, which has been applied for simulating urban air quality for a decade, to construct an urban noise model (LUNOS) in Dalian Municipality, Northwest China. The LUNOS model describes noise as a dependent variable of surrounding various land areas via a regressive function. The results suggest that a linear model performs better in fitting monitoring data, and there is no significant difference of the LUNOS's outputs when applied to different spatial scales. As the LUNOS facilitates a better understanding of the association between land use and urban environmental noise in comparison to conventional methods, it can be regarded as a promising tool for noise prediction for planning purposes and aid smart decision-making.
Treating experimental data of inverse kinetic method by unitary linear regression analysis

International Nuclear Information System (INIS)

Zhao Yusen; Chen Xiaoliang

2009-01-01

The theory of treating experimental data of inverse kinetic method by unitary linear regression analysis was described. Not only the reactivity, but also the effective neutron source intensity could be calculated by this method. Computer code was compiled base on the inverse kinetic method and unitary linear regression analysis. The data of zero power facility BFS-1 in Russia were processed and the results were compared. The results show that the reactivity and the effective neutron source intensity can be obtained correctly by treating experimental data of inverse kinetic method using unitary linear regression analysis and the precision of reactivity measurement is improved. The central element efficiency can be calculated by using the reactivity. The result also shows that the effect to reactivity measurement caused by external neutron source should be considered when the reactor power is low and the intensity of external neutron source is strong. (authors)
New approach to breast cancer CAD using partial least squares and kernel-partial least squares

Science.gov (United States)

Land, Walker H., Jr.; Heine, John; Embrechts, Mark; Smith, Tom; Choma, Robert; Wong, Lut

2005-04-01

Breast cancer is second only to lung cancer as a tumor-related cause of death in women. Currently, the method of choice for the early detection of breast cancer is mammography. While sensitive to the detection of breast cancer, its positive predictive value (PPV) is low, resulting in biopsies that are only 15-34% likely to reveal malignancy. This paper explores the use of two novel approaches called Partial Least Squares (PLS) and Kernel-PLS (K-PLS) to the diagnosis of breast cancer. The approach is based on optimization for the partial least squares (PLS) algorithm for linear regression and the K-PLS algorithm for non-linear regression. Preliminary results show that both the PLS and K-PLS paradigms achieved comparable results with three separate support vector learning machines (SVLMs), where these SVLMs were known to have been trained to a global minimum. That is, the average performance of the three separate SVLMs were Az = 0.9167927, with an average partial Az (Az90) = 0.5684283. These results compare favorably with the K-PLS paradigm, which obtained an Az = 0.907 and partial Az = 0.6123. The PLS paradigm provided comparable results. Secondly, both the K-PLS and PLS paradigms out performed the ANN in that the Az index improved by about 14% (Az ~ 0.907 compared to the ANN Az of ~ 0.8). The "Press R squared" value for the PLS and K-PLS machine learning algorithms were 0.89 and 0.9, respectively, which is in good agreement with the other MOP values.
Sensitive Wavelengths Selection in Identification of Ophiopogon japonicus Based on Near-Infrared Hyperspectral Imaging Technology

Directory of Open Access Journals (Sweden)

Zhengyan Xia

2017-01-01

Full Text Available Hyperspectral imaging (HSI technology has increasingly been applied as an analytical tool in fields of agricultural, food, and Traditional Chinese Medicine over the past few years. The HSI spectrum of a sample is typically achieved by a spectroradiometer at hundreds of wavelengths. In recent years, considerable effort has been made towards identifying wavelengths (variables that contribute useful information. Wavelengths selection is a critical step in data analysis for Raman, NIRS, or HSI spectroscopy. In this study, the performances of 10 different wavelength selection methods for the discrimination of Ophiopogon japonicus of different origin were compared. The wavelength selection algorithms tested include successive projections algorithm (SPA, loading weights (LW, regression coefficients (RC, uninformative variable elimination (UVE, UVE-SPA, competitive adaptive reweighted sampling (CARS, interval partial least squares regression (iPLS, backward iPLS (BiPLS, forward iPLS (FiPLS, and genetic algorithms (GA-PLS. One linear technique (partial least squares-discriminant analysis was established for the evaluation of identification. And a nonlinear calibration model, support vector machine (SVM, was also provided for comparison. The results indicate that wavelengths selection methods are tools to identify more concise and effective spectral data and play important roles in the multivariate analysis, which can be used for subsequent modeling analysis.
Whole-Genome Regression and Prediction Methods Applied to Plant and Animal Breeding

Science.gov (United States)

de los Campos, Gustavo; Hickey, John M.; Pong-Wong, Ricardo; Daetwyler, Hans D.; Calus, Mario P. L.

2013-01-01

Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade. PMID:22745228
Modified multiblock partial least squares path modeling algorithm with backpropagation neural networks approach

Science.gov (United States)

Yuniarto, Budi; Kurniawan, Robert

2017-03-01

PLS Path Modeling (PLS-PM) is different from covariance based SEM, where PLS-PM use an approach based on variance or component, therefore, PLS-PM is also known as a component based SEM. Multiblock Partial Least Squares (MBPLS) is a method in PLS regression which can be used in PLS Path Modeling which known as Multiblock PLS Path Modeling (MBPLS-PM). This method uses an iterative procedure in its algorithm. This research aims to modify MBPLS-PM with Back Propagation Neural Network approach. The result is MBPLS-PM algorithm can be modified using the Back Propagation Neural Network approach to replace the iterative process in backward and forward step to get the matrix t and the matrix u in the algorithm. By modifying the MBPLS-PM algorithm using Back Propagation Neural Network approach, the model parameters obtained are relatively not significantly different compared to model parameters obtained by original MBPLS-PM algorithm.
A Comparative Study of Pairwise Learning Methods Based on Kernel Ridge Regression.

Science.gov (United States)

Stock, Michiel; Pahikkala, Tapio; Airola, Antti; De Baets, Bernard; Waegeman, Willem

2018-06-12

Many machine learning problems can be formulated as predicting labels for a pair of objects. Problems of that kind are often referred to as pairwise learning, dyadic prediction, or network inference problems. During the past decade, kernel methods have played a dominant role in pairwise learning. They still obtain a state-of-the-art predictive performance, but a theoretical analysis of their behavior has been underexplored in the machine learning literature. In this work we review and unify kernel-based algorithms that are commonly used in different pairwise learning settings, ranging from matrix filtering to zero-shot learning. To this end, we focus on closed-form efficient instantiations of Kronecker kernel ridge regression. We show that independent task kernel ridge regression, two-step kernel ridge regression, and a linear matrix filter arise naturally as a special case of Kronecker kernel ridge regression, implying that all these methods implicitly minimize a squared loss. In addition, we analyze universality, consistency, and spectral filtering properties. Our theoretical results provide valuable insights into assessing the advantages and limitations of existing pairwise learning methods.
Metode PLS: Analisis Kinerja Karyawan melalui Kepuasan Kerja dan Komitmen Karyawan

Directory of Open Access Journals (Sweden)

Saskia Yuanita

2012-11-01

Full Text Available Global challenges of the current causes increasing competition among national and international businesses. Under these conditions, the company realizes the importance of quality and efforts to enhance competitiveness by doing improvements consistently and continuously in order to meet customer and market needs. This study aims to examine the effect of the implementation of ISO 9001 quality management system onemployee performance, and the moderating effects of job satisfaction and employee commitment to the relationship between the application of ISO 9001:2008 quality management system on employee performance. The method used to analyze is Partial Least Square (PLS. The results show that the application of ISO 9001:2008 quality management system affects performance of employees with employee satisfaction and commitment as moderating variable that affect the relationship between the application of ISO 9001:2008quality management system on employee performance. Both variables, moderating employee satisfaction and commitment have positive parameter estimation, so that when the satisfaction and commitment of employees increase, it will give effect to the improvement of the implementation of the ISO 9001:2008 quality management system on employee performance.
Estimation of active pharmaceutical ingredients content using locally weighted partial least squares and statistical wavelength selection.

OpenAIRE

Kim, Sanghong; Kano, Manabu; Nakagawa, Hiroshi; Hasebe, Shinji

2011-01-01

Development of quality estimation models using near infrared spectroscopy (NIRS) and multivariate analysis has been accelerated as a process analytical technology (PAT) tool in the pharmaceutical industry. Although linear regression methods such as partial least squares (PLS) are widely used, they cannot always achieve high estimation accuracy because physical and chemical properties of a measuring object have a complex effect on NIR spectra. In this research, locally weighted PLS (LW-PLS) wh...
Towards a user-friendly brain-computer interface: initial tests in ALS and PLS patients.

Science.gov (United States)

Bai, Ou; Lin, Peter; Huang, Dandan; Fei, Ding-Yu; Floeter, Mary Kay

2010-08-01

Patients usually require long-term training for effective EEG-based brain-computer interface (BCI) control due to fatigue caused by the demands for focused attention during prolonged BCI operation. We intended to develop a user-friendly BCI requiring minimal training and less mental load. Testing of BCI performance was investigated in three patients with amyotrophic lateral sclerosis (ALS) and three patients with primary lateral sclerosis (PLS), who had no previous BCI experience. All patients performed binary control of cursor movement. One ALS patient and one PLS patient performed four-directional cursor control in a two-dimensional domain under a BCI paradigm associated with human natural motor behavior using motor execution and motor imagery. Subjects practiced for 5-10min and then participated in a multi-session study of either binary control or four-directional control including online BCI game over 1.5-2h in a single visit. Event-related desynchronization and event-related synchronization in the beta band were observed in all patients during the production of voluntary movement either by motor execution or motor imagery. The online binary control of cursor movement was achieved with an average accuracy about 82.1+/-8.2% with motor execution and about 80% with motor imagery, whereas offline accuracy was achieved with 91.4+/-3.4% with motor execution and 83.3+/-8.9% with motor imagery after optimization. In addition, four-directional cursor control was achieved with an accuracy of 50-60% with motor execution and motor imagery. Patients with ALS or PLS may achieve BCI control without extended training, and fatigue might be reduced during operation of a BCI associated with human natural motor behavior. The development of a user-friendly BCI will promote practical BCI applications in paralyzed patients. Copyright 2010 International Federation of Clinical Neurophysiology. All rights reserved.
Online Monitoring of Copper Damascene Electroplating Bath by Voltammetry: Selection of Variables for Multiblock and Hierarchical Chemometric Analysis of Voltammetric Data

Directory of Open Access Journals (Sweden)

Aleksander Jaworski

2017-01-01

Full Text Available The Real Time Analyzer (RTA utilizing DC- and AC-voltammetric techniques is an in situ, online monitoring system that provides a complete chemical analysis of different electrochemical deposition solutions. The RTA employs multivariate calibration when predicting concentration parameters from a multivariate data set. Although the hierarchical and multiblock Principal Component Regression- (PCR- and Partial Least Squares- (PLS- based methods can handle data sets even when the number of variables significantly exceeds the number of samples, it can be advantageous to reduce the number of variables to obtain improvement of the model predictions and better interpretation. This presentation focuses on the introduction of a multistep, rigorous method of data-selection-based Least Squares Regression, Simple Modeling of Class Analogy modeling power, and, as a novel application in electroanalysis, Uninformative Variable Elimination by PLS and by PCR, Variable Importance in the Projection coupled with PLS, Interval PLS, Interval PCR, and Moving Window PLS. Selection criteria of the optimum decomposition technique for the specific data are also demonstrated. The chief goal of this paper is to introduce to the community of electroanalytical chemists numerous variable selection methods which are well established in spectroscopy and can be successfully applied to voltammetric data analysis.
Projects Delay Factors of Saudi Arabia Construction Industry Using PLS-SEM Path Modelling Approach

Directory of Open Access Journals (Sweden)

Abdul Rahman Ismail

2016-01-01

Full Text Available This paper presents the development of PLS-SEM Path Model of delay factors of Saudi Arabia construction industry focussing on Mecca City. The model was developed and assessed using SmartPLS v3.0 software and it consists of 37 factors/manifests in 7 groups/independent variables and one dependent variable which is delay of the construction projects. The model was rigorously assessed at measurement and structural components and the outcomes found that the model has achieved the required threshold values. At structural level of the model, among the seven groups, the client and consultant group has the highest impact on construction delay with path coefficient β-value of 0.452 and the project management and contract administration group is having the least impact to the construction delay with β-value of 0.016. The overall model has moderate explaining power ability with R2 value of 0.197 for Saudi Arabia construction industry representation. This model will able to assist practitioners in Mecca city to pay more attention in risk analysis for potential construction delay.
Modelos de regressão multivariada empregando seleção de intervalos para a quantificação do biodiesel em blendas biodiesel/diesel

Directory of Open Access Journals (Sweden)

Marco Flôres Ferrão

2010-01-01

Full Text Available No presente trabalho foram analisados e comparados modelos de regressão multivariados por mínimos quadrados parciais porintervalo (iPLS e por mínimos quadrados parciais por exclusão (biPLS que selecionaram regiões do espectro mais adequadas,retirando informações não relevantes e otimizando o modelo de calibração, a fim de determinar a concentração de biodiesel emblendas de biodiesel/diesel a partir de dados obtidos por espectroscopia no infravermelho por reflectância total atenuada (HATRFTIR.Foram utilizadas 45 amostras de blendas biodiesel/diesel com concentrações de 8 a 30% de biodiesel e os espectros foramadquiridos em dois distintos espectrofotômetros e misturados aleatoriamente para a realização dos modelos, onde foram construídosmodelos para calibração utilizando 2/3 dos espectros das amostras obtendo assim os valores de RMSECV, e o restante dos espectrosforam empregados no conjunto de previsão, obtendo então os valores de RMSEP. Os dados espectrais foram autoescalados (AUTOou centrados na média (MEAN, com ou sem o emprego da correção multiplicativa de sinal (MSC. A utilização dos métodos deseleção das faixas espectrais aplicados aos espectros por ATR se mostrou viável para a quantificação do biodiesel nas blendas, sendoque a utilização da espectroscopia no infravermelho apresenta vantagens como à necessidade de pequena quantidade de amostra ebaixo tempo de análise, além de ser um procedimento não destrutivo e não gerador de resíduos, otimizando assim o processo emquestão.Abstract In the present work multivariate regressionmodels using interval partial least square (iPLS and backwardinterval partial least square (biPLS had been analyzed andcompared. iPLS and biPLS models had been developed todetermine the concentration of biodiesel in blends ofbiodiesel/diesel using infrared spectroscopy signals. 45samples with concentrations in range 8-30% of biodiesel, andtwo distinct spectrophotometers were
The Crucial Role of the Pls1 Tetraspanin during Ascospore Germination in Podospora anserina Provides an Example of the Convergent Evolution of Morphogenetic Processes in Fungal Plant Pathogens and Saprobes▿ †

Science.gov (United States)

Lambou, Karine; Malagnac, Fabienne; Barbisan, Crystel; Tharreau, Didier; Lebrun, Marc-Henri; Silar, Philippe

2008-01-01

Pls1 tetraspanins were shown for some pathogenic fungi to be essential for appressorium-mediated penetration into their host plants. We show here that Podospora anserina, a saprobic fungus lacking appressorium, contains PaPls1, a gene orthologous to known PLS1 genes. Inactivation of PaPls1 demonstrates that this gene is specifically required for the germination of ascospores in P. anserina. These ascospores are heavily melanized cells that germinate under inducing conditions through a specific pore. On the contrary, MgPLS1, which fully complements a ΔPaPls1 ascospore germination defect, has no role in the germination of Magnaporthe grisea nonmelanized ascospores but is required for the formation of the penetration peg at the pore of its melanized appressorium. P. anserina mutants with mutation of PaNox2, which encodes the NADPH oxidase of the NOX2 family, display the same ascospore-specific germination defect as the ΔPaPls1 mutant. Both mutant phenotypes are suppressed by the inhibition of melanin biosynthesis, suggesting that they are involved in the same cellular process required for the germination of P. anserina melanized ascospores. The analysis of the distribution of PLS1 and NOX2 genes in fungal genomes shows that they are either both present or both absent. These results indicate that the germination of P. anserina ascospores and the formation of the M. grisea appressorium penetration peg use the same molecular machinery that includes Pls1 and Nox2. This machinery is specifically required for the emergence of polarized hyphae from reinforced structures such as appressoria and ascospores. Its recurrent recruitment during fungal evolution may account for some of the morphogenetic convergence observed in fungi. PMID:18757568
Assessing a moderating effect and the global fit of a PLS model on online trading

Directory of Open Access Journals (Sweden)

Juan J. García-Machado

2017-12-01

Full Text Available This paper proposes a PLS Model for the study of Online Trading. Traditional investing has experienced a revolution due to the rise of e-trading services that enable investors to use Internet conduct secure trading. On the hand, model results show that there is a positive, direct and statistically significant relationship between personal outcome expectations, perceived relative advantage, shared vision and economy-based trust with the quality of knowledge. On the other hand, trading frequency and portfolio performance has also this relationship. After including the investor’s income and financial wealth (IFW as moderating effect, the PLS model was enhanced, and we found that the interaction term is negative and statistically significant, so, higher IFW levels entail a weaker relationship between trading frequency and portfolio performance and vice-versa. Finally, with regard to the goodness of overall model fit measures, they showed that the model is fit for SRMR and dG measures, so it is likely that the model is true.

A different approach to estimate nonlinear regression model using numerical methods

Science.gov (United States)

Mahaboob, B.; Venkateswarlu, B.; Mokeshrayalu, G.; Balasiddamuni, P.

2017-11-01

This research paper concerns with the computational methods namely the Gauss-Newton method, Gradient algorithm methods (Newton-Raphson method, Steepest Descent or Steepest Ascent algorithm method, the Method of Scoring, the Method of Quadratic Hill-Climbing) based on numerical analysis to estimate parameters of nonlinear regression model in a very different way. Principles of matrix calculus have been used to discuss the Gradient-Algorithm methods. Yonathan Bard [1] discussed a comparison of gradient methods for the solution of nonlinear parameter estimation problems. However this article discusses an analytical approach to the gradient algorithm methods in a different way. This paper describes a new iterative technique namely Gauss-Newton method which differs from the iterative technique proposed by Gorden K. Smyth [2]. Hans Georg Bock et.al [10] proposed numerical methods for parameter estimation in DAE’s (Differential algebraic equation). Isabel Reis Dos Santos et al [11], Introduced weighted least squares procedure for estimating the unknown parameters of a nonlinear regression metamodel. For large-scale non smooth convex minimization the Hager and Zhang (HZ) conjugate gradient Method and the modified HZ (MHZ) method were presented by Gonglin Yuan et al [12].
PLS Torino: A way to discover semiconductors in a school lab

International Nuclear Information System (INIS)

Marzolla, F.

2015-01-01

In the wide range of PLS activities, one on semiconductors was realized with high-school 4th- and 5th-year students. After an introduction on semiconductor and electromagnetic radiation concepts, students assembled circuits, observed photoresistor and LED behavior and compared experimental and theoretical results. We especially paid attention to energy conversions and devices applications. An important point of the project is that it can be easily realized in our schools because low-cost devices are used. Moreover, discussing experimental results, it is possible to correct or complete students phenomena interpretation.
Rapid and Simultaneous Prediction of Eight Diesel Quality Parameters through ATR-FTIR Analysis

Science.gov (United States)

Hatanaka, Rafael Rodrigues; Flumignan, Danilo Luiz; de Oliveira, José Eduardo

2018-01-01

Quality assessment of diesel fuel is highly necessary for society, but the costs and time spent are very high while using standard methods. Therefore, this study aimed to develop an analytical method capable of simultaneously determining eight diesel quality parameters (density; flash point; total sulfur content; distillation temperatures at 10% (T10), 50% (T50), and 85% (T85) recovery; cetane index; and biodiesel content) through attenuated total reflection Fourier transform infrared (ATR-FTIR) spectroscopy and the multivariate regression method, partial least square (PLS). For this purpose, the quality parameters of 409 samples were determined using standard methods, and their spectra were acquired in ranges of 4000–650 cm−1. The use of the multivariate filters, generalized least squares weighting (GLSW) and orthogonal signal correction (OSC), was evaluated to improve the signal-to-noise ratio of the models. Likewise, four variable selection approaches were tested: manual exclusion, forward interval PLS (FiPLS), backward interval PLS (BiPLS), and genetic algorithm (GA). The multivariate filters and variables selection algorithms generated more fitted and accurate PLS models. According to the validation, the FTIR/PLS models presented accuracy comparable to the reference methods and, therefore, the proposed method can be applied in the diesel routine monitoring to significantly reduce costs and analysis time. PMID:29629209
Rapid and Simultaneous Prediction of Eight Diesel Quality Parameters through ATR-FTIR Analysis

Directory of Open Access Journals (Sweden)

Maurilio Gustavo Nespeca

2018-01-01

Full Text Available Quality assessment of diesel fuel is highly necessary for society, but the costs and time spent are very high while using standard methods. Therefore, this study aimed to develop an analytical method capable of simultaneously determining eight diesel quality parameters (density; flash point; total sulfur content; distillation temperatures at 10% (T10, 50% (T50, and 85% (T85 recovery; cetane index; and biodiesel content through attenuated total reflection Fourier transform infrared (ATR-FTIR spectroscopy and the multivariate regression method, partial least square (PLS. For this purpose, the quality parameters of 409 samples were determined using standard methods, and their spectra were acquired in ranges of 4000–650 cm−1. The use of the multivariate filters, generalized least squares weighting (GLSW and orthogonal signal correction (OSC, was evaluated to improve the signal-to-noise ratio of the models. Likewise, four variable selection approaches were tested: manual exclusion, forward interval PLS (FiPLS, backward interval PLS (BiPLS, and genetic algorithm (GA. The multivariate filters and variables selection algorithms generated more fitted and accurate PLS models. According to the validation, the FTIR/PLS models presented accuracy comparable to the reference methods and, therefore, the proposed method can be applied in the diesel routine monitoring to significantly reduce costs and analysis time.
Rapid and Simultaneous Prediction of Eight Diesel Quality Parameters through ATR-FTIR Analysis.

Science.gov (United States)

Nespeca, Maurilio Gustavo; Hatanaka, Rafael Rodrigues; Flumignan, Danilo Luiz; de Oliveira, José Eduardo

2018-01-01

Quality assessment of diesel fuel is highly necessary for society, but the costs and time spent are very high while using standard methods. Therefore, this study aimed to develop an analytical method capable of simultaneously determining eight diesel quality parameters (density; flash point; total sulfur content; distillation temperatures at 10% (T10), 50% (T50), and 85% (T85) recovery; cetane index; and biodiesel content) through attenuated total reflection Fourier transform infrared (ATR-FTIR) spectroscopy and the multivariate regression method, partial least square (PLS). For this purpose, the quality parameters of 409 samples were determined using standard methods, and their spectra were acquired in ranges of 4000-650 cm -1 . The use of the multivariate filters, generalized least squares weighting (GLSW) and orthogonal signal correction (OSC), was evaluated to improve the signal-to-noise ratio of the models. Likewise, four variable selection approaches were tested: manual exclusion, forward interval PLS (FiPLS), backward interval PLS (BiPLS), and genetic algorithm (GA). The multivariate filters and variables selection algorithms generated more fitted and accurate PLS models. According to the validation, the FTIR/PLS models presented accuracy comparable to the reference methods and, therefore, the proposed method can be applied in the diesel routine monitoring to significantly reduce costs and analysis time.
Ordinary Least Squares and Quantile Regression: An Inquiry-Based Learning Approach to a Comparison of Regression Methods

Science.gov (United States)

Helmreich, James E.; Krog, K. Peter

2018-01-01

We present a short, inquiry-based learning course on concepts and methods underlying ordinary least squares (OLS), least absolute deviation (LAD), and quantile regression (QR). Students investigate squared, absolute, and weighted absolute distance functions (metrics) as location measures. Using differential calculus and properties of convex…
Improved variable reduction in partial least squares modelling by Global-Minimum Error Uninformative-Variable Elimination.

Science.gov (United States)

Andries, Jan P M; Vander Heyden, Yvan; Buydens, Lutgarde M C

2017-08-22

The calibration performance of Partial Least Squares regression (PLS) can be improved by eliminating uninformative variables. For PLS, many variable elimination methods have been developed. One is the Uninformative-Variable Elimination for PLS (UVE-PLS). However, the number of variables retained by UVE-PLS is usually still large. In UVE-PLS, variable elimination is repeated as long as the root mean squared error of cross validation (RMSECV) is decreasing. The set of variables in this first local minimum is retained. In this paper, a modification of UVE-PLS is proposed and investigated, in which UVE is repeated until no further reduction in variables is possible, followed by a search for the global RMSECV minimum. The method is called Global-Minimum Error Uninformative-Variable Elimination for PLS, denoted as GME-UVE-PLS or simply GME-UVE. After each iteration, the predictive ability of the PLS model, built with the remaining variable set, is assessed by RMSECV. The variable set with the global RMSECV minimum is then finally selected. The goal is to obtain smaller sets of variables with similar or improved predictability than those from the classical UVE-PLS method. The performance of the GME-UVE-PLS method is investigated using four data sets, i.e. a simulated set, NIR and NMR spectra, and a theoretical molecular descriptors set, resulting in twelve profile-response (X-y) calibrations. The selective and predictive performances of the models resulting from GME-UVE-PLS are statistically compared to those from UVE-PLS and 1-step UVE, one-sided paired t-tests. The results demonstrate that variable reduction with the proposed GME-UVE-PLS method, usually eliminates significantly more variables than the classical UVE-PLS, while the predictive abilities of the resulting models are better. With GME-UVE-PLS, a lower number of uninformative variables, without a chemical meaning for the response, may be retained than with UVE-PLS. The selectivity of the classical UVE method
Improved variable reduction in partial least squares modelling based on predictive-property-ranked variables and adaptation of partial least squares complexity.

Science.gov (United States)

Andries, Jan P M; Vander Heyden, Yvan; Buydens, Lutgarde M C

2011-10-31

The calibration performance of partial least squares for one response variable (PLS1) can be improved by elimination of uninformative variables. Many methods are based on so-called predictive variable properties, which are functions of various PLS-model parameters, and which may change during the variable reduction process. In these methods variable reduction is made on the variables ranked in descending order for a given variable property. The methods start with full spectrum modelling. Iteratively, until a specified number of remaining variables is reached, the variable with the smallest property value is eliminated; a new PLS model is calculated, followed by a renewed ranking of the variables. The Stepwise Variable Reduction methods using Predictive-Property-Ranked Variables are denoted as SVR-PPRV. In the existing SVR-PPRV methods the PLS model complexity is kept constant during the variable reduction process. In this study, three new SVR-PPRV methods are proposed, in which a possibility for decreasing the PLS model complexity during the variable reduction process is build in. Therefore we denote our methods as PPRVR-CAM methods (Predictive-Property-Ranked Variable Reduction with Complexity Adapted Models). The selective and predictive abilities of the new methods are investigated and tested, using the absolute PLS regression coefficients as predictive property. They were compared with two modifications of existing SVR-PPRV methods (with constant PLS model complexity) and with two reference methods: uninformative variable elimination followed by either a genetic algorithm for PLS (UVE-GA-PLS) or an interval PLS (UVE-iPLS). The performance of the methods is investigated in conjunction with two data sets from near-infrared sources (NIR) and one simulated set. The selective and predictive performances of the variable reduction methods are compared statistically using the Wilcoxon signed rank test. The three newly developed PPRVR-CAM methods were able to retain
A study for lattice comparison for PLS 2 GeV storage ring

International Nuclear Information System (INIS)

Yoon, M.

1991-01-01

TBA and DBA lattices are compared for 1.5-2.5 GeV synchrotron light source, with particular attention to the PLS 2 GeV electron storage ring currently being developed in Pohang, Korea. For the comparison study, the optimum electron energy was chosen to be 2 GeV and the circumference of the ring is less than 280.56 m, the natural beam emittance no greater than 13 nm. Results from various linear and nonlinear optics comparison studies are presented
Regression dilution bias: tools for correction methods and sample size calculation.

Science.gov (United States)

Berglund, Lars

2012-08-01

Random errors in measurement of a risk factor will introduce downward bias of an estimated association to a disease or a disease marker. This phenomenon is called regression dilution bias. A bias correction may be made with data from a validity study or a reliability study. In this article we give a non-technical description of designs of reliability studies with emphasis on selection of individuals for a repeated measurement, assumptions of measurement error models, and correction methods for the slope in a simple linear regression model where the dependent variable is a continuous variable. Also, we describe situations where correction for regression dilution bias is not appropriate. The methods are illustrated with the association between insulin sensitivity measured with the euglycaemic insulin clamp technique and fasting insulin, where measurement of the latter variable carries noticeable random error. We provide software tools for estimation of a corrected slope in a simple linear regression model assuming data for a continuous dependent variable and a continuous risk factor from a main study and an additional measurement of the risk factor in a reliability study. Also, we supply programs for estimation of the number of individuals needed in the reliability study and for choice of its design. Our conclusion is that correction for regression dilution bias is seldom applied in epidemiological studies. This may cause important effects of risk factors with large measurement errors to be neglected.
Analyses of polycyclic aromatic hydrocarbon (PAH) and chiral-PAH analogues-methyl-β-cyclodextrin guest-host inclusion complexes by fluorescence spectrophotometry and multivariate regression analysis.

Science.gov (United States)

Greene, LaVana; Elzey, Brianda; Franklin, Mariah; Fakayode, Sayo O

2017-03-05

The negative health impact of polycyclic aromatic hydrocarbons (PAHs) and differences in pharmacological activity of enantiomers of chiral molecules in humans highlights the need for analysis of PAHs and their chiral analogue molecules in humans. Herein, the first use of cyclodextrin guest-host inclusion complexation, fluorescence spectrophotometry, and chemometric approach to PAH (anthracene) and chiral-PAH analogue derivatives (1-(9-anthryl)-2,2,2-triflouroethanol (TFE)) analyses are reported. The binding constants (K b ), stoichiometry (n), and thermodynamic properties (Gibbs free energy (ΔG), enthalpy (ΔH), and entropy (ΔS)) of anthracene and enantiomers of TFE-methyl-β-cyclodextrin (Me-β-CD) guest-host complexes were also determined. Chemometric partial-least-square (PLS) regression analysis of emission spectra data of Me-β-CD-guest-host inclusion complexes was used for the determination of anthracene and TFE enantiomer concentrations in Me-β-CD-guest-host inclusion complex samples. The values of calculated K b and negative ΔG suggest the thermodynamic favorability of anthracene-Me-β-CD and enantiomeric of TFE-Me-β-CD inclusion complexation reactions. However, anthracene-Me-β-CD and enantiomer TFE-Me-β-CD inclusion complexations showed notable differences in the binding affinity behaviors and thermodynamic properties. The PLS regression analysis resulted in square-correlation-coefficients of 0.997530 or better and a low LOD of 3.81×10 -7 M for anthracene and 3.48×10 -8 M for TFE enantiomers at physiological conditions. Most importantly, PLS regression accurately determined the anthracene and TFE enantiomer concentrations with an average low error of 2.31% for anthracene, 4.44% for R-TFE and 3.60% for S-TFE. The results of the study are highly significant because of its high sensitivity and accuracy for analysis of PAH and chiral PAH analogue derivatives without the need of an expensive chiral column, enantiomeric resolution, or use of a polarized
A comparison of artificial neural networks with other statistical approaches for the prediction of true metabolizable energy of meat and bone meal.

Science.gov (United States)

Perai, A H; Nassiri Moghaddam, H; Asadpour, S; Bahrampour, J; Mansoori, Gh

2010-07-01

There has been a considerable and continuous interest to develop equations for rapid and accurate prediction of the ME of meat and bone meal. In this study, an artificial neural network (ANN), a partial least squares (PLS), and a multiple linear regression (MLR) statistical method were used to predict the TME(n) of meat and bone meal based on its CP, ether extract, and ash content. The accuracy of the models was calculated by R(2) value, MS error, mean absolute percentage error, mean absolute deviation, bias, and Theil's U. The predictive ability of an ANN was compared with a PLS and a MLR model using the same training data sets. The squared regression coefficients of prediction for the MLR, PLS, and ANN models were 0.38, 0.36, and 0.94, respectively. The results revealed that ANN produced more accurate predictions of TME(n) as compared with PLS and MLR methods. Based on the results of this study, ANN could be used as a promising approach for rapid prediction of nutritive value of meat and bone meal.
Improved ability of biological and previous caries multimarkers to predict caries disease as revealed by multivariate PLS modelling

Directory of Open Access Journals (Sweden)

Ericson Thorild

2009-11-01

Full Text Available Abstract Background Dental caries is a chronic disease with plaque bacteria, diet and saliva modifying disease activity. Here we have used the PLS method to evaluate a multiplicity of such biological variables (n = 88 for ability to predict caries in a cross-sectional (baseline caries and prospective (2-year caries development setting. Methods Multivariate PLS modelling was used to associate the many biological variables with caries recorded in thirty 14-year-old children by measuring the numbers of incipient and manifest caries lesions at all surfaces. Results A wide but shallow gliding scale of one fifth caries promoting or protecting, and four fifths non-influential, variables occurred. The influential markers behaved in the order of plaque bacteria > diet > saliva, with previously known plaque bacteria/diet markers and a set of new protective diet markers. A differential variable patterning appeared for new versus progressing lesions. The influential biological multimarkers (n = 18 predicted baseline caries better (ROC area 0.96 than five markers (0.92 and a single lactobacilli marker (0.7 with sensitivity/specificity of 1.87, 1.78 and 1.13 at 1/3 of the subjects diagnosed sick, respectively. Moreover, biological multimarkers (n = 18 explained 2-year caries increment slightly better than reported before but predicted it poorly (ROC area 0.76. By contrast, multimarkers based on previous caries predicted alone (ROC area 0.88, or together with biological multimarkers (0.94, increment well with a sensitivity/specificity of 1.74 at 1/3 of the subjects diagnosed sick. Conclusion Multimarkers behave better than single-to-five markers but future multimarker strategies will require systematic searches for improved saliva and plaque bacteria markers.
A nonparametric approach to calculate critical micelle concentrations: the local polynomial regression method

Energy Technology Data Exchange (ETDEWEB)

Lopez Fontan, J.L.; Costa, J.; Ruso, J.M.; Prieto, G. [Dept. of Applied Physics, Univ. of Santiago de Compostela, Santiago de Compostela (Spain); Sarmiento, F. [Dept. of Mathematics, Faculty of Informatics, Univ. of A Coruna, A Coruna (Spain)

2004-02-01

The application of a statistical method, the local polynomial regression method, (LPRM), based on a nonparametric estimation of the regression function to determine the critical micelle concentration (cmc) is presented. The method is extremely flexible because it does not impose any parametric model on the subjacent structure of the data but rather allows the data to speak for themselves. Good concordance of cmc values with those obtained by other methods was found for systems in which the variation of a measured physical property with concentration showed an abrupt change. When this variation was slow, discrepancies between the values obtained by LPRM and others methods were found. (orig.)
Sensor combination and chemometric variable selection for online monitoring of Streptomyces coelicolor fed-batch cultivations

DEFF Research Database (Denmark)

Ödman, Peter; Johansen, C.L.; Olsson, L.

2010-01-01

of biomass and substrate (casamino acids) concentrations, respectively. The effect of combination of fluorescence and gas analyzer data as well as of different variable selection methods was investigated. Improved prediction models were obtained by combination of data from the two sensors and by variable......Fed-batch cultivations of Streptomyces coelicolor, producing the antibiotic actinorhodin, were monitored online by multiwavelength fluorescence spectroscopy and off-gas analysis. Partial least squares (PLS), locally weighted regression, and multilinear PLS (N-PLS) models were built for prediction...
Fuzzy Linear Regression for the Time Series Data which is Fuzzified with SMRGT Method

Directory of Open Access Journals (Sweden)

Seçil YALAZ

2016-10-01

Full Text Available Our work on regression and classification provides a new contribution to the analysis of time series used in many areas for years. Owing to the fact that convergence could not obtained with the methods used in autocorrelation fixing process faced with time series regression application, success is not met or fall into obligation of changing the models’ degree. Changing the models’ degree may not be desirable in every situation. In our study, recommended for these situations, time series data was fuzzified by using the simple membership function and fuzzy rule generation technique (SMRGT and to estimate future an equation has created by applying fuzzy least square regression (FLSR method which is a simple linear regression method to this data. Although SMRGT has success in determining the flow discharge in open channels and can be used confidently for flow discharge modeling in open canals, as well as in pipe flow with some modifications, there is no clue about that this technique is successful in fuzzy linear regression modeling. Therefore, in order to address the luck of such a modeling, a new hybrid model has been described within this study. In conclusion, to demonstrate our methods’ efficiency, classical linear regression for time series data and linear regression for fuzzy time series data were applied to two different data sets, and these two approaches performances were compared by using different measures.
Easy methods for extracting individual regression slopes: Comparing SPSS, R, and Excel

Directory of Open Access Journals (Sweden)

Roland Pfister

2013-10-01

Full Text Available Three different methods for extracting coefficientsof linear regression analyses are presented. The focus is on automatic and easy-to-use approaches for common statistical packages: SPSS, R, and MS Excel / LibreOffice Calc. Hands-on examples are included for each analysis, followed by a brief description of how a subsequent regression coefficient analysis is performed.
Multivariate Calibration and Model Integrity for Wood Chemistry Using Fourier Transform Infrared Spectroscopy

OpenAIRE

Zhou, Chengfeng; Jiang, Wei; Cheng, Qingzheng; Via, Brian K.

2015-01-01

This research addressed a rapid method to monitor hardwood chemical composition by applying Fourier transform infrared (FT-IR) spectroscopy, with particular interest in model performance for interpretation and prediction. Partial least squares (PLS) and principal components regression (PCR) were chosen as the primary models for comparison. Standard laboratory chemistry methods were employed on a mixed genus/species hardwood sample set to collect the original data. PLS was found to provide bet...
Partial Least Squares with Structured Output for Modelling the Metabolomics Data Obtained from Complex Experimental Designs: A Study into the Y-Block Coding

Directory of Open Access Journals (Sweden)

Yun Xu

2016-10-01

Full Text Available Partial least squares (PLS is one of the most commonly used supervised modelling approaches for analysing multivariate metabolomics data. PLS is typically employed as either a regression model (PLS-R or a classification model (PLS-DA. However, in metabolomics studies it is common to investigate multiple, potentially interacting, factors simultaneously following a specific experimental design. Such data often cannot be considered as a “pure” regression or a classification problem. Nevertheless, these data have often still been treated as a regression or classification problem and this could lead to ambiguous results. In this study, we investigated the feasibility of designing a hybrid target matrix Y that better reflects the experimental design than simple regression or binary class membership coding commonly used in PLS modelling. The new design of Y coding was based on the same principle used by structural modelling in machine learning techniques. Two real metabolomics datasets were used as examples to illustrate how the new Y coding can improve the interpretability of the PLS model compared to classic regression/classification coding.
Realtime control system for microprobe beamline at PLS

Energy Technology Data Exchange (ETDEWEB)

Yoon, J.C.; Lee, J.W.; Kim, K.H.; Ko, I.S. [Pohang Accelerator Laboratory, POSTECH, Pohang (Korea)

1998-11-01

The microprobe beamline of the Pohang Light Source (PLS) consists of main and second slits, a microprobe system, two ion chambers, a video-microscope, and a Si(Li) detector. These machine components must be controlled remodely through the computer system to make user's experiments precise and speedy. A real-time computer control system was developed to control and monitor these components. A VMEbus computer with an OS-9 real-time operating system was used for the low-level data acquisition and control. VME I/O modules were used for the step motor control and the scalar control. The software has a modular structure for the maximum performance and the easy maintenance. We developed the database, the I/O driver, and the control software. We used PC/Windows 95 for the data logging and the operator interface. Visual C{sup ++} was used for the graphical user interface programming. RS232C was used for the communication between the VME and the PC. (author)

Polynomial regression analysis and significance test of the regression function

International Nuclear Information System (INIS)

Gao Zhengming; Zhao Juan; He Shengping

2012-01-01

In order to analyze the decay heating power of a certain radioactive isotope per kilogram with polynomial regression method, the paper firstly demonstrated the broad usage of polynomial function and deduced its parameters with ordinary least squares estimate. Then significance test method of polynomial regression function is derived considering the similarity between the polynomial regression model and the multivariable linear regression model. Finally, polynomial regression analysis and significance test of the polynomial function are done to the decay heating power of the iso tope per kilogram in accord with the authors' real work. (authors)
A Fast Gradient Method for Nonnegative Sparse Regression With Self-Dictionary

Science.gov (United States)

Gillis, Nicolas; Luce, Robert

2018-01-01

A nonnegative matrix factorization (NMF) can be computed efficiently under the separability assumption, which asserts that all the columns of the given input data matrix belong to the cone generated by a (small) subset of them. The provably most robust methods to identify these conic basis columns are based on nonnegative sparse regression and self dictionaries, and require the solution of large-scale convex optimization problems. In this paper we study a particular nonnegative sparse regression model with self dictionary. As opposed to previously proposed models, this model yields a smooth optimization problem where the sparsity is enforced through linear constraints. We show that the Euclidean projection on the polyhedron defined by these constraints can be computed efficiently, and propose a fast gradient method to solve our model. We compare our algorithm with several state-of-the-art methods on synthetic data sets and real-world hyperspectral images.
Convert a low-cost sensor to a colorimeter using an improved regression method

Science.gov (United States)

Wu, Yifeng

2008-01-01

Closed loop color calibration is a process to maintain consistent color reproduction for color printers. To perform closed loop color calibration, a pre-designed color target should be printed, and automatically measured by a color measuring instrument. A low cost sensor has been embedded to the printer to perform the color measurement. A series of sensor calibration and color conversion methods have been developed. The purpose is to get accurate colorimetric measurement from the data measured by the low cost sensor. In order to get high accuracy colorimetric measurement, we need carefully calibrate the sensor, and minimize all possible errors during the color conversion. After comparing several classical color conversion methods, a regression based color conversion method has been selected. The regression is a powerful method to estimate the color conversion functions. But the main difficulty to use this method is to find an appropriate function to describe the relationship between the input and the output data. In this paper, we propose to use 1D pre-linearization tables to improve the linearity between the input sensor measuring data and the output colorimetric data. Using this method, we can increase the accuracy of the regression method, so as to improve the accuracy of the color conversion.
Estimation Methods for Non-Homogeneous Regression - Minimum CRPS vs Maximum Likelihood

Science.gov (United States)

Gebetsberger, Manuel; Messner, Jakob W.; Mayr, Georg J.; Zeileis, Achim

2017-04-01

Non-homogeneous regression models are widely used to statistically post-process numerical weather prediction models. Such regression models correct for errors in mean and variance and are capable to forecast a full probability distribution. In order to estimate the corresponding regression coefficients, CRPS minimization is performed in many meteorological post-processing studies since the last decade. In contrast to maximum likelihood estimation, CRPS minimization is claimed to yield more calibrated forecasts. Theoretically, both scoring rules used as an optimization score should be able to locate a similar and unknown optimum. Discrepancies might result from a wrong distributional assumption of the observed quantity. To address this theoretical concept, this study compares maximum likelihood and minimum CRPS estimation for different distributional assumptions. First, a synthetic case study shows that, for an appropriate distributional assumption, both estimation methods yield to similar regression coefficients. The log-likelihood estimator is slightly more efficient. A real world case study for surface temperature forecasts at different sites in Europe confirms these results but shows that surface temperature does not always follow the classical assumption of a Gaussian distribution. KEYWORDS: ensemble post-processing, maximum likelihood estimation, CRPS minimization, probabilistic temperature forecasting, distributional regression models
Chemometrics-Assisted UV Spectrophotometric and RP-HPLC Methods for the Simultaneous Determination of Tolperisone Hydrochloride and Diclofenac Sodium in their Combined Pharmaceutical Formulation.

Science.gov (United States)

Gohel, Nikunj Rameshbhai; Patel, Bhavin Kiritbhai; Parmar, Vijaykumar Kunvarji

2013-01-01

Chemometrics-assisted UV spectrophotometric and RP-HPLC methods are presented for the simultaneous determination of tolperisone hydrochloride (TOL) and diclofenac sodium (DIC) from their combined pharmaceutical dosage form. Chemometric methods are based on principal component regression and partial least-square regression models. Two sets of standard mixtures, calibration sets, and validation sets were prepared. Both models were optimized to quantify each drug in the mixture using the information included in the UV absorption spectra of the appropriate solution in the range 241-290 nm with the intervals λ = 1 nm at 50 wavelengths. The optimized models were successfully applied to the simultaneous determination of these drugs in synthetic mixture and pharmaceutical formulation. In addition, an HPLC method was developed using a reversed-phase C18 column at ambient temperature with a mobile phase consisting of methanol:acetonitrile:water (60:30:10 v/v/v), pH-adjusted to 3.0, with UV detection at 275 nm. The methods were validated in terms of linearity, accuracy, precision, sensitivity, specificity, and robustness in the range of 3-30 μg/mL for TOL and 1-10 μg/mL for DIC. The robustness of the HPLC method was tested using an experimental design approach. The developed HPLC method, and the PCR and PLS models were used to determine the amount of TOL and DIC in tablets. The data obtained from the PCR and PLS models were not significantly different from those obtained from the HPLC method at 95% confidence limit.
Statistical approach for selection of regression model during validation of bioanalytical method

Directory of Open Access Journals (Sweden)

Natalija Nakov

2014-06-01

Full Text Available The selection of an adequate regression model is the basis for obtaining accurate and reproducible results during the bionalytical method validation. Given the wide concentration range, frequently present in bioanalytical assays, heteroscedasticity of the data may be expected. Several weighted linear and quadratic regression models were evaluated during the selection of the adequate curve fit using nonparametric statistical tests: One sample rank test and Wilcoxon signed rank test for two independent groups of samples. The results obtained with One sample rank test could not give statistical justification for the selection of linear vs. quadratic regression models because slight differences between the error (presented through the relative residuals were obtained. Estimation of the significance of the differences in the RR was achieved using Wilcoxon signed rank test, where linear and quadratic regression models were treated as two independent groups. The application of this simple non-parametric statistical test provides statistical confirmation of the choice of an adequate regression model.
An NCME Instructional Module on Data Mining Methods for Classification and Regression

Science.gov (United States)

Sinharay, Sandip

2016-01-01

Data mining methods for classification and regression are becoming increasingly popular in various scientific fields. However, these methods have not been explored much in educational measurement. This module first provides a review, which should be accessible to a wide audience in education measurement, of some of these methods. The module then…
Simultaneous Detemination of Atorvastatin Calcium and Amlodipine Besylate by Spectrophotometry and Multivariate Calibration Methods in Pharmaceutical Formulations

Directory of Open Access Journals (Sweden)

Amir H. M. Sarrafi

2011-01-01

Full Text Available Resolution of binary mixture of atorvastatin (ATV and amlodipine (AML with minimum sample pretreatment and without analyte separation has been successfully achieved using a rapid method based on partial least square analysis of UV–spectral data. Multivariate calibration modeling procedures, traditional partial least squares (PLS-2, interval partial least squares (iPLS and synergy partial least squares (siPLS, were applied to select a spectral range that provided the lowest prediction error in comparison to the full-spectrum model. The simultaneous determination of both analytes was possible by PLS processing of sample absorbance between 220-425 nm. The correlation coefficients (R and root mean squared error of cross validation (RMSECV for ATV and AML in synthetic mixture were 0.9991, 0.9958 and 0.4538, 0.2411 in best siPLS models respectively. The optimized method has been used for determination of ATV and AML in amostatin commercial tablets. The proposed method are simple, fast, inexpensive and do not need any separation or preparation methods.
Correcting for cryptic relatedness by a regression-based genomic control method

Directory of Open Access Journals (Sweden)

Yang Yaning

2009-12-01

Full Text Available Abstract Background Genomic control (GC method is a useful tool to correct for the cryptic relatedness in population-based association studies. It was originally proposed for correcting for the variance inflation of Cochran-Armitage's additive trend test by using information from unlinked null markers, and was later generalized to be applicable to other tests with the additional requirement that the null markers are matched with the candidate marker in allele frequencies. However, matching allele frequencies limits the number of available null markers and thus limits the applicability of the GC method. On the other hand, errors in genotype/allele frequencies may cause further bias and variance inflation and thereby aggravate the effect of GC correction. Results In this paper, we propose a regression-based GC method using null markers that are not necessarily matched in allele frequencies with the candidate marker. Variation of allele frequencies of the null markers is adjusted by a regression method. Conclusion The proposed method can be readily applied to the Cochran-Armitage's trend tests other than the additive trend test, the Pearson's chi-square test and other robust efficiency tests. Simulation results show that the proposed method is effective in controlling type I error in the presence of population substructure.
Prediction of octanol-water partition coefficients of organic compounds by multiple linear regression, partial least squares, and artificial neural network.

Science.gov (United States)

Golmohammadi, Hassan

2009-11-30

A quantitative structure-property relationship (QSPR) study was performed to develop models those relate the structure of 141 organic compounds to their octanol-water partition coefficients (log P(o/w)). A genetic algorithm was applied as a variable selection tool. Modeling of log P(o/w) of these compounds as a function of theoretically derived descriptors was established by multiple linear regression (MLR), partial least squares (PLS), and artificial neural network (ANN). The best selected descriptors that appear in the models are: atomic charge weighted partial positively charged surface area (PPSA-3), fractional atomic charge weighted partial positive surface area (FPSA-3), minimum atomic partial charge (Qmin), molecular volume (MV), total dipole moment of molecule (mu), maximum antibonding contribution of a molecule orbital in the molecule (MAC), and maximum free valency of a C atom in the molecule (MFV). The result obtained showed the ability of developed artificial neural network to prediction of partition coefficients of organic compounds. Also, the results revealed the superiority of ANN over the MLR and PLS models. Copyright 2009 Wiley Periodicals, Inc.
Pre-accidental situations highlighted by RECUPERARE method and data

Energy Technology Data Exchange (ETDEWEB)

Matahri, N. [Institut de Radioprotection et de Surete Nucleaire (IRSN), 92 - Fontenay-aux-Roses (France)

2006-07-01

RECUPERARE method has been developed for operating feedback analysis and built on the French Human Reliability Analysis (HRA) principles. It is used to study the causes of human errors or technical failures occurred in French PWRs and the recovery process of events. Based on an event classification (6 categories) model according to the nature of the link between failure and recovery, the identified and recorded data are: the causes of the defects (technical, human, organizational) and the context in which they appear; the factors of the recovery performance (depending on technical and organizational aspects); a chronological analysis, designed to collect delays between failures and their detection/recovery for each event. About 3600 events reported in French PWRs (1997-2003) had been reviewed through this model. Initially, the weight of factors and the most important factors, which influenced the detection and recovery delay, are defined. For this purpose, the regression Partial Least Square (PLS) is used. Then, to link RECUPERARE results with pre-accidental data, conditional probabilities of events linked between them by a cause and effect relationship are calculated. For this, the Bayesian method with the Bayesian network is built with the PLS obtained results and applied. This constitutes a first approach to take into account in HRA the human and organizational factors highlighted by operating feedback. (author)
Pre-accidental situations highlighted by RECUPERARE method and data

International Nuclear Information System (INIS)

Matahri, N.

2006-01-01

RECUPERARE method has been developed for operating feedback analysis and built on the French Human Reliability Analysis (HRA) principles. It is used to study the causes of human errors or technical failures occurred in French PWRs and the recovery process of events. Based on an event classification (6 categories) model according to the nature of the link between failure and recovery, the identified and recorded data are: the causes of the defects (technical, human, organizational) and the context in which they appear; the factors of the recovery performance (depending on technical and organizational aspects); a chronological analysis, designed to collect delays between failures and their detection/recovery for each event. About 3600 events reported in French PWRs (1997-2003) had been reviewed through this model. Initially, the weight of factors and the most important factors, which influenced the detection and recovery delay, are defined. For this purpose, the regression Partial Least Square (PLS) is used. Then, to link RECUPERARE results with pre-accidental data, conditional probabilities of events linked between them by a cause and effect relationship are calculated. For this, the Bayesian method with the Bayesian network is built with the PLS obtained results and applied. This constitutes a first approach to take into account in HRA the human and organizational factors highlighted by operating feedback. (author)
Predictive-property-ranked variable reduction in partial least squares modelling with final complexity adapted models: comparison of properties for ranking.

Science.gov (United States)

Andries, Jan P M; Vander Heyden, Yvan; Buydens, Lutgarde M C

2013-01-14

The calibration performance of partial least squares regression for one response (PLS1) can be improved by eliminating uninformative variables. Many variable-reduction methods are based on so-called predictor-variable properties or predictive properties, which are functions of various PLS-model parameters, and which may change during the steps of the variable-reduction process. Recently, a new predictive-property-ranked variable reduction method with final complexity adapted models, denoted as PPRVR-FCAM or simply FCAM, was introduced. It is a backward variable elimination method applied on the predictive-property-ranked variables. The variable number is first reduced, with constant PLS1 model complexity A, until A variables remain, followed by a further decrease in PLS complexity, allowing the final selection of small numbers of variables. In this study for three data sets the utility and effectiveness of six individual and nine combined predictor-variable properties are investigated, when used in the FCAM method. The individual properties include the absolute value of the PLS1 regression coefficient (REG), the significance of the PLS1 regression coefficient (SIG), the norm of the loading weight (NLW) vector, the variable importance in the projection (VIP), the selectivity ratio (SR), and the squared correlation coefficient of a predictor variable with the response y (COR). The selective and predictive performances of the models resulting from the use of these properties are statistically compared using the one-tailed Wilcoxon signed rank test. The results indicate that the models, resulting from variable reduction with the FCAM method, using individual or combined properties, have similar or better predictive abilities than the full spectrum models. After mean-centring of the data, REG and SIG, provide low numbers of informative variables, with a meaning relevant to the response, and lower than the other individual properties, while the predictive abilities are
Correlation of sensory bitterness in dairy protein hydrolysates: Comparison of prediction models built using sensory, chromatographic and electronic tongue data.

Science.gov (United States)

Newman, J; Egan, T; Harbourne, N; O'Riordan, D; Jacquier, J C; O'Sullivan, M

2014-08-01

Sensory evaluation can be problematic for ingredients with a bitter taste during research and development phase of new food products. In this study, 19 dairy protein hydrolysates (DPH) were analysed by an electronic tongue and their physicochemical characteristics, the data obtained from these methods were correlated with their bitterness intensity as scored by a trained sensory panel and each model was also assessed by its predictive capabilities. The physiochemical characteristics of the DPHs investigated were degree of hydrolysis (DH%), and data relating to peptide size and relative hydrophobicity from size exclusion chromatography (SEC) and reverse phase (RP) HPLC. Partial least square regression (PLS) was used to construct the prediction models. All PLS regressions had good correlations (0.78 to 0.93) with the strongest being the combination of data obtained from SEC and RP HPLC. However, the PLS with the strongest predictive power was based on the e-tongue which had the PLS regression with the lowest root mean predicted residual error sum of squares (PRESS) in the study. The results show that the PLS models constructed with the e-tongue and the combination of SEC and RP-HPLC has potential to be used for prediction of bitterness and thus reducing the reliance on sensory analysis in DPHs for future food research. Copyright © 2014 Elsevier B.V. All rights reserved.
Design of High Field Multipole Wiggler at PLS

International Nuclear Information System (INIS)

Kim, D. E.; Park, K. H.; Lee, H. G.; Suh, H. S.; Han, H. S.; Jung, Y. G.; Chung, C. W.

2007-01-01

Pohang Accelerator Laboratory (PAL) is developing a high field multipole wiggler for new EXAFS beamline. The beamline is planning to utilize very high photon energy (∼40keV) synchrotron radiation at Pohang Light Source (PLS). To achieve higher critical photon energy, the wiggler field need to be maximized. A magnetic structure with wedged pole and blocks with additional side blocks which are similar to asymmetric wiggler of ESRF are designed to achieve higher flux density. The end structures were designed to be asymmetric along the beam direction to ensure systematic zero 1st field integral. The thickness of the last magnets were adjusted to minimize the transition sequence to the fully developed periodic field. This approach is more convenient to control than adjusting the strength of the end magnets. The final design features 140mm period, 2.5 Tesla peak flux density at 12mm pole gap, 1205mm magnetic structure length with 16 full field poles. In this article, all the design, engineering efforts for the HFMSII wiggler will be described
Dealing with gene expression missing data.

Science.gov (United States)

Brás, L P; Menezes, J C

2006-05-01

Compared evaluation of different methods is presented for estimating missing values in microarray data: weighted K-nearest neighbours imputation (KNNimpute), regression-based methods such as local least squares imputation (LLSimpute) and partial least squares imputation (PLSimpute) and Bayesian principal component analysis (BPCA). The influence in prediction accuracy of some factors, such as methods' parameters, type of data relationships used in the estimation process (i.e. row-wise, column-wise or both), missing rate and pattern and type of experiment [time series (TS), non-time series (NTS) or mixed (MIX) experiments] is elucidated. Improvements based on the iterative use of data (iterative LLS and PLS imputation--ILLSimpute and IPLSimpute), the need to perform initial imputations (modified PLS and Helland PLS imputation--MPLSimpute and HPLSimpute) and the type of relationships employed (KNNarray, LLSarray, HPLSarray and alternating PLS--APLSimpute) are proposed. Overall, it is shown that data set properties (type of experiment, missing rate and pattern) affect the data similarity structure, therefore influencing the methods' performance. LLSimpute and ILLSimpute are preferable in the presence of data with a stronger similarity structure (TS and MIX experiments), whereas PLS-based methods (MPLSimpute, IPLSimpute and APLSimpute) are preferable when estimating NTS missing data.
Multi-block methods in multivariate process control

DEFF Research Database (Denmark)

Kohonen, J.; Reinikainen, S.P.; Aaljoki, K.

2008-01-01

methods the effect of a sub-process can be seen and an example with two blocks, near infra-red, NIR, and process data, is shown. The results show improvements in modelling task, when a MB-based approach is used. This way of working with data gives more information on the process than if all data...... are in one X-matrix. The procedure is demonstrated by an industrial continuous process, where knowledge about the sub-processes is available and X-matrix can be divided into blocks between process variables and NIR spectra.......In chemometric studies all predictor variables are usually collected in one data matrix X. This matrix is then analyzed by PLS regression or other methods. When data from several different sub-processes are collected in one matrix, there is a possibility that the effects of some sub-processes may...
Prediction-oriented modeling in business research by means of PLS path modeling : Introduction to a JBR special section

NARCIS (Netherlands)

Cepeda Carrion, Gabriel; Henseler, Jörg; Ringle, Christian M.; Roldan, Jose Luis

2016-01-01

Under the main theme “prediction-oriented modeling in business research by means of partial least squares path modeling” (PLS), the special issue presents 17 papers. Most contributions include content from presentations at the 2nd International Symposium on Partial Least Squares Path Modeling: The
Application of Genetic Algorithm (GA) Assisted Partial Least Square (PLS) Analysis on Trilinear and Non-trilinear Fluorescence Data Sets to Quantify the Fluorophores in Multifluorophoric Mixtures: Improving Quantification Accuracy of Fluorimetric Estimations of Dilute Aqueous Mixtures.

Science.gov (United States)

Kumar, Keshav

2018-03-29

Excitation-emission matrix fluorescence (EEMF) and total synchronous fluorescence spectroscopy (TSFS) are the 2 fluorescence techniques that are commonly used for the analysis of multifluorophoric mixtures. These 2 fluorescence techniques are conceptually different and provide certain advantages over each other. The manual analysis of such highly correlated large volume of EEMF and TSFS towards developing a calibration model is difficult. Partial least square (PLS) analysis can analyze the large volume of EEMF and TSFS data sets by finding important factors that maximize the correlation between the spectral and concentration information for each fluorophore. However, often the application of PLS analysis on entire data sets does not provide a robust calibration model and requires application of suitable pre-processing step. The present work evaluates the application of genetic algorithm (GA) analysis prior to PLS analysis on EEMF and TSFS data sets towards improving the precision and accuracy of the calibration model. The GA algorithm essentially combines the advantages provided by stochastic methods with those provided by deterministic approaches and can find the set of EEMF and TSFS variables that perfectly correlate well with the concentration of each of the fluorophores present in the multifluorophoric mixtures. The utility of the GA assisted PLS analysis is successfully validated using (i) EEMF data sets acquired for dilute aqueous mixture of four biomolecules and (ii) TSFS data sets acquired for dilute aqueous mixtures of four carcinogenic polycyclic aromatic hydrocarbons (PAHs) mixtures. In the present work, it is shown that by using the GA it is possible to significantly improve the accuracy and precision of the PLS calibration model developed for both EEMF and TSFS data set. Hence, GA must be considered as a useful pre-processing technique while developing an EEMF and TSFS calibration model.
Comparing the index-flood and multiple-regression methods using L-moments

Science.gov (United States)

Malekinezhad, H.; Nachtnebel, H. P.; Klik, A.

In arid and semi-arid regions, the length of records is usually too short to ensure reliable quantile estimates. Comparing index-flood and multiple-regression analyses based on L-moments was the main objective of this study. Factor analysis was applied to determine main influencing variables on flood magnitude. Ward’s cluster and L-moments approaches were applied to several sites in the Namak-Lake basin in central Iran to delineate homogeneous regions based on site characteristics. Homogeneity test was done using L-moments-based measures. Several distributions were fitted to the regional flood data and index-flood and multiple-regression methods as two regional flood frequency methods were compared. The results of factor analysis showed that length of main waterway, compactness coefficient, mean annual precipitation, and mean annual temperature were the main variables affecting flood magnitude. The study area was divided into three regions based on the Ward’s method of clustering approach. The homogeneity test based on L-moments showed that all three regions were acceptably homogeneous. Five distributions were fitted to the annual peak flood data of three homogeneous regions. Using the L-moment ratios and the Z-statistic criteria, GEV distribution was identified as the most robust distribution among five candidate distributions for all the proposed sub-regions of the study area, and in general, it was concluded that the generalised extreme value distribution was the best-fit distribution for every three regions. The relative root mean square error (RRMSE) measure was applied for evaluating the performance of the index-flood and multiple-regression methods in comparison with the curve fitting (plotting position) method. In general, index-flood method gives more reliable estimations for various flood magnitudes of different recurrence intervals. Therefore, this method should be adopted as regional flood frequency method for the study area and the Namak-Lake basin

The Bland-Altman Method Should Not Be Used in Regression Cross-Validation Studies

Science.gov (United States)

O'Connor, Daniel P.; Mahar, Matthew T.; Laughlin, Mitzi S.; Jackson, Andrew S.

2011-01-01

The purpose of this study was to demonstrate the bias in the Bland-Altman (BA) limits of agreement method when it is used to validate regression models. Data from 1,158 men were used to develop three regression equations to estimate maximum oxygen uptake (R[superscript 2] = 0.40, 0.61, and 0.82, respectively). The equations were evaluated in a…
Parameter Selection Method for Support Vector Regression Based on Adaptive Fusion of the Mixed Kernel Function

Directory of Open Access Journals (Sweden)

Hailun Wang

2017-01-01

Full Text Available Support vector regression algorithm is widely used in fault diagnosis of rolling bearing. A new model parameter selection method for support vector regression based on adaptive fusion of the mixed kernel function is proposed in this paper. We choose the mixed kernel function as the kernel function of support vector regression. The mixed kernel function of the fusion coefficients, kernel function parameters, and regression parameters are combined together as the parameters of the state vector. Thus, the model selection problem is transformed into a nonlinear system state estimation problem. We use a 5th-degree cubature Kalman filter to estimate the parameters. In this way, we realize the adaptive selection of mixed kernel function weighted coefficients and the kernel parameters, the regression parameters. Compared with a single kernel function, unscented Kalman filter (UKF support vector regression algorithms, and genetic algorithms, the decision regression function obtained by the proposed method has better generalization ability and higher prediction accuracy.
A Comparison of Multivariate and Pre-Processing Methods for Quantitative Laser-Induced Breakdown Spectroscopy of Geologic Samples

Science.gov (United States)

Anderson, R. B.; Morris, R. V.; Clegg, S. M.; Bell, J. F., III; Humphries, S. D.; Wiens, R. C.

2011-01-01

The ChemCam instrument selected for the Curiosity rover is capable of remote laser-induced breakdown spectroscopy (LIBS).[1] We used a remote LIBS instrument similar to ChemCam to analyze 197 geologic slab samples and 32 pressed-powder geostandards. The slab samples are well-characterized and have been used to validate the calibration of previous instruments on Mars missions, including CRISM [2], OMEGA [3], the MER Pancam [4], Mini-TES [5], and Moessbauer [6] instruments and the Phoenix SSI [7]. The resulting dataset was used to compare multivariate methods for quantitative LIBS and to determine the effect of grain size on calculations. Three multivariate methods - partial least squares (PLS), multilayer perceptron artificial neural networks (MLP ANNs) and cascade correlation (CC) ANNs - were used to generate models and extract the quantitative composition of unknown samples. PLS can be used to predict one element (PLS1) or multiple elements (PLS2) at a time, as can the neural network methods. Although MLP and CC ANNs were successful in some cases, PLS generally produced the most accurate and precise results.
Data analysis of photon beam position at PLS-II

Energy Technology Data Exchange (ETDEWEB)

Ko, J.; Shin, S., E-mail: tlssh@postech.ac.kr; Huang, Jung-Yun; Kim, D.; Kim, C.; Kim, Ilyou; Lee, T.-Y.; Park, C.-D.; Kim, K. R. [Pohang Accelerator Laboratory, Pohang, Kyungbuk 790-834 (Korea, Republic of); Cho, Moohyun [Department of Physics, POSTECH, Pohang, Kyungbuk 790-834 (Korea, Republic of)

2016-07-27

In the third generation light source, photon beam position stability is critical issue on user experiment. Generally photon beam position monitors have been developed for the detection of the real photon beam position and the position is controlled by feedback system in order to keep the reference photon beam position. In the PLS-II, photon beam position stability for front end of particular beam line, in which photon beam position monitor is installed, has been obtained less than rms 1μm for user service period. Nevertheless, detail analysis for photon beam position data in order to demonstrate the performance of photon beam position monitor is necessary, since it can be suffers from various unknown noises. (for instance, a back ground contamination due to upstream or downstream dipole radiation, undulator gap dependence, etc.) In this paper, we will describe the start to end study for photon beam position stability and the Singular Value Decomposition (SVD) analysis to demonstrate the reliability on photon beam position data.
Comparison of Four Weighting Methods in Fuzzy-based Land Suitability to Predict Wheat Yield

Directory of Open Access Journals (Sweden)

Fatemeh Rahmati

2017-06-01

climatic conditions like mean, maximum and minimum air temperatures during growing period as well as edaphologic properties like EC, pH, ESP, percent of clay, silt, sand, gravel, gypsum and CaCO3 content. Climatic data collected from the Shahrekord synoptic station were used to assess climatic land suitability for wheat. Qualitative land suitability evaluation was carried out using the fuzzy approach. Potential yield was calculated using the method proposed by FAO. Using MATLAB software, qualitative and quantitative land evaluation were classified based on fuzzy logic approach. In fuzzy method, climatic factors are used to achieve climatic index. Clay and sand percent were applied to calculate soil texture. To determine the membership degrees,bell membership functions were used. Parameters of function shapes were transformed to equations with variable coefficients and the best coefficients were eventually chosen based on the model determination coefficient. In evaluation method based on fuzzy logic, the weights are used for land characteristics. In fuzzy logic method, weights were calculated by four methods. These methods consist of neural network using 1 neuron and 4 neurons, multivariate and Partial Least Squares (PLS regressions. Comparison of the coefficient of determination results of multivariate regression and RMSE is carried out between observed and predicted yield. Weight calculations were conducted by using MINITAB software to PLS and multivariate regression. Also, Neurosolution 5 was used for weight calculation based on neural network. Results and Discussion: The calculated weights were differed by using the four applied methods. In all methods, the maximum weight was related to gravel, and minimum weight was related to clay. The results of land index and predicted yield calculation were different in some points (3, 6, 7, 13, 14, 19, and 21 for four methods. The coefficient of determination of calculated weights were 0.595, 0.56, 0.6 and 0.56 for neural network
45 CFR 303.15 - Agreements to use the Federal Parent Locator Service (PLS) in parental kidnapping and child...

Science.gov (United States)

2010-10-01

... Service (PLS) in parental kidnapping and child custody or visitation cases. 303.15 Section 303.15 Public... parental kidnapping and child custody or visitation cases. (a) Definitions. The following definitions apply... responsibilities require access in connection with child custody and parental kidnapping cases; (ii) Store the...
Simultaneous measurement of two enzyme activities using infrared spectroscopy: A comparative evaluation of PARAFAC, TUCKER and N-PLS modeling

DEFF Research Database (Denmark)

Baum, Andreas; Hansen, Per Waaben; Meyer, Anne S.

2013-01-01

multiway methods, namely PARAFAC, TUCKER3 and N-PLS, to establish simultaneous enzyme activity assays for pectin lyase and pectin methyl esterase. Correlation coefficients Rpred2 for prediction test sets are 0.48, 0.96 and 0.96 for pectin lyase and 0.70, 0.89 and 0.89 for pectin methyl esterase......Enzymes are used in many processes to release fermentable sugars for green production of biofuel, or the refinery of biomass for extraction of functional food ingredients such as pectin or prebiotic oligosaccharides. The complex biomasses may, however, require a multitude of specific enzymes which...... are active on specific substrates generating a multitude of products. In this paper we use the plant polymer, pectin, to present a method to quantify enzyme activity of two pectolytic enzymes by monitoring their superimposed spectral evolutions simultaneously. The data is analyzed by three chemometric...
Simultaneous spectrophotometric determination of crystal violet and malachite green in water samples using partial least squares regression and central composite design after preconcentration by dispersive solid-phase extraction.

Science.gov (United States)

Razi-Asrami, Mahboobeh; Ghasemi, Jahan B; Amiri, Nayereh; Sadeghi, Seyed Jamal

2017-04-01

In this paper, a simple, fast, and inexpensive method is introduced for the simultaneous spectrophotometric determination of crystal violet (CV) and malachite green (MG) contents in aquatic samples using partial least squares regression (PLS) as a multivariate calibration technique after preconcentration by graphene oxide (GO). The method was based on the sorption and desorption of analytes onto GO and direct determination by ultraviolet-visible spectrophotometric techniques. GO was synthesized according to Hummers method. To characterize the shape and structure of GO, FT-IR, SEM, and XRD were used. The effective factors on the extraction efficiency such as pH, extraction time, and the amount of adsorbent were optimized using central composite design. The optimum values of these factors were 6, 15 min, and 12 mg, respectively. The maximum capacity of GO for the adsorption of CV and MG was 63.17 and 77.02 mg g -1 , respectively. Preconcentration factors and extraction recoveries were obtained and were 19.6, 98% for CV and 20, 100% for MG, respectively. LOD and linear dynamic ranges for CV and MG were 0.009, 0.03-0.3, 0.015, and 0.05-0.5 (μg mL -1 ), respectively. The intra-day and inter-day relative standard deviations were 1.99 and 0.58 for CV and 1.69 and 3.13 for MG at the concentration level of 50 ng mL -1 , respectively. Finally, the proposed DSPE/PLS method was successfully applied for the simultaneous determination of the trace amount of CV and MG in the real water samples.
Identification of solid state fermentation degree with FT-NIR spectroscopy: Comparison of wavelength variable selection methods of CARS and SCARS

Science.gov (United States)

Jiang, Hui; Zhang, Hang; Chen, Quansheng; Mei, Congli; Liu, Guohai

2015-10-01

The use of wavelength variable selection before partial least squares discriminant analysis (PLS-DA) for qualitative identification of solid state fermentation degree by FT-NIR spectroscopy technique was investigated in this study. Two wavelength variable selection methods including competitive adaptive reweighted sampling (CARS) and stability competitive adaptive reweighted sampling (SCARS) were employed to select the important wavelengths. PLS-DA was applied to calibrate identified model using selected wavelength variables by CARS and SCARS for identification of solid state fermentation degree. Experimental results showed that the number of selected wavelength variables by CARS and SCARS were 58 and 47, respectively, from the 1557 original wavelength variables. Compared with the results of full-spectrum PLS-DA, the two wavelength variable selection methods both could enhance the performance of identified models. Meanwhile, compared with CARS-PLS-DA model, the SCARS-PLS-DA model achieved better results with the identification rate of 91.43% in the validation process. The overall results sufficiently demonstrate the PLS-DA model constructed using selected wavelength variables by a proper wavelength variable method can be more accurate identification of solid state fermentation degree.
Fish mercury levels in lakes - adjusting for Hg and fish-size covariation

International Nuclear Information System (INIS)

Sonesten, Lars

2003-01-01

Fish-size covariation can be circumvented by regression intercepts of Hg vs. fish length as lake-specific Hg levels. - Accurate estimates of lake-specific mercury levels are vital in assessing the environmental impact on the mercury content in fish. The intercepts of lake-specific regressions of Hg concentration in fish vs. fish length provide accurate estimates when there is a prominent Hg and fish-size covariation. Commonly used regression methods, such as analysis of covariance (ANCOVA) and various standardization techniques are less suitable, since they do not completely remove the fish-size covariation when regression slopes are not parallel. Partial least squares (PLS) regression analysis reveals that catchment area and water chemistry have the strongest influence on the Hg level in fish in circumneutral lakes. PLS is a multivariate projection method that allows biased linear regression analysis of multicollinear data. The method is applicable to statistical and visual exploration of large data sets, even if there are more variables than observations. Environmental descriptors have no significant impact on the slopes of linear regressions of the Hg concentration in perch (Perca fluviatilis L.) vs. fish length, suggesting that the slopes mainly reflect ontogenetic dietary shifts during the perch life span
Rapid classification of pharmaceutical ingredients with Raman spectroscopy using compressive detection strategy with PLS-DA multivariate filters.

Science.gov (United States)

Cebeci Maltaş, Derya; Kwok, Kaho; Wang, Ping; Taylor, Lynne S; Ben-Amotz, Dor

2013-06-01

Identifying pharmaceutical ingredients is a routine procedure required during industrial manufacturing. Here we show that a recently developed Raman compressive detection strategy can be employed to classify various widely used pharmaceutical materials using a hybrid supervised/unsupervised strategy in which only two ingredients are used for training and yet six other ingredients can also be distinguished. More specifically, our liquid crystal spatial light modulator (LC-SLM) based compressive detection instrument is trained using only the active ingredient, tadalafil, and the excipient, lactose, but is tested using these and various other excipients; microcrystalline cellulose, magnesium stearate, titanium (IV) oxide, talc, sodium lauryl sulfate and hydroxypropyl cellulose. Partial least squares discriminant analysis (PLS-DA) is used to generate the compressive detection filters necessary for fast chemical classification. Although the filters used in this study are trained on only lactose and tadalafil, we show that all the pharmaceutical ingredients mentioned above can be differentiated and classified using PLS-DA compressive detection filters with an accumulation time of 10ms per filter. Copyright © 2013 Elsevier B.V. All rights reserved.
Fibre Morphological Characteristics of Kraft Pulps of Acacia melanoxylon Estimated by NIR-PLS-R Models

Directory of Open Access Journals (Sweden)

Helena Pereira

2015-12-01

Full Text Available In this paper, the morphological properties of fiber length (weighted in length and of fiber width of unbleached Kraft pulp of Acacia melanoxylon were determined using TECHPAP Morfi® equipment (Techpap SAS, Grenoble, France, and were used in the calibration development of Near Infrared (NIR partial least squares regression (PLS-R models based on the spectral data obtained for the wood. It is the first time that fiber length and width of pulp were predicted with NIR spectral data of the initial woodmeal, with high accuracy and precision, and with ratios of performance to deviation (RPD fulfilling the requirements for screening in breeding programs. The selected models for fiber length and fiber width used the second derivative and first derivative + multiplicative scatter correction (2ndDer and 1stDer + MSC pre-processed spectra, respectively, in the wavenumber ranges from 7506 to 5440 cm−1. The statistical parameters of cross-validation (RMSECV (root mean square error of cross-validation of 0.009 mm and 0.39 μm and validation (RMSEP (root mean square error of prediction of 0.007 mm and 0.36 μm with RPDTS (ratios of performance to deviation of test set values of 3.9 and 3.3, respectively, confirmed that the models are robust and well qualified for prediction. This modeling approach shows a high potential to be used for tree breeding and improvement programs, providing a rapid screening for desired fiber morphological properties of pulp prediction.
[Research on modeling method to analyze Lonicerae Japonicae Flos extraction process with online MEMS-NIR based on two types of error detection theory].

Science.gov (United States)

Du, Chen-Zhao; Wu, Zhi-Sheng; Zhao, Na; Zhou, Zheng; Shi, Xin-Yuan; Qiao, Yan-Jiang

2016-10-01

To establish a rapid quantitative analysis method for online monitoring of chlorogenic acid in aqueous solution of Lonicera Japonica Flos extraction by using micro-electromechanical near infrared spectroscopy (MEMS-NIR). High performance liquid chromatography(HPLC) was used as reference method．Kennard-Stone (K-S) algorithm was used to divide sample sets, and partial least square(PLS) regression was adopted to establish the multivariate analysis model between the HPLC analysis contents and NIR spectra. The synergy interval partial least squares (SiPLS) was used to selected modeling waveband to establish PLS models. RPD was used to evaluate the prediction performance of the models. MDLs was calculated based on two types of error detection theory, on-line analytical modeling approach of Lonicera Japonica Flos extraction process was expressed scientifically by MDL. The result shows that the model established by multiplicative scatter correction(MSC) was the best, with the root mean square with cross validation(RMSECV), root mean square error of correction(RMSEC) and root mean square error of prediction(RMSEP) of chlorogenic acid as 1.707, 1.489, 2.362, respectively, the determination coefficient of the calibration model was 0.998 5, and the determination coefficient of the prediction was 0.988 1．The value of RPD is 9.468.The MDL (0.042 15 g•L⁻¹) selected by SiPLS is less than the original,which demonstrated that SiPLS was beneficial to improve the prediction performance of the model. In this study, a more accurate expression of the prediction performance of the model from the two types of error detection theory, to further illustrate MEMS-NIR spectroscopy can be used for on-line monitoring of Lonicera Japonica Flos extraction process. Copyright© by the Chinese Pharmaceutical Association.
At-line determination of pharmaceuticals small molecule's blending end point using chemometric modeling combined with Fourier transform near infrared spectroscopy

Science.gov (United States)

Tewari, Jagdish; Strong, Richard; Boulas, Pierre

2017-02-01

This article summarizes the development and validation of a Fourier transform near infrared spectroscopy (FT-NIR) method for the rapid at-line prediction of active pharmaceutical ingredient (API) in a powder blend to optimize small molecule formulations. The method was used to determine the blend uniformity end-point for a pharmaceutical solid dosage formulation containing a range of API concentrations. A set of calibration spectra from samples with concentrations ranging from 1% to 15% of API (w/w) were collected at-line from 4000 to 12,500 cm- 1. The ability of the FT-NIR method to predict API concentration in the blend samples was validated against a reference high performance liquid chromatography (HPLC) method. The prediction efficiency of four different types of multivariate data modeling methods such as partial least-squares 1 (PLS1), partial least-squares 2 (PLS2), principal component regression (PCR) and artificial neural network (ANN), were compared using relevant multivariate figures of merit. The prediction ability of the regression models were cross validated against results generated with the reference HPLC method. PLS1 and ANN showed excellent and superior prediction abilities when compared to PLS2 and PCR. Based upon these results and because of its decreased complexity compared to ANN, PLS1 was selected as the best chemometric method to predict blend uniformity at-line. The FT-NIR measurement and the associated chemometric analysis were implemented in the production environment for rapid at-line determination of the end-point of the small molecule blending operation. FIGURE 1: Correlation coefficient vs Rank plot FIGURE 2: FT-NIR spectra of different steps of Blend and final blend FIGURE 3: Predictions ability of PCR FIGURE 4: Blend uniformity predication ability of PLS2 FIGURE 5: Prediction efficiency of blend uniformity using ANN FIGURE 6: Comparison of prediction efficiency of chemometric models TABLE 1: Order of Addition for Blending Steps
Hierarchical Cluster-based Partial Least Squares Regression (HC-PLSR is an efficient tool for metamodelling of nonlinear dynamic models

Directory of Open Access Journals (Sweden)

Omholt Stig W

2011-06-01

Full Text Available Abstract Background Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs to variation in features of the trajectories of the state variables (outputs throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR, where fuzzy C-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR and ordinary least squares (OLS regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function. Results Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback
Hierarchical cluster-based partial least squares regression (HC-PLSR) is an efficient tool for metamodelling of nonlinear dynamic models.

Science.gov (United States)

Tøndel, Kristin; Indahl, Ulf G; Gjuvsland, Arne B; Vik, Jon Olav; Hunter, Peter; Omholt, Stig W; Martens, Harald

2011-06-01

Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs) to variation in features of the trajectories of the state variables (outputs) throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR), where fuzzy C-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR) and ordinary least squares (OLS) regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function. Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback loops. HC-PLSR is a promising approach for
PLS-NIR determination of five parameters in different types of Chinese rice wine

Science.gov (United States)

Yu, Haiyan; Ying, Yibin; Fu, Xiaping; Lu, Huishan

2005-11-01

To evaluate the applicability of near infrared spectroscopy for determination of the five enological parameters (alcoholic degree, pH value, total acid and amino acid nitrogen, °Brix) of Chinese rice wine, transmission spectra were collected in the spectral range from 12500 to 3800 cm-1 in a 1 mm path length rectangular quartz cuvette with air as reference at room temperature. Five calibration equations for the five parameters were established between the reference data and spectra by partial least squares (PLS) regression, separately. The best calibration results were achieved for the determination of alcoholic degree and °Brix. The RPD (ration of the standard deviation of the samples to the SECV) values of the calibration for both alcoholic degree and °Brix were higher than 3 (4.30 and 7.94, respectively), which demonstrated the robustness and power of the calibration models. The determination coefficients (R2) for alcoholic degree and °Brix were 0.987 and 0.991, respectively. The performance of pH, total acid and amino acid nitrogen was not as good as that of alcoholic degree and °Brix. The RPD values for the three parameters were 1.48, 1.85 and 1.82, respectively, and R2 values were 0.964, 0.970 and 0.971, respectively. In validation step, R2 value of the five parameters are all higher than 0.7, especially for alcoholic degree and °Brix (0.968 and 0.956, respectively). The results demonstrated that NIR spectroscopy could be used to predict the concentration of the five enological parameters in Chinese rice wine.
Determination of cellulose I crystallinity by FT-Raman spectroscopy

Science.gov (United States)

Umesh P. Agarwal; Richard S. Reiner; Sally A. Ralph

2009-01-01

Two new methods based on FT-Raman spectroscopy, one simple, based on band intensity ratio, and the other, using a partial least-squares (PLS) regression model, are proposed to determine cellulose I crystallinity. In the simple method, crystallinity in semicrystalline cellulose I samples was determined based on univariate regression that was first developed using the...
Statistical methods and regression analysis of stratospheric ozone and meteorological variables in Isfahan

Science.gov (United States)

Hassanzadeh, S.; Hosseinibalam, F.; Omidvari, M.

2008-04-01

Data of seven meteorological variables (relative humidity, wet temperature, dry temperature, maximum temperature, minimum temperature, ground temperature and sun radiation time) and ozone values have been used for statistical analysis. Meteorological variables and ozone values were analyzed using both multiple linear regression and principal component methods. Data for the period 1999-2004 are analyzed jointly using both methods. For all periods, temperature dependent variables were highly correlated, but were all negatively correlated with relative humidity. Multiple regression analysis was used to fit the meteorological variables using the meteorological variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to obtain subsets of the predictor variables to be included in the linear regression model of the meteorological variables. In 1999, 2001 and 2002 one of the meteorological variables was weakly influenced predominantly by the ozone concentrations. However, the model did not predict that the meteorological variables for the year 2000 were not influenced predominantly by the ozone concentrations that point to variation in sun radiation. This could be due to other factors that were not explicitly considered in this study.
Regression analysis with categorized regression calibrated exposure: some interesting findings

Directory of Open Access Journals (Sweden)

Hjartåker Anette

2006-07-01

Full Text Available Abstract Background Regression calibration as a method for handling measurement error is becoming increasingly well-known and used in epidemiologic research. However, the standard version of the method is not appropriate for exposure analyzed on a categorical (e.g. quintile scale, an approach commonly used in epidemiologic studies. A tempting solution could then be to use the predicted continuous exposure obtained through the regression calibration method and treat it as an approximation to the true exposure, that is, include the categorized calibrated exposure in the main regression analysis. Methods We use semi-analytical calculations and simulations to evaluate the performance of the proposed approach compared to the naive approach of not correcting for measurement error, in situations where analyses are performed on quintile scale and when incorporating the original scale into the categorical variables, respectively. We also present analyses of real data, containing measures of folate intake and depression, from the Norwegian Women and Cancer study (NOWAC. Results In cases where extra information is available through replicated measurements and not validation data, regression calibration does not maintain important qualities of the true exposure distribution, thus estimates of variance and percentiles can be severely biased. We show that the outlined approach maintains much, in some cases all, of the misclassification found in the observed exposure. For that reason, regression analysis with the corrected variable included on a categorical scale is still biased. In some cases the corrected estimates are analytically equal to those obtained by the naive approach. Regression calibration is however vastly superior to the naive method when applying the medians of each category in the analysis. Conclusion Regression calibration in its most well-known form is not appropriate for measurement error correction when the exposure is analyzed on a

Comparison of Adaline and Multiple Linear Regression Methods for Rainfall Forecasting

Science.gov (United States)

Sutawinaya, IP; Astawa, INGA; Hariyanti, NKD

2018-01-01

Heavy rainfall can cause disaster, therefore need a forecast to predict rainfall intensity. Main factor that cause flooding is there is a high rainfall intensity and it makes the river become overcapacity. This will cause flooding around the area. Rainfall factor is a dynamic factor, so rainfall is very interesting to be studied. In order to support the rainfall forecasting, there are methods that can be used from Artificial Intelligence (AI) to statistic. In this research, we used Adaline for AI method and Regression for statistic method. The more accurate forecast result shows the method that used is good for forecasting the rainfall. Through those methods, we expected which is the best method for rainfall forecasting here.
Development of Compressive Failure Strength for Composite Laminate Using Regression Analysis Method

Energy Technology Data Exchange (ETDEWEB)

Lee, Myoung Keon [Agency for Defense Development, Daejeon (Korea, Republic of); Lee, Jeong Won; Yoon, Dong Hyun; Kim, Jae Hoon [Chungnam Nat’l Univ., Daejeon (Korea, Republic of)

2016-10-15

This paper provides the compressive failure strength value of composite laminate developed by using regression analysis method. Composite material in this document is a Carbon/Epoxy unidirection(UD) tape prepreg(Cycom G40-800/5276-1) cured at 350°F(177°C). The operating temperature is –60°F~+200°F(-55°C - +95°C). A total of 56 compression tests were conducted on specimens from eight (8) distinct laminates that were laid up by standard angle layers (0°, +45°, –45° and 90°). The ASTM-D-6484 standard was used for test method. The regression analysis was performed with the response variable being the laminate ultimate fracture strength and the regressor variables being two ply orientations (0° and ±45°)
Development of Compressive Failure Strength for Composite Laminate Using Regression Analysis Method

International Nuclear Information System (INIS)

Lee, Myoung Keon; Lee, Jeong Won; Yoon, Dong Hyun; Kim, Jae Hoon

2016-01-01

This paper provides the compressive failure strength value of composite laminate developed by using regression analysis method. Composite material in this document is a Carbon/Epoxy unidirection(UD) tape prepreg(Cycom G40-800/5276-1) cured at 350°F(177°C). The operating temperature is –60°F~+200°F(-55°C - +95°C). A total of 56 compression tests were conducted on specimens from eight (8) distinct laminates that were laid up by standard angle layers (0°, +45°, –45° and 90°). The ASTM-D-6484 standard was used for test method. The regression analysis was performed with the response variable being the laminate ultimate fracture strength and the regressor variables being two ply orientations (0° and ±45°)
[Study on the Recognition of Liquor Age of Gujing Based on Raman Spectra and Support Vector Regression].

Science.gov (United States)

Wang, Guo-xiang; Wang, Hai-yan; Wang, Hu; Zhang, Zheng-yong; Liu, Jun

2016-03-01

as the Partial Least Squares Regression (PLS) method, and can also be applied in the practice of liquor analysis.
Near Infrared Spectroscopy Calibration for Wood Chemistry: Which Chemometric Technique Is Best for Prediction and Interpretation?

OpenAIRE

Via, Brian K.; Zhou, Chengfeng; Acquah, Gifty; Jiang, Wei; Eckhardt, Lori

2014-01-01

This paper addresses the precision in factor loadings during partial least squares (PLS) and principal components regression (PCR) of wood chemistry content from near infrared reflectance (NIR) spectra. The precision of the loadings is considered important because these estimates are often utilized to interpret chemometric models or selection of meaningful wavenumbers. Standard laboratory chemistry methods were employed on a mixed genus/species hardwood sample set. PLS and PCR, before and af...
ON THE EFFECTS OF THE PRESENCE AND METHODS OF THE ELIMINATION HETEROSCEDASTICITY AND AUTOCORRELATION IN THE REGRESSION MODEL

Directory of Open Access Journals (Sweden)

Nina L. Timofeeva

2014-01-01

Full Text Available The article presents the methodological and technical bases for the creation of regression models that adequately reflect reality. The focus is on methods of removing residual autocorrelation in models. Algorithms eliminating heteroscedasticity and autocorrelation of the regression model residuals: reweighted least squares method, the method of Cochran-Orkutta are given. A model of "pure" regression is build, as well as to compare the effect on the dependent variable of the different explanatory variables when the latter are expressed in different units, a standardized form of the regression equation. The scheme of abatement techniques of heteroskedasticity and autocorrelation for the creation of regression models specific to the social and cultural sphere is developed.
Linear regression based on Minimum Covariance Determinant (MCD) and TELBS methods on the productivity of phytoplankton

Science.gov (United States)

Gusriani, N.; Firdaniza

2018-03-01

The existence of outliers on multiple linear regression analysis causes the Gaussian assumption to be unfulfilled. If the Least Square method is forcedly used on these data, it will produce a model that cannot represent most data. For that, we need a robust regression method against outliers. This paper will compare the Minimum Covariance Determinant (MCD) method and the TELBS method on secondary data on the productivity of phytoplankton, which contains outliers. Based on the robust determinant coefficient value, MCD method produces a better model compared to TELBS method.
Regression trees for predicting mortality in patients with cardiovascular disease: What improvement is achieved by using ensemble-based methods?

Science.gov (United States)

Austin, Peter C; Lee, Douglas S; Steyerberg, Ewout W; Tu, Jack V

2012-01-01

In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease. PMID:22777999
A subagging regression method for estimating the qualitative and quantitative state of groundwater

Science.gov (United States)

Jeong, Jina; Park, Eungyu; Han, Weon Shik; Kim, Kue-Young

2017-08-01

A subsample aggregating (subagging) regression (SBR) method for the analysis of groundwater data pertaining to trend-estimation-associated uncertainty is proposed. The SBR method is validated against synthetic data competitively with other conventional robust and non-robust methods. From the results, it is verified that the estimation accuracies of the SBR method are consistent and superior to those of other methods, and the uncertainties are reasonably estimated; the others have no uncertainty analysis option. To validate further, actual groundwater data are employed and analyzed comparatively with Gaussian process regression (GPR). For all cases, the trend and the associated uncertainties are reasonably estimated by both SBR and GPR regardless of Gaussian or non-Gaussian skewed data. However, it is expected that GPR has a limitation in applications to severely corrupted data by outliers owing to its non-robustness. From the implementations, it is determined that the SBR method has the potential to be further developed as an effective tool of anomaly detection or outlier identification in groundwater state data such as the groundwater level and contaminant concentration.
Near Infrared Spectroscopy Calibration for Wood Chemistry: Which Chemometric Technique Is Best for Prediction and Interpretation?

Directory of Open Access Journals (Sweden)

Brian K. Via

2014-07-01

Full Text Available This paper addresses the precision in factor loadings during partial least squares (PLS and principal components regression (PCR of wood chemistry content from near infrared reflectance (NIR spectra. The precision of the loadings is considered important because these estimates are often utilized to interpret chemometric models or selection of meaningful wavenumbers. Standard laboratory chemistry methods were employed on a mixed genus/species hardwood sample set. PLS and PCR, before and after 1st derivative pretreatment, was utilized for model building and loadings investigation. As demonstrated by others, PLS was found to provide better predictive diagnostics. However, PCR exhibited a more precise estimate of loading peaks which makes PCR better for interpretation. Application of the 1st derivative appeared to assist in improving both PCR and PLS loading precision, but due to the small sample size, the two chemometric methods could not be compared statistically. This work is important because to date most research works have committed to PLS because it yields better predictive performance. But this research suggests there is a tradeoff between better prediction and model interpretation. Future work is needed to compare PLS and PCR for a suite of spectral pretreatment techniques.
Near infrared spectroscopy calibration for wood chemistry: which chemometric technique is best for prediction and interpretation?

Science.gov (United States)

Via, Brian K; Zhou, Chengfeng; Acquah, Gifty; Jiang, Wei; Eckhardt, Lori

2014-07-25

This paper addresses the precision in factor loadings during partial least squares (PLS) and principal components regression (PCR) of wood chemistry content from near infrared reflectance (NIR) spectra. The precision of the loadings is considered important because these estimates are often utilized to interpret chemometric models or selection of meaningful wavenumbers. Standard laboratory chemistry methods were employed on a mixed genus/species hardwood sample set. PLS and PCR, before and after 1st derivative pretreatment, was utilized for model building and loadings investigation. As demonstrated by others, PLS was found to provide better predictive diagnostics. However, PCR exhibited a more precise estimate of loading peaks which makes PCR better for interpretation. Application of the 1st derivative appeared to assist in improving both PCR and PLS loading precision, but due to the small sample size, the two chemometric methods could not be compared statistically. This work is important because to date most research works have committed to PLS because it yields better predictive performance. But this research suggests there is a tradeoff between better prediction and model interpretation. Future work is needed to compare PLS and PCR for a suite of spectral pretreatment techniques.
Using a Regression Method for Estimating Performance in a Rapid Serial Visual Presentation Target-Detection Task

Science.gov (United States)

2017-12-01

Fig. 2 Simulation method; the process for one iteration of the simulation . It was repeated 250 times per combination of HR and FAR. Analysis was...distribution is unlimited. 8 Fig. 2 Simulation method; the process for one iteration of the simulation . It was repeated 250 times per combination of HR...stimuli. Simulations show that this regression method results in an unbiased and accurate estimate of target detection performance. The regression
Estimation of Fine Particulate Matter in Taipei Using Landuse Regression and Bayesian Maximum Entropy Methods

Directory of Open Access Journals (Sweden)

Yi-Ming Kuo

2011-06-01

Full Text Available Fine airborne particulate matter (PM2.5 has adverse effects on human health. Assessing the long-term effects of PM2.5 exposure on human health and ecology is often limited by a lack of reliable PM2.5 measurements. In Taipei, PM2.5 levels were not systematically measured until August, 2005. Due to the popularity of geographic information systems (GIS, the landuse regression method has been widely used in the spatial estimation of PM concentrations. This method accounts for the potential contributing factors of the local environment, such as traffic volume. Geostatistical methods, on other hand, account for the spatiotemporal dependence among the observations of ambient pollutants. This study assesses the performance of the landuse regression model for the spatiotemporal estimation of PM2.5 in the Taipei area. Specifically, this study integrates the landuse regression model with the geostatistical approach within the framework of the Bayesian maximum entropy (BME method. The resulting epistemic framework can assimilate knowledge bases including: (a empirical-based spatial trends of PM concentration based on landuse regression, (b the spatio-temporal dependence among PM observation information, and (c site-specific PM observations. The proposed approach performs the spatiotemporal estimation of PM2.5 levels in the Taipei area (Taiwan from 2005–2007.
Estimation of fine particulate matter in Taipei using landuse regression and bayesian maximum entropy methods.

Science.gov (United States)

Yu, Hwa-Lung; Wang, Chih-Hsih; Liu, Ming-Che; Kuo, Yi-Ming

2011-06-01

Fine airborne particulate matter (PM2.5) has adverse effects on human health. Assessing the long-term effects of PM2.5 exposure on human health and ecology is often limited by a lack of reliable PM2.5 measurements. In Taipei, PM2.5 levels were not systematically measured until August, 2005. Due to the popularity of geographic information systems (GIS), the landuse regression method has been widely used in the spatial estimation of PM concentrations. This method accounts for the potential contributing factors of the local environment, such as traffic volume. Geostatistical methods, on other hand, account for the spatiotemporal dependence among the observations of ambient pollutants. This study assesses the performance of the landuse regression model for the spatiotemporal estimation of PM2.5 in the Taipei area. Specifically, this study integrates the landuse regression model with the geostatistical approach within the framework of the Bayesian maximum entropy (BME) method. The resulting epistemic framework can assimilate knowledge bases including: (a) empirical-based spatial trends of PM concentration based on landuse regression, (b) the spatio-temporal dependence among PM observation information, and (c) site-specific PM observations. The proposed approach performs the spatiotemporal estimation of PM2.5 levels in the Taipei area (Taiwan) from 2005-2007.
PERBANDINGAN ANALISIS LEAST ABSOLUTE SHRINKAGE AND SELECTION OPERATOR DAN PARTIAL LEAST SQUARES (Studi Kasus: Data Microarray

Directory of Open Access Journals (Sweden)

KADEK DWI FARMANI

2012-09-01

Full Text Available Linear regression analysis is one of the parametric statistical methods which utilize the relationship between two or more quantitative variables. In linear regression analysis, there are several assumptions that must be met that is normal distribution of errors, there is no correlation between the error and error variance is constant and homogent. There are some constraints that caused the assumption can not be met, for example, the correlation between independent variables (multicollinearity, constraints on the number of data and independent variables are obtained. When the number of samples obtained less than the number of independent variables, then the data is called the microarray data. Least Absolute shrinkage and Selection Operator (LASSO and Partial Least Squares (PLS is a statistical method that can be used to overcome the microarray, overfitting, and multicollinearity. From the above description, it is necessary to study with the intention of comparing LASSO and PLS method. This study uses coronary heart and stroke patients data which is a microarray data and contain multicollinearity. With these two characteristics of the data that most have a weak correlation between independent variables, LASSO method produces a better model than PLS seen from the large RMSEP.
An Introduction to Graphical and Mathematical Methods for Detecting Heteroscedasticity in Linear Regression.

Science.gov (United States)

Thompson, Russel L.

Homoscedasticity is an important assumption of linear regression. This paper explains what it is and why it is important to the researcher. Graphical and mathematical methods for testing the homoscedasticity assumption are demonstrated. Sources of homoscedasticity and types of homoscedasticity are discussed, and methods for correction are…
Real-time prediction of respiratory motion based on local regression methods

International Nuclear Information System (INIS)

Ruan, D; Fessler, J A; Balter, J M

2007-01-01

Recent developments in modulation techniques enable conformal delivery of radiation doses to small, localized target volumes. One of the challenges in using these techniques is real-time tracking and predicting target motion, which is necessary to accommodate system latencies. For image-guided-radiotherapy systems, it is also desirable to minimize sampling rates to reduce imaging dose. This study focuses on predicting respiratory motion, which can significantly affect lung tumours. Predicting respiratory motion in real-time is challenging, due to the complexity of breathing patterns and the many sources of variability. We propose a prediction method based on local regression. There are three major ingredients of this approach: (1) forming an augmented state space to capture system dynamics, (2) local regression in the augmented space to train the predictor from previous observation data using semi-periodicity of respiratory motion, (3) local weighting adjustment to incorporate fading temporal correlations. To evaluate prediction accuracy, we computed the root mean square error between predicted tumor motion and its observed location for ten patients. For comparison, we also investigated commonly used predictive methods, namely linear prediction, neural networks and Kalman filtering to the same data. The proposed method reduced the prediction error for all imaging rates and latency lengths, particularly for long prediction lengths
Advanced statistics: linear regression, part I: simple linear regression.

Science.gov (United States)

Marill, Keith A

2004-01-01

Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.
Ordinary least square regression, orthogonal regression, geometric mean regression and their applications in aerosol science

International Nuclear Information System (INIS)

Leng Ling; Zhang Tianyi; Kleinman, Lawrence; Zhu Wei

2007-01-01

Regression analysis, especially the ordinary least squares method which assumes that errors are confined to the dependent variable, has seen a fair share of its applications in aerosol science. The ordinary least squares approach, however, could be problematic due to the fact that atmospheric data often does not lend itself to calling one variable independent and the other dependent. Errors often exist for both measurements. In this work, we examine two regression approaches available to accommodate this situation. They are orthogonal regression and geometric mean regression. Comparisons are made theoretically as well as numerically through an aerosol study examining whether the ratio of organic aerosol to CO would change with age
A SOCIOLOGICAL ANALYSIS OF THE CHILDBEARING COEFFICIENT IN THE ALTAI REGION BASED ON METHOD OF FUZZY LINEAR REGRESSION

Directory of Open Access Journals (Sweden)

Sergei Vladimirovich Varaksin

2017-06-01

Full Text Available Purpose. Construction of a mathematical model of the dynamics of childbearing change in the Altai region in 2000–2016, analysis of the dynamics of changes in birth rates for multiple age categories of women of childbearing age. Methodology. A auxiliary analysis element is the construction of linear mathematical models of the dynamics of childbearing by using fuzzy linear regression method based on fuzzy numbers. Fuzzy linear regression is considered as an alternative to standard statistical linear regression for short time series and unknown distribution law. The parameters of fuzzy linear and standard statistical regressions for childbearing time series were defined with using the built in language MatLab algorithm. Method of fuzzy linear regression is not used in sociological researches yet. Results. There are made the conclusions about the socio-demographic changes in society, the high efficiency of the demographic policy of the leadership of the region and the country, and the applicability of the method of fuzzy linear regression for sociological analysis.

Understanding poisson regression.

Science.gov (United States)

Hayat, Matthew J; Higgins, Melinda

2014-04-01

Nurse investigators often collect study data in the form of counts. Traditional methods of data analysis have historically approached analysis of count data either as if the count data were continuous and normally distributed or with dichotomization of the counts into the categories of occurred or did not occur. These outdated methods for analyzing count data have been replaced with more appropriate statistical methods that make use of the Poisson probability distribution, which is useful for analyzing count data. The purpose of this article is to provide an overview of the Poisson distribution and its use in Poisson regression. Assumption violations for the standard Poisson regression model are addressed with alternative approaches, including addition of an overdispersion parameter or negative binomial regression. An illustrative example is presented with an application from the ENSPIRE study, and regression modeling of comorbidity data is included for illustrative purposes. Copyright 2014, SLACK Incorporated.
Safe trapping of cesium into pollucite structure by hot-pressing method

International Nuclear Information System (INIS)

Omerašević, Mia; Matović, Ljiljana; Ružić, Jovana; Golubović, Željko; Jovanović, Uroš; Mentus, Slavko; Dondur, Vera

2016-01-01

A simple one-step method with direct thermal conversion at lower temperatures for preparing a stabile Cs-aluminsilicate phase, known as pollucite, is presented. Cs-exchanged form of Na, Ca-LTA type of zeolite (Cs-LTA) was pressureless sintered and hot pressed at certain temperatures in order to obtain pollucite. XRD and FTIR analysis were used to study structural changes of Cs-LTA before and after thermal treatments. Pressureless sintered sample recrystallized into pollucite phase after heat treatment at 1000 °C (3 h) (PLS1000) and hot pressed sample at 750 °C (3 h) using pressure of 35 MPa (HP750), indicating reduced temperature of 250°. SEM micrographs confirmed that HP750 has higher density than PLS1000 which leads to higher value of compressive strength. The HP750 showed better resistance to Cs leaching than the PLS1000. Base on these results one can conclude that hot pressing is the promising method for the permanent disposal of Cs radionuclides. - Highlights: • Na, Ca-LTA zeolite showed high affinity for Cs ions. • Pollucite phase was obtained using hot pressing at temperature as low as 750 °C. • HP750 shows better mechanical and morphological properties than PLS1000. • HP750 has lower leaching rate of Cs ions than PLS1000.
Novel, customizable scoring functions, parameterized using N-PLS, for structure-based drug discovery.

Science.gov (United States)

Catana, Cornel; Stouten, Pieter F W

2007-01-01

The ability to accurately predict biological affinity on the basis of in silico docking to a protein target remains a challenging goal in the CADD arena. Typically, "standard" scoring functions have been employed that use the calculated docking result and a set of empirical parameters to calculate a predicted binding affinity. To improve on this, we are exploring novel strategies for rapidly developing and tuning "customized" scoring functions tailored to a specific need. In the present work, three such customized scoring functions were developed using a set of 129 high-resolution protein-ligand crystal structures with measured Ki values. The functions were parametrized using N-PLS (N-way partial least squares), a multivariate technique well-known in the 3D quantitative structure-activity relationship field. A modest correlation between observed and calculated pKi values using a standard scoring function (r2 = 0.5) could be improved to 0.8 when a customized scoring function was applied. To mimic a more realistic scenario, a second scoring function was developed, not based on crystal structures but exclusively on several binding poses generated with the Flo+ docking program. Finally, a validation study was conducted by generating a third scoring function with 99 randomly selected complexes from the 129 as a training set and predicting pKi values for a test set that comprised the remaining 30 complexes. Training and test set r2 values were 0.77 and 0.78, respectively. These results indicate that, even without direct structural information, predictive customized scoring functions can be developed using N-PLS, and this approach holds significant potential as a general procedure for predicting binding affinity on the basis of in silico docking.
Partial Least Squares tutorial for analyzing neuroimaging data

Directory of Open Access Journals (Sweden)

Patricia Van Roon

2014-09-01

Full Text Available Partial least squares (PLS has become a respected and meaningful soft modeling analysis technique that can be applied to very large datasets where the number of factors or variables is greater than the number of observations. Current biometric studies (e.g., eye movements, EKG, body movements, EEG are often of this nature. PLS eliminates the multiple linear regression issues of over-fitting data by finding a few underlying or latent variables (factors that account for most of the variation in the data. In real-world applications, where linear models do not always apply, PLS can model the non-linear relationship well. This tutorial introduces two PLS methods, PLS Correlation (PLSC and PLS Regression (PLSR and their applications in data analysis which are illustrated with neuroimaging examples. Both methods provide straightforward and comprehensible techniques for determining and modeling relationships between two multivariate data blocks by finding latent variables that best describes the relationships. In the examples, the PLSC will analyze the relationship between neuroimaging data such as Event-Related Potential (ERP amplitude averages from different locations on the scalp with their corresponding behavioural data. Using the same data, the PLSR will be used to model the relationship between neuroimaging and behavioural data. This model will be able to predict future behaviour solely from available neuroimaging data. To find latent variables, Singular Value Decomposition (SVD for PLSC and Non-linear Iterative PArtial Least Squares (NIPALS for PLSR are implemented in this tutorial. SVD decomposes the large data block into three manageable matrices containing a diagonal set of singular values, as well as left and right singular vectors. For PLSR, NIPALS algorithms are used because it provides amore precise estimation of the latent variables. Mathematica notebooks are provided for each PLS method with clearly labeled sections and subsections. The
New spectrophotometric/chemometric assisted methods for the simultaneous determination of imatinib, gemifloxacin, nalbuphine and naproxen in pharmaceutical formulations and human urine

Science.gov (United States)

Belal, F.; Ibrahim, F.; Sheribah, Z. A.; Alaa, H.

2018-06-01

In this paper, novel univariate and multivariate regression methods along with model-updating technique were developed and validated for the simultaneous determination of quaternary mixture of imatinib (IMB), gemifloxacin (GMI), nalbuphine (NLP) and naproxen (NAP). The univariate method is extended derivative ratio (EDR) which depends on measuring every drug in the quaternary mixture by using a ternary mixture of the other three drugs as divisor. Peak amplitudes were measured at 294 nm, 250 nm, 283 nm and 239 nm within linear concentration ranges of 4.0-17.0, 3.0-15.0, 4.0-80.0 and 1.0-6.0 μg mL-1 for IMB, GMI, NLP and NAB, respectively. Multivariate methods adopted are partial least squares (PLS) in original and derivative mode. These models were constructed for simultaneous determination of the studied drugs in the ranges of 4.0-8.0, 3.0-11.0, 10.0-18.0 and 1.0-3.0 μg mL-1 for IMB, GMI, NLP and NAB, respectively, by using eighteen mixtures as a calibration set and seven mixtures as a validation set. The root mean square error of predication (RMSEP) were 0.09 and 0.06 for IMB, 0.14 and 0.13 for GMI, 0.07 and 0.02 for NLP and 0.64 and 0.27 for NAP by PLS in original and derivative mode, respectively. Both models were successfully applied for analysis of IMB, GMI, NLP and NAP in their dosage forms. Updated PLS in derivative mode and EDR were applied for determination of the studied drugs in spiked human urine. The obtained results were statistically compared with those obtained by the reported methods giving a conclusion that there is no significant difference regarding accuracy and precision.
Phantom and animal imaging studies using PLS synchrotron X-rays

CERN Document Server

Hee Joung Kim; Kyu Ho Lee; Hai Jo Jung; Eun Kyung Kim; Jung Ho Je; In Woo Kim; Yeukuang, Hwu; Wen Li Tsai; Je Kyung Seong; Seung Won Lee; Hyung Sik Yoo

2001-01-01

Ultra-high resolution radiographs can be obtained using synchrotron X-rays. A collaboration team consisting of K-JIST, POSTECH and YUMC has recently commissioned a new beamline (5C1) at Pohang Light Source (PLS) in Korea for medical applications using phase contrast radiology. Relatively simple image acquisition systems were set up on 5C1 beamline, and imaging studies were performed for resolution test patterns, mammographic phantom, and animals. Resolution test patterns and mammographic phantom images showed much better image resolution and quality with the 5C1 imaging system than the mammography system. Both fish and mouse images with 5C1 imaging system also showed much better image resolution with great details of organs and anatomy compared to those obtained with a conventional mammography system. A simple and inexpensive ultra-high resolution imaging system on 5C1 beamline was successfully implemented. The authors were able to acquire ultra-high resolution images for, resolution test patterns, mammograph...
Assessing the performance of variational methods for mixed logistic regression models

Czech Academy of Sciences Publication Activity Database

Rijmen, F.; Vomlel, Jiří

2008-01-01

Roč. 78, č. 8 (2008), s. 765-779 ISSN 0094-9655 R&D Projects: GA MŠk 1M0572 Grant - others:GA MŠk(CZ) 2C06019 Institutional research plan: CEZ:AV0Z10750506 Keywords : Mixed models * Logistic regression * Variational methods * Lower bound approximation Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.353, year: 2008
Method for nonlinear exponential regression analysis

Science.gov (United States)

Junkin, B. G.

1972-01-01

Two computer programs developed according to two general types of exponential models for conducting nonlinear exponential regression analysis are described. Least squares procedure is used in which the nonlinear problem is linearized by expanding in a Taylor series. Program is written in FORTRAN 5 for the Univac 1108 computer.
Microscale Solubility Measurements of Matrix-Assisted Laser Desorption-Ionization (MALDI) Matrices Using Attenuated Total Reflection (ATR) Fourier Transform Infrared Spectroscopy (FT-IR) Coupled with Partial Least Squares (PLS) Analysis.

Science.gov (United States)

Gorre, Elsa; Owens, Kevin G

2016-11-01

In this work an attenuated total reflection Fourier transform infrared (FT-IR) absorption based method is used to measure the solubility of two matrix-assisted laser desorption-ionization (MALDI) matrices in a few pure solvents and mixtures of acetonitrile and water using low microliter amounts of solution. Results from a method that averages the values obtained from multiple calibration curves created by manual peak picking are compared to those predicted using a partial least squares (PLS) chemometrics approach. The PLS method provided solubility values that were in good agreement with the manual method with significantly greater ease of analysis. As a test, the solubility of adipic acid in acetone was measured using the two methods of analysis, and the values are in good agreement with solubility values reported in literature. The solubilities of the MALDI matrices α-cyano-4-hydroxy cinnamic acid (CHCA) and sinapinic acid (SA) were measured in a series of mixtures made from acetonitrile (ACN) and water; surprisingly, the results show a highly nonlinear trend. While both CHCA and SA show solubility values of less than 10 mg/mL in the pure solvents, the solubility value for SA increases to 56.3 mg/mL in a 75:25 v/v ACN:water mixture. This can have a significant effect on the matrix-to-analyte ratios in the MALDI experiment when sample protocols call for preparation of a saturated solution of the matrix in the chosen solvent system. © The Author(s) 2016.
Direct integral linear least square regression method for kinetic evaluation of hepatobiliary scintigraphy

International Nuclear Information System (INIS)

Shuke, Noriyuki

1991-01-01

In hepatobiliary scintigraphy, kinetic model analysis, which provides kinetic parameters like hepatic extraction or excretion rate, have been done for quantitative evaluation of liver function. In this analysis, unknown model parameters are usually determined using nonlinear least square regression method (NLS method) where iterative calculation and initial estimate for unknown parameters are required. As a simple alternative to NLS method, direct integral linear least square regression method (DILS method), which can determine model parameters by a simple calculation without initial estimate, is proposed, and tested the applicability to analysis of hepatobiliary scintigraphy. In order to see whether DILS method could determine model parameters as good as NLS method, or to determine appropriate weight for DILS method, simulated theoretical data based on prefixed parameters were fitted to 1 compartment model using both DILS method with various weightings and NLS method. The parameter values obtained were then compared with prefixed values which were used for data generation. The effect of various weights on the error of parameter estimate was examined, and inverse of time was found to be the best weight to make the error minimum. When using this weight, DILS method could give parameter values close to those obtained by NLS method and both parameter values were very close to prefixed values. With appropriate weighting, the DILS method could provide reliable parameter estimate which is relatively insensitive to the data noise. In conclusion, the DILS method could be used as a simple alternative to NLS method, providing reliable parameter estimate. (author)
Alpins and thibos vectorial astigmatism analyses: proposal of a linear regression model between methods

Directory of Open Access Journals (Sweden)

Giuliano de Oliveira Freitas

2013-10-01

Full Text Available PURPOSE: To determine linear regression models between Alpins descriptive indices and Thibos astigmatic power vectors (APV, assessing the validity and strength of such correlations. METHODS: This case series prospectively assessed 62 eyes of 31 consecutive cataract patients with preoperative corneal astigmatism between 0.75 and 2.50 diopters in both eyes. Patients were randomly assorted among two phacoemulsification groups: one assigned to receive AcrySof®Toric intraocular lens (IOL in both eyes and another assigned to have AcrySof Natural IOL associated with limbal relaxing incisions, also in both eyes. All patients were reevaluated postoperatively at 6 months, when refractive astigmatism analysis was performed using both Alpins and Thibos methods. The ratio between Thibos postoperative APV and preoperative APV (APVratio and its linear regression to Alpins percentage of success of astigmatic surgery, percentage of astigmatism corrected and percentage of astigmatism reduction at the intended axis were assessed. RESULTS: Significant negative correlation between the ratio of post- and preoperative Thibos APVratio and Alpins percentage of success (%Success was found (Spearman's ρ=-0.93; linear regression is given by the following equation: %Success = (-APVratio + 1.00x100. CONCLUSION: The linear regression we found between APVratio and %Success permits a validated mathematical inference concerning the overall success of astigmatic surgery.
Using the fuzzy linear regression method to benchmark the energy efficiency of commercial buildings

International Nuclear Information System (INIS)

Chung, William

2012-01-01

Highlights: ► Fuzzy linear regression method is used for developing benchmarking systems. ► The systems can be used to benchmark energy efficiency of commercial buildings. ► The resulting benchmarking model can be used by public users. ► The resulting benchmarking model can capture the fuzzy nature of input–output data. -- Abstract: Benchmarking systems from a sample of reference buildings need to be developed to conduct benchmarking processes for the energy efficiency of commercial buildings. However, not all benchmarking systems can be adopted by public users (i.e., other non-reference building owners) because of the different methods in developing such systems. An approach for benchmarking the energy efficiency of commercial buildings using statistical regression analysis to normalize other factors, such as management performance, was developed in a previous work. However, the field data given by experts can be regarded as a distribution of possibility. Thus, the previous work may not be adequate to handle such fuzzy input–output data. Consequently, a number of fuzzy structures cannot be fully captured by statistical regression analysis. This present paper proposes the use of fuzzy linear regression analysis to develop a benchmarking process, the resulting model of which can be used by public users. An illustrative example is given as well.
Optimization Method of Fusing Model Tree into Partial Least Squares

Directory of Open Access Journals (Sweden)

Yu Fang

2017-01-01

Full Text Available Partial Least Square (PLS can’t adapt to the characteristics of the data of many fields due to its own features multiple independent variables, multi-dependent variables and non-linear. However, Model Tree (MT has a good adaptability to nonlinear function, which is made up of many multiple linear segments. Based on this, a new method combining PLS and MT to analysis and predict the data is proposed, which build MT through the main ingredient and the explanatory variables(the dependent variable extracted from PLS, and extract residual information constantly to build Model Tree until well-pleased accuracy condition is satisfied. Using the data of the maxingshigan decoction of the monarch drug to treat the asthma or cough and two sample sets in the UCI Machine Learning Repository, the experimental results show that, the ability of explanation and predicting get improved in the new method.
Post-processing through linear regression

Science.gov (United States)

van Schaeybroeck, B.; Vannitsem, S.

2011-03-01

Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS) method, a new time-dependent Tikhonov regularization (TDTR) method, the total least-square method, a new geometric-mean regression (GM), a recently introduced error-in-variables (EVMOS) method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified. These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise). At long lead times the regression schemes (EVMOS, TDTR) which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.
Cox regression with missing covariate data using a modified partial likelihood method

DEFF Research Database (Denmark)

Martinussen, Torben; Holst, Klaus K.; Scheike, Thomas H.

2016-01-01

Missing covariate values is a common problem in survival analysis. In this paper we propose a novel method for the Cox regression model that is close to maximum likelihood but avoids the use of the EM-algorithm. It exploits that the observed hazard function is multiplicative in the baseline hazard...
Application of near-infrared spectroscopy for the rapid quality assessment of Radix Paeoniae Rubra

Science.gov (United States)

Zhan, Hao; Fang, Jing; Tang, Liying; Yang, Hongjun; Li, Hua; Wang, Zhuju; Yang, Bin; Wu, Hongwei; Fu, Meihong

2017-08-01

Near-infrared (NIR) spectroscopy with multivariate analysis was used to quantify gallic acid, catechin, albiflorin, and paeoniflorin in Radix Paeoniae Rubra, and the feasibility to classify the samples originating from different areas was investigated. A new high-performance liquid chromatography method was developed and validated to analyze gallic acid, catechin, albiflorin, and paeoniflorin in Radix Paeoniae Rubra as the reference. Partial least squares (PLS), principal component regression (PCR), and stepwise multivariate linear regression (SMLR) were performed to calibrate the regression model. Different data pretreatments such as derivatives (1st and 2nd), multiplicative scatter correction, standard normal variate, Savitzky-Golay filter, and Norris derivative filter were applied to remove the systematic errors. The performance of the model was evaluated according to the root mean square of calibration (RMSEC), root mean square error of prediction (RMSEP), root mean square error of cross-validation (RMSECV), and correlation coefficient (r). The results show that compared to PCR and SMLR, PLS had a lower RMSEC, RMSECV, and RMSEP and higher r for all the four analytes. PLS coupled with proper pretreatments showed good performance in both the fitting and predicting results. Furthermore, the original areas of Radix Paeoniae Rubra samples were partly distinguished by principal component analysis. This study shows that NIR with PLS is a reliable, inexpensive, and rapid tool for the quality assessment of Radix Paeoniae Rubra.
Nonparametric Methods in Astronomy: Think, Regress, Observe—Pick Any Three

Science.gov (United States)

Steinhardt, Charles L.; Jermyn, Adam S.

2018-02-01

Telescopes are much more expensive than astronomers, so it is essential to minimize required sample sizes by using the most data-efficient statistical methods possible. However, the most commonly used model-independent techniques for finding the relationship between two variables in astronomy are flawed. In the worst case they can lead without warning to subtly yet catastrophically wrong results, and even in the best case they require more data than necessary. Unfortunately, there is no single best technique for nonparametric regression. Instead, we provide a guide for how astronomers can choose the best method for their specific problem and provide a python library with both wrappers for the most useful existing algorithms and implementations of two new algorithms developed here.
Total sulfur determination in residues of crude oil distillation using FT-IR/ATR and variable selection methods

Science.gov (United States)

Müller, Aline Lima Hermes; Picoloto, Rochele Sogari; Mello, Paola de Azevedo; Ferrão, Marco Flores; dos Santos, Maria de Fátima Pereira; Guimarães, Regina Célia Lourenço; Müller, Edson Irineu; Flores, Erico Marlon Moraes

2012-04-01

Total sulfur concentration was determined in atmospheric residue (AR) and vacuum residue (VR) samples obtained from petroleum distillation process by Fourier transform infrared spectroscopy with attenuated total reflectance (FT-IR/ATR) in association with chemometric methods. Calibration and prediction set consisted of 40 and 20 samples, respectively. Calibration models were developed using two variable selection models: interval partial least squares (iPLS) and synergy interval partial least squares (siPLS). Different treatments and pre-processing steps were also evaluated for the development of models. The pre-treatment based on multiplicative scatter correction (MSC) and the mean centered data were selected for models construction. The use of siPLS as variable selection method provided a model with root mean square error of prediction (RMSEP) values significantly better than those obtained by PLS model using all variables. The best model was obtained using siPLS algorithm with spectra divided in 20 intervals and combinations of 3 intervals (911-824, 823-736 and 737-650 cm-1). This model produced a RMSECV of 400 mg kg-1 S and RMSEP of 420 mg kg-1 S, showing a correlation coefficient of 0.990.
Linear regression in astronomy. I

Science.gov (United States)

Isobe, Takashi; Feigelson, Eric D.; Akritas, Michael G.; Babu, Gutti Jogesh

1990-01-01

Five methods for obtaining linear regression fits to bivariate data with unknown or insignificant measurement errors are discussed: ordinary least-squares (OLS) regression of Y on X, OLS regression of X on Y, the bisector of the two OLS lines, orthogonal regression, and 'reduced major-axis' regression. These methods have been used by various researchers in observational astronomy, most importantly in cosmic distance scale applications. Formulas for calculating the slope and intercept coefficients and their uncertainties are given for all the methods, including a new general form of the OLS variance estimates. The accuracy of the formulas was confirmed using numerical simulations. The applicability of the procedures is discussed with respect to their mathematical properties, the nature of the astronomical data under consideration, and the scientific purpose of the regression. It is found that, for problems needing symmetrical treatment of the variables, the OLS bisector performs significantly better than orthogonal or reduced major-axis regression.
Estimation of active pharmaceutical ingredients content using locally weighted partial least squares and statistical wavelength selection.

Science.gov (United States)

Kim, Sanghong; Kano, Manabu; Nakagawa, Hiroshi; Hasebe, Shinji

2011-12-15

Development of quality estimation models using near infrared spectroscopy (NIRS) and multivariate analysis has been accelerated as a process analytical technology (PAT) tool in the pharmaceutical industry. Although linear regression methods such as partial least squares (PLS) are widely used, they cannot always achieve high estimation accuracy because physical and chemical properties of a measuring object have a complex effect on NIR spectra. In this research, locally weighted PLS (LW-PLS) which utilizes a newly defined similarity between samples is proposed to estimate active pharmaceutical ingredient (API) content in granules for tableting. In addition, a statistical wavelength selection method which quantifies the effect of API content and other factors on NIR spectra is proposed. LW-PLS and the proposed wavelength selection method were applied to real process data provided by Daiichi Sankyo Co., Ltd., and the estimation accuracy was improved by 38.6% in root mean square error of prediction (RMSEP) compared to the conventional PLS using wavelengths selected on the basis of variable importance on the projection (VIP). The results clearly show that the proposed calibration modeling technique is useful for API content estimation and is superior to the conventional one. Copyright © 2011 Elsevier B.V. All rights reserved.

Fungible weights in logistic regression.

Science.gov (United States)

Jones, Jeff A; Waller, Niels G

2016-06-01

In this article we develop methods for assessing parameter sensitivity in logistic regression models. To set the stage for this work, we first review Waller's (2008) equations for computing fungible weights in linear regression. Next, we describe 2 methods for computing fungible weights in logistic regression. To demonstrate the utility of these methods, we compute fungible logistic regression weights using data from the Centers for Disease Control and Prevention's (2010) Youth Risk Behavior Surveillance Survey, and we illustrate how these alternate weights can be used to evaluate parameter sensitivity. To make our work accessible to the research community, we provide R code (R Core Team, 2015) that will generate both kinds of fungible logistic regression weights. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Geographically weighted regression based methods for merging satellite and gauge precipitation

Science.gov (United States)

Chao, Lijun; Zhang, Ke; Li, Zhijia; Zhu, Yuelong; Wang, Jingfeng; Yu, Zhongbo

2018-03-01

Real-time precipitation data with high spatiotemporal resolutions are crucial for accurate hydrological forecasting. To improve the spatial resolution and quality of satellite precipitation, a three-step satellite and gauge precipitation merging method was formulated in this study: (1) bilinear interpolation is first applied to downscale coarser satellite precipitation to a finer resolution (PS); (2) the (mixed) geographically weighted regression methods coupled with a weighting function are then used to estimate biases of PS as functions of gauge observations (PO) and PS; and (3) biases of PS are finally corrected to produce a merged precipitation product. Based on the above framework, eight algorithms, a combination of two geographically weighted regression methods and four weighting functions, are developed to merge CMORPH (CPC MORPHing technique) precipitation with station observations on a daily scale in the Ziwuhe Basin of China. The geographical variables (elevation, slope, aspect, surface roughness, and distance to the coastline) and a meteorological variable (wind speed) were used for merging precipitation to avoid the artificial spatial autocorrelation resulting from traditional interpolation methods. The results show that the combination of the MGWR and BI-square function (MGWR-BI) has the best performance (R = 0.863 and RMSE = 7.273 mm/day) among the eight algorithms. The MGWR-BI algorithm was then applied to produce hourly merged precipitation product. Compared to the original CMORPH product (R = 0.208 and RMSE = 1.208 mm/hr), the quality of the merged data is significantly higher (R = 0.724 and RMSE = 0.706 mm/hr). The developed merging method not only improves the spatial resolution and quality of the satellite product but also is easy to implement, which is valuable for hydrological modeling and other applications.
Post-processing through linear regression

Directory of Open Access Journals (Sweden)

B. Van Schaeybroeck

2011-03-01

Full Text Available Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS method, a new time-dependent Tikhonov regularization (TDTR method, the total least-square method, a new geometric-mean regression (GM, a recently introduced error-in-variables (EVMOS method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified.

These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise. At long lead times the regression schemes (EVMOS, TDTR which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.
Building optimal regression tree by ant colony system-genetic algorithm: Application to modeling of melting points

Energy Technology Data Exchange (ETDEWEB)

Hemmateenejad, Bahram, E-mail: hemmatb@sums.ac.ir [Department of Chemistry, Shiraz University, Shiraz (Iran, Islamic Republic of); Medicinal and Natural Products Chemistry Research Center, Shiraz University of Medical Sciences, Shiraz (Iran, Islamic Republic of); Shamsipur, Mojtaba [Department of Chemistry, Razi University, Kermanshah (Iran, Islamic Republic of); Zare-Shahabadi, Vali [Young Researchers Club, Mahshahr Branch, Islamic Azad University, Mahshahr (Iran, Islamic Republic of); Akhond, Morteza [Department of Chemistry, Shiraz University, Shiraz (Iran, Islamic Republic of)

2011-10-17

Highlights: {yields} Ant colony systems help to build optimum classification and regression trees. {yields} Using of genetic algorithm operators in ant colony systems resulted in more appropriate models. {yields} Variable selection in each terminal node of the tree gives promising results. {yields} CART-ACS-GA could model the melting point of organic materials with prediction errors lower than previous models. - Abstract: The classification and regression trees (CART) possess the advantage of being able to handle large data sets and yield readily interpretable models. A conventional method of building a regression tree is recursive partitioning, which results in a good but not optimal tree. Ant colony system (ACS), which is a meta-heuristic algorithm and derived from the observation of real ants, can be used to overcome this problem. The purpose of this study was to explore the use of CART and its combination with ACS for modeling of melting points of a large variety of chemical compounds. Genetic algorithm (GA) operators (e.g., cross averring and mutation operators) were combined with ACS algorithm to select the best solution model. In addition, at each terminal node of the resulted tree, variable selection was done by ACS-GA algorithm to build an appropriate partial least squares (PLS) model. To test the ability of the resulted tree, a set of approximately 4173 structures and their melting points were used (3000 compounds as training set and 1173 as validation set). Further, an external test set containing of 277 drugs was used to validate the prediction ability of the tree. Comparison of the results obtained from both trees showed that the tree constructed by ACS-GA algorithm performs better than that produced by recursive partitioning procedure.
The Research of Regression Method for Forecasting Monthly Electricity Sales Considering Coupled Multi-factor

Science.gov (United States)

Wang, Jiangbo; Liu, Junhui; Li, Tiantian; Yin, Shuo; He, Xinhui

2018-01-01

The monthly electricity sales forecasting is a basic work to ensure the safety of the power system. This paper presented a monthly electricity sales forecasting method which comprehensively considers the coupled multi-factors of temperature, economic growth, electric power replacement and business expansion. The mathematical model is constructed by using regression method. The simulation results show that the proposed method is accurate and effective.
Implementing the Fundamental Principle of Islamic Finance PLS in Order to Reduce Moral Hazard on the Financial Services Market

OpenAIRE

Dariusz Piotrowski

2014-01-01

Moral hazard is a situation where agent takes a risky actions, knowing that potential costs will be born by principal. In finance, moral hazard arises when advisers take risky decisions come to believe that they will not have to carry the full burden of potential loses. Implementing the fundamental principle of Islamic finance PLS could reduce moral hazard on financial services market.
A new analytical method for quantification of olive and palm oil in blends with other vegetable edible oils based on the chromatographic fingerprints from the methyl-transesterified fraction.

Science.gov (United States)

Jiménez-Carvelo, Ana M; González-Casado, Antonio; Cuadros-Rodríguez, Luis

2017-03-01

A new analytical method for the quantification of olive oil and palm oil in blends with other vegetable edible oils (canola, safflower, corn, peanut, seeds, grapeseed, linseed, sesame and soybean) using normal phase liquid chromatography, and applying chemometric tools was developed. The procedure for obtaining of chromatographic fingerprint from the methyl-transesterified fraction from each blend is described. The multivariate quantification methods used were Partial Least Square-Regression (PLS-R) and Support Vector Regression (SVR). The quantification results were evaluated by several parameters as the Root Mean Square Error of Validation (RMSEV), Mean Absolute Error of Validation (MAEV) and Median Absolute Error of Validation (MdAEV). It has to be highlighted that the new proposed analytical method, the chromatographic analysis takes only eight minutes and the results obtained showed the potential of this method and allowed quantification of mixtures of olive oil and palm oil with other vegetable oils. Copyright © 2016 Elsevier B.V. All rights reserved.
Applications of Monte Carlo method to nonlinear regression of rheological data

Science.gov (United States)

Kim, Sangmo; Lee, Junghaeng; Kim, Sihyun; Cho, Kwang Soo

2018-02-01

In rheological study, it is often to determine the parameters of rheological models from experimental data. Since both rheological data and values of the parameters vary in logarithmic scale and the number of the parameters is quite large, conventional method of nonlinear regression such as Levenberg-Marquardt (LM) method is usually ineffective. The gradient-based method such as LM is apt to be caught in local minima which give unphysical values of the parameters whenever the initial guess of the parameters is far from the global optimum. Although this problem could be solved by simulated annealing (SA), the Monte Carlo (MC) method needs adjustable parameter which could be determined in ad hoc manner. We suggest a simplified version of SA, a kind of MC methods which results in effective values of the parameters of most complicated rheological models such as the Carreau-Yasuda model of steady shear viscosity, discrete relaxation spectrum and zero-shear viscosity as a function of concentration and molecular weight.
Estimación de biomasa en herbáceas a partir de datos hiperespectrales, regresión PLS y la transformación continuum removal

Directory of Open Access Journals (Sweden)

M. Marabel-García

2014-12-01

Full Text Available El objetivo del estudio fue comparar los resultados de dos métodos para la estimación de la biomasa aérea a partir de datos de espectroradiometría de campo: (i regresión por mínimos cuadrados parciales (Partial Least Squares Regression, PLSR y (ii regresión lineal utilizando los índices Profundidad del Mínimo (Maximum Band Depth, MBD y Área Sobre el Mínimo (Area Over the Minimum, AOM como descriptores. En ambos casos se llevó a cabo una previa transformación de los espectros mediante Continuum Removal (CR. Como los resultados empleando PLS (R2=0,920, RMSE=3,622 g/m2 fueron muy similares a los obtenidos con los índices (para AOM: R2=0,915, RMSE=3,615 g/m2, recomendamos los índices derivados del CR puesto que su interpretación es más sencilla que la del PLSR.
Further Insight and Additional Inference Methods for Polynomial Regression Applied to the Analysis of Congruence

Science.gov (United States)

Cohen, Ayala; Nahum-Shani, Inbal; Doveh, Etti

2010-01-01

In their seminal paper, Edwards and Parry (1993) presented the polynomial regression as a better alternative to applying difference score in the study of congruence. Although this method is increasingly applied in congruence research, its complexity relative to other methods for assessing congruence (e.g., difference score methods) was one of the…
Multi-step polynomial regression method to model and forecast malaria incidence.

Directory of Open Access Journals (Sweden)

Chandrajit Chatterjee

Full Text Available Malaria is one of the most severe problems faced by the world even today. Understanding the causative factors such as age, sex, social factors, environmental variability etc. as well as underlying transmission dynamics of the disease is important for epidemiological research on malaria and its eradication. Thus, development of suitable modeling approach and methodology, based on the available data on the incidence of the disease and other related factors is of utmost importance. In this study, we developed a simple non-linear regression methodology in modeling and forecasting malaria incidence in Chennai city, India, and predicted future disease incidence with high confidence level. We considered three types of data to develop the regression methodology: a longer time series data of Slide Positivity Rates (SPR of malaria; a smaller time series data (deaths due to Plasmodium vivax of one year; and spatial data (zonal distribution of P. vivax deaths for the city along with the climatic factors, population and previous incidence of the disease. We performed variable selection by simple correlation study, identification of the initial relationship between variables through non-linear curve fitting and used multi-step methods for induction of variables in the non-linear regression analysis along with applied Gauss-Markov models, and ANOVA for testing the prediction, validity and constructing the confidence intervals. The results execute the applicability of our method for different types of data, the autoregressive nature of forecasting, and show high prediction power for both SPR and P. vivax deaths, where the one-lag SPR values plays an influential role and proves useful for better prediction. Different climatic factors are identified as playing crucial role on shaping the disease curve. Further, disease incidence at zonal level and the effect of causative factors on different zonal clusters indicate the pattern of malaria prevalence in the city
Rye Bran Modified with Cell Wall-Degrading Enzymes Influences the Kinetics of Plant Lignans but Not of Enterolignans in Multicatheterized Pigs.

Science.gov (United States)

Bolvig, Anne K; Nørskov, Natalja P; van Vliet, Sophie; Foldager, Leslie; Curtasu, Mihai V; Hedemann, Mette S; Sørensen, Jens F; Lærke, Helle N; Bach Knudsen, Knud E

2017-12-01

Background: Whole-grain intake is associated with a lower risk of chronic Western-style diseases, possibly brought about by the high concentration of phytochemicals, among them plant lignans (PLs), in the grains. Objective: We studied whether treatment of rye bran with cell wall-degrading enzymes changed the solubility and kinetics of PLs in multicatheterized pigs. Methods: Ten female Duroc × Danish Landrace × Yorkshire pigs (60.3 ± 2.3 kg at surgery) fitted with permanent catheters were included in an incomplete crossover study. The pigs were fed 2 experimental diets for 1-7 d. The diets were rich in PLs and based on nontreated lignan-rich [LR; lignan concentration: 20.2 mg dry matter (DM)/kg] or enzymatically treated lignan-rich (ENZLR; lignan concentration: 27.8 mg DM/kg) rye bran. Plasma concentrations of PLs and enterolignans were quantified with the use of targeted LC-tandem mass spectrometry. Data were log transformed and analyzed with mixed-effects, 1-compartment, and asymptotic regression models. Results: The availability of PLs was 38% greater in ENZLR than in LR, and the soluble fraction of PLs was 49% in ENZLR compared with 35% in LR diets. PLs appeared in the circulation 30 min after intake of both the ENZLR and LR diets. Postprandially, consumption of ENZLR resulted in a 4-times-greater ( P concentration compared with LR. The area under the curve (AUC) measured 0-360 min after ENZLR intake was ∼2 times higher than after LR intake. A 1-compartment model could describe the postprandial increase in plasma concentration after ENZLR intake, whereas an asymptotic regression model described the plasma concentrations after LR intake. Despite increased available and soluble PLs, ENZLR did not increase plasma enterolignans. Conclusion: The modification of rye bran with cell wall-degrading enzymes resulted in significantly greater plasma concentrations of PLs and the 4-h AUC, particularly syringaresinol, in multicatheterized pigs. © 2017 American Society
Comparing treatment effects after adjustment with multivariable Cox proportional hazards regression and propensity score methods

NARCIS (Netherlands)

Martens, Edwin P; de Boer, Anthonius; Pestman, Wiebe R; Belitser, Svetlana V; Stricker, Bruno H Ch; Klungel, Olaf H

PURPOSE: To compare adjusted effects of drug treatment for hypertension on the risk of stroke from propensity score (PS) methods with a multivariable Cox proportional hazards (Cox PH) regression in an observational study with censored data. METHODS: From two prospective population-based cohort
Support vector methods for survival analysis: a comparison between ranking and regression approaches.

Science.gov (United States)

Van Belle, Vanya; Pelckmans, Kristiaan; Van Huffel, Sabine; Suykens, Johan A K

2011-10-01

To compare and evaluate ranking, regression and combined machine learning approaches for the analysis of survival data. The literature describes two approaches based on support vector machines to deal with censored observations. In the first approach the key idea is to rephrase the task as a ranking problem via the concordance index, a problem which can be solved efficiently in a context of structural risk minimization and convex optimization techniques. In a second approach, one uses a regression approach, dealing with censoring by means of inequality constraints. The goal of this paper is then twofold: (i) introducing a new model combining the ranking and regression strategy, which retains the link with existing survival models such as the proportional hazards model via transformation models; and (ii) comparison of the three techniques on 6 clinical and 3 high-dimensional datasets and discussing the relevance of these techniques over classical approaches fur survival data. We compare svm-based survival models based on ranking constraints, based on regression constraints and models based on both ranking and regression constraints. The performance of the models is compared by means of three different measures: (i) the concordance index, measuring the model's discriminating ability; (ii) the logrank test statistic, indicating whether patients with a prognostic index lower than the median prognostic index have a significant different survival than patients with a prognostic index higher than the median; and (iii) the hazard ratio after normalization to restrict the prognostic index between 0 and 1. Our results indicate a significantly better performance for models including regression constraints above models only based on ranking constraints. This work gives empirical evidence that svm-based models using regression constraints perform significantly better than svm-based models based on ranking constraints. Our experiments show a comparable performance for methods
A robust and efficient stepwise regression method for building sparse polynomial chaos expansions

Energy Technology Data Exchange (ETDEWEB)

Abraham, Simon, E-mail: Simon.Abraham@ulb.ac.be [Vrije Universiteit Brussel (VUB), Department of Mechanical Engineering, Research Group Fluid Mechanics and Thermodynamics, Pleinlaan 2, 1050 Brussels (Belgium); Raisee, Mehrdad [School of Mechanical Engineering, College of Engineering, University of Tehran, P.O. Box: 11155-4563, Tehran (Iran, Islamic Republic of); Ghorbaniasl, Ghader; Contino, Francesco; Lacor, Chris [Vrije Universiteit Brussel (VUB), Department of Mechanical Engineering, Research Group Fluid Mechanics and Thermodynamics, Pleinlaan 2, 1050 Brussels (Belgium)

2017-03-01

Polynomial Chaos (PC) expansions are widely used in various engineering fields for quantifying uncertainties arising from uncertain parameters. The computational cost of classical PC solution schemes is unaffordable as the number of deterministic simulations to be calculated grows dramatically with the number of stochastic dimension. This considerably restricts the practical use of PC at the industrial level. A common approach to address such problems is to make use of sparse PC expansions. This paper presents a non-intrusive regression-based method for building sparse PC expansions. The most important PC contributions are detected sequentially through an automatic search procedure. The variable selection criterion is based on efficient tools relevant to probabilistic method. Two benchmark analytical functions are used to validate the proposed algorithm. The computational efficiency of the method is then illustrated by a more realistic CFD application, consisting of the non-deterministic flow around a transonic airfoil subject to geometrical uncertainties. To assess the performance of the developed methodology, a detailed comparison is made with the well established LAR-based selection technique. The results show that the developed sparse regression technique is able to identify the most significant PC contributions describing the problem. Moreover, the most important stochastic features are captured at a reduced computational cost compared to the LAR method. The results also demonstrate the superior robustness of the method by repeating the analyses using random experimental designs.
Classification of structurally related commercial contrast media by near infrared spectroscopy.

Science.gov (United States)

Yip, Wai Lam; Soosainather, Tom Collin; Dyrstad, Knut; Sande, Sverre Arne

2014-03-01

Near infrared spectroscopy (NIRS) is a non-destructive measurement technique with broad application in pharmaceutical industry. Correct identification of pharmaceutical ingredients is an important task for quality control. Failure in this step can result in several adverse consequences, varied from economic loss to negative impact on patient safety. We have compared different methods in classification of a set of commercially available structurally related contrast media, Iodixanol (Visipaque(®)), Iohexol (Omnipaque(®)), Caldiamide Sodium and Gadodiamide (Omniscan(®)), by using NIR spectroscopy. The performance of classification models developed by soft independent modelling of class analogy (SIMCA), partial least squares discriminant analysis (PLS-DA) and Main and Interactions of Individual Principal Components Regression (MIPCR) were compared. Different variable selection methods were applied to optimize the classification models. Models developed by backward variable elimination partial least squares regression (BVE-PLS) and MIPCR were found to be most effective for classification of the set of contrast media. Below 1.5% of samples from the independent test set were not recognized by the BVE-PLS and MIPCR models, compared to up to 15% when models developed by other techniques were applied. Copyright © 2013 Elsevier B.V. All rights reserved.
Discussion on Regression Methods Based on Ensemble Learning and Applicability Domains of Linear Submodels.

Science.gov (United States)

Kaneko, Hiromasa

2018-02-26

To develop a new ensemble learning method and construct highly predictive regression models in chemoinformatics and chemometrics, applicability domains (ADs) are introduced into the ensemble learning process of prediction. When estimating values of an objective variable using subregression models, only the submodels with ADs that cover a query sample, i.e., the sample is inside the model's AD, are used. By constructing submodels and changing a list of selected explanatory variables, the union of the submodels' ADs, which defines the overall AD, becomes large, and the prediction performance is enhanced for diverse compounds. By analyzing a quantitative structure-activity relationship data set and a quantitative structure-property relationship data set, it is confirmed that the ADs can be enlarged and the estimation performance of regression models is improved compared with traditional methods.
Regression analysis by example

CERN Document Server

Chatterjee, Samprit

2012-01-01

Praise for the Fourth Edition: ""This book is . . . an excellent source of examples for regression analysis. It has been and still is readily readable and understandable."" -Journal of the American Statistical Association Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. Regression Analysis by Example, Fifth Edition has been expanded
An evaluation of regression methods to estimate nutritional condition of canvasbacks and other water birds

Science.gov (United States)

Sparling, D.W.; Barzen, J.A.; Lovvorn, J.R.; Serie, J.R.

1992-01-01

Regression equations that use mensural data to estimate body condition have been developed for several water birds. These equations often have been based on data that represent different sexes, age classes, or seasons, without being adequately tested for intergroup differences. We used proximate carcass analysis of 538 adult and juvenile canvasbacks (Aythya valisineria ) collected during fall migration, winter, and spring migrations in 1975-76 and 1982-85 to test regression methods for estimating body condition.
Spectrophotometric and chemometric methods for determination of imipenem, ciprofloxacin hydrochloride, dexamethasone sodium phosphate, paracetamol and cilastatin sodium in human urine

Science.gov (United States)

El-Kosasy, A. M.; Abdel-Aziz, Omar; Magdy, N.; El Zahar, N. M.

2016-03-01

New accurate, sensitive and selective spectrophotometric and chemometric methods were developed and subsequently validated for determination of Imipenem (IMP), ciprofloxacin hydrochloride (CIPRO), dexamethasone sodium phosphate (DEX), paracetamol (PAR) and cilastatin sodium (CIL) in human urine. These methods include a new derivative ratio method, namely extended derivative ratio (EDR), principal component regression (PCR) and partial least-squares (PLS) methods. A novel EDR method was developed for the determination of these drugs, where each component in the mixture was determined by using a mixture of the other four components as divisor. Peak amplitudes were recorded at 293.0 nm, 284.0 nm, 276.0 nm, 257.0 nm and 221.0 nm within linear concentration ranges 3.00-45.00, 1.00-15.00, 4.00-40.00, 1.50-25.00 and 4.00-50.00 μg mL- 1 for IMP, CIPRO, DEX, PAR and CIL, respectively. PCR and PLS-2 models were established for simultaneous determination of the studied drugs in the range of 3.00-15.00, 1.00-13.00, 4.00-12.00, 1.50-9.50, and 4.00-12.00 μg mL- 1 for IMP, CIPRO, DEX, PAR and CIL, respectively, by using eighteen mixtures as calibration set and seven mixtures as validation set. The suggested methods were validated according to the International Conference of Harmonization (ICH) guidelines and the results revealed that they were accurate, precise and reproducible. The obtained results were statistically compared with those of the published methods and there was no significant difference.

Regression modeling of ground-water flow

Science.gov (United States)

Cooley, R.L.; Naff, R.L.

1985-01-01

Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
Using the Ridge Regression Procedures to Estimate the Multiple Linear Regression Coefficients

Science.gov (United States)

Gorgees, HazimMansoor; Mahdi, FatimahAssim

2018-05-01

This article concerns with comparing the performance of different types of ordinary ridge regression estimators that have been already proposed to estimate the regression parameters when the near exact linear relationships among the explanatory variables is presented. For this situations we employ the data obtained from tagi gas filling company during the period (2008-2010). The main result we reached is that the method based on the condition number performs better than other methods since it has smaller mean square error (MSE) than the other stated methods.
Discriminative Elastic-Net Regularized Linear Regression.

Science.gov (United States)

Zhang, Zheng; Lai, Zhihui; Xu, Yong; Shao, Ling; Wu, Jian; Xie, Guo-Sen

2017-03-01

In this paper, we aim at learning compact and discriminative linear regression models. Linear regression has been widely used in different problems. However, most of the existing linear regression methods exploit the conventional zero-one matrix as the regression targets, which greatly narrows the flexibility of the regression model. Another major limitation of these methods is that the learned projection matrix fails to precisely project the image features to the target space due to their weak discriminative capability. To this end, we present an elastic-net regularized linear regression (ENLR) framework, and develop two robust linear regression models which possess the following special characteristics. First, our methods exploit two particular strategies to enlarge the margins of different classes by relaxing the strict binary targets into a more feasible variable matrix. Second, a robust elastic-net regularization of singular values is introduced to enhance the compactness and effectiveness of the learned projection matrix. Third, the resulting optimization problem of ENLR has a closed-form solution in each iteration, which can be solved efficiently. Finally, rather than directly exploiting the projection matrix for recognition, our methods employ the transformed features as the new discriminate representations to make final image classification. Compared with the traditional linear regression model and some of its variants, our method is much more accurate in image classification. Extensive experiments conducted on publicly available data sets well demonstrate that the proposed framework can outperform the state-of-the-art methods. The MATLAB codes of our methods can be available at http://www.yongxu.org/lunwen.html.
A method to determine the necessity for global signal regression in resting-state fMRI studies.

Science.gov (United States)

Chen, Gang; Chen, Guangyu; Xie, Chunming; Ward, B Douglas; Li, Wenjun; Antuono, Piero; Li, Shi-Jiang

2012-12-01

In resting-state functional MRI studies, the global signal (operationally defined as the global average of resting-state functional MRI time courses) is often considered a nuisance effect and commonly removed in preprocessing. This global signal regression method can introduce artifacts, such as false anticorrelated resting-state networks in functional connectivity analyses. Therefore, the efficacy of this technique as a correction tool remains questionable. In this article, we establish that the accuracy of the estimated global signal is determined by the level of global noise (i.e., non-neural noise that has a global effect on the resting-state functional MRI signal). When the global noise level is low, the global signal resembles the resting-state functional MRI time courses of the largest cluster, but not those of the global noise. Using real data, we demonstrate that the global signal is strongly correlated with the default mode network components and has biological significance. These results call into question whether or not global signal regression should be applied. We introduce a method to quantify global noise levels. We show that a criteria for global signal regression can be found based on the method. By using the criteria, one can determine whether to include or exclude the global signal regression in minimizing errors in functional connectivity measures. Copyright © 2012 Wiley Periodicals, Inc.
Aplicação de métodos de calibração multivariada para a determinação simultânea de riboflavina (VB2, tiamina (VB1, piridoxina (VB6 e nicotinamida (VPP UV spectrophotrometry and chemometrics methods for simultaneous determinations of riboflavin (VB2, thiamine (VB1, pyridoxine (VB6 and nicotinamide (VPP

Directory of Open Access Journals (Sweden)

Rosângela C. Barthus

2007-01-01

Full Text Available In this work, the artificial neural networks (ANN and partial least squares (PLS regression were applied to UV spectral data for quantitative determination of thiamin hydrochloride (VB1, riboflavin phosphate (VB2, pyridoxine hydrochloride (VB6 and nicotinamide (VPP in pharmaceutical samples. For calibration purposes, commercial samples in 0.2 mol L-1 acetate buffer (pH 4.0 were employed as standards. The concentration ranges used in the calibration step were: 0.1 - 7.5 mg L-1 for VB1, 0.1 - 3.0 mg L-1 for VB2, 0.1 - 3.0 mg L-1 for VB6 and 0.4 - 30.0 mg L-1 for VPP. From the results it is possible to verify that both methods can be successfully applied for these determinations. The similar error values were obtained by using neural network or PLS methods. The proposed methodology is simple, rapid and can be easily used in quality control laboratories.
Process control and optimization with simple interval calculation method

DEFF Research Database (Denmark)

Pomerantsev, A.; Rodionova, O.; Høskuldsson, Agnar

2006-01-01

for the quality improvement in the course of production. The latter is an active quality optimization, which takes into account the actual history of the process. The advocate approach is allied to the conventional method of multivariate statistical process control (MSPC) as it also employs the historical process......Methods of process control and optimization are presented and illustrated with a real world example. The optimization methods are based on the PLS block modeling as well as on the simple interval calculation methods of interval prediction and object status classification. It is proposed to employ...... the series of expanding PLS/SIC models in order to support the on-line process improvements. This method helps to predict the effect of planned actions on the product quality and thus enables passive quality control. We have also considered an optimization approach that proposes the correcting actions...
A method for fitting regression splines with varying polynomial order in the linear mixed model.

Science.gov (United States)

Edwards, Lloyd J; Stewart, Paul W; MacDougall, James E; Helms, Ronald W

2006-02-15

The linear mixed model has become a widely used tool for longitudinal analysis of continuous variables. The use of regression splines in these models offers the analyst additional flexibility in the formulation of descriptive analyses, exploratory analyses and hypothesis-driven confirmatory analyses. We propose a method for fitting piecewise polynomial regression splines with varying polynomial order in the fixed effects and/or random effects of the linear mixed model. The polynomial segments are explicitly constrained by side conditions for continuity and some smoothness at the points where they join. By using a reparameterization of this explicitly constrained linear mixed model, an implicitly constrained linear mixed model is constructed that simplifies implementation of fixed-knot regression splines. The proposed approach is relatively simple, handles splines in one variable or multiple variables, and can be easily programmed using existing commercial software such as SAS or S-plus. The method is illustrated using two examples: an analysis of longitudinal viral load data from a study of subjects with acute HIV-1 infection and an analysis of 24-hour ambulatory blood pressure profiles.
Robust Methods for Moderation Analysis with a Two-Level Regression Model.

Science.gov (United States)

Yang, Miao; Yuan, Ke-Hai

2016-01-01

Moderation analysis has many applications in social sciences. Most widely used estimation methods for moderation analysis assume that errors are normally distributed and homoscedastic. When these assumptions are not met, the results from a classical moderation analysis can be misleading. For more reliable moderation analysis, this article proposes two robust methods with a two-level regression model when the predictors do not contain measurement error. One method is based on maximum likelihood with Student's t distribution and the other is based on M-estimators with Huber-type weights. An algorithm for obtaining the robust estimators is developed. Consistent estimates of standard errors of the robust estimators are provided. The robust approaches are compared against normal-distribution-based maximum likelihood (NML) with respect to power and accuracy of parameter estimates through a simulation study. Results show that the robust approaches outperform NML under various distributional conditions. Application of the robust methods is illustrated through a real data example. An R program is developed and documented to facilitate the application of the robust methods.
Boosted beta regression.

Directory of Open Access Journals (Sweden)

Matthias Schmid

Full Text Available Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1. Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures.
Comparison of some biased estimation methods (including ordinary subset regression) in the linear model

Science.gov (United States)

Sidik, S. M.

1975-01-01

Ridge, Marquardt's generalized inverse, shrunken, and principal components estimators are discussed in terms of the objectives of point estimation of parameters, estimation of the predictive regression function, and hypothesis testing. It is found that as the normal equations approach singularity, more consideration must be given to estimable functions of the parameters as opposed to estimation of the full parameter vector; that biased estimators all introduce constraints on the parameter space; that adoption of mean squared error as a criterion of goodness should be independent of the degree of singularity; and that ordinary least-squares subset regression is the best overall method.
Single-electron multiplication statistics as a combination of Poissonian pulse height distributions using constraint regression methods

International Nuclear Information System (INIS)

Ballini, J.-P.; Cazes, P.; Turpin, P.-Y.

1976-01-01

Analysing the histogram of anode pulse amplitudes allows a discussion of the hypothesis that has been proposed to account for the statistical processes of secondary multiplication in a photomultiplier. In an earlier work, good agreement was obtained between experimental and reconstructed spectra, assuming a first dynode distribution including two Poisson distributions of distinct mean values. This first approximation led to a search for a method which could give the weights of several Poisson distributions of distinct mean values. Three methods have been briefly exposed: classical linear regression, constraint regression (d'Esopo's method), and regression on variables subject to error. The use of these methods gives an approach of the frequency function which represents the dispersion of the punctual mean gain around the whole first dynode mean gain value. Comparison between this function and the one employed in Polya distribution allows the statement that the latter is inadequate to describe the statistical process of secondary multiplication. Numerous spectra obtained with two kinds of photomultiplier working under different physical conditions have been analysed. Then two points are discussed: - Does the frequency function represent the dynode structure and the interdynode collection process. - Is the model (the multiplication process of all dynodes but the first one, is Poissonian) valid whatever the photomultiplier and the utilization conditions. (Auth.)
Study (Prediction of Main Pipes Break Rates in Water Distribution Systems Using Intelligent and Regression Methods

Directory of Open Access Journals (Sweden)

Massoud Tabesh

2011-07-01

Full Text Available Optimum operation of water distribution networks is one of the priorities of sustainable development of water resources, considering the issues of increasing efficiency and decreasing the water losses. One of the key subjects in optimum operational management of water distribution systems is preparing rehabilitation and replacement schemes, prediction of pipes break rate and evaluation of their reliability. Several approaches have been presented in recent years regarding prediction of pipe failure rates which each one requires especial data sets. Deterministic models based on age and deterministic multi variables and stochastic group modeling are examples of the solutions which relate pipe break rates to parameters like age, material and diameters. In this paper besides the mentioned parameters, more factors such as pipe depth and hydraulic pressures are considered as well. Then using multi variable regression method, intelligent approaches (Artificial neural network and neuro fuzzy models and Evolutionary polynomial Regression method (EPR pipe burst rate are predicted. To evaluate the results of different approaches, a case study is carried out in a part ofMashhadwater distribution network. The results show the capability and advantages of ANN and EPR methods to predict pipe break rates, in comparison with neuro fuzzy and multi-variable regression methods.
Physical structure and genetic expression of the sulfonamide-resistance plasmid pLS80 and its derivatives in Streptococcus pneumoniae and Bacillus subtilis

Energy Technology Data Exchange (ETDEWEB)

Lopez, P.; Espinosa, M.; Lacks, S.A.

1984-01-01

The 10-kb chromosomal fragment of Streptococcus pneumoniae cloned in pLS80 contains the sul-d allele of the pneumococcal gene for dihydropteroate synthase. As a single copy in the chromosome this allele confers resistance to sulfanilamide at 0.2 mg/ml; in the multicopy plasmid it confers resistance to 2.0 mg/ml. The sul-d mutation was mapped by restriction analysis to a 0.4-kb region. A spontaneous deletion beginning approx. 1.5 kb to the right of the sul-d mutation prevented gene function, possibly by removing a promoter. This region could be restored by chromosomal facilitation and be demonstrated in the plasmid by selection for sulfonamide resistance. Under selection for a vector marker, tetracycline resistance, only the deleted plasmid was detectable, apparently as a result of plasmid segregation and the advantageous growth rates of cells with smaller plasmids. When such cells were selected for sulfonamide resistance, the deleted region returned to the plasmid, presumably by equilibration between the chromosome and the plasmid pool, to give a low frequency (approx. 10/sup -3/) of cells resistant to sulfanilamide at 2.0 mg/ml. Models for the mechanisms of chromosomal facilitation and equilibration are proposed. Several derivatives of pLS80 could be transferred to Bacillus subtilis, where they conferred resistance to sulfanilamide at 2 mg/ml, thereby demonstrating cross-species expression of the pneumococcal gene. Transfer of the plasmids to B. subtilis gave rise to large deletions to the left of the sul-d marker, but these deletions did not interfere with the sul-d gene function. Restriction maps of pLS80 and its variously deleted derivatives are presented.
A Trajectory Regression Clustering Technique Combining a Novel Fuzzy C-Means Clustering Algorithm with the Least Squares Method

Directory of Open Access Journals (Sweden)

Xiangbing Zhou

2018-04-01

Full Text Available Rapidly growing GPS (Global Positioning System trajectories hide much valuable information, such as city road planning, urban travel demand, and population migration. In order to mine the hidden information and to capture better clustering results, a trajectory regression clustering method (an unsupervised trajectory clustering method is proposed to reduce local information loss of the trajectory and to avoid getting stuck in the local optimum. Using this method, we first define our new concept of trajectory clustering and construct a novel partitioning (angle-based partitioning method of line segments; second, the Lagrange-based method and Hausdorff-based K-means++ are integrated in fuzzy C-means (FCM clustering, which are used to maintain the stability and the robustness of the clustering process; finally, least squares regression model is employed to achieve regression clustering of the trajectory. In our experiment, the performance and effectiveness of our method is validated against real-world taxi GPS data. When comparing our clustering algorithm with the partition-based clustering algorithms (K-means, K-median, and FCM, our experimental results demonstrate that the presented method is more effective and generates a more reasonable trajectory.
Development of K-Nearest Neighbour Regression Method in Forecasting River Stream Flow

Directory of Open Access Journals (Sweden)

Mohammad Azmi

2012-07-01

Full Text Available Different statistical, non-statistical and black-box methods have been used in forecasting processes. Among statistical methods, K-nearest neighbour non-parametric regression method (K-NN due to its natural simplicity and mathematical base is one of the recommended methods for forecasting processes. In this study, K-NN method is explained completely. Besides, development and improvement approaches such as best neighbour estimation, data transformation functions, distance functions and proposed extrapolation method are described. K-NN method in company with its development approaches is used in streamflow forecasting of Zayandeh-Rud Dam upper basin. Comparing between final results of classic K-NN method and modified K-NN (number of neighbour 5, transformation function of Range Scaling, distance function of Mahanalobis and proposed extrapolation method shows that modified K-NN in criteria of goodness of fit, root mean square error, percentage of volume of error and correlation has had performance improvement 45% , 59% and 17% respectively. These results approve necessity of applying mentioned approaches to derive more accurate forecasts.
Comparison of multinomial logistic regression and logistic regression: which is more efficient in allocating land use?

Science.gov (United States)

Lin, Yingzhi; Deng, Xiangzheng; Li, Xing; Ma, Enjun

2014-12-01

Spatially explicit simulation of land use change is the basis for estimating the effects of land use and cover change on energy fluxes, ecology and the environment. At the pixel level, logistic regression is one of the most common approaches used in spatially explicit land use allocation models to determine the relationship between land use and its causal factors in driving land use change, and thereby to evaluate land use suitability. However, these models have a drawback in that they do not determine/allocate land use based on the direct relationship between land use change and its driving factors. Consequently, a multinomial logistic regression method was introduced to address this flaw, and thereby, judge the suitability of a type of land use in any given pixel in a case study area of the Jiangxi Province, China. A comparison of the two regression methods indicated that the proportion of correctly allocated pixels using multinomial logistic regression was 92.98%, which was 8.47% higher than that obtained using logistic regression. Paired t-test results also showed that pixels were more clearly distinguished by multinomial logistic regression than by logistic regression. In conclusion, multinomial logistic regression is a more efficient and accurate method for the spatial allocation of land use changes. The application of this method in future land use change studies may improve the accuracy of predicting the effects of land use and cover change on energy fluxes, ecology, and environment.
Identification and characterization of a novel type of replication terminator with bidirectional activity on the Bacillus subtilis theta plasmid pLS20

NARCIS (Netherlands)

Meijer, WJJ; Smith, M; Wake, RG; deBoer, AL; Venema, G; Bron, S

We have sequenced and analysed a 3.1 kb fragment of the 55 kb endogenous Bacillus subtilis plasmid pLS20 containing its replication functions, Just outside the region required for autonomous replication, a segment of 18 bp was identified as being almost identical to part of the major B. subtilis
Prevalence and Determinants of Preterm Birth in Tehran, Iran: A Comparison between Logistic Regression and Decision Tree Methods.

Science.gov (United States)

Amini, Payam; Maroufizadeh, Saman; Samani, Reza Omani; Hamidi, Omid; Sepidarkish, Mahdi

2017-06-01

Preterm birth (PTB) is a leading cause of neonatal death and the second biggest cause of death in children under five years of age. The objective of this study was to determine the prevalence of PTB and its associated factors using logistic regression and decision tree classification methods. This cross-sectional study was conducted on 4,415 pregnant women in Tehran, Iran, from July 6-21, 2015. Data were collected by a researcher-developed questionnaire through interviews with mothers and review of their medical records. To evaluate the accuracy of the logistic regression and decision tree methods, several indices such as sensitivity, specificity, and the area under the curve were used. The PTB rate was 5.5% in this study. The logistic regression outperformed the decision tree for the classification of PTB based on risk factors. Logistic regression showed that multiple pregnancies, mothers with preeclampsia, and those who conceived with assisted reproductive technology had an increased risk for PTB ( p logistic regression model for the classification of risk groups for PTB.
The Occupancy Rate Modeling of Kendari Hotel Room using Mexican Hat Transformation and Partial Least Squares

Directory of Open Access Journals (Sweden)

Margaretha Ohyver

2016-12-01

Full Text Available Partial Least Squares (PLS method was developed in 1960 by Herman Wold. The method particularly suits with construct a regression model when the number of independent variables is many and highly collinear. The PLS can be combined with other methods, one of which is a Continuous Wavelet Transformation (CWT. By considering that the presence of outliers can lead to a less reliable model, and this kind of transformation may be required at a stage of pre-processing, the data is free of noise or outliers. Based on the previous study, Kendari hotel room occupancy rate was affected by the outlier, and it had a low value of R2. Therefore, this research aimed to obtain a good model by combining the PLS method and CWT transformation using the Mexican Hats them other wavelet of CWT. The research concludes that merging the PLS and the Mexican Hat transformation has resulted in a better model compared to the model that combined the PLS and the Haar wavelet transformation as shown in the previous study. The research shows that by changing the mother of the wavelet, the value of R2 can be improved significantly. The result provides information on how to increase the value of R2. The other advantage is the information for hotel managements to notice the age of the hotel, the maximum rates, the facilities, and the number of rooms to increase the number of visitors.
Time-adaptive quantile regression

DEFF Research Database (Denmark)

Møller, Jan Kloppenborg; Nielsen, Henrik Aalborg; Madsen, Henrik

2008-01-01

and an updating procedure are combined into a new algorithm for time-adaptive quantile regression, which generates new solutions on the basis of the old solution, leading to savings in computation time. The suggested algorithm is tested against a static quantile regression model on a data set with wind power......An algorithm for time-adaptive quantile regression is presented. The algorithm is based on the simplex algorithm, and the linear optimization formulation of the quantile regression problem is given. The observations have been split to allow a direct use of the simplex algorithm. The simplex method...... production, where the models combine splines and quantile regression. The comparison indicates superior performance for the time-adaptive quantile regression in all the performance parameters considered....

Application of correlation constrained multivariate curve resolution alternating least-squares methods for determination of compounds of interest in biodiesel blends using NIR and UV-visible spectroscopic data.

Science.gov (United States)

de Oliveira, Rodrigo Rocha; de Lima, Kássio Michell Gomes; Tauler, Romà; de Juan, Anna

2014-07-01

This study describes two applications of a variant of the multivariate curve resolution alternating least squares (MCR-ALS) method with a correlation constraint. The first application describes the use of MCR-ALS for the determination of biodiesel concentrations in biodiesel blends using near infrared (NIR) spectroscopic data. In the second application, the proposed method allowed the determination of the synthetic antioxidant N,N'-Di-sec-butyl-p-phenylenediamine (PDA) present in biodiesel mixtures from different vegetable sources using UV-visible spectroscopy. Well established multivariate regression algorithm, partial least squares (PLS), were calculated for comparison of the quantification performance in the models developed in both applications. The correlation constraint has been adapted to handle the presence of batch-to-batch matrix effects due to ageing effects, which might occur when different groups of samples were used to build a calibration model in the first application. Different data set configurations and diverse modes of application of the correlation constraint are explored and guidelines are given to cope with different type of analytical problems, such as the correction of matrix effects among biodiesel samples, where MCR-ALS outperformed PLS reducing the relative error of prediction RE (%) from 9.82% to 4.85% in the first application, or the determination of minor compound with overlapped weak spectroscopic signals, where MCR-ALS gave higher (RE (%)=3.16%) for prediction of PDA compared to PLS (RE (%)=1.99%), but with the advantage of recovering the related pure spectral profile of analytes and interferences. The obtained results show the potential of the MCR-ALS method with correlation constraint to be adapted to diverse data set configurations and analytical problems related to the determination of biodiesel mixtures and added compounds therein. Copyright © 2014 Elsevier B.V. All rights reserved.
Collision cross section prediction of deprotonated phenolics in a travelling-wave ion mobility spectrometer using molecular descriptors and chemometrics

Energy Technology Data Exchange (ETDEWEB)

Gonzales, Gerard Bryan, E-mail: gerard.gonzales@ugent.be [Food Chemistry and Human Nutrition (NutriFOODChem), Department of Food Safety and Food Quality, Faculty of Bioscience Engineering, Ghent University (Belgium); Laboratory of Agrozoology, Department of Crop Protection, Faculty of Bioscience Engineering, Ghent University (Belgium); Department of Applied Biological Science, Faculty of Bioscience Engineering, Ghent University (Belgium); Smagghe, Guy [Laboratory of Agrozoology, Department of Crop Protection, Faculty of Bioscience Engineering, Ghent University (Belgium); Coelus, Sofie; Adriaenssens, Dieter [Food Chemistry and Human Nutrition (NutriFOODChem), Department of Food Safety and Food Quality, Faculty of Bioscience Engineering, Ghent University (Belgium); De Winter, Karel; Desmet, Tom [Center for Industrial Biotechnology and Biocatalysis, Faculty of Bioscience Engineering, Ghent University (Belgium); Raes, Katleen [Department of Applied Biological Science, Faculty of Bioscience Engineering, Ghent University (Belgium); Van Camp, John, E-mail: john.vancamp@ugent.be [Food Chemistry and Human Nutrition (NutriFOODChem), Department of Food Safety and Food Quality, Faculty of Bioscience Engineering, Ghent University (Belgium)

2016-06-14

The combination of ion mobility and mass spectrometry (MS) affords significant improvements over conventional MS/MS, especially in the characterization of isomeric metabolites due to the differences in their collision cross sections (CCS). Experimentally obtained CCS values are typically matched with theoretical CCS values from Trajectory Method (TM) and/or Projection Approximation (PA) calculations. In this paper, predictive models for CCS of deprotonated phenolics were developed using molecular descriptors and chemometric tools, stepwise multiple linear regression (SMLR), principal components regression (PCR), and partial least squares regression (PLS). A total of 102 molecular descriptors were generated and reduced to 28 after employing a feature selection tool, composed of mass, topological descriptors, Jurs descriptors and shadow indices. Therefore, the generated models considered the effects of mass, 3D conformation and partial charge distribution on CCS, which are the main parameters for either TM or PA (only 3D conformation) calculations. All three techniques yielded highly predictive models for both the training (R{sup 2}{sub SMLR} = 0.9911; R{sup 2}{sub PCR} = 0.9917; R{sup 2}{sub PLS} = 0.9918) and validation datasets (R{sup 2}{sub SMLR} = 0.9489; R{sup 2}{sub PCR} = 0.9761; R{sup 2}{sub PLS} = 0.9760). Also, the high cross validated R{sup 2} values indicate that the generated models are robust and highly predictive (Q{sup 2}{sub SMLR} = 0.9859; Q{sup 2}{sub PCR} = 0.9748; Q{sup 2}{sub PLS} = 0.9760). The predictions were also very comparable to the results from TM calculations using modified mobcal (N2). Most importantly, this method offered a rapid (<10 min) alternative to TM calculations without compromising predictive ability. These methods could therefore be used in routine analysis and could be easily integrated to metabolite identification platforms. - Highlights: • CCS for deprotonated phenolics were measured using TWIMS. �
Method validation using weighted linear regression models for quantification of UV filters in water samples.

Science.gov (United States)

da Silva, Claudia Pereira; Emídio, Elissandro Soares; de Marchi, Mary Rosa Rodrigues

2015-01-01

This paper describes the validation of a method consisting of solid-phase extraction followed by gas chromatography-tandem mass spectrometry for the analysis of the ultraviolet (UV) filters benzophenone-3, ethylhexyl salicylate, ethylhexyl methoxycinnamate and octocrylene. The method validation criteria included evaluation of selectivity, analytical curve, trueness, precision, limits of detection and limits of quantification. The non-weighted linear regression model has traditionally been used for calibration, but it is not necessarily the optimal model in all cases. Because the assumption of homoscedasticity was not met for the analytical data in this work, a weighted least squares linear regression was used for the calibration method. The evaluated analytical parameters were satisfactory for the analytes and showed recoveries at four fortification levels between 62% and 107%, with relative standard deviations less than 14%. The detection limits ranged from 7.6 to 24.1 ng L(-1). The proposed method was used to determine the amount of UV filters in water samples from water treatment plants in Araraquara and Jau in São Paulo, Brazil. Copyright © 2014 Elsevier B.V. All rights reserved.
Comparative investigation of two different self-organizing map ...

African Journals Online (AJOL)

Purpose: To demonstrate the ability and investigate the performance of two different wavelength selection approaches based on self-organizing map (SOM) technique in partial least-squares (PLS) regression for analysis of pharmaceutical binary mixtures with strongly overlapping spectra. Methods: Two different variable ...
Analyzing Big Data with the Hybrid Interval Regression Methods

Directory of Open Access Journals (Sweden)

Chia-Hui Huang

2014-01-01

Full Text Available Big data is a new trend at present, forcing the significant impacts on information technologies. In big data applications, one of the most concerned issues is dealing with large-scale data sets that often require computation resources provided by public cloud services. How to analyze big data efficiently becomes a big challenge. In this paper, we collaborate interval regression with the smooth support vector machine (SSVM to analyze big data. Recently, the smooth support vector machine (SSVM was proposed as an alternative of the standard SVM that has been proved more efficient than the traditional SVM in processing large-scale data. In addition the soft margin method is proposed to modify the excursion of separation margin and to be effective in the gray zone that the distribution of data becomes hard to be described and the separation margin between classes.
Dimension Reduction and Discretization in Stochastic Problems by Regression Method

DEFF Research Database (Denmark)

Ditlevsen, Ove Dalager

1996-01-01

The chapter mainly deals with dimension reduction and field discretizations based directly on the concept of linear regression. Several examples of interesting applications in stochastic mechanics are also given.Keywords: Random fields discretization, Linear regression, Stochastic interpolation, ...
Direct and regression methods do not give different estimates of digestible and metabolizable energy of wheat for pigs.

Science.gov (United States)

Bolarinwa, O A; Adeola, O

2012-12-01

Digestible and metabolizable energy contents of feed ingredients for pigs can be determined by direct or indirect methods. There are situations when only the indirect approach is suitable and the regression method is a robust indirect approach. This study was conducted to compare the direct and regression methods for determining the energy value of wheat for pigs. Twenty-four barrows with an average initial BW of 31 kg were assigned to 4 diets in a randomized complete block design. The 4 diets consisted of 969 g wheat/kg plus minerals and vitamins (sole wheat) for the direct method, corn (Zea mays)-soybean (Glycine max) meal reference diet (RD), RD + 300 g wheat/kg, and RD + 600 g wheat/kg. The 3 corn-soybean meal diets were used for the regression method and wheat replaced the energy-yielding ingredients, corn and soybean meal, so that the same ratio of corn and soybean meal across the experimental diets was maintained. The wheat used was analyzed to contain 883 g DM, 15.2 g N, and 3.94 Mcal GE/kg. Each diet was fed to 6 barrows in individual metabolism crates for a 5-d acclimation followed by a 5-d total but separate collection of feces and urine. The DE and ME for the sole wheat diet were 3.83 and 3.77 Mcal/kg DM, respectively. Because the sole wheat diet contained 969 g wheat/kg, these translate to 3.95 Mcal DE/kg DM and 3.89 Mcal ME/kg DM. The RD used for the regression approach yielded 4.00 Mcal DE and 3.91 Mcal ME/kg DM diet. Increasing levels of wheat in the RD linearly reduced (P direct method (3.95 and 3.89 Mcal/kg DM) did not differ (0.78 < P < 0.89) from those obtained using the regression method (3.96 and 3.88 Mcal/kg DM).
4D-Fingerprint Categorical QSAR Models for Skin Sensitization Based on Classification Local Lymph Node Assay Measures

Science.gov (United States)

Li, Yi; Tseng, Yufeng J.; Pan, Dahua; Liu, Jianzhong; Kern, Petra S.; Gerberick, G. Frank; Hopfinger, Anton J.

2008-01-01

Currently, the only validated methods to identify skin sensitization effects are in vivo models, such as the Local Lymph Node Assay (LLNA) and guinea pig studies. There is a tremendous need, in particular due to novel legislation, to develop animal alternatives, eg. Quantitative Structure-Activity Relationship (QSAR) models. Here, QSAR models for skin sensitization using LLNA data have been constructed. The descriptors used to generate these models are derived from the 4D-molecular similarity paradigm and are referred to as universal 4D-fingerprints. A training set of 132 structurally diverse compounds and a test set of 15 structurally diverse compounds were used in this study. The statistical methodologies used to build the models are logistic regression (LR), and partial least square coupled logistic regression (PLS-LR), which prove to be effective tools for studying skin sensitization measures expressed in the two categorical terms of sensitizer and non-sensitizer. QSAR models with low values of the Hosmer-Lemeshow goodness-of-fit statistic, χHL2, are significant and predictive. For the training set, the cross-validated prediction accuracy of the logistic regression models ranges from 77.3% to 78.0%, while that of PLS-logistic regression models ranges from 87.1% to 89.4%. For the test set, the prediction accuracy of logistic regression models ranges from 80.0%-86.7%, while that of PLS-logistic regression models ranges from 73.3%-80.0%. The QSAR models are made up of 4D-fingerprints related to aromatic atoms, hydrogen bond acceptors and negatively partially charged atoms. PMID:17226934
Prediction of protein binding sites using physical and chemical descriptors and the support vector machine regression method

International Nuclear Information System (INIS)

Sun Zhong-Hua; Jiang Fan

2010-01-01

In this paper a new continuous variable called core-ratio is defined to describe the probability for a residue to be in a binding site, thereby replacing the previous binary description of the interface residue using 0 and 1. So we can use the support vector machine regression method to fit the core-ratio value and predict the protein binding sites. We also design a new group of physical and chemical descriptors to characterize the binding sites. The new descriptors are more effective, with an averaging procedure used. Our test shows that much better prediction results can be obtained by the support vector regression (SVR) method than by the support vector classification method. (rapid communication)
A Simple Linear Regression Method for Quantitative Trait Loci Linkage Analysis With Censored Observations

OpenAIRE

Anderson, Carl A.; McRae, Allan F.; Visscher, Peter M.

2006-01-01

Standard quantitative trait loci (QTL) mapping techniques commonly assume that the trait is both fully observed and normally distributed. When considering survival or age-at-onset traits these assumptions are often incorrect. Methods have been developed to map QTL for survival traits; however, they are both computationally intensive and not available in standard genome analysis software packages. We propose a grouped linear regression method for the analysis of continuous survival data. Using...
Estimating Penetration Resistance in Agricultural Soils of Ardabil Plain Using Artificial Neural Network and Regression Methods

Directory of Open Access Journals (Sweden)

Gholam Reza Sheykhzadeh

2017-02-01

Full Text Available Introduction: Penetration resistance is one of the criteria for evaluating soil compaction. It correlates with several soil properties such as vehicle trafficability, resistance to root penetration, seedling emergence, and soil compaction by farm machinery. Direct measurement of penetration resistance is time consuming and difficult because of high temporal and spatial variability. Therefore, many different regressions and artificial neural network pedotransfer functions have been proposed to estimate penetration resistance from readily available soil variables such as particle size distribution, bulk density (Db and gravimetric water content (θm. The lands of Ardabil Province are one of the main production regions of potato in Iran, thus, obtaining the soil penetration resistance in these regions help with the management of potato production. The objective of this research was to derive pedotransfer functions by using regression and artificial neural network to predict penetration resistance from some soil variations in the agricultural soils of Ardabil plain and to compare the performance of artificial neural network with regression models. Materials and methods: Disturbed and undisturbed soil samples (n= 105 were systematically taken from 0-10 cm soil depth with nearly 3000 m distance in the agricultural lands of the Ardabil plain ((lat 38°15' to 38°40' N, long 48°16' to 48°61' E. The contents of sand, silt and clay (hydrometer method, CaCO3 (titration method, bulk density (cylinder method, particle density (Dp (pychnometer method, organic carbon (wet oxidation method, total porosity(calculating from Db and Dp, saturated (θs and field soil water (θf using the gravimetric method were measured in the laboratory. Mean geometric diameter (dg and standard deviation (σg of soil particles were computed using the percentages of sand, silt and clay. Penetration resistance was measured in situ using cone penetrometer (analog model at 10
Landslide susceptibility mapping on a global scale using the method of logistic regression

Directory of Open Access Journals (Sweden)

L. Lin

2017-08-01

Full Text Available This paper proposes a statistical model for mapping global landslide susceptibility based on logistic regression. After investigating explanatory factors for landslides in the existing literature, five factors were selected for model landslide susceptibility: relative relief, extreme precipitation, lithology, ground motion and soil moisture. When building the model, 70 % of landslide and nonlandslide points were randomly selected for logistic regression, and the others were used for model validation. To evaluate the accuracy of predictive models, this paper adopts several criteria including a receiver operating characteristic (ROC curve method. Logistic regression experiments found all five factors to be significant in explaining landslide occurrence on a global scale. During the modeling process, percentage correct in confusion matrix of landslide classification was approximately 80 % and the area under the curve (AUC was nearly 0.87. During the validation process, the above statistics were about 81 % and 0.88, respectively. Such a result indicates that the model has strong robustness and stable performance. This model found that at a global scale, soil moisture can be dominant in the occurrence of landslides and topographic factor may be secondary.
A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants

Science.gov (United States)

Cooper, Paul D.

2010-01-01

A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…
Petroleomics by electrospray ionization FT-ICR mass spectrometry coupled to partial least squares with variable selection methods: prediction of the total acid number of crude oils.

Science.gov (United States)

Terra, Luciana A; Filgueiras, Paulo R; Tose, Lílian V; Romão, Wanderson; de Souza, Douglas D; de Castro, Eustáquio V R; de Oliveira, Mirela S L; Dias, Júlio C M; Poppi, Ronei J

2014-10-07

Negative-ion mode electrospray ionization, ESI(-), with Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) was coupled to a Partial Least Squares (PLS) regression and variable selection methods to estimate the total acid number (TAN) of Brazilian crude oil samples. Generally, ESI(-)-FT-ICR mass spectra present a power of resolution of ca. 500,000 and a mass accuracy less than 1 ppm, producing a data matrix containing over 5700 variables per sample. These variables correspond to heteroatom-containing species detected as deprotonated molecules, [M - H](-) ions, which are identified primarily as naphthenic acids, phenols and carbazole analog species. The TAN values for all samples ranged from 0.06 to 3.61 mg of KOH g(-1). To facilitate the spectral interpretation, three methods of variable selection were studied: variable importance in the projection (VIP), interval partial least squares (iPLS) and elimination of uninformative variables (UVE). The UVE method seems to be more appropriate for selecting important variables, reducing the dimension of the variables to 183 and producing a root mean square error of prediction of 0.32 mg of KOH g(-1). By reducing the size of the data, it was possible to relate the selected variables with their corresponding molecular formulas, thus identifying the main chemical species responsible for the TAN values.
Development and Validation of a Near-Infrared Spectroscopy Method for the Prediction of Acrylamide Content in French-Fried Potato.

Science.gov (United States)

Adedipe, Oluwatosin E; Johanningsmeier, Suzanne D; Truong, Van-Den; Yencho, G Craig

2016-03-02

This study investigated the ability of near-infrared spectroscopy (NIRS) to predict acrylamide content in French-fried potato. Potato flour spiked with acrylamide (50-8000 μg/kg) was used to determine if acrylamide could be accurately predicted in a potato matrix. French fries produced with various pretreatments and cook times (n = 84) and obtained from quick-service restaurants (n = 64) were used for model development and validation. Acrylamide was quantified using gas chromatography-mass spectrometry, and reflectance spectra (400-2500 nm) of each freeze-dried sample were captured on a Foss XDS Rapid Content Analyzer-NIR spectrometer. Partial least-squares (PLS) discriminant analysis and PLS regression modeling demonstrated that NIRS could accurately detect acrylamide content as low as 50 μg/kg in the model potato matrix. Prediction errors of 135 μg/kg (R(2) = 0.98) and 255 μg/kg (R(2) = 0.93) were achieved with the best PLS models for acrylamide prediction in Russet Norkotah French-fried potato and multiple samples of unknown varieties, respectively. The findings indicate that NIRS can be used as a screening tool in potato breeding and potato processing research to reduce acrylamide in the food supply.
Quantitative analysis of red wine tannins using Fourier-transform mid-infrared spectrometry.

Science.gov (United States)

Fernandez, Katherina; Agosin, Eduardo

2007-09-05

Tannin content and composition are critical quality components of red wines. No spectroscopic method assessing these phenols in wine has been described so far. We report here a new method using Fourier transform mid-infrared (FT-MIR) spectroscopy and chemometric techniques for the quantitative analysis of red wine tannins. Calibration models were developed using protein precipitation and phloroglucinolysis as analytical reference methods. After spectra preprocessing, six different predictive partial least-squares (PLS) models were evaluated, including the use of interval selection procedures such as iPLS and CSMWPLS. PLS regression with full-range (650-4000 cm(-1)), second derivative of the spectra and phloroglucinolysis as the reference method gave the most accurate determination for tannin concentration (RMSEC = 2.6%, RMSEP = 9.4%, r = 0.995). The prediction of the mean degree of polymerization (mDP) of the tannins also gave a reasonable prediction (RMSEC = 6.7%, RMSEP = 10.3%, r = 0.958). These results represent the first step in the development of a spectroscopic methodology for the quantification of several phenolic compounds that are critical for wine quality.
Liquid chromatographic and spectrophotometric methods for the determination of erythromycin stearate and trimethoprim in tablets

Directory of Open Access Journals (Sweden)

Sonia T. Hassib

2011-12-01

Full Text Available Simple, accurate and precise reversed-phase liquid chromatographic (LC and spectrophotometric methods have been developed and validated for the determination of erythromycin stearate (ERS and trimethoprim (TMP in mixture. In LC method, chromatographic separation was achieved on a Symmetry® Waters C18 column (150 × 4.6 mm, 5 μm based on isocratic elution using a mobile phase consisting of potassium dihydrogen phosphate buffer pH (9:acetonitrile:water (25:100:50, v/v/v at a flow rate of 1.6 ml min−1 with UV detection at 210 nm for ERS and 280 nm for TMP. Besides, two spectrophotometric methods were applied after reaction with perchloric acid (12 M which gives a colored product with ERS. Then, the spectral interference between the colored product of ERS and TMP was resolved by either ratio spectra derivative spectrophotometry in the first spectrophotometric method or chemometric techniques, namely classical least-squares (CLS, principal component regression (PCR and partial least-squares regression (PLS in the second spectrophotometric method. The results were statistically compared using one-way analysis of variance (ANOVA. The methods developed were satisfactorily applied to the analysis of the pharmaceutical preparation containing the two drugs and proved to be specific and accurate for the quality control of the cited drugs in pharmaceutical dosage forms.
Sub-Model Partial Least Squares for Improved Accuracy in Quantitative Laser Induced Breakdown Spectroscopy

Science.gov (United States)

Anderson, R. B.; Clegg, S. M.; Frydenvang, J.

2015-12-01

One of the primary challenges faced by the ChemCam instrument on the Curiosity Mars rover is developing a regression model that can accurately predict the composition of the wide range of target types encountered (basalts, calcium sulfate, feldspar, oxides, etc.). The original calibration used 69 rock standards to train a partial least squares (PLS) model for each major element. By expanding the suite of calibration samples to >400 targets spanning a wider range of compositions, the accuracy of the model was improved, but some targets with "extreme" compositions (e.g. pure minerals) were still poorly predicted. We have therefore developed a simple method, referred to as "submodel PLS", to improve the performance of PLS across a wide range of target compositions. In addition to generating a "full" (0-100 wt.%) PLS model for the element of interest, we also generate several overlapping submodels (e.g. for SiO2, we generate "low" (0-50 wt.%), "mid" (30-70 wt.%), and "high" (60-100 wt.%) models). The submodels are generally more accurate than the "full" model for samples within their range because they are able to adjust for matrix effects that are specific to that range. To predict the composition of an unknown target, we first predict the composition with the submodels and the "full" model. Then, based on the predicted composition from the "full" model, the appropriate submodel prediction can be used (e.g. if the full model predicts a low composition, use the "low" model result, which is likely to be more accurate). For samples with "full" predictions that occur in a region of overlap between submodels, the submodel predictions are "blended" using a simple linear weighted sum. The submodel PLS method shows improvements in most of the major elements predicted by ChemCam and reduces the occurrence of negative predictions for low wt.% targets. Submodel PLS is currently being used in conjunction with ICA regression for the major element compositions of ChemCam data.
Prediction of chemical, physical and sensory data from process parameters for frozen cod using multivariate analysis

DEFF Research Database (Denmark)

Bechmann, Iben Ellegaard; Jensen, H.S.; Bøknæs, Niels

1998-01-01

Physical, chemical and sensory quality parameters were determined for 115 cod (Gadus morhua) samples stored under varying frozen storage conditions. Five different process parameters (period of frozen storage, frozen storage. temperature, place of catch, season for catching and state of rigor) were...... varied systematically at two levels. The data obtained were evaluated using the multivariate methods, principal component analysis (PCA) and partial least squares (PLS) regression. The PCA models were used to identify which process parameters were actually most important for the quality of the frozen cod....... PLS models that were able to predict the physical, chemical and sensory quality parameters from the process parameters of the frozen raw material were generated. The prediction abilities of the PLS models were good enough to give reasonable results even when the process parameters were characterised...
Simultaneous spectrophotometric determination of heavy metal ions Fe(II), Co(II), Ni(II), Cu(II), and Zn(II) with Br-PADPA in primary reactor coolant system

Energy Technology Data Exchange (ETDEWEB)

Kim, Tae-Hyeong; Yun, Jong-Il [KAIST, Daejeon (Korea, Republic of)

2015-05-15

The performance with integrity of nuclear power plants is highly influenced by the presence of the corrosion products. The deposition of corrosion products in the steam generator is the one of the main concerns of power plants. The quantification of corrosion products is considered of importance. In this study, we applied the spectrophotometric method to detect metal ions such as iron, cobalt, nickel, copper, and zinc, which are major elements of structural material of the plant. In particular, the chemical complexation of those divalent metal ions with 2-(5-bromo-2-pyridylazo)-5-diethylaminophenol (Br-PADAP) provides high molar absorptivity. For the simultaneous determination of metal ions, a partial least square (PLS) regression method was applied. In the present work, the complexation of Br-PADAP with divalent metal ions (iron, cobalt, nickel, copper and zinc) was studied. The PLS regression method was successfully applied for simultaneous elemental detection in multi-element systems. These results suggests that the method is very ample to detect corrosion products in nuclear power plants.

Predicting Taxi-Out Time at Congested Airports with Optimization-Based Support Vector Regression Methods

Directory of Open Access Journals (Sweden)

Guan Lian

2018-01-01

Full Text Available Accurate prediction of taxi-out time is significant precondition for improving the operationality of the departure process at an airport, as well as reducing the long taxi-out time, congestion, and excessive emission of greenhouse gases. Unfortunately, several of the traditional methods of predicting taxi-out time perform unsatisfactorily at congested airports. This paper describes and tests three of those conventional methods which include Generalized Linear Model, Softmax Regression Model, and Artificial Neural Network method and two improved Support Vector Regression (SVR approaches based on swarm intelligence algorithm optimization, which include Particle Swarm Optimization (PSO and Firefly Algorithm. In order to improve the global searching ability of Firefly Algorithm, adaptive step factor and Lévy flight are implemented simultaneously when updating the location function. Six factors are analysed, of which delay is identified as one significant factor in congested airports. Through a series of specific dynamic analyses, a case study of Beijing International Airport (PEK is tested with historical data. The performance measures show that the proposed two SVR approaches, especially the Improved Firefly Algorithm (IFA optimization-based SVR method, not only perform as the best modelling measures and accuracy rate compared with the representative forecast models, but also can achieve a better predictive performance when dealing with abnormal taxi-out time states.
Hybrid ARIMAX quantile regression method for forecasting short term electricity consumption in east java

Science.gov (United States)

Prastuti, M.; Suhartono; Salehah, NA

2018-04-01

The need for energy supply, especially for electricity in Indonesia has been increasing in the last past years. Furthermore, the high electricity usage by people at different times leads to the occurrence of heteroscedasticity issue. Estimate the electricity supply that could fulfilled the community’s need is very important, but the heteroscedasticity issue often made electricity forecasting hard to be done. An accurate forecast of electricity consumptions is one of the key challenges for energy provider to make better resources and service planning and also take control actions in order to balance the electricity supply and demand for community. In this paper, hybrid ARIMAX Quantile Regression (ARIMAX-QR) approach was proposed to predict the short-term electricity consumption in East Java. This method will also be compared to time series regression using RMSE, MAPE, and MdAPE criteria. The data used in this research was the electricity consumption per half-an-hour data during the period of September 2015 to April 2016. The results show that the proposed approach can be a competitive alternative to forecast short-term electricity in East Java. ARIMAX-QR using lag values and dummy variables as predictors yield more accurate prediction in both in-sample and out-sample data. Moreover, both time series regression and ARIMAX-QR methods with addition of lag values as predictor could capture accurately the patterns in the data. Hence, it produces better predictions compared to the models that not use additional lag variables.
Multivariate Calibration and Model Integrity for Wood Chemistry Using Fourier Transform Infrared Spectroscopy

Directory of Open Access Journals (Sweden)

Chengfeng Zhou

2015-01-01

Full Text Available This research addressed a rapid method to monitor hardwood chemical composition by applying Fourier transform infrared (FT-IR spectroscopy, with particular interest in model performance for interpretation and prediction. Partial least squares (PLS and principal components regression (PCR were chosen as the primary models for comparison. Standard laboratory chemistry methods were employed on a mixed genus/species hardwood sample set to collect the original data. PLS was found to provide better predictive capability while PCR exhibited a more precise estimate of loading peaks and suggests that PCR is better for model interpretation of key underlying functional groups. Specifically, when PCR was utilized, an error in peak loading of ±15 cm−1 from the true mean was quantified. Application of the first derivative appeared to assist in improving both PCR and PLS loading precision. Research results identified the wavenumbers important in the prediction of extractives, lignin, cellulose, and hemicellulose and further demonstrated the utility in FT-IR for rapid monitoring of wood chemistry.
Calculation of U, Ra, Th and K contents in uranium ore by multiple linear regression method

International Nuclear Information System (INIS)

Lin Chao; Chen Yingqiang; Zhang Qingwen; Tan Fuwen; Peng Guanghui

1991-01-01

A multiple linear regression method was used to compute γ spectra of uranium ore samples and to calculate contents of U, Ra, Th, and K. In comparison with the inverse matrix method, its advantage is that no standard samples of pure U, Ra, Th and K are needed for obtaining response coefficients
The regression-calibration method for fitting generalized linear models with additive measurement error

OpenAIRE

James W. Hardin; Henrik Schmeidiche; Raymond J. Carroll

2003-01-01

This paper discusses and illustrates the method of regression calibration. This is a straightforward technique for fitting models with additive measurement error. We present this discussion in terms of generalized linear models (GLMs) following the notation defined in Hardin and Carroll (2003). Discussion will include specified measurement error, measurement error estimated by replicate error-prone proxies, and measurement error estimated by instrumental variables. The discussion focuses on s...
A simple approach to power and sample size calculations in logistic regression and Cox regression models.

Science.gov (United States)

Vaeth, Michael; Skovlund, Eva

2004-06-15

For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.
Pharmacological Classification and Activity Evaluation of Furan and Thiophene Amide Derivatives Applying Semi-Empirical ab initio Molecular Modeling Methods

Directory of Open Access Journals (Sweden)

Leszek Bober

2012-05-01

Full Text Available Pharmacological and physicochemical classification of the furan and thiophene amide derivatives by multiple regression analysis and partial least square (PLS based on semi-empirical ab initio molecular modeling studies and high-performance liquid chromatography (HPLC retention data is proposed. Structural parameters obtained from the PCM (Polarizable Continuum Model method and the literature values of biological activity (antiproliferative for the A431 cells expressed as LD₅₀ of the examined furan and thiophene derivatives was used to search for relationships. It was tested how variable molecular modeling conditions considered together, with or without HPLC retention data, allow evaluation of the structural recognition of furan and thiophene derivatives with respect to their pharmacological properties.
Gaussian process regression analysis for functional data

CERN Document Server

Shi, Jian Qing

2011-01-01

Gaussian Process Regression Analysis for Functional Data presents nonparametric statistical methods for functional regression analysis, specifically the methods based on a Gaussian process prior in a functional space. The authors focus on problems involving functional response variables and mixed covariates of functional and scalar variables.Covering the basics of Gaussian process regression, the first several chapters discuss functional data analysis, theoretical aspects based on the asymptotic properties of Gaussian process regression models, and new methodological developments for high dime
A Monte Carlo simulation study comparing linear regression, beta regression, variable-dispersion beta regression and fractional logit regression at recovering average difference measures in a two sample design.

Science.gov (United States)

Meaney, Christopher; Moineddin, Rahim

2014-01-24

In biomedical research, response variables are often encountered which have bounded support on the open unit interval--(0,1). Traditionally, researchers have attempted to estimate covariate effects on these types of response data using linear regression. Alternative modelling strategies may include: beta regression, variable-dispersion beta regression, and fractional logit regression models. This study employs a Monte Carlo simulation design to compare the statistical properties of the linear regression model to that of the more novel beta regression, variable-dispersion beta regression, and fractional logit regression models. In the Monte Carlo experiment we assume a simple two sample design. We assume observations are realizations of independent draws from their respective probability models. The randomly simulated draws from the various probability models are chosen to emulate average proportion/percentage/rate differences of pre-specified magnitudes. Following simulation of the experimental data we estimate average proportion/percentage/rate differences. We compare the estimators in terms of bias, variance, type-1 error and power. Estimates of Monte Carlo error associated with these quantities are provided. If response data are beta distributed with constant dispersion parameters across the two samples, then all models are unbiased and have reasonable type-1 error rates and power profiles. If the response data in the two samples have different dispersion parameters, then the simple beta regression model is biased. When the sample size is small (N0 = N1 = 25) linear regression has superior type-1 error rates compared to the other models. Small sample type-1 error rates can be improved in beta regression models using bias correction/reduction methods. In the power experiments, variable-dispersion beta regression and fractional logit regression models have slightly elevated power compared to linear regression models. Similar results were observed if the
Quantile regression theory and applications

CERN Document Server

Davino, Cristina; Vistocco, Domenico

2013-01-01

A guide to the implementation and interpretation of Quantile Regression models This book explores the theory and numerous applications of quantile regression, offering empirical data analysis as well as the software tools to implement the methods. The main focus of this book is to provide the reader with a comprehensivedescription of the main issues concerning quantile regression; these include basic modeling, geometrical interpretation, estimation and inference for quantile regression, as well as issues on validity of the model, diagnostic tools. Each methodological aspect is explored and
Kernel PLS Estimation of Single-trial Event-related Potentials

Science.gov (United States)

Rosipal, Roman; Trejo, Leonard J.

2004-01-01

Nonlinear kernel partial least squaes (KPLS) regressior, is a novel smoothing approach to nonparametric regression curve fitting. We have developed a KPLS approach to the estimation of single-trial event related potentials (ERPs). For improved accuracy of estimation, we also developed a local KPLS method for situations in which there exists prior knowledge about the approximate latency of individual ERP components. To assess the utility of the KPLS approach, we compared non-local KPLS and local KPLS smoothing with other nonparametric signal processing and smoothing methods. In particular, we examined wavelet denoising, smoothing splines, and localized smoothing splines. We applied these methods to the estimation of simulated mixtures of human ERPs and ongoing electroencephalogram (EEG) activity using a dipole simulator (BESA). In this scenario we considered ongoing EEG to represent spatially and temporally correlated noise added to the ERPs. This simulation provided a reasonable but simplified model of real-world ERP measurements. For estimation of the simulated single-trial ERPs, local KPLS provided a level of accuracy that was comparable with or better than the other methods. We also applied the local KPLS method to the estimation of human ERPs recorded in an experiment on co,onitive fatigue. For these data, the local KPLS method provided a clear improvement in visualization of single-trial ERPs as well as their averages. The local KPLS method may serve as a new alternative to the estimation of single-trial ERPs and improvement of ERP averages.
Regression of environmental noise in LIGO data

International Nuclear Information System (INIS)

Tiwari, V; Klimenko, S; Mitselmakher, G; Necula, V; Drago, M; Prodi, G; Frolov, V; Yakushin, I; Re, V; Salemi, F; Vedovato, G

2015-01-01

We address the problem of noise regression in the output of gravitational-wave (GW) interferometers, using data from the physical environmental monitors (PEM). The objective of the regression analysis is to predict environmental noise in the GW channel from the PEM measurements. One of the most promising regression methods is based on the construction of Wiener–Kolmogorov (WK) filters. Using this method, the seismic noise cancellation from the LIGO GW channel has already been performed. In the presented approach the WK method has been extended, incorporating banks of Wiener filters in the time–frequency domain, multi-channel analysis and regulation schemes, which greatly enhance the versatility of the regression analysis. Also we present the first results on regression of the bi-coherent noise in the LIGO data. (paper)
Development and validation of a Partial Least Squares-Discriminant Analysis (PLS-DA) model based on the determination of ethyl glucuronide (EtG) and fatty acid ethyl esters (FAEEs) in hair for the diagnosis of chronic alcohol abuse.

Science.gov (United States)

Alladio, E; Giacomelli, L; Biosa, G; Corcia, D Di; Gerace, E; Salomone, A; Vincenti, M

2018-01-01

The chronic intake of an excessive amount of alcohol is currently ascertained by determining the concentration of direct alcohol metabolites in the hair samples of the alleged abusers, including ethyl glucuronide (EtG) and, less frequently, fatty acid ethyl esters (FAEEs). Indirect blood biomarkers of alcohol abuse are still determined to support hair EtG results and diagnose a consequent liver impairment. In the present study, the supporting role of hair FAEEs is compared with indirect blood biomarkers with respect to the contexts in which hair EtG interpretation is uncertain. Receiver Operating Characteristics (ROC) curves and multivariate Principal Component Analysis (PCA) demonstrated much stronger correlation of EtG results with FAEEs than with any single indirect biomarker or their combinations. Partial Least Squares Discriminant Analysis (PLS-DA) models based on hair EtG and FAEEs were developed to maximize the biomarkers information content on a multivariate background. The final PLS-DA model yielded 100% correct classification on a training/evaluation dataset of 155 subjects, including both chronic alcohol abusers and social drinkers. Then, the PLS-DA model was validated on an external dataset of 81 individual providing optimal discrimination ability between chronic alcohol abusers and social drinkers, in terms of specificity and sensitivity. The PLS-DA scores obtained for each subject, with respect to the PLS-DA model threshold that separates the probabilistic distributions for the two classes, furnished a likelihood ratio value, which in turn conveys the strength of the experimental data support to the classification decision, within a Bayesian logic. Typical boundary real cases from daily work are discussed, too. Copyright © 2017 Elsevier B.V. All rights reserved.
Thermoluminescence dating of chinese porcelain using a regression method of saturating exponential in pre-dose technique

International Nuclear Information System (INIS)

Wang Weida; Xia Junding; Zhou Zhixin; Leung, P.L.

2001-01-01

Thermoluminescence (TL) dating using a regression method of saturating exponential in pre-dose technique was described. 23 porcelain samples from past dynasties of China were dated by this method. The results show that the TL ages are in reasonable agreement with archaeological dates within a standard deviation of 27%. Such error can be accepted in porcelain dating
Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages.

Science.gov (United States)

Kim, Yoonsang; Choi, Young-Ku; Emery, Sherry

2013-08-01

Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.
[Correlation coefficient-based classification method of hydrological dependence variability: With auto-regression model as example].

Science.gov (United States)

Zhao, Yu Xi; Xie, Ping; Sang, Yan Fang; Wu, Zi Yi

2018-04-01

Hydrological process evaluation is temporal dependent. Hydrological time series including dependence components do not meet the data consistency assumption for hydrological computation. Both of those factors cause great difficulty for water researches. Given the existence of hydrological dependence variability, we proposed a correlationcoefficient-based method for significance evaluation of hydrological dependence based on auto-regression model. By calculating the correlation coefficient between the original series and its dependence component and selecting reasonable thresholds of correlation coefficient, this method divided significance degree of dependence into no variability, weak variability, mid variability, strong variability, and drastic variability. By deducing the relationship between correlation coefficient and auto-correlation coefficient in each order of series, we found that the correlation coefficient was mainly determined by the magnitude of auto-correlation coefficient from the 1 order to p order, which clarified the theoretical basis of this method. With the first-order and second-order auto-regression models as examples, the reasonability of the deduced formula was verified through Monte-Carlo experiments to classify the relationship between correlation coefficient and auto-correlation coefficient. This method was used to analyze three observed hydrological time series. The results indicated the coexistence of stochastic and dependence characteristics in hydrological process.
A novel relational regularization feature selection method for joint regression and classification in AD diagnosis.

Science.gov (United States)

Zhu, Xiaofeng; Suk, Heung-Il; Wang, Li; Lee, Seong-Whan; Shen, Dinggang

2017-05-01

In this paper, we focus on joint regression and classification for Alzheimer's disease diagnosis and propose a new feature selection method by embedding the relational information inherent in the observations into a sparse multi-task learning framework. Specifically, the relational information includes three kinds of relationships (such as feature-feature relation, response-response relation, and sample-sample relation), for preserving three kinds of the similarity, such as for the features, the response variables, and the samples, respectively. To conduct feature selection, we first formulate the objective function by imposing these three relational characteristics along with an ℓ 2,1 -norm regularization term, and further propose a computationally efficient algorithm to optimize the proposed objective function. With the dimension-reduced data, we train two support vector regression models to predict the clinical scores of ADAS-Cog and MMSE, respectively, and also a support vector classification model to determine the clinical label. We conducted extensive experiments on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset to validate the effectiveness of the proposed method. Our experimental results showed the efficacy of the proposed method in enhancing the performances of both clinical scores prediction and disease status identification, compared to the state-of-the-art methods. Copyright © 2015 Elsevier B.V. All rights reserved.
Applied regression analysis a research tool

CERN Document Server

Pantula, Sastry; Dickey, David

1998-01-01

Least squares estimation, when used appropriately, is a powerful research tool. A deeper understanding of the regression concepts is essential for achieving optimal benefits from a least squares analysis. This book builds on the fundamentals of statistical methods and provides appropriate concepts that will allow a scientist to use least squares as an effective research tool. Applied Regression Analysis is aimed at the scientist who wishes to gain a working knowledge of regression analysis. The basic purpose of this book is to develop an understanding of least squares and related statistical methods without becoming excessively mathematical. It is the outgrowth of more than 30 years of consulting experience with scientists and many years of teaching an applied regression course to graduate students. Applied Regression Analysis serves as an excellent text for a service course on regression for non-statisticians and as a reference for researchers. It also provides a bridge between a two-semester introduction to...
Housing price forecastability: A factor analysis

DEFF Research Database (Denmark)

Møller, Stig Vinther; Bork, Lasse

2017-01-01

We examine U.S. housing price forecastability using principal component analysis (PCA), partial least squares (PLS), and sparse PLS (SPLS). We incorporate information from a large panel of 128 economic time series and show that macroeconomic fundamentals have strong predictive power for future...... movements in housing prices. We find that (S)PLS models systematically dominate PCA models. (S)PLS models also generate significant out-of-sample predictive power over and above the predictive power contained by the price-rent ratio, autoregressive benchmarks, and regression models based on small datasets....
DATA MINING METHODS FOR OMICS AND KNOWLEDGE OF CRUDE MEDICINAL PLANTS TOWARD BIG DATA BIOLOGY

Directory of Open Access Journals (Sweden)

Farit M. Afendi

2013-01-01

Full Text Available Molecular biological data has rapidly increased with the recent progress of the Omics fields, e.g., genomics, transcriptomics, proteomics and metabolomics that necessitates the development of databases and methods for efficient storage, retrieval, integration and analysis of massive data. The present study reviews the usage of KNApSAcK Family DB in metabolomics and related area, discusses several statistical methods for handling multivariate data and shows their application on Indonesian blended herbal medicines (Jamu as a case study. Exploration using Biplot reveals many plants are rarely utilized while some plants are highly utilized toward specific efficacy. Furthermore, the ingredients of Jamu formulas are modeled using Partial Least Squares Discriminant Analysis (PLS-DA in order to predict their efficacy. The plants used in each Jamu medicine served as the predictors, whereas the efficacy of each Jamu provided the responses. This model produces 71.6% correct classification in predicting efficacy. Permutation test then is used to determine plants that serve as main ingredients in Jamu formula by evaluating the significance of the PLS-DA coefficients. Next, in order to explain the role of plants that serve as main ingredients in Jamu medicines, information of pharmacological activity of the plants is added to the predictor block. Then N-PLS-DA model, multiway version of PLS-DA, is utilized to handle the three-dimensional array of the predictor block. The resulting N-PLS-DA model reveals that the effects of some pharmacological activities are specific for certain efficacy and the other activities are diverse toward many efficacies. Mathematical modeling introduced in the present study can be utilized in global analysis of big data targeting to reveal the underlying biology.

Validated univariate and multivariate spectrophotometric methods for the determination of pharmaceuticals mixture in complex wastewater

Science.gov (United States)

Riad, Safaa M.; Salem, Hesham; Elbalkiny, Heba T.; Khattab, Fatma I.

2015-04-01

Five, accurate, precise, and sensitive univariate and multivariate spectrophotometric methods were developed for the simultaneous determination of a ternary mixture containing Trimethoprim (TMP), Sulphamethoxazole (SMZ) and Oxytetracycline (OTC) in waste water samples collected from different cites either production wastewater or livestock wastewater after their solid phase extraction using OASIS HLB cartridges. In univariate methods OTC was determined at its λmax 355.7 nm (0D), while (TMP) and (SMZ) were determined by three different univariate methods. Method (A) is based on successive spectrophotometric resolution technique (SSRT). The technique starts with the ratio subtraction method followed by ratio difference method for determination of TMP and SMZ. Method (B) is successive derivative ratio technique (SDR). Method (C) is mean centering of the ratio spectra (MCR). The developed multivariate methods are principle component regression (PCR) and partial least squares (PLS). The specificity of the developed methods is investigated by analyzing laboratory prepared mixtures containing different ratios of the three drugs. The obtained results are statistically compared with those obtained by the official methods, showing no significant difference with respect to accuracy and precision at p = 0.05.
Power system state estimation using an iteratively reweighted least squares method for sequential L{sub 1}-regression

Energy Technology Data Exchange (ETDEWEB)

Jabr, R.A. [Electrical, Computer and Communication Engineering Department, Notre Dame University, P.O. Box 72, Zouk Mikhael, Zouk Mosbeh (Lebanon)

2006-02-15

This paper presents an implementation of the least absolute value (LAV) power system state estimator based on obtaining a sequence of solutions to the L{sub 1}-regression problem using an iteratively reweighted least squares (IRLS{sub L1}) method. The proposed implementation avoids reformulating the regression problem into standard linear programming (LP) form and consequently does not require the use of common methods of LP, such as those based on the simplex method or interior-point methods. It is shown that the IRLS{sub L1} method is equivalent to solving a sequence of linear weighted least squares (LS) problems. Thus, its implementation presents little additional effort since the sparse LS solver is common to existing LS state estimators. Studies on the termination criteria of the IRLS{sub L1} method have been carried out to determine a procedure for which the proposed estimator is more computationally efficient than a previously proposed non-linear iteratively reweighted least squares (IRLS) estimator. Indeed, it is revealed that the proposed method is a generalization of the previously reported IRLS estimator, but is based on more rigorous theory. (author)
Vector regression introduced

Directory of Open Access Journals (Sweden)

Mok Tik

2014-06-01

Full Text Available This study formulates regression of vector data that will enable statistical analysis of various geodetic phenomena such as, polar motion, ocean currents, typhoon/hurricane tracking, crustal deformations, and precursory earthquake signals. The observed vector variable of an event (dependent vector variable is expressed as a function of a number of hypothesized phenomena realized also as vector variables (independent vector variables and/or scalar variables that are likely to impact the dependent vector variable. The proposed representation has the unique property of solving the coefficients of independent vector variables (explanatory variables also as vectors, hence it supersedes multivariate multiple regression models, in which the unknown coefficients are scalar quantities. For the solution, complex numbers are used to rep- resent vector information, and the method of least squares is deployed to estimate the vector model parameters after transforming the complex vector regression model into a real vector regression model through isomorphism. Various operational statistics for testing the predictive significance of the estimated vector parameter coefficients are also derived. A simple numerical example demonstrates the use of the proposed vector regression analysis in modeling typhoon paths.
On Solving Lq-Penalized Regressions

Directory of Open Access Journals (Sweden)

Tracy Zhou Wu

2007-01-01

Full Text Available Lq-penalized regression arises in multidimensional statistical modelling where all or part of the regression coefficients are penalized to achieve both accuracy and parsimony of statistical models. There is often substantial computational difficulty except for the quadratic penalty case. The difficulty is partly due to the nonsmoothness of the objective function inherited from the use of the absolute value. We propose a new solution method for the general Lq-penalized regression problem based on space transformation and thus efficient optimization algorithms. The new method has immediate applications in statistics, notably in penalized spline smoothing problems. In particular, the LASSO problem is shown to be polynomial time solvable. Numerical studies show promise of our approach.
Predicting volume of distribution with decision tree-based regression methods using predicted tissue:plasma partition coefficients.

Science.gov (United States)

Freitas, Alex A; Limbu, Kriti; Ghafourian, Taravat

2015-01-01

Volume of distribution is an important pharmacokinetic property that indicates the extent of a drug's distribution in the body tissues. This paper addresses the problem of how to estimate the apparent volume of distribution at steady state (Vss) of chemical compounds in the human body using decision tree-based regression methods from the area of data mining (or machine learning). Hence, the pros and cons of several different types of decision tree-based regression methods have been discussed. The regression methods predict Vss using, as predictive features, both the compounds' molecular descriptors and the compounds' tissue:plasma partition coefficients (Kt:p) - often used in physiologically-based pharmacokinetics. Therefore, this work has assessed whether the data mining-based prediction of Vss can be made more accurate by using as input not only the compounds' molecular descriptors but also (a subset of) their predicted Kt:p values. Comparison of the models that used only molecular descriptors, in particular, the Bagging decision tree (mean fold error of 2.33), with those employing predicted Kt:p values in addition to the molecular descriptors, such as the Bagging decision tree using adipose Kt:p (mean fold error of 2.29), indicated that the use of predicted Kt:p values as descriptors may be beneficial for accurate prediction of Vss using decision trees if prior feature selection is applied. Decision tree based models presented in this work have an accuracy that is reasonable and similar to the accuracy of reported Vss inter-species extrapolations in the literature. The estimation of Vss for new compounds in drug discovery will benefit from methods that are able to integrate large and varied sources of data and flexible non-linear data mining methods such as decision trees, which can produce interpretable models. Graphical AbstractDecision trees for the prediction of tissue partition coefficient and volume of distribution of drugs.
Analysis of some methods for reduced rank Gaussian process regression

DEFF Research Database (Denmark)

Quinonero-Candela, J.; Rasmussen, Carl Edward

2005-01-01

While there is strong motivation for using Gaussian Processes (GPs) due to their excellent performance in regression and classification problems, their computational complexity makes them impractical when the size of the training set exceeds a few thousand cases. This has motivated the recent...... proliferation of a number of cost-effective approximations to GPs, both for classification and for regression. In this paper we analyze one popular approximation to GPs for regression: the reduced rank approximation. While generally GPs are equivalent to infinite linear models, we show that Reduced Rank...... Gaussian Processes (RRGPs) are equivalent to finite sparse linear models. We also introduce the concept of degenerate GPs and show that they correspond to inappropriate priors. We show how to modify the RRGP to prevent it from being degenerate at test time. Training RRGPs consists both in learning...
The Bacillus subtilis Conjugative Plasmid pLS20 Encodes Two Ribbon-Helix-Helix Type Auxiliary Relaxosome Proteins That Are Essential for Conjugation

Directory of Open Access Journals (Sweden)

Andrés Miguel-Arribas

2017-11-01

Full Text Available Bacterial conjugation is the process by which a conjugative element (CE is transferred horizontally from a donor to a recipient cell via a connecting pore. One of the first steps in the conjugation process is the formation of a nucleoprotein complex at the origin of transfer (oriT, where one of the components of the nucleoprotein complex, the relaxase, introduces a site- and strand specific nick to initiate the transfer of a single DNA strand into the recipient cell. In most cases, the nucleoprotein complex involves, besides the relaxase, one or more additional proteins, named auxiliary proteins, which are encoded by the CE and/or the host. The conjugative plasmid pLS20 replicates in the Gram-positive Firmicute bacterium Bacillus subtilis. We have recently identified the relaxase gene and the oriT of pLS20, which are separated by a region of almost 1 kb. Here we show that this region contains two auxiliary genes that we name aux1LS20 and aux2LS20, and which we show are essential for conjugation. Both Aux1LS20 and Aux2LS20 are predicted to contain a Ribbon-Helix-Helix DNA binding motif near their N-terminus. Analyses of the purified proteins show that Aux1LS20 and Aux2LS20 form tetramers and hexamers in solution, respectively, and that they both bind preferentially to oriTLS20, although with different characteristics and specificities. In silico analyses revealed that genes encoding homologs of Aux1LS20 and/or Aux2LS20 are located upstream of almost 400 relaxase genes of the RelLS20 family (MOBL of relaxases. Thus, Aux1LS20 and Aux2LS20 of pLS20 constitute the founding member of the first two families of auxiliary proteins described for CEs of Gram-positive origin.
The Bacillus subtilis Conjugative Plasmid pLS20 Encodes Two Ribbon-Helix-Helix Type Auxiliary Relaxosome Proteins That Are Essential for Conjugation.

Science.gov (United States)

Miguel-Arribas, Andrés; Hao, Jian-An; Luque-Ortega, Juan R; Ramachandran, Gayetri; Val-Calvo, Jorge; Gago-Córdoba, César; González-Álvarez, Daniel; Abia, David; Alfonso, Carlos; Wu, Ling J; Meijer, Wilfried J J

2017-01-01

Bacterial conjugation is the process by which a conjugative element (CE) is transferred horizontally from a donor to a recipient cell via a connecting pore. One of the first steps in the conjugation process is the formation of a nucleoprotein complex at the origin of transfer ( oriT ), where one of the components of the nucleoprotein complex, the relaxase, introduces a site- and strand specific nick to initiate the transfer of a single DNA strand into the recipient cell. In most cases, the nucleoprotein complex involves, besides the relaxase, one or more additional proteins, named auxiliary proteins, which are encoded by the CE and/or the host. The conjugative plasmid pLS20 replicates in the Gram-positive Firmicute bacterium Bacillus subtilis . We have recently identified the relaxase gene and the oriT of pLS20, which are separated by a region of almost 1 kb. Here we show that this region contains two auxiliary genes that we name aux1 LS20 and aux2 LS20 , and which we show are essential for conjugation. Both Aux1 LS20 and Aux2 LS20 are predicted to contain a Ribbon-Helix-Helix DNA binding motif near their N-terminus. Analyses of the purified proteins show that Aux1 LS20 and Aux2 LS20 form tetramers and hexamers in solution, respectively, and that they both bind preferentially to oriT LS20 , although with different characteristics and specificities. In silico analyses revealed that genes encoding homologs of Aux1 LS20 and/or Aux2 LS20 are located upstream of almost 400 relaxase genes of the Rel LS20 family (MOB L ) of relaxases. Thus, Aux1 LS20 and Aux2 LS20 of pLS20 constitute the founding member of the first two families of auxiliary proteins described for CEs of Gram-positive origin.
Laser-induced Breakdown spectroscopy quantitative analysis method via adaptive analytical line selection and relevance vector machine regression model

International Nuclear Information System (INIS)

Yang, Jianhong; Yi, Cancan; Xu, Jinwu; Ma, Xianghong

2015-01-01

A new LIBS quantitative analysis method based on analytical line adaptive selection and Relevance Vector Machine (RVM) regression model is proposed. First, a scheme of adaptively selecting analytical line is put forward in order to overcome the drawback of high dependency on a priori knowledge. The candidate analytical lines are automatically selected based on the built-in characteristics of spectral lines, such as spectral intensity, wavelength and width at half height. The analytical lines which will be used as input variables of regression model are determined adaptively according to the samples for both training and testing. Second, an LIBS quantitative analysis method based on RVM is presented. The intensities of analytical lines and the elemental concentrations of certified standard samples are used to train the RVM regression model. The predicted elemental concentration analysis results will be given with a form of confidence interval of probabilistic distribution, which is helpful for evaluating the uncertainness contained in the measured spectra. Chromium concentration analysis experiments of 23 certified standard high-alloy steel samples have been carried out. The multiple correlation coefficient of the prediction was up to 98.85%, and the average relative error of the prediction was 4.01%. The experiment results showed that the proposed LIBS quantitative analysis method achieved better prediction accuracy and better modeling robustness compared with the methods based on partial least squares regression, artificial neural network and standard support vector machine. - Highlights: • Both training and testing samples are considered for analytical lines selection. • The analytical lines are auto-selected based on the built-in characteristics of spectral lines. • The new method can achieve better prediction accuracy and modeling robustness. • Model predictions are given with confidence interval of probabilistic distribution
Principal component regression analysis with SPSS.

Science.gov (United States)

Liu, R X; Kuang, J; Gong, Q; Hou, X L

2003-06-01

The paper introduces all indices of multicollinearity diagnoses, the basic principle of principal component regression and determination of 'best' equation method. The paper uses an example to describe how to do principal component regression analysis with SPSS 10.0: including all calculating processes of the principal component regression and all operations of linear regression, factor analysis, descriptives, compute variable and bivariate correlations procedures in SPSS 10.0. The principal component regression analysis can be used to overcome disturbance of the multicollinearity. The simplified, speeded up and accurate statistical effect is reached through the principal component regression analysis with SPSS.
Using Reflectance Spectroscopy and Artificial Neural Network to Assess Water Infiltration Rate into the Soil Profile

Directory of Open Access Journals (Sweden)

Naftali Goldshleger

2012-01-01

Full Text Available We explored the effect of raindrop energy on both water infiltration into soil and the soil's NIR-SWIR spectral reflectance (1200–2400 nm. Seven soils with different physical and morphological properties from Israel and the US were subjected to an artificial rainstorm. The spectral properties of the crust formed on the soil surface were analyzed using an artificial neural network (ANN. Results were compared to a study with the same population in which partial least-squares (PLS regression was applied. It was concluded that both models (PLS regression and ANN are generic as they are based on properties that correlate with the physical crust, such as clay content, water content and organic matter. Nonetheless, better results for the connection between infiltration rate and spectral properties were achieved with the non-linear ANN technique in terms of statistical values (RMSE of 17.3% for PLS regression and 10% for ANN. Furthermore, although both models were run at the selected wavelengths and their accuracy was assessed with an independent external group of samples, no pre-processing procedure was applied to the reflectance data when using ANN. As the relationship between infiltration rate and soil reflectance is not linear, ANN methods have the advantage for examining this relationship when many soils are being analyzed.
Sparse canonical methods for biological data integration: application to a cross-platform study

Directory of Open Access Journals (Sweden)

Robert-Granié Christèle

2009-01-01

Full Text Available Abstract Background In the context of systems biology, few sparse approaches have been proposed so far to integrate several data sets. It is however an important and fundamental issue that will be widely encountered in post genomic studies, when simultaneously analyzing transcriptomics, proteomics and metabolomics data using different platforms, so as to understand the mutual interactions between the different data sets. In this high dimensional setting, variable selection is crucial to give interpretable results. We focus on a sparse Partial Least Squares approach (sPLS to handle two-block data sets, where the relationship between the two types of variables is known to be symmetric. Sparse PLS has been developed either for a regression or a canonical correlation framework and includes a built-in procedure to select variables while integrating data. To illustrate the canonical mode approach, we analyzed the NCI60 data sets, where two different platforms (cDNA and Affymetrix chips were used to study the transcriptome of sixty cancer cell lines. Results We compare the results obtained with two other sparse or related canonical correlation approaches: CCA with Elastic Net penalization (CCA-EN and Co-Inertia Analysis (CIA. The latter does not include a built-in procedure for variable selection and requires a two-step analysis. We stress the lack of statistical criteria to evaluate canonical correlation methods, which makes biological interpretation absolutely necessary to compare the different gene selections. We also propose comprehensive graphical representations of both samples and variables to facilitate the interpretation of the results. Conclusion sPLS and CCA-EN selected highly relevant genes and complementary findings from the two data sets, which enabled a detailed understanding of the molecular characteristics of several groups of cell lines. These two approaches were found to bring similar results, although they highlighted the same
Differentiating regressed melanoma from regressed lichenoid keratosis.

Science.gov (United States)

Chan, Aegean H; Shulman, Kenneth J; Lee, Bonnie A

2017-04-01

Distinguishing regressed lichen planus-like keratosis (LPLK) from regressed melanoma can be difficult on histopathologic examination, potentially resulting in mismanagement of patients. We aimed to identify histopathologic features by which regressed melanoma can be differentiated from regressed LPLK. Twenty actively inflamed LPLK, 12 LPLK with regression and 15 melanomas with regression were compared and evaluated by hematoxylin and eosin staining as well as Melan-A, microphthalmia transcription factor (MiTF) and cytokeratin (AE1/AE3) immunostaining. (1) A total of 40% of regressed melanomas showed complete or near complete loss of melanocytes within the epidermis with Melan-A and MiTF immunostaining, while 8% of regressed LPLK exhibited this finding. (2) Necrotic keratinocytes were seen in the epidermis in 33% regressed melanomas as opposed to all of the regressed LPLK. (3) A dense infiltrate of melanophages in the papillary dermis was seen in 40% of regressed melanomas, a feature not seen in regressed LPLK. In summary, our findings suggest that a complete or near complete loss of melanocytes within the epidermis strongly favors a regressed melanoma over a regressed LPLK. In addition, necrotic epidermal keratinocytes and the presence of a dense band-like distribution of dermal melanophages can be helpful in differentiating these lesions. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Determination of main fruits in adulterated nectars by ATR-FTIR spectroscopy combined with multivariate calibration and variable selection methods.

Science.gov (United States)

Miaw, Carolina Sheng Whei; Assis, Camila; Silva, Alessandro Rangel Carolino Sales; Cunha, Maria Luísa; Sena, Marcelo Martins; de Souza, Scheilla Vitorino Carvalho

2018-07-15

Grape, orange, peach and passion fruit nectars were formulated and adulterated by dilution with syrup, apple and cashew juices at 10 levels for each adulterant. Attenuated total reflectance Fourier transform mid infrared (ATR-FTIR) spectra were obtained. Partial least squares (PLS) multivariate calibration models allied to different variable selection methods, such as interval partial least squares (iPLS), ordered predictors selection (OPS) and genetic algorithm (GA), were used to quantify the main fruits. PLS improved by iPLS-OPS variable selection showed the highest predictive capacity to quantify the main fruit contents. The selected variables in the final models varied from 72 to 100; the root mean square errors of prediction were estimated from 0.5 to 2.6%; the correlation coefficients of prediction ranged from 0.948 to 0.990; and, the mean relative errors of prediction varied from 3.0 to 6.7%. All of the developed models were validated. Copyright © 2018 Elsevier Ltd. All rights reserved.
Determination of Commercials Cooking Oils and Fats Using Chemometrics Methods

International Nuclear Information System (INIS)

Azwan Mat Lazim; Mohd Zuli Jaafar; Phang Wei Shong, P.W.; Suzereen Jamil

2013-01-01

In this study, chemometric method has been used in determining the oil quality. The samples used were olive oil, sunflower oil and butter from two different brands. Two different conditions were applied, either it was fresh or fried. Titratio, a conventional method was used to determine free fatty acids content (FFA), iodine value (IV), and peroxide value (PV). Twelve samples were then used for analysis and their FTIR spectra were measured at 4000-400 cm -1 . The computer stimulation was used to process the data based on their pattern recognition which optimized by principal component analysis (PCA) and partial least squares (PLS). PCA model was used to distinguish the properties between fresh and fried oil. The PLS model was used to predict the value for validation test in comparison with conventional results. Results showed the validation value for fresh oil was 0.90. This indicated the chemometric method was in agreement with conventional method. (author)
Performance and separation occurrence of binary probit regression estimator using maximum likelihood method and Firths approach under different sample size

Science.gov (United States)

Lusiana, Evellin Dewi

2017-12-01

The parameters of binary probit regression model are commonly estimated by using Maximum Likelihood Estimation (MLE) method. However, MLE method has limitation if the binary data contains separation. Separation is the condition where there are one or several independent variables that exactly grouped the categories in binary response. It will result the estimators of MLE method become non-convergent, so that they cannot be used in modeling. One of the effort to resolve the separation is using Firths approach instead. This research has two aims. First, to identify the chance of separation occurrence in binary probit regression model between MLE method and Firths approach. Second, to compare the performance of binary probit regression model estimator that obtained by MLE method and Firths approach using RMSE criteria. Those are performed using simulation method and under different sample size. The results showed that the chance of separation occurrence in MLE method for small sample size is higher than Firths approach. On the other hand, for larger sample size, the probability decreased and relatively identic between MLE method and Firths approach. Meanwhile, Firths estimators have smaller RMSE than MLEs especially for smaller sample sizes. But for larger sample sizes, the RMSEs are not much different. It means that Firths estimators outperformed MLE estimator.
Regression calibration with more surrogates than mismeasured variables

KAUST Repository

Kipnis, Victor

2012-06-29

In a recent paper (Weller EA, Milton DK, Eisen EA, Spiegelman D. Regression calibration for logistic regression with multiple surrogates for one exposure. Journal of Statistical Planning and Inference 2007; 137: 449-461), the authors discussed fitting logistic regression models when a scalar main explanatory variable is measured with error by several surrogates, that is, a situation with more surrogates than variables measured with error. They compared two methods of adjusting for measurement error using a regression calibration approximate model as if it were exact. One is the standard regression calibration approach consisting of substituting an estimated conditional expectation of the true covariate given observed data in the logistic regression. The other is a novel two-stage approach when the logistic regression is fitted to multiple surrogates, and then a linear combination of estimated slopes is formed as the estimate of interest. Applying estimated asymptotic variances for both methods in a single data set with some sensitivity analysis, the authors asserted superiority of their two-stage approach. We investigate this claim in some detail. A troubling aspect of the proposed two-stage method is that, unlike standard regression calibration and a natural form of maximum likelihood, the resulting estimates are not invariant to reparameterization of nuisance parameters in the model. We show, however, that, under the regression calibration approximation, the two-stage method is asymptotically equivalent to a maximum likelihood formulation, and is therefore in theory superior to standard regression calibration. However, our extensive finite-sample simulations in the practically important parameter space where the regression calibration model provides a good approximation failed to uncover such superiority of the two-stage method. We also discuss extensions to different data structures.
Regression calibration with more surrogates than mismeasured variables

KAUST Repository

Kipnis, Victor; Midthune, Douglas; Freedman, Laurence S.; Carroll, Raymond J.

2012-01-01

In a recent paper (Weller EA, Milton DK, Eisen EA, Spiegelman D. Regression calibration for logistic regression with multiple surrogates for one exposure. Journal of Statistical Planning and Inference 2007; 137: 449-461), the authors discussed fitting logistic regression models when a scalar main explanatory variable is measured with error by several surrogates, that is, a situation with more surrogates than variables measured with error. They compared two methods of adjusting for measurement error using a regression calibration approximate model as if it were exact. One is the standard regression calibration approach consisting of substituting an estimated conditional expectation of the true covariate given observed data in the logistic regression. The other is a novel two-stage approach when the logistic regression is fitted to multiple surrogates, and then a linear combination of estimated slopes is formed as the estimate of interest. Applying estimated asymptotic variances for both methods in a single data set with some sensitivity analysis, the authors asserted superiority of their two-stage approach. We investigate this claim in some detail. A troubling aspect of the proposed two-stage method is that, unlike standard regression calibration and a natural form of maximum likelihood, the resulting estimates are not invariant to reparameterization of nuisance parameters in the model. We show, however, that, under the regression calibration approximation, the two-stage method is asymptotically equivalent to a maximum likelihood formulation, and is therefore in theory superior to standard regression calibration. However, our extensive finite-sample simulations in the practically important parameter space where the regression calibration model provides a good approximation failed to uncover such superiority of the two-stage method. We also discuss extensions to different data structures.
Face Hallucination with Linear Regression Model in Semi-Orthogonal Multilinear PCA Method

Science.gov (United States)

Asavaskulkiet, Krissada

2018-04-01

In this paper, we propose a new face hallucination technique, face images reconstruction in HSV color space with a semi-orthogonal multilinear principal component analysis method. This novel hallucination technique can perform directly from tensors via tensor-to-vector projection by imposing the orthogonality constraint in only one mode. In our experiments, we use facial images from FERET database to test our hallucination approach which is demonstrated by extensive experiments with high-quality hallucinated color faces. The experimental results assure clearly demonstrated that we can generate photorealistic color face images by using the SO-MPCA subspace with a linear regression model.
Regression with Sparse Approximations of Data

DEFF Research Database (Denmark)

Noorzad, Pardis; Sturm, Bob L.

2012-01-01

We propose sparse approximation weighted regression (SPARROW), a method for local estimation of the regression function that uses sparse approximation with a dictionary of measurements. SPARROW estimates the regression function at a point with a linear combination of a few regressands selected...... by a sparse approximation of the point in terms of the regressors. We show SPARROW can be considered a variant of \\(k\\)-nearest neighbors regression (\\(k\\)-NNR), and more generally, local polynomial kernel regression. Unlike \\(k\\)-NNR, however, SPARROW can adapt the number of regressors to use based...

Two and three way spectrophotometric-assisted multivariate determination of linezolid in the presence of its alkaline and oxidative degradation products and application to pharmaceutical formulation

Science.gov (United States)

Hegazy, Maha Abd El-Monem; Eissa, Maya Shaaban; Abd El-Sattar, Osama Ibrahim; Abd El-Kawy, Mohammad

2014-07-01

Linezolid (LIN) is determined in the presence of its alkaline (ALK) and oxidative (OXD) degradation products without preliminary separation based on ultraviolet spectrophotometry using two-way chemometric methods; principal component regression (PCR) and partial least-squares (PLS), and three-way chemometric methods; parallel factor analysis (PARAFAC) and multi-way partial least squares (N-PLS). A training set of mixtures containing LIN, ALK and OXD; was prepared in the concentration ranges of 12-18, 2.4-3.6 and 1.2-1.8 μg mL-1, respectively according to a multilevel multifactor experimental design. The multivariate calibrations were obtained by measuring the zero-order absorbance from 220 to 320 nm using the training set. The validation of the multivariate methods was realized by analyzing their synthetic mixtures. The capabilities of the chemometric analysis methods for the analysis of real samples were evaluated by determination of LIN in its pharmaceutical preparation with satisfactory results. The accuracy of the methods, evaluated through the root mean square error of prediction (RMSEP), was 0.058, 0.026, 0.101 and 0.026 for LIN using PCR, PLS, PARAFAC and N-PLS, respectively. Protolytic equilibria of LIN and its degradation products were evaluated using the corresponding absorption spectra-pH data obtained with PARAFAC. The obtained pKa values of LIN, ALK and OXD are 5.70, 8.90 and 6.15, respectively. The results obtained were statistically compared to that of a reported HPLC method, and there was no significant difference between the proposed methods and the reported method regarding both accuracy and precision.
Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages

Science.gov (United States)

Kim, Yoonsang; Emery, Sherry

2013-01-01

Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods’ performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages—SAS GLIMMIX Laplace and SuperMix Gaussian quadrature—perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. PMID:24288415
Propiedades psicométricas de la Escala de lenguaje para preescolares (PLS-3 colombianos

Directory of Open Access Journals (Sweden)

Rita Flórez Romero

2013-01-01

Full Text Available Objetivo.El presente estudio buscó caracterizar las propiedades psicométricas del instrumento Preschool Language Scale – 3(PLS-3, en una muestra de 477 niños colombianos de cuatro a siete años de la ciudad de Bogotá D. C. Método. Para lograr este propósito, se realizaron análisis de coeficientes de discriminación, dificultad y matriz de relaciones tetracóricas. Resultados. Se encontraron apropiados niveles de confiabilidad, alta sensibilidad a la evolución de la comprensión y producción lingüística de los niños, así como un bajo índice de dificultad en algunos reactivos. Conclusión. Estos resultados se discuten a la luz de índices de discriminación en pruebas de desarrollo lingüístico típico y atípico.
A method for nonlinear exponential regression analysis

Science.gov (United States)

Junkin, B. G.

1971-01-01

A computer-oriented technique is presented for performing a nonlinear exponential regression analysis on decay-type experimental data. The technique involves the least squares procedure wherein the nonlinear problem is linearized by expansion in a Taylor series. A linear curve fitting procedure for determining the initial nominal estimates for the unknown exponential model parameters is included as an integral part of the technique. A correction matrix was derived and then applied to the nominal estimate to produce an improved set of model parameters. The solution cycle is repeated until some predetermined criterion is satisfied.
Sparse reduced-rank regression with covariance estimation

KAUST Repository

Chen, Lisha

2014-12-08

Improving the predicting performance of the multiple response regression compared with separate linear regressions is a challenging question. On the one hand, it is desirable to seek model parsimony when facing a large number of parameters. On the other hand, for certain applications it is necessary to take into account the general covariance structure for the errors of the regression model. We assume a reduced-rank regression model and work with the likelihood function with general error covariance to achieve both objectives. In addition we propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty, and to estimate the error covariance matrix simultaneously by using a similar penalty on the precision matrix. We develop a numerical algorithm to solve the penalized regression problem. In a simulation study and real data analysis, the new method is compared with two recent methods for multivariate regression and exhibits competitive performance in prediction and variable selection.
Sparse reduced-rank regression with covariance estimation

KAUST Repository

Chen, Lisha; Huang, Jianhua Z.

2014-01-01

Improving the predicting performance of the multiple response regression compared with separate linear regressions is a challenging question. On the one hand, it is desirable to seek model parsimony when facing a large number of parameters. On the other hand, for certain applications it is necessary to take into account the general covariance structure for the errors of the regression model. We assume a reduced-rank regression model and work with the likelihood function with general error covariance to achieve both objectives. In addition we propose to select relevant variables for reduced-rank regression by using a sparsity-inducing penalty, and to estimate the error covariance matrix simultaneously by using a similar penalty on the precision matrix. We develop a numerical algorithm to solve the penalized regression problem. In a simulation study and real data analysis, the new method is compared with two recent methods for multivariate regression and exhibits competitive performance in prediction and variable selection.
Statistical learning method in regression analysis of simulated positron spectral data

International Nuclear Information System (INIS)

Avdic, S. Dz.

2005-01-01

Positron lifetime spectroscopy is a non-destructive tool for detection of radiation induced defects in nuclear reactor materials. This work concerns the applicability of the support vector machines method for the input data compression in the neural network analysis of positron lifetime spectra. It has been demonstrated that the SVM technique can be successfully applied to regression analysis of positron spectra. A substantial data compression of about 50 % and 8 % of the whole training set with two and three spectral components respectively has been achieved including a high accuracy of the spectra approximation. However, some parameters in the SVM approach such as the insensitivity zone e and the penalty parameter C have to be chosen carefully to obtain a good performance. (author)
The crux of the method: assumptions in ordinary least squares and logistic regression.

Science.gov (United States)

Long, Rebecca G

2008-10-01

Logistic regression has increasingly become the tool of choice when analyzing data with a binary dependent variable. While resources relating to the technique are widely available, clear discussions of why logistic regression should be used in place of ordinary least squares regression are difficult to find. The current paper compares and contrasts the assumptions of ordinary least squares with those of logistic regression and explains why logistic regression's looser assumptions make it adept at handling violations of the more important assumptions in ordinary least squares.
Regression and direct methods do not give different estimates of digestible and metabolizable energy values of barley, sorghum, and wheat for pigs.

Science.gov (United States)

Bolarinwa, O A; Adeola, O

2016-02-01

Direct or indirect methods can be used to determine the DE and ME of feed ingredients for pigs. In situations when only the indirect approach is suitable, the regression method presents a robust indirect approach. Three experiments were conducted to compare the direct and regression methods for determining the DE and ME values of barley, sorghum, and wheat for pigs. In each experiment, 24 barrows with an average initial BW of 31, 32, and 33 kg were assigned to 4 diets in a randomized complete block design. The 4 diets consisted of 969 g barley, sorghum, or wheat/kg plus minerals and vitamins for the direct method; a corn-soybean meal reference diet (RD); the RD + 300 g barley, sorghum, or wheat/kg; and the RD + 600 g barley, sorghum, or wheat/kg. The 3 corn-soybean meal diets were used for the regression method. Each diet was fed to 6 barrows in individual metabolism crates for a 5-d acclimation followed by a 5-d period of total but separate collection of feces and urine in each experiment. Graded substitution of barley or wheat, but not sorghum, into the RD linearly reduced ( direct method-derived DE and ME for barley were 3,669 and 3,593 kcal/kg DM, respectively. The regressions of barley contribution to DE and ME in kilocalories against the quantity of barley DMI in kilograms generated 3,746 kcal DE/kg DM and 3,647 kcal ME/kg DM. The DE and ME for sorghum by the direct method were 4,097 and 4,042 kcal/kg DM, respectively; the corresponding regression-derived estimates were 4,145 and 4,066 kcal/kg DM. Using the direct method, energy values for wheat were 3,953 kcal DE/kg DM and 3,889 kcal ME/kg DM. The regressions of wheat contribution to DE and ME in kilocalories against the quantity of wheat DMI in kilograms generated 3,960 kcal DE/kg DM and 3,874 kcal ME/kg DM. The DE and ME of barley using the direct method were not different (0.3 direct method-derived DE and ME of sorghum were not different (0.5 direct method- and regression method-derived DE (3,953 and 3
Learning a Nonnegative Sparse Graph for Linear Regression.

Science.gov (United States)

Fang, Xiaozhao; Xu, Yong; Li, Xuelong; Lai, Zhihui; Wong, Wai Keung

2015-09-01

Previous graph-based semisupervised learning (G-SSL) methods have the following drawbacks: 1) they usually predefine the graph structure and then use it to perform label prediction, which cannot guarantee an overall optimum and 2) they only focus on the label prediction or the graph structure construction but are not competent in handling new samples. To this end, a novel nonnegative sparse graph (NNSG) learning method was first proposed. Then, both the label prediction and projection learning were integrated into linear regression. Finally, the linear regression and graph structure learning were unified within the same framework to overcome these two drawbacks. Therefore, a novel method, named learning a NNSG for linear regression was presented, in which the linear regression and graph learning were simultaneously performed to guarantee an overall optimum. In the learning process, the label information can be accurately propagated via the graph structure so that the linear regression can learn a discriminative projection to better fit sample labels and accurately classify new samples. An effective algorithm was designed to solve the corresponding optimization problem with fast convergence. Furthermore, NNSG provides a unified perceptiveness for a number of graph-based learning methods and linear regression methods. The experimental results showed that NNSG can obtain very high classification accuracy and greatly outperforms conventional G-SSL methods, especially some conventional graph construction methods.
[Prediction of total nitrogen and alkali hydrolysable nitrogen content in loess using hyperspectral data based on correlation analysis and partial least squares regression].

Science.gov (United States)

Liu, Xiu-ying; Wang, Li; Chang, Qing-rui; Wang, Xiao-xing; Shang, Yan

2015-07-01

Wuqi County of Shaanxi Province, where the vegetation recovering measures have been carried out for years, was taken as the study area. A total of 100 loess samples from 24 different profiles were collected. Total nitrogen (TN) and alkali hydrolysable nitrogen (AHN) contents of the soil samples were analyzed, and the soil samples were scanned in the visible/near-infrared (VNIR) region of 350-2500 nm in the laboratory. The calibration models were developed between TN and AHN contents and VNIR values based on correlation analysis (CA) and partial least squares regression (PLS). Independent samples validated the calibration models. The results indicated that the optimum model for predicting TN of loess was established by using first derivative of reflectance. The best model for predicting AHN of loess was established by using normal derivative spectra. The optimum TN model could effectively predict TN in loess from 0 to 40 cm, but the optimum AHN model could only roughly predict AHN at the same depth. This study provided a good method for rapidly predicting TN of loess where vegetation recovering measures have been adopted, but prediction of AHN needs to be further studied.
Methods of Detecting Outliers in A Regression Analysis Model ...

African Journals Online (AJOL)

PROF. O. E. OSUAGWU

2013-06-01

Jun 1, 2013 ... especially true in observational studies .... Simple linear regression and multiple ... The simple linear ..... Grubbs,F.E (1950): Sample Criteria for Testing Outlying observations: Annals of ... In experimental design, the Relative.
Assessing the reliability of the borderline regression method as a standard setting procedure for objective structured clinical examination

Directory of Open Access Journals (Sweden)

Sara Mortaz Hejri

2013-01-01

Full Text Available Background: One of the methods used for standard setting is the borderline regression method (BRM. This study aims to assess the reliability of BRM when the pass-fail standard in an objective structured clinical examination (OSCE was calculated by averaging the BRM standards obtained for each station separately. Materials and Methods: In nine stations of the OSCE with direct observation the examiners gave each student a checklist score and a global score. Using a linear regression model for each station, we calculated the checklist score cut-off on the regression equation for the global scale cut-off set at 2. The OSCE pass-fail standard was defined as the average of all station′s standard. To determine the reliability, the root mean square error (RMSE was calculated. The R2 coefficient and the inter-grade discrimination were calculated to assess the quality of OSCE. Results: The mean total test score was 60.78. The OSCE pass-fail standard and its RMSE were 47.37 and 0.55, respectively. The R2 coefficients ranged from 0.44 to 0.79. The inter-grade discrimination score varied greatly among stations. Conclusion: The RMSE of the standard was very small indicating that BRM is a reliable method of setting standard for OSCE, which has the advantage of providing data for quality assurance.
Postharvest monitoring of organic potato (cv. Anuschka) during hot-air drying using visible-NIR hyperspectral imaging.

Science.gov (United States)

Moscetti, Roberto; Sturm, Barbara; Crichton, Stuart Oj; Amjad, Waseem; Massantini, Riccardo

2018-05-01

The potential of hyperspectral imaging (500-1010 nm) was evaluated for monitoring of the quality of potato slices (var. Anuschka) of 5, 7 and 9 mm thickness subjected to air drying at 50 °C. The study investigated three different feature selection methods for the prediction of dry basis moisture content and colour of potato slices using partial least squares regression (PLS). The feature selection strategies tested include interval PLS regression (iPLS), and differences and ratios between raw reflectance values for each possible pair of wavelengths (R[λ 1 ]-R[λ 2 ] and R[λ 1 ]:R[λ 2 ], respectively). Moreover, the combination of spectral and spatial domains was tested. Excellent results were obtained using the iPLS algorithm. However, features from both datasets of raw reflectance differences and ratios represent suitable alternatives for development of low-complex prediction models. Finally, the dry basis moisture content was high accurately predicted by combining spectral data (i.e. R[511 nm]-R[994 nm]) and spatial domain (i.e. relative area shrinkage of slice). Modelling the data acquired during drying through hyperspectral imaging can provide useful information concerning the chemical and physicochemical changes of the product. With all this information, the proposed approach lays the foundations for a more efficient smart dryer that can be designed and its process optimized for drying of potato slices. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.
Linear regression in astronomy. II

Science.gov (United States)

Feigelson, Eric D.; Babu, Gutti J.

1992-01-01

A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.
A validated near-infrared spectroscopic method for methanol detection in biodiesel

Science.gov (United States)

Paul, Andrea; Bräuer, Bastian; Nieuwenkamp, Gerard; Ent, Hugo; Bremser, Wolfram

2016-06-01

Biodiesel quality control is a relevant issue as biodiesel properties influence diesel engine performance and integrity. Within the European metrology research program (EMRP) ENG09 project ‘Metrology for Biofuels’, an on-line/at-site suitable near-infrared spectroscopy (NIRS) method has been developed in parallel with an improved EN14110 headspace gas chromatography (GC) analysis method for methanol in biodiesel. Both methods have been optimized for a methanol content of 0.2 mass% as this represents the maximum limit of methanol content in FAME according to EN 14214:2009. The NIRS method is based on a mobile NIR spectrometer equipped with a fiber-optic coupled probe. Due to the high volatility of methanol, a tailored air-tight adaptor was constructed to prevent methanol evaporation during measurement. The methanol content of biodiesel was determined from evaluation of NIRS spectra by partial least squares regression (PLS). Both GC analysis and NIRS exhibited a significant dependence on biodiesel feedstock. The NIRS method is applicable to a content range of 0.1% (m/m) to 0.4% (m/m) of methanol with uncertainties at around 6% relative for the different feedstocks. A direct comparison of headspace GC and NIRS for samples of FAMEs yielded that the results of both methods are fully compatible within their stated uncertainties.
A validated near-infrared spectroscopic method for methanol detection in biodiesel

International Nuclear Information System (INIS)

Paul, Andrea; Bräuer, Bastian; Bremser, Wolfram; Nieuwenkamp, Gerard; Ent, Hugo

2016-01-01

Biodiesel quality control is a relevant issue as biodiesel properties influence diesel engine performance and integrity. Within the European metrology research program (EMRP) ENG09 project ‘Metrology for Biofuels’, an on-line/at-site suitable near-infrared spectroscopy (NIRS) method has been developed in parallel with an improved EN14110 headspace gas chromatography (GC) analysis method for methanol in biodiesel. Both methods have been optimized for a methanol content of 0.2 mass% as this represents the maximum limit of methanol content in FAME according to EN 14214:2009. The NIRS method is based on a mobile NIR spectrometer equipped with a fiber-optic coupled probe. Due to the high volatility of methanol, a tailored air-tight adaptor was constructed to prevent methanol evaporation during measurement. The methanol content of biodiesel was determined from evaluation of NIRS spectra by partial least squares regression (PLS). Both GC analysis and NIRS exhibited a significant dependence on biodiesel feedstock. The NIRS method is applicable to a content range of 0.1% (m/m) to 0.4% (m/m) of methanol with uncertainties at around 6% relative for the different feedstocks. A direct comparison of headspace GC and NIRS for samples of FAMEs yielded that the results of both methods are fully compatible within their stated uncertainties. (paper)
Regression Analysis by Example. 5th Edition

Science.gov (United States)

Chatterjee, Samprit; Hadi, Ali S.

2012-01-01

Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. "Regression Analysis by Example, Fifth Edition" has been expanded and thoroughly…
Modelling infant mortality rate in Central Java, Indonesia use generalized poisson regression method

Science.gov (United States)

Prahutama, Alan; Sudarno

2018-05-01

The infant mortality rate is the number of deaths under one year of age occurring among the live births in a given geographical area during a given year, per 1,000 live births occurring among the population of the given geographical area during the same year. This problem needs to be addressed because it is an important element of a country’s economic development. High infant mortality rate will disrupt the stability of a country as it relates to the sustainability of the population in the country. One of regression model that can be used to analyze the relationship between dependent variable Y in the form of discrete data and independent variable X is Poisson regression model. Recently The regression modeling used for data with dependent variable is discrete, among others, poisson regression, negative binomial regression and generalized poisson regression. In this research, generalized poisson regression modeling gives better AIC value than poisson regression. The most significant variable is the Number of health facilities (X1), while the variable that gives the most influence to infant mortality rate is the average breastfeeding (X9).
Determination of alcohol and extract concentration in beer samples using a combined method of near-infrared (NIR) spectroscopy and refractometry.

Science.gov (United States)

Castritius, Stefan; Kron, Alexander; Schäfer, Thomas; Rädle, Matthias; Harms, Diedrich

2010-12-22

A new approach of combination of near-infrared (NIR) spectroscopy and refractometry was developed in this work to determine the concentration of alcohol and real extract in various beer samples. A partial least-squares (PLS) regression, as multivariate calibration method, was used to evaluate the correlation between the data of spectroscopy/refractometry and alcohol/extract concentration. This multivariate combination of spectroscopy and refractometry enhanced the precision in the determination of alcohol, compared to single spectroscopy measurements, due to the effect of high extract concentration on the spectral data, especially of nonalcoholic beer samples. For NIR calibration, two mathematical pretreatments (first-order derivation and linear baseline correction) were applied to eliminate light scattering effects. A sample grouping of the refractometry data was also applied to increase the accuracy of the determined concentration. The root mean squared errors of validation (RMSEV) of the validation process concerning alcohol and extract concentration were 0.23 Mas% (method A), 0.12 Mas% (method B), and 0.19 Mas% (method C) and 0.11 Mas% (method A), 0.11 Mas% (method B), and 0.11 Mas% (method C), respectively.

Logistic regression applied to natural hazards: rare event logistic regression with replications

Science.gov (United States)

Guns, M.; Vanacker, V.

2012-06-01

Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.
Quantitative measurements of fly ash, slag, and cement in limestone-based blends by Fourier transform infrared-attenuated total reflectance method

International Nuclear Information System (INIS)

Rebagay, T.V.; Dodd, D.A.; Claghorn, R.D.; Voogd, J.A.

1991-02-01

The disposal of the low-level radioactive liquids involves mixing the liquid waste with pozzolanic blend to form grout. Since the long-term performance of the grout depends on the composition of the blend, a rapid and reliable quantitative method to monitor blend compositions is needed. Earlier studies by Westinghouse Hanford Company demonstrated the utility of a Fourier transform infrared-attenuated total reflectance method for the analysis of cement blends. A sequential spectral subtraction technique was used to analyze the blend; however, its reproducibility depends on the operator's skill to perform spectral subtractions. A partial-least-squares (PLS) algorithm has replaced spectral subtraction. The PLS method is a statistical quantitative method suitable for analysis of multicomponent systems. Calibration blends are prepared by mixing the blend components in various proportions following a carefully designed calibration model. For the model, limestone content ranges from 30-50 wt%; blast furnace slag from 18-38 wt%; fly ash from 18-38 wt%; and cement from 0-16 wt%. Use of the large concentration range will enhance the chance that the calibration will be useful when target concentration change. The ability of the PLS method to predict limestone, slag, fly ash, and cement values in test blends was assessed. The prediction step of the PLS algorithm required only a few seconds to analyze the test spectra. The best and worst results for each component of the blends calculated by this method are shown in tables. The standard error of prediction of the true value is <2 wt% for limestone, <4 wt% for both fly ash and blast furnace slag, and <10 wt% for cement. 2 refs., 8 figs., 7 tabs
Testing discontinuities in nonparametric regression

KAUST Repository

Dai, Wenlin

2017-01-19

In nonparametric regression, it is often needed to detect whether there are jump discontinuities in the mean function. In this paper, we revisit the difference-based method in [13 H.-G. Müller and U. Stadtmüller, Discontinuous versus smooth regression, Ann. Stat. 27 (1999), pp. 299–337. doi: 10.1214/aos/1018031100
Testing discontinuities in nonparametric regression

KAUST Repository

Dai, Wenlin; Zhou, Yuejin; Tong, Tiejun

2017-01-01

In nonparametric regression, it is often needed to detect whether there are jump discontinuities in the mean function. In this paper, we revisit the difference-based method in [13 H.-G. Müller and U. Stadtmüller, Discontinuous versus smooth regression, Ann. Stat. 27 (1999), pp. 299–337. doi: 10.1214/aos/1018031100
Estimating HIES Data through Ratio and Regression Methods for Different Sampling Designs

Directory of Open Access Journals (Sweden)

Faqir Muhammad

2007-01-01

Full Text Available In this study, comparison has been made for different sampling designs, using the HIES data of North West Frontier Province (NWFP for 2001-02 and 1998-99 collected from the Federal Bureau of Statistics, Statistical Division, Government of Pakistan, Islamabad. The performance of the estimators has also been considered using bootstrap and Jacknife. A two-stage stratified random sample design is adopted by HIES. In the first stage, enumeration blocks and villages are treated as the first stage Primary Sampling Units (PSU. The sample PSU’s are selected with probability proportional to size. Secondary Sampling Units (SSU i.e., households are selected by systematic sampling with a random start. They have used a single study variable. We have compared the HIES technique with some other designs, which are: Stratified Simple Random Sampling. Stratified Systematic Sampling. Stratified Ranked Set Sampling. Stratified Two Phase Sampling. Ratio and Regression methods were applied with two study variables, which are: Income (y and Household sizes (x. Jacknife and Bootstrap are used for variance replication. Simple Random Sampling with sample size (462 to 561 gave moderate variances both by Jacknife and Bootstrap. By applying Systematic Sampling, we received moderate variance with sample size (467. In Jacknife with Systematic Sampling, we obtained variance of regression estimator greater than that of ratio estimator for a sample size (467 to 631. At a sample size (952 variance of ratio estimator gets greater than that of regression estimator. The most efficient design comes out to be Ranked set sampling compared with other designs. The Ranked set sampling with jackknife and bootstrap, gives minimum variance even with the smallest sample size (467. Two Phase sampling gave poor performance. Multi-stage sampling applied by HIES gave large variances especially if used with a single study variable.
Logistic Regression and Path Analysis Method to Analyze Factors influencing Students’ Achievement

Science.gov (United States)

Noeryanti, N.; Suryowati, K.; Setyawan, Y.; Aulia, R. R.

2018-04-01

Students' academic achievement cannot be separated from the influence of two factors namely internal and external factors. The first factors of the student (internal factors) consist of intelligence (X1), health (X2), interest (X3), and motivation of students (X4). The external factors consist of family environment (X5), school environment (X6), and society environment (X7). The objects of this research are eighth grade students of the school year 2016/2017 at SMPN 1 Jiwan Madiun sampled by using simple random sampling. Primary data are obtained by distributing questionnaires. The method used in this study is binary logistic regression analysis that aims to identify internal and external factors that affect student’s achievement and how the trends of them. Path Analysis was used to determine the factors that influence directly, indirectly or totally on student’s achievement. Based on the results of binary logistic regression, variables that affect student’s achievement are interest and motivation. And based on the results obtained by path analysis, factors that have a direct impact on student’s achievement are students’ interest (59%) and students’ motivation (27%). While the factors that have indirect influences on students’ achievement, are family environment (97%) and school environment (37).
QSAR Study of Insecticides of Phthalamide Derivatives Using Multiple Linear Regression and Artificial Neural Network Methods

Directory of Open Access Journals (Sweden)

Adi Syahputra

2014-03-01

Full Text Available Quantitative structure activity relationship (QSAR for 21 insecticides of phthalamides containing hydrazone (PCH was studied using multiple linear regression (MLR, principle component regression (PCR and artificial neural network (ANN. Five descriptors were included in the model for MLR and ANN analysis, and five latent variables obtained from principle component analysis (PCA were used in PCR analysis. Calculation of descriptors was performed using semi-empirical PM6 method. ANN analysis was found to be superior statistical technique compared to the other methods and gave a good correlation between descriptors and activity (r2 = 0.84. Based on the obtained model, we have successfully designed some new insecticides with higher predicted activity than those of previously synthesized compounds, e.g.2-(decalinecarbamoyl-5-chloro-N’-((5-methylthiophen-2-ylmethylene benzohydrazide, 2-(decalinecarbamoyl-5-chloro-N’-((thiophen-2-yl-methylene benzohydrazide and 2-(decaline carbamoyl-N’-(4-fluorobenzylidene-5-chlorobenzohydrazide with predicted log LC50 of 1.640, 1.672, and 1.769 respectively.
Improving ASTER GDEM Accuracy Using Land Use-Based Linear Regression Methods: A Case Study of Lianyungang, East China

Directory of Open Access Journals (Sweden)

Xiaoyan Yang

2018-04-01

Full Text Available The Advanced Spaceborne Thermal-Emission and Reflection Radiometer Global Digital Elevation Model (ASTER GDEM is important to a wide range of geographical and environmental studies. Its accuracy, to some extent associated with land-use types reflecting topography, vegetation coverage, and human activities, impacts the results and conclusions of these studies. In order to improve the accuracy of ASTER GDEM prior to its application, we investigated ASTER GDEM errors based on individual land-use types and proposed two linear regression calibration methods, one considering only land use-specific errors and the other considering the impact of both land-use and topography. Our calibration methods were tested on the coastal prefectural city of Lianyungang in eastern China. Results indicate that (1 ASTER GDEM is highly accurate for rice, wheat, grass and mining lands but less accurate for scenic, garden, wood and bare lands; (2 despite improvements in ASTER GDEM2 accuracy, multiple linear regression calibration requires more data (topography and a relatively complex calibration process; (3 simple linear regression calibration proves a practicable and simplified means to systematically investigate and improve the impact of land-use on ASTER GDEM accuracy. Our method is applicable to areas with detailed land-use data based on highly accurate field-based point-elevation measurements.
Meta-Modeling by Symbolic Regression and Pareto Simulated Annealing

NARCIS (Netherlands)

Stinstra, E.; Rennen, G.; Teeuwen, G.J.A.

2006-01-01

The subject of this paper is a new approach to Symbolic Regression.Other publications on Symbolic Regression use Genetic Programming.This paper describes an alternative method based on Pareto Simulated Annealing.Our method is based on linear regression for the estimation of constants.Interval
A dynamic particle filter-support vector regression method for reliability prediction

International Nuclear Information System (INIS)

Wei, Zhao; Tao, Tao; ZhuoShu, Ding; Zio, Enrico

2013-01-01

Support vector regression (SVR) has been applied to time series prediction and some works have demonstrated the feasibility of its use to forecast system reliability. For accuracy of reliability forecasting, the selection of SVR's parameters is important. The existing research works on SVR's parameters selection divide the example dataset into training and test subsets, and tune the parameters on the training data. However, these fixed parameters can lead to poor prediction capabilities if the data of the test subset differ significantly from those of training. Differently, the novel method proposed in this paper uses particle filtering to estimate the SVR model parameters according to the whole measurement sequence up to the last observation instance. By treating the SVR training model as the observation equation of a particle filter, our method allows updating the SVR model parameters dynamically when a new observation comes. Because of the adaptability of the parameters to dynamic data pattern, the new PF–SVR method has superior prediction performance over that of standard SVR. Four application results show that PF–SVR is more robust than SVR to the decrease of the number of training data and the change of initial SVR parameter values. Also, even if there are trends in the test data different from those in the training data, the method can capture the changes, correct the SVR parameters and obtain good predictions. -- Highlights: •A dynamic PF–SVR method is proposed to predict the system reliability. •The method can adjust the SVR parameters according to the change of data. •The method is robust to the size of training data and initial parameter values. •Some cases based on both artificial and real data are studied. •PF–SVR shows superior prediction performance over standard SVR
Prediction of unwanted pregnancies using logistic regression, probit regression and discriminant analysis.

Science.gov (United States)

Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon

2015-01-01

Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended.
Short term load forecasting technique based on the seasonal exponential adjustment method and the regression model

International Nuclear Information System (INIS)

Wu, Jie; Wang, Jianzhou; Lu, Haiyan; Dong, Yao; Lu, Xiaoxiao

2013-01-01

Highlights: ► The seasonal and trend items of the data series are forecasted separately. ► Seasonal item in the data series is verified by the Kendall τ correlation testing. ► Different regression models are applied to the trend item forecasting. ► We examine the superiority of the combined models by the quartile value comparison. ► Paired-sample T test is utilized to confirm the superiority of the combined models. - Abstract: For an energy-limited economy system, it is crucial to forecast load demand accurately. This paper devotes to 1-week-ahead daily load forecasting approach in which load demand series are predicted by employing the information of days before being similar to that of the forecast day. As well as in many nonlinear systems, seasonal item and trend item are coexisting in load demand datasets. In this paper, the existing of the seasonal item in the load demand data series is firstly verified according to the Kendall τ correlation testing method. Then in the belief of the separate forecasting to the seasonal item and the trend item would improve the forecasting accuracy, hybrid models by combining seasonal exponential adjustment method (SEAM) with the regression methods are proposed in this paper, where SEAM and the regression models are employed to seasonal and trend items forecasting respectively. Comparisons of the quartile values as well as the mean absolute percentage error values demonstrate this forecasting technique can significantly improve the accuracy though models applied to the trend item forecasting are eleven different ones. This superior performance of this separate forecasting technique is further confirmed by the paired-sample T tests
Multivariate estimation of the limit of detection by orthogonal partial least squares in temperature-modulated MOX sensors.

Science.gov (United States)

Burgués, Javier; Marco, Santiago

2018-08-17

Metal oxide semiconductor (MOX) sensors are usually temperature-modulated and calibrated with multivariate models such as partial least squares (PLS) to increase the inherent low selectivity of this technology. The multivariate sensor response patterns exhibit heteroscedastic and correlated noise, which suggests that maximum likelihood methods should outperform PLS. One contribution of this paper is the comparison between PLS and maximum likelihood principal components regression (MLPCR) in MOX sensors. PLS is often criticized by the lack of interpretability when the model complexity increases beyond the chemical rank of the problem. This happens in MOX sensors due to cross-sensitivities to interferences, such as temperature or humidity and non-linearity. Additionally, the estimation of fundamental figures of merit, such as the limit of detection (LOD), is still not standardized in multivariate models. Orthogonalization methods, such as orthogonal projection to latent structures (O-PLS), have been successfully applied in other fields to reduce the complexity of PLS models. In this work, we propose a LOD estimation method based on applying the well-accepted univariate LOD formulas to the scores of the first component of an orthogonal PLS model. The resulting LOD is compared to the multivariate LOD range derived from error-propagation. The methodology is applied to data extracted from temperature-modulated MOX sensors (FIS SB-500-12 and Figaro TGS 3870-A04), aiming at the detection of low concentrations of carbon monoxide in the presence of uncontrolled humidity (chemical noise). We found that PLS models were simpler and more accurate than MLPCR models. Average LOD values of 0.79 ppm (FIS) and 1.06 ppm (Figaro) were found using the approach described in this paper. These values were contained within the LOD ranges obtained with the error-propagation approach. The mean LOD increased to 1.13 ppm (FIS) and 1.59 ppm (Figaro) when considering validation samples
Multiple regression and beyond an introduction to multiple regression and structural equation modeling

CERN Document Server

Keith, Timothy Z

2014-01-01

Multiple Regression and Beyond offers a conceptually oriented introduction to multiple regression (MR) analysis and structural equation modeling (SEM), along with analyses that flow naturally from those methods. By focusing on the concepts and purposes of MR and related methods, rather than the derivation and calculation of formulae, this book introduces material to students more clearly, and in a less threatening way. In addition to illuminating content necessary for coursework, the accessibility of this approach means students are more likely to be able to conduct research using MR or SEM--and more likely to use the methods wisely. Covers both MR and SEM, while explaining their relevance to one another Also includes path analysis, confirmatory factor analysis, and latent growth modeling Figures and tables throughout provide examples and illustrate key concepts and techniques For additional resources, please visit: http://tzkeith.com/.
Short-term load forecasting with increment regression tree

Energy Technology Data Exchange (ETDEWEB)

Yang, Jingfei; Stenzel, Juergen [Darmstadt University of Techonology, Darmstadt 64283 (Germany)

2006-06-15

This paper presents a new regression tree method for short-term load forecasting. Both increment and non-increment tree are built according to the historical data to provide the data space partition and input variable selection. Support vector machine is employed to the samples of regression tree nodes for further fine regression. Results of different tree nodes are integrated through weighted average method to obtain the comprehensive forecasting result. The effectiveness of the proposed method is demonstrated through its application to an actual system. (author)
Improvement of the thermo-mechanical position stability of the beam position monitor in the PLS-II

Science.gov (United States)

Ha, Taekyun; Hong, Mansu; Kwon, Hyuckchae; Han, Hongsik; Park, Chongdo

2016-09-01

In the storage ring of the Pohang Light Source-II (PLS-II), we reduced the mechanical displacement of the electron-beam position monitors (e-BPMs) that is caused by heating during e-beam storage. The BPM pickup itself must be kept stable to sub-micrometer precision in order for a stable photon beam to be provided to beamlines because the orbit feedback system is programmed to make the electron beam pass through the center of the BPM. Thermal deformation of the vacuum chambers on which the BPM pickups are mounted is inevitable when the electron beam current is changed by an unintended beam abort. We reduced this deformation by improving the vacuum chamber support and by enhancing the water cooling. We report a thermo-mechanical analysis and displacement measurements for the BPM pickups after improvements.
Regression models of reactor diagnostic signals

International Nuclear Information System (INIS)

Vavrin, J.

1989-01-01

The application is described of an autoregression model as the simplest regression model of diagnostic signals in experimental analysis of diagnostic systems, in in-service monitoring of normal and anomalous conditions and their diagnostics. The method of diagnostics is described using a regression type diagnostic data base and regression spectral diagnostics. The diagnostics is described of neutron noise signals from anomalous modes in the experimental fuel assembly of a reactor. (author)
Logistic regression applied to natural hazards: rare event logistic regression with replications

Directory of Open Access Journals (Sweden)

M. Guns

2012-06-01

Full Text Available Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.
Predicting Charging Time of Battery Electric Vehicles Based on Regression and Time-Series Methods: A Case Study of Beijing

Directory of Open Access Journals (Sweden)

Jun Bi

2018-04-01

Full Text Available Battery electric vehicles (BEVs reduce energy consumption and air pollution as compared with conventional vehicles. However, the limited driving range and potential long charging time of BEVs create new problems. Accurate charging time prediction of BEVs helps drivers determine travel plans and alleviate their range anxiety during trips. This study proposed a combined model for charging time prediction based on regression and time-series methods according to the actual data from BEVs operating in Beijing, China. After data analysis, a regression model was established by considering the charged amount for charging time prediction. Furthermore, a time-series method was adopted to calibrate the regression model, which significantly improved the fitting accuracy of the model. The parameters of the model were determined by using the actual data. Verification results confirmed the accuracy of the model and showed that the model errors were small. The proposed model can accurately depict the charging time characteristics of BEVs in Beijing.
Impact of regression methods on improved effects of soil structure on soil water retention estimates

Science.gov (United States)

Nguyen, Phuong Minh; De Pue, Jan; Le, Khoa Van; Cornelis, Wim

2015-06-01

Increasing the accuracy of pedotransfer functions (PTFs), an indirect method for predicting non-readily available soil features such as soil water retention characteristics (SWRC), is of crucial importance for large scale agro-hydrological modeling. Adding significant predictors (i.e., soil structure), and implementing more flexible regression algorithms are among the main strategies of PTFs improvement. The aim of this study was to investigate whether the improved effect of categorical soil structure information on estimating soil-water content at various matric potentials, which has been reported in literature, could be enduringly captured by regression techniques other than the usually applied linear regression. Two data mining techniques, i.e., Support Vector Machines (SVM), and k-Nearest Neighbors (kNN), which have been recently introduced as promising tools for PTF development, were utilized to test if the incorporation of soil structure will improve PTF's accuracy under a context of rather limited training data. The results show that incorporating descriptive soil structure information, i.e., massive, structured and structureless, as grouping criterion can improve the accuracy of PTFs derived by SVM approach in the range of matric potential of -6 to -33 kPa (average RMSE decreased up to 0.005 m3 m-3 after grouping, depending on matric potentials). The improvement was primarily attributed to the outperformance of SVM-PTFs calibrated on structureless soils. No improvement was obtained with kNN technique, at least not in our study in which the data set became limited in size after grouping. Since there is an impact of regression techniques on the improved effect of incorporating qualitative soil structure information, selecting a proper technique will help to maximize the combined influence of flexible regression algorithms and soil structure information on PTF accuracy.

Logistic regression models

CERN Document Server

Hilbe, Joseph M

2009-01-01

This book really does cover everything you ever wanted to know about logistic regression … with updates available on the author's website. Hilbe, a former national athletics champion, philosopher, and expert in astronomy, is a master at explaining statistical concepts and methods. Readers familiar with his other expository work will know what to expect-great clarity.The book provides considerable detail about all facets of logistic regression. No step of an argument is omitted so that the book will meet the needs of the reader who likes to see everything spelt out, while a person familiar with some of the topics has the option to skip "obvious" sections. The material has been thoroughly road-tested through classroom and web-based teaching. … The focus is on helping the reader to learn and understand logistic regression. The audience is not just students meeting the topic for the first time, but also experienced users. I believe the book really does meet the author's goal … .-Annette J. Dobson, Biometric...
Retro-regression--another important multivariate regression improvement.

Science.gov (United States)

Randić, M

2001-01-01

We review the serious problem associated with instabilities of the coefficients of regression equations, referred to as the MRA (multivariate regression analysis) "nightmare of the first kind". This is manifested when in a stepwise regression a descriptor is included or excluded from a regression. The consequence is an unpredictable change of the coefficients of the descriptors that remain in the regression equation. We follow with consideration of an even more serious problem, referred to as the MRA "nightmare of the second kind", arising when optimal descriptors are selected from a large pool of descriptors. This process typically causes at different steps of the stepwise regression a replacement of several previously used descriptors by new ones. We describe a procedure that resolves these difficulties. The approach is illustrated on boiling points of nonanes which are considered (1) by using an ordered connectivity basis; (2) by using an ordering resulting from application of greedy algorithm; and (3) by using an ordering derived from an exhaustive search for optimal descriptors. A novel variant of multiple regression analysis, called retro-regression (RR), is outlined showing how it resolves the ambiguities associated with both "nightmares" of the first and the second kind of MRA.
Modified Regression Correlation Coefficient for Poisson Regression Model

Science.gov (United States)

Kaengthong, Nattacha; Domthong, Uthumporn

2017-09-01

This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).
Rapid discrimination of geographical origin and evaluation of antioxidant activity of Salvia miltiorrhiza var. alba by Fourier transform near infrared spectroscopy.

Science.gov (United States)

Duan, Xiaoju; Zhang, Danlu; Nie, Lei; Zang, Hengchang

2014-03-25

Radix Salvia miltiorrhiza Bge. var. alba C.Y. Wu and H.W. Li and Radix S. miltrorrhiza belong to the same genus. S. miltiorrhiza var. alba has a unique effectiveness for thromboangiitis besides therapeutical efficay of S. miltrorrhiza. It exhibits antioxidant activity (AA), while its quality and efficacy also vary with geographic locations. Therefore, a rapid and nondestructive method based on Fourier transform near infrared spectroscopy (FT-NIRS) was developed for discrimination of geographical origin and evaluation of AA of S. miltiorrhiza var. alba. The discrimination of geographical origin was achieved by using discriminant analysis and the accuracy was 100%. Partial least squares (PLS) regression was employed to establish the model for evaluation of AA by NIRS. The spectral regions were selected by interval PLS (i-PLS) method. Different pre-treated methods were compared for the spectral pre-processing. The final optimal results of PLS model showed that correlation coefficients in the calibration set (Rc) and the prediction set (Rp), root mean square error of prediction (RMSEP) and residual prediction deviation (RPD) were 0.974, 0.950, 0.163 mg mL(-1) and 2.66, respectively. The results demonstrated that NIRs combined with chemometric methods could be a rapid and nondestructive tool to discriminate geographical origin and evaluate AA of S. miltiorrhiza var. alba. The developed NIRS method might have a potential application to high-throughput screening of a great number of raw S. miltiorrhiza var. alba samples for AA. Copyright © 2013 Elsevier B.V. All rights reserved.
Whole-genome regression and prediction methods applied to plant and animal breeding

NARCIS (Netherlands)

Los Campos, De G.; Hickey, J.M.; Pong-Wong, R.; Daetwyler, H.D.; Calus, M.P.L.

2013-01-01

Genomic-enabled prediction is becoming increasingly important in animal and plant breeding, and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of
Study on feasibility of determination of glucosamine content of fermentation process using a micro NIR spectrometer.

Science.gov (United States)

Sun, Zhongyu; Li, Can; Li, Lian; Nie, Lei; Dong, Qin; Li, Danyang; Gao, Lingling; Zang, Hengchang

2018-08-05

N-acetyl-d-glucosamine (GlcNAc) is a microbial fermentation product, and NIR spectroscopy is an effective process analytical technology (PAT) tool in detecting the key quality attribute: the GlcNAc content. Meanwhile, the design of NIR spectrometers is under the trend of miniaturization, portability and low-cost nowadays. The aim of this study was to explore a portable micro NIR spectrometer with the fermentation process. First, FT-NIR spectrometer and Micro-NIR 1700 spectrometer were compared with simulated fermentation process solutions. The R c 2 , R p 2 , RMSECV and RMSEP of the optimal FT-NIR and Micro-NIR 1700 models were 0.999, 0.999, 3.226 g/L, 1.388 g/L and 0.999, 0.999, 1.821 g/L, 0.967 g/L. Passing-Bablok regression method and paired t-test results showed there were no significant differences between the two instruments. Then the Micro-NIR 1700 was selected for the practical fermentation process, 135 samples from 10 batches were collected. Spectral pretreatment methods and variables selection methods (BiPLS, FiPLS, MWPLS and CARS-PLS) for PLS modeling were discussed. The R c 2 , R p 2 , RMSECV and RMSEP of the optimal GlcNAc content PLS model of the practical fermentation process were 0.994, 0.995, 2.792 g/L and 1.946 g/L. The results have a positive reference for application of the Micro-NIR spectrometer. To some extent, it could provide theoretical supports in guiding the microbial fermentation or the further assessment of bioprocess. Copyright © 2018. Published by Elsevier B.V.
Comparative ANNs with Different Input Layers and GA-PLS Study for Simultaneous Spectrofluorimetric Determination of Melatonin and Pyridoxine HCl in the Presence of Melatonin’s Main Impurity

Directory of Open Access Journals (Sweden)

Amer M. Alanazi

2013-01-01

Full Text Available Melatonin (MLT has many health implications, therefore it is important to develop specific analytical methods for the determination of MLT in the presence of its main impurity, N-{2-[1-({3-[2-(acetylaminoethyl]-5-methoxy-1H-indol-2-yl}methyl-5-methoxy-1H-indol-3-yl]ethyl}acetaamide (DMLT and pyridoxine HCl (PNH as a co-formulated drug. This work describes simple, sensitive, and reliable four multivariate calibration methods, namely artificial neural network preceded by genetic algorithm (GA-ANN, principal component analysis (PCA-ANN and wavelet transform procedures (WT-ANN as well as partial least squares preceded by genetic algorithm (GA-PLS for the spectrofluorimetric determination of MLT and PNH in the presence of DMLT. Analytical performance of the proposed methods was statistically validated with respect to linearity, accuracy, precision and specificity. The proposed methods were successfully applied for the assay of MLT in laboratory prepared mixtures containing up to 15% of DMLT and in commercial MLT tablets with recoveries of no less than 99.00%. No interference was observed from common pharmaceutical additives and the results compared favorably with those obtained by a reference method.
Semisupervised Clustering by Iterative Partition and Regression with Neuroscience Applications

Directory of Open Access Journals (Sweden)

Guoqi Qian

2016-01-01

Full Text Available Regression clustering is a mixture of unsupervised and supervised statistical learning and data mining method which is found in a wide range of applications including artificial intelligence and neuroscience. It performs unsupervised learning when it clusters the data according to their respective unobserved regression hyperplanes. The method also performs supervised learning when it fits regression hyperplanes to the corresponding data clusters. Applying regression clustering in practice requires means of determining the underlying number of clusters in the data, finding the cluster label of each data point, and estimating the regression coefficients of the model. In this paper, we review the estimation and selection issues in regression clustering with regard to the least squares and robust statistical methods. We also provide a model selection based technique to determine the number of regression clusters underlying the data. We further develop a computing procedure for regression clustering estimation and selection. Finally, simulation studies are presented for assessing the procedure, together with analyzing a real data set on RGB cell marking in neuroscience to illustrate and interpret the method.
Quantile Regression With Measurement Error

KAUST Repository

Wei, Ying

2009-08-27

Regression quantiles can be substantially biased when the covariates are measured with error. In this paper we propose a new method that produces consistent linear quantile estimation in the presence of covariate measurement error. The method corrects the measurement error induced bias by constructing joint estimating equations that simultaneously hold for all the quantile levels. An iterative EM-type estimation algorithm to obtain the solutions to such joint estimation equations is provided. The finite sample performance of the proposed method is investigated in a simulation study, and compared to the standard regression calibration approach. Finally, we apply our methodology to part of the National Collaborative Perinatal Project growth data, a longitudinal study with an unusual measurement error structure. © 2009 American Statistical Association.
Introduction to the use of regression models in epidemiology.

Science.gov (United States)

Bender, Ralf

2009-01-01

Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.
Determination of vegetable oils and fats adulterants in diesel oil by high performance liquid chromatography and multivariate methods.

Science.gov (United States)

Brandão, Luiz Filipe Paiva; Braga, Jez Willian Batista; Suarez, Paulo Anselmo Ziani

2012-02-17

The current legislation requires the mandatory addition of biodiesel to all Brazilian road diesel oil A (pure diesel) marketed in the country and bans the addition of vegetable oils for this type of diesel. However, cases of irregular addition of vegetable oils directly to the diesel oil may occur, mainly due to the lower cost of these raw materials compared to the final product, biodiesel. In Brazil, the situation is even more critical once the country is one of the largest producers of oleaginous products in the world, especially soybean, and also it has an extensive road network dependent on diesel. Therefore, alternatives to control the quality of diesel have become increasingly necessary. This study proposes an analytical methodology for quality control of diesel with intention to identify and determine adulterations of oils and even fats of vegetable origin. This methodology is based on detection, identification and quantification of triacylglycerols on diesel (main constituents of vegetable oils and fats) by high performance liquid chromatography in reversed phase with UV detection at 205nm associated with multivariate methods. Six different types of oils and fats were studied (soybean, frying oil, corn, cotton, palm oil and babassu) and two methods were developed for data analysis. The first one, based on principal component analysis (PCA), nearest neighbor classification (KNN) and univariate regression, was used for samples adulterated with a single type of oil or fat. In the second method, partial least square regression (PLS) was used for the cases where the adulterants were mixtures of up to three types of oils or fats. In the first method, the techniques of PCA and KNN were correctly classified as 17 out of 18 validation samples on the type of oil or fat present. The concentrations estimated for adulterants showed good agreement with the reference values, with mean errors of prediction (RMSEP) ranging between 0.10 and 0.22% (v/v). The PLS method was
Partial least squares path modeling basic concepts, methodological issues and applications

CERN Document Server

Noonan, Richard

2017-01-01

This edited book presents the recent developments in partial least squares-path modeling (PLS-PM) and provides a comprehensive overview of the current state of the most advanced research related to PLS-PM. The first section of this book emphasizes the basic concepts and extensions of the PLS-PM method. The second section discusses the methodological issues that are the focus of the recent development of the PLS-PM method. The third part discusses the real world application of the PLS-PM method in various disciplines. The contributions from expert authors in the field of PLS focus on topics such as the factor-based PLS-PM, the perfect match between a model and a mode, quantile composite-based path modeling (QC-PM), ordinal consistent partial least squares (OrdPLSc), non-symmetrical composite-based path modeling (NSCPM), modern view for mediation analysis in PLS-PM, a multi-method approach for identifying and treating unobserved heterogeneity, multigroup analysis (PLS-MGA), the assessment of the common method b...
A comparative study on generating simulated Landsat NDVI images using data fusion and regression method-the case of the Korean Peninsula.

Science.gov (United States)

Lee, Mi Hee; Lee, Soo Bong; Eo, Yang Dam; Kim, Sun Woong; Woo, Jung-Hun; Han, Soo Hee

2017-07-01

Landsat optical images have enough spatial and spectral resolution to analyze vegetation growth characteristics. But, the clouds and water vapor degrade the image quality quite often, which limits the availability of usable images for the time series vegetation vitality measurement. To overcome this shortcoming, simulated images are used as an alternative. In this study, weighted average method, spatial and temporal adaptive reflectance fusion model (STARFM) method, and multilinear regression analysis method have been tested to produce simulated Landsat normalized difference vegetation index (NDVI) images of the Korean Peninsula. The test results showed that the weighted average method produced the images most similar to the actual images, provided that the images were available within 1 month before and after the target date. The STARFM method gives good results when the input image date is close to the target date. Careful regional and seasonal consideration is required in selecting input images. During summer season, due to clouds, it is very difficult to get the images close enough to the target date. Multilinear regression analysis gives meaningful results even when the input image date is not so close to the target date. Average R 2 values for weighted average method, STARFM, and multilinear regression analysis were 0.741, 0.70, and 0.61, respectively.
Forecasting exchange rates: a robust regression approach

OpenAIRE

Preminger, Arie; Franck, Raphael

2005-01-01

The least squares estimation method as well as other ordinary estimation method for regression models can be severely affected by a small number of outliers, thus providing poor out-of-sample forecasts. This paper suggests a robust regression approach, based on the S-estimation method, to construct forecasting models that are less sensitive to data contamination by outliers. A robust linear autoregressive (RAR) and a robust neural network (RNN) models are estimated to study the predictabil...
Classification and quantitation of milk powder by near-infrared spectroscopy and mutual information-based variable selection and partial least squares

Science.gov (United States)

Chen, Hui; Tan, Chao; Lin, Zan; Wu, Tong

2018-01-01

Milk is among the most popular nutrient source worldwide, which is of great interest due to its beneficial medicinal properties. The feasibility of the classification of milk powder samples with respect to their brands and the determination of protein concentration is investigated by NIR spectroscopy along with chemometrics. Two datasets were prepared for experiment. One contains 179 samples of four brands for classification and the other contains 30 samples for quantitative analysis. Principal component analysis (PCA) was used for exploratory analysis. Based on an effective model-independent variable selection method, i.e., minimal-redundancy maximal-relevance (MRMR), only 18 variables were selected to construct a partial least-square discriminant analysis (PLS-DA) model. On the test set, the PLS-DA model based on the selected variable set was compared with the full-spectrum PLS-DA model, both of which achieved 100% accuracy. In quantitative analysis, the partial least-square regression (PLSR) model constructed by the selected subset of 260 variables outperforms significantly the full-spectrum model. It seems that the combination of NIR spectroscopy, MRMR and PLS-DA or PLSR is a powerful tool for classifying different brands of milk and determining the protein content.
Two-Stage Method Based on Local Polynomial Fitting for a Linear Heteroscedastic Regression Model and Its Application in Economics

Directory of Open Access Journals (Sweden)

Liyun Su

2012-01-01

Full Text Available We introduce the extension of local polynomial fitting to the linear heteroscedastic regression model. Firstly, the local polynomial fitting is applied to estimate heteroscedastic function, then the coefficients of regression model are obtained by using generalized least squares method. One noteworthy feature of our approach is that we avoid the testing for heteroscedasticity by improving the traditional two-stage method. Due to nonparametric technique of local polynomial estimation, we do not need to know the heteroscedastic function. Therefore, we can improve the estimation precision, when the heteroscedastic function is unknown. Furthermore, we focus on comparison of parameters and reach an optimal fitting. Besides, we verify the asymptotic normality of parameters based on numerical simulations. Finally, this approach is applied to a case of economics, and it indicates that our method is surely effective in finite-sample situations.
Estimating cavity tree and snag abundance using negative binomial regression models and nearest neighbor imputation methods

Science.gov (United States)

Bianca N.I. Eskelson; Hailemariam Temesgen; Tara M. Barrett

2009-01-01

Cavity tree and snag abundance data are highly variable and contain many zero observations. We predict cavity tree and snag abundance from variables that are readily available from forest cover maps or remotely sensed data using negative binomial (NB), zero-inflated NB, and zero-altered NB (ZANB) regression models as well as nearest neighbor (NN) imputation methods....
Classification of Amazonian rosewood essential oil by Raman spectroscopy and PLS-DA with reliability estimation.

Science.gov (United States)

Almeida, Mariana R; Fidelis, Carlos H V; Barata, Lauro E S; Poppi, Ronei J

2013-12-15

The Amazon tree Aniba rosaeodora Ducke (rosewood) provides an essential oil valuable for the perfume industry, but after decades of predatory extraction it is at risk of extinction. The extraction of the essential oil from wood implies the cutting of the tree, and then the study of oil extracted from the leaves is important as a sustainable alternative. The goal of this study was to test the applicability of Raman spectroscopy and Partial Least Square Discriminant Analysis (PLS-DA) as means to classify the essential oil extracted from different parties (wood, leaves and branches) of the Brazilian tree A. rosaeodora. For the development of classification models, the Raman spectra were split into two sets: training and test. The value of the limit that separates the classes was calculated based on the distribution of samples of training. This value was calculated in a manner that the classes are divided with a lower probability of incorrect classification for future estimates. The best model presented sensitivity and specificity of 100%, predictive accuracy and efficiency of 100%. These results give an overall vision of the behavior of the model, but do not give information about individual samples; in this case, the confidence interval for each sample of classification was also calculated using the resampling bootstrap technique. The methodology developed have the potential to be an alternative for standard procedures used for oil analysis and it can be employed as screening method, since it is fast, non-destructive and robust. © 2013 Elsevier B.V. All rights reserved.
Linear regression

CERN Document Server

Olive, David J

2017-01-01

This text covers both multiple linear regression and some experimental design models. The text uses the response plot to visualize the model and to detect outliers, does not assume that the error distribution has a known parametric distribution, develops prediction intervals that work when the error distribution is unknown, suggests bootstrap hypothesis tests that may be useful for inference after variable selection, and develops prediction regions and large sample theory for the multivariate linear regression model that has m response variables. A relationship between multivariate prediction regions and confidence regions provides a simple way to bootstrap confidence regions. These confidence regions often provide a practical method for testing hypotheses. There is also a chapter on generalized linear models and generalized additive models. There are many R functions to produce response and residual plots, to simulate prediction intervals and hypothesis tests, to detect outliers, and to choose response trans...
Robust mislabel logistic regression without modeling mislabel probabilities.

Science.gov (United States)

Hung, Hung; Jou, Zhi-Yu; Huang, Su-Yun

2018-03-01

Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation. One common resolution is to fit a mislabel logistic regression model, which takes into consideration of mislabeled responses. Another common method is to adopt a robust M-estimation by down-weighting suspected instances. In this work, we propose a new robust mislabel logistic regression based on γ-divergence. Our proposal possesses two advantageous features: (1) It does not need to model the mislabel probabilities. (2) The minimum γ-divergence estimation leads to a weighted estimating equation without the need to include any bias correction term, that is, it is automatically bias-corrected. These features make the proposed γ-logistic regression more robust in model fitting and more intuitive for model interpretation through a simple weighting scheme. Our method is also easy to implement, and two types of algorithms are included. Simulation studies and the Pima data application are presented to demonstrate the performance of γ-logistic regression. © 2017, The International Biometric Society.

Model-based Quantile Regression for Discrete Data

KAUST Repository

Padellini, Tullia; Rue, Haavard

2018-01-01

Quantile regression is a class of methods voted to the modelling of conditional quantiles. In a Bayesian framework quantile regression has typically been carried out exploiting the Asymmetric Laplace Distribution as a working likelihood. Despite
Comparison of exact, efron and breslow parameter approach method on hazard ratio and stratified cox regression model

Science.gov (United States)

Fatekurohman, Mohamat; Nurmala, Nita; Anggraeni, Dian

2018-04-01

Lungs are the most important organ, in the case of respiratory system. Problems related to disorder of the lungs are various, i.e. pneumonia, emphysema, tuberculosis and lung cancer. Comparing all those problems, lung cancer is the most harmful. Considering about that, the aim of this research applies survival analysis and factors affecting the endurance of the lung cancer patient using comparison of exact, Efron and Breslow parameter approach method on hazard ratio and stratified cox regression model. The data applied are based on the medical records of lung cancer patients in Jember Paru-paru hospital on 2016, east java, Indonesia. The factors affecting the endurance of the lung cancer patients can be classified into several criteria, i.e. sex, age, hemoglobin, leukocytes, erythrocytes, sedimentation rate of blood, therapy status, general condition, body weight. The result shows that exact method of stratified cox regression model is better than other. On the other hand, the endurance of the patients is affected by their age and the general conditions.
Automatic variable selection method and a comparison for quantitative analysis in laser-induced breakdown spectroscopy

Science.gov (United States)

Duan, Fajie; Fu, Xiao; Jiang, Jiajia; Huang, Tingting; Ma, Ling; Zhang, Cong

2018-05-01

In this work, an automatic variable selection method for quantitative analysis of soil samples using laser-induced breakdown spectroscopy (LIBS) is proposed, which is based on full spectrum correction (FSC) and modified iterative predictor weighting-partial least squares (mIPW-PLS). The method features automatic selection without artificial processes. To illustrate the feasibility and effectiveness of the method, a comparison with genetic algorithm (GA) and successive projections algorithm (SPA) for different elements (copper, barium and chromium) detection in soil was implemented. The experimental results showed that all the three methods could accomplish variable selection effectively, among which FSC-mIPW-PLS required significantly shorter computation time (12 s approximately for 40,000 initial variables) than the others. Moreover, improved quantification models were got with variable selection approaches. The root mean square errors of prediction (RMSEP) of models utilizing the new method were 27.47 (copper), 37.15 (barium) and 39.70 (chromium) mg/kg, which showed comparable prediction effect with GA and SPA.
Variable selection in multivariate calibration based on clustering of variable concept.

Science.gov (United States)

Farrokhnia, Maryam; Karimi, Sadegh

2016-01-01

Recently we have proposed a new variable selection algorithm, based on clustering of variable concept (CLoVA) in classification problem. With the same idea, this new concept has been applied to a regression problem and then the obtained results have been compared with conventional variable selection strategies for PLS. The basic idea behind the clustering of variable is that, the instrument channels are clustered into different clusters via clustering algorithms. Then, the spectral data of each cluster are subjected to PLS regression. Different real data sets (Cargill corn, Biscuit dough, ACE QSAR, Soy, and Tablet) have been used to evaluate the influence of the clustering of variables on the prediction performances of PLS. Almost in the all cases, the statistical parameter especially in prediction error shows the superiority of CLoVA-PLS respect to other variable selection strategies. Finally the synergy clustering of variable (sCLoVA-PLS), which is used the combination of cluster, has been proposed as an efficient and modification of CLoVA algorithm. The obtained statistical parameter indicates that variable clustering can split useful part from redundant ones, and then based on informative cluster; stable model can be reached. Copyright © 2015 Elsevier B.V. All rights reserved.
Data-driven method based on particle swarm optimization and k-nearest neighbor regression for estimating capacity of lithium-ion battery

International Nuclear Information System (INIS)

Hu, Chao; Jain, Gaurav; Zhang, Puqiang; Schmidt, Craig; Gomadam, Parthasarathy; Gorka, Tom

2014-01-01

Highlights: • We develop a data-driven method for the battery capacity estimation. • Five charge-related features that are indicative of the capacity are defined. • The kNN regression model captures the dependency of the capacity on the features. • Results with 10 years’ continuous cycling data verify the effectiveness of the method. - Abstract: Reliability of lithium-ion (Li-ion) rechargeable batteries used in implantable medical devices has been recognized as of high importance from a broad range of stakeholders, including medical device manufacturers, regulatory agencies, physicians, and patients. To ensure Li-ion batteries in these devices operate reliably, it is important to be able to assess the battery health condition by estimating the battery capacity over the life-time. This paper presents a data-driven method for estimating the capacity of Li-ion battery based on the charge voltage and current curves. The contributions of this paper are three-fold: (i) the definition of five characteristic features of the charge curves that are indicative of the capacity, (ii) the development of a non-linear kernel regression model, based on the k-nearest neighbor (kNN) regression, that captures the complex dependency of the capacity on the five features, and (iii) the adaptation of particle swarm optimization (PSO) to finding the optimal combination of feature weights for creating a kNN regression model that minimizes the cross validation (CV) error in the capacity estimation. Verification with 10 years’ continuous cycling data suggests that the proposed method is able to accurately estimate the capacity of Li-ion battery throughout the whole life-time
Saturation recovery EPR spin-labeling method for quantification of lipids in biological membrane domains.

Science.gov (United States)

Mainali, Laxman; Camenisch, Theodore G; Hyde, James S; Subczynski, Witold K

2017-12-01

The presence of integral membrane proteins induces the formation of distinct domains in the lipid bilayer portion of biological membranes. Qualitative application of both continuous wave (CW) and saturation recovery (SR) electron paramagnetic resonance (EPR) spin-labeling methods allowed discrimination of the bulk, boundary, and trapped lipid domains. A recently developed method, which is based on the CW EPR spectra of phospholipid (PL) and cholesterol (Chol) analog spin labels, allows evaluation of the relative amount of PLs (% of total PLs) in the boundary plus trapped lipid domain and the relative amount of Chol (% of total Chol) in the trapped lipid domain [ M. Raguz, L. Mainali, W. J. O'Brien, and W. K. Subczynski (2015), Exp. Eye Res., 140:179-186 ]. Here, a new method is presented that, based on SR EPR spin-labeling, allows quantitative evaluation of the relative amounts of PLs and Chol in the trapped lipid domain of intact membranes. This new method complements the existing one, allowing acquisition of more detailed information about the distribution of lipids between domains in intact membranes. The methodological transition of the SR EPR spin-labeling approach from qualitative to quantitative is demonstrated. The abilities of this method are illustrated for intact cortical and nuclear fiber cell plasma membranes from porcine eye lenses. Statistical analysis (Student's t -test) of the data allowed determination of the separations of mean values above which differences can be treated as statistically significant ( P ≤ 0.05) and can be attributed to sources other than preparation/technique.
Evaluating the Performance of Polynomial Regression Method with Different Parameters during Color Characterization

Directory of Open Access Journals (Sweden)

Bangyong Sun

2014-01-01

Full Text Available The polynomial regression method is employed to calculate the relationship of device color space and CIE color space for color characterization, and the performance of different expressions with specific parameters is evaluated. Firstly, the polynomial equation for color conversion is established and the computation of polynomial coefficients is analysed. And then different forms of polynomial equations are used to calculate the RGB and CMYK’s CIE color values, while the corresponding color errors are compared. At last, an optimal polynomial expression is obtained by analysing several related parameters during color conversion, including polynomial numbers, the degree of polynomial terms, the selection of CIE visual spaces, and the linearization.
Fast-HPLC Fingerprinting to Discriminate Olive Oil from Other Edible Vegetable Oils by Multivariate Classification Methods.

Science.gov (United States)

Jiménez-Carvelo, Ana M; González-Casado, Antonio; Pérez-Castaño, Estefanía; Cuadros-Rodríguez, Luis

2017-03-01

A new analytical method for the differentiation of olive oil from other vegetable oils using reversed-phase LC and applying chemometric techniques was developed. A 3 cm short column was used to obtain the chromatographic fingerprint of the methyl-transesterified fraction of each vegetable oil. The chromatographic analysis took only 4 min. The multivariate classification methods used were k-nearest neighbors, partial least-squares (PLS) discriminant analysis, one-class PLS, support vector machine classification, and soft independent modeling of class analogies. The discrimination of olive oil from other vegetable edible oils was evaluated by several classification quality metrics. Several strategies for the classification of the olive oil were used: one input-class, two input-class, and pseudo two input-class.
Dual Regression

OpenAIRE

Spady, Richard; Stouli, Sami

2012-01-01

We propose dual regression as an alternative to the quantile regression process for the global estimation of conditional distribution functions under minimal assumptions. Dual regression provides all the interpretational power of the quantile regression process while avoiding the need for repairing the intersecting conditional quantile surfaces that quantile regression often produces in practice. Our approach introduces a mathematical programming characterization of conditional distribution f...
Evaluation of Parallel Level Sets and Bowsher's Method as Segmentation-Free Anatomical Priors for Time-of-Flight PET Reconstruction.

Science.gov (United States)

Schramm, Georg; Holler, Martin; Rezaei, Ahmadreza; Vunckx, Kathleen; Knoll, Florian; Bredies, Kristian; Boada, Fernando; Nuyts, Johan

2018-02-01

In this article, we evaluate Parallel Level Sets (PLS) and Bowsher's method as segmentation-free anatomical priors for regularized brain positron emission tomography (PET) reconstruction. We derive the proximity operators for two PLS priors and use the EM-TV algorithm in combination with the first order primal-dual algorithm by Chambolle and Pock to solve the non-smooth optimization problem for PET reconstruction with PLS regularization. In addition, we compare the performance of two PLS versions against the symmetric and asymmetric Bowsher priors with quadratic and relative difference penalty function. For this aim, we first evaluate reconstructions of 30 noise realizations of simulated PET data derived from a real hybrid positron emission tomography/magnetic resonance imaging (PET/MR) acquisition in terms of regional bias and noise. Second, we evaluate reconstructions of a real brain PET/MR data set acquired on a GE Signa time-of-flight PET/MR in a similar way. The reconstructions of simulated and real 3D PET/MR data show that all priors were superior to post-smoothed maximum likelihood expectation maximization with ordered subsets (OSEM) in terms of bias-noise characteristics in different regions of interest where the PET uptake follows anatomical boundaries. Our implementation of the asymmetric Bowsher prior showed slightly superior performance compared with the two versions of PLS and the symmetric Bowsher prior. At very high regularization weights, all investigated anatomical priors suffer from the transfer of non-shared gradients.
Influence diagnostics in meta-regression model.

Science.gov (United States)

Shi, Lei; Zuo, ShanShan; Yu, Dalei; Zhou, Xiaohua

2017-09-01

This paper studies the influence diagnostics in meta-regression model including case deletion diagnostic and local influence analysis. We derive the subset deletion formulae for the estimation of regression coefficient and heterogeneity variance and obtain the corresponding influence measures. The DerSimonian and Laird estimation and maximum likelihood estimation methods in meta-regression are considered, respectively, to derive the results. Internal and external residual and leverage measure are defined. The local influence analysis based on case-weights perturbation scheme, responses perturbation scheme, covariate perturbation scheme, and within-variance perturbation scheme are explored. We introduce a method by simultaneous perturbing responses, covariate, and within-variance to obtain the local influence measure, which has an advantage of capable to compare the influence magnitude of influential studies from different perturbations. An example is used to illustrate the proposed methodology. Copyright © 2017 John Wiley & Sons, Ltd.
Determination of Free Fatty Acid by FT-NIR Spectroscopy in Esterification Reaction for Biodiesel Production

Directory of Open Access Journals (Sweden)

Djéssica Tatiana Raspe

2013-01-01

Full Text Available This work reports the use of FT-NIR spectroscopy coupled with multivariate calibration to determine the percentage of free fatty acids (FFA in samples obtained by the esterification of FFA in vegetable oils. The analytical method used as calibration matrix samples of the reaction medium of esterification of oleic acid in soybean oil in proportions of 0.3 to 40 wt% (by weight of oleic acid obtained under different experimental conditions and utilized the partial least squares (PLS regression. The efficiency of the method was tested to predict the content of FFA in reactions of esterification of oleic acid in soybean oil catalysed by KSF clay and Amberlyst 15 commercial resin, both in a batch mode. Good Correlations were observed between the FT-NIR/PLS method and the reference method (AOCS. The results confirm that FT-NIR spectroscopy, in combination with multivariate calibration, is a promising technique for monitoring esterification reaction for biodiesel production.
Influence of smoothing of X-ray spectra on parameters of calibration model

International Nuclear Information System (INIS)

Antoniak, W.; Urbanski, P.; Kowalska, E.

1998-01-01

Parameters of the calibration model before and after smoothing of X-ray spectra have been investigated. The calibration model was calculated using multivariate procedure - namely the partial least square regression (PLS). Investigations have been performed on an example of six sets of various standards used for calibration of some instruments based on X-ray fluorescence principle. The smoothing methods were compared: regression splines, Savitzky-Golay and Discrete Fourier Transform. The calculations were performed using a software package MATLAB and some home-made programs. (author)
Resting-state functional magnetic resonance imaging: the impact of regression analysis.

Science.gov (United States)

Yeh, Chia-Jung; Tseng, Yu-Sheng; Lin, Yi-Ru; Tsai, Shang-Yueh; Huang, Teng-Yi

2015-01-01

To investigate the impact of regression methods on resting-state functional magnetic resonance imaging (rsfMRI). During rsfMRI preprocessing, regression analysis is considered effective for reducing the interference of physiological noise on the signal time course. However, it is unclear whether the regression method benefits rsfMRI analysis. Twenty volunteers (10 men and 10 women; aged 23.4 ± 1.5 years) participated in the experiments. We used node analysis and functional connectivity mapping to assess the brain default mode network by using five combinations of regression methods. The results show that regressing the global mean plays a major role in the preprocessing steps. When a global regression method is applied, the values of functional connectivity are significantly lower (P ≤ .01) than those calculated without a global regression. This step increases inter-subject variation and produces anticorrelated brain areas. rsfMRI data processed using regression should be interpreted carefully. The significance of the anticorrelated brain areas produced by global signal removal is unclear. Copyright © 2014 by the American Society of Neuroimaging.
Time-of-flight secondary ion mass spectrometry of a range of coal samples: a chemometrics (PCA, cluster, and PLS) analysis

Energy Technology Data Exchange (ETDEWEB)

Lei Pei; Guilin Jiang; Bonnie J. Tyler; Larry L. Baxter; Matthew R. Linford [Brigham Young University, Provo, UT (United States). Department of Chemistry and Biochemistry

2008-03-15

This paper documents time-of-flight secondary ion mass spectrometry (ToF-SIMS) analyses of 34 different coal samples. In many cases, the inorganic Na{sup +}, Al{sup +}, Si{sup +}, and K{sup +} ions dominate the spectra, eclipsing the organic peaks. A scores plot of principal component 1 (PC1) versus principal component 2 (PC2) in a principal components analysis (PCA) effectively separates the coal spectra into a triangular pattern, where the different vertices of this pattern come from (I) spectra that have a strong inorganic signature that is dominated by Na{sup +}, (ii) spectra that have a strong inorganic signature that is dominated by Al{sup +}, Si{sup +}, and K{sup +}, and (iii) spectra that have a strong organic signature. Loadings plots of PC1 and PC2 confirm these observations. The spectra with the more prominent inorganic signatures come from samples with higher ash contents. Cluster analysis with the K-means algorithm was also applied to the data. The progressive clustering revealed in the dendrogram correlates extremely well with the clustering of the data points found in the scores plot of PC1 versus PC2 from the PCA. In addition, this clustering often correlates with properties of the coal samples, as measured by traditional analyses. Partial least-squares (PLS), which included the use of interval PLS and a genetic algorithm for variable selection, shows a good correlation between ToF-SIMS spectra and some of the properties measured by traditional means. Thus, ToF-SIMS appears to be a promising technique for the analysis of this important fuel. 33 refs., 9 figs., 5 tabs.
Complex regression Doppler optical coherence tomography

Science.gov (United States)

Elahi, Sahar; Gu, Shi; Thrane, Lars; Rollins, Andrew M.; Jenkins, Michael W.

2018-04-01

We introduce a new method to measure Doppler shifts more accurately and extend the dynamic range of Doppler optical coherence tomography (OCT). The two-point estimate of the conventional Doppler method is replaced with a regression that is applied to high-density B-scans in polar coordinates. We built a high-speed OCT system using a 1.68-MHz Fourier domain mode locked laser to acquire high-density B-scans (16,000 A-lines) at high enough frame rates (˜100 fps) to accurately capture the dynamics of the beating embryonic heart. Flow phantom experiments confirm that the complex regression lowers the minimum detectable velocity from 12.25 mm / s to 374 μm / s, whereas the maximum velocity of 400 mm / s is measured without phase wrapping. Complex regression Doppler OCT also demonstrates higher accuracy and precision compared with the conventional method, particularly when signal-to-noise ratio is low. The extended dynamic range allows monitoring of blood flow over several stages of development in embryos without adjusting the imaging parameters. In addition, applying complex averaging recovers hidden features in structural images.
A Novel and Effective Multivariate Method for Compositional Analysis using Laser Induced Breakdown Spectroscopy

International Nuclear Information System (INIS)

Wang, W; Qi, H; Ayhan, B; Kwan, C; Vance, S

2014-01-01

Compositional analysis is important to interrogate spectral samples for direct analysis of materials in agriculture, environment and archaeology, etc. In this paper, multi-variate analysis (MVA) techniques are coupled with laser induced breakdown spectroscopy (LIBS) to estimate quantitative elemental compositions and determine the type of the sample. In particular, we present a new multivariate analysis method for composition analysis, referred to as s pectral unmixing . The LIBS spectrum of a testing sample is considered as a linear mixture with more than one constituent signatures that correspond to various chemical elements. The signature library is derived from regression analysis using training samples or is manually set up with the information from an elemental LIBS spectral database. A calibration step is used to make all the signatures in library to be homogeneous with the testing sample so as to avoid inhomogeneous signatures that might be caused by different sampling conditions. To demonstrate the feasibility of the proposed method, we compare it with the traditional partial least squares (PLS) method and the univariate method using a standard soil data set with elemental concentration measured a priori. The experimental results show that the proposed method holds great potential for reliable and effective elemental concentration estimation
Prediction of valid acidity in intact apples with Fourier transform near infrared spectroscopy*

OpenAIRE

Liu, Yan-de; Ying, Yi-bin; Fu, Xia-ping

2005-01-01

To develop nondestructive acidity prediction for intact Fuji apples, the potential of Fourier transform near infrared (FT-NIR) method with fiber optics in interactance mode was investigated. Interactance in the 800 nm to 2619 nm region was measured for intact apples, harvested from early to late maturity stages. Spectral data were analyzed by two multivariate calibration techniques including partial least squares (PLS) and principal component regression (PCR) methods. A total of 120 Fuji appl...
Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis

NARCIS (Netherlands)

Eekhout, I.; Wiel, M.A. van de; Heymans, M.W.

2017-01-01

Background. Multiple imputation is a recommended method to handle missing data. For significance testing after multiple imputation, Rubin’s Rules (RR) are easily applied to pool parameter estimates. In a logistic regression model, to consider whether a categorical covariate with more than two levels
Linear Regression Analysis

CERN Document Server

Seber, George A F

2012-01-01

Concise, mathematically clear, and comprehensive treatment of the subject.* Expanded coverage of diagnostics and methods of model fitting.* Requires no specialized knowledge beyond a good grasp of matrix algebra and some acquaintance with straight-line regression and simple analysis of variance models.* More than 200 problems throughout the book plus outline solutions for the exercises.* This revision has been extensively class-tested.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.